Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-12

Feature-preserving Latent-EnKF for Data Assimilation of Flows with Shocks

arXiv:2606.12559v1 Announce Type: cross Abstract: The ensemble Kalman filter (EnKF) is widely adopted for sequential data assimilation, but fails for solutions with discontinuities, such as shocks in compressible flows. Uncertainty in shock location induces multimodal ensemble statistics that violate the Gaussian assumptions underlying the EnKF, producing large-scale spurious oscillations in the analysis state. We introduce a feature-preserving latent-EnKF that performs the ensemble update in a learned low-dimensional latent space, where shock and flow features admit a smooth manifold representation, thereby preserving sharp features during EnKF analysis. The updated latent state is mapped back to physical state through a shared decoder for all ensemble members. The algorithm eliminates the member-specific ordered training and positivity flooring used in prior approaches. Numerical experiments on a Sod shock tube and Mach 2 shock interaction with a 2D cylinder, using sparse and noisy observations, show accurate feature recovery of shocks and contact discontinuities without spurious oscillations.

02.
arXiv (CS.LG) 2026-06-12

Limits of spectral learning under noise

arXiv:2606.13067v1 Announce Type: new Abstract: Learning functional relationships from noisy data is a central problem in scientific inference. Spectral methods approximate unknown functions by expanding them in a basis and estimating the corresponding coefficients from data, but the stability of these coefficients under noise remains poorly understood. Here we study supervised regression with additive label noise using sparse spectral representations across multiple bases and dimensions. We show that noise induces a predictable drift in the learned coefficient vector whose magnitude depends on the effective number of active spectral modes. After whitening the empirical feature geometry, we derive a closed-form expression for the overlap between noisy and noiseless coefficient vectors, revealing a universal degradation curve governed by a single intrinsic noise scale. Numerical experiments across Fourier, Legendre, Bessel, and Haar bases confirm the theoretical prediction. The results demonstrate that spectral learning exhibits a fundamental noise threshold beyond which coefficient estimates become unstable, placing intrinsic limits on recovering functional structure from noisy data.

03.
bioRxiv (Bioinfo) 2026-06-15

RepGene: Toward a Unified Gene Representation Space Robust to Missing Biological Views

Genes can be described through multiple heterogeneous biological views, including genomic sequence, transcript sequence, protein sequence, textual knowledge, and single-cell expression context, yet existing gene embeddings remain largely modality-specific and difficult to compare or reuse when many views are unavailable. We study a narrower but practically important question: whether pretrained embeddings from these distinct sources can be organized into a shared gene representation interface that remains usable under severe missing-modality conditions. To investigate this question, we introduce RepGene, a lightweight single-branch framework that combines modality adapters, a shared encoder, presence-aware fusion, and self-supervised cross-view objectives to map five biological views into one latent space. Our goal is not to claim a new multimodal learning principle or to establish superiority over all simpler fusion strategies, but to provide an initial technical instantiation for testing whether such a shared interface is feasible in a fixed-feature setting. Under a two-stage protocol in which RepGene is trained self-supervised on frozen upstream embeddings and evaluated by downstream linear probing, we find preliminary evidence that the learned representation is broadly competitive in the full-modality setting and remains informative when only partial modality subsets are observed at inference time. The strongest signal in our study is robustness under missing views: average performance changes are often limited when one modality is removed, and even single-view inference remains non-trivial in the evaluated benchmark regime.These results do not resolve unified biological representation learning, and they should be interpreted in light of incomplete simple-fusion baselines, limited architectural ablation, benchmark dependence, and possible upstream feature exposure. We therefore position RepGene as a feasibility study and a starting point for stronger comparisons, broader benchmarks, and leakage-aware validation.

04.
arXiv (CS.AI) 2026-06-18

DecNefSimulator: A Modular, Interpretable Framework for Decoded Neurofeedback Simulation Using Generative Models

arXiv:2511.14555v4 Announce Type: replace-cross Abstract: Decoded Neurofeedback (DecNef) is a promising non-invasive approach to brain modulation with wide-ranging applications in neuromedicine and cognitive neuroscience. However, progress in DecNef research remains constrained by subject-dependent learning variability, reliance on indirect measures to quantify progress, and the high cost and time demands of experimentation. We present DecNefSimulator, a modular and interpretable simulation framework that formalizes DecNef as a machine learning problem. Beyond providing a virtual laboratory, DecNefSimulator enables researchers to model, analyze and understand neurofeedback dynamics. Using latent variable generative models as simulated participants, DecNefSimulator allows direct observation of internal cognitive states and systematic evaluation of how different protocol designs and subject characteristics influence learning. We demonstrate how this approach can (i) reproduce empirical phenomena of DecNef learning, (ii) identify conditions under which DecNef feedback fails to induce learning, and (iii) guide the design of more robust and reliable DecNef protocols in silico before human implementation. In summary, DecNefSimulator bridges computational modeling and cognitive neuroscience, offering a principled foundation for methodological innovation, robust protocol design, and ultimately, a deeper understanding of DecNef-based brain modulation.

05.
arXiv (CS.CV) 2026-06-16

SLUM-i: Semi-supervised Learning for Urban Mapping of Informal Settlements and Data Quality Benchmarking

Rapid urban expansion has fueled the growth of informal settlements in major cities of low- and middle-income countries, with Lahore and Karachi in Pakistan and Mumbai in India serving as prominent examples. However, large-scale mapping of these settlements is severely constrained not only by the scarcity of annotations but by inherent data quality challenges, specifically high spectral ambiguity between formal and informal structures and significant annotation noise. We address this by introducing a benchmark dataset for Lahore, constructed from scratch, along with companion datasets for Karachi and Mumbai, which were derived from verified administrative boundaries, totaling approximately 900 $km^2$ of urban area. This collection is supplemented by four cities from prior literature across Sub-Saharan Africa and Latin America, with comprehensive data quality assessments provided for each city. We also propose a semi-supervised segmentation framework designed to mitigate the class imbalance and distribution mismatch inherent in standard semi-supervised learning pipelines. Our method integrates a Class-Aware Adaptive Thresholding mechanism that dynamically adjusts confidence thresholds to prevent minority class suppression, and a DINOv2-based unlabeled pool filter that removes out-of-distribution tiles prior to training to reduce covariate shift. Extensive experiments across seven cities spanning three continents, repeated over five random seeds, demonstrate gains of up to +5.9 pp mIoU over state-of-the-art semi-supervised baselines, with both components being architecture-agnostic and adding no inference overhead.

06.
arXiv (CS.AI) 2026-06-11

Continual Quadruped Robots Coordination via Semantic Skill Discovery

arXiv:2606.08102v2 Announce Type: replace-cross Abstract: Multi-quadruped coordination has attracted increasing attention due to its enhanced payload capacity, broader contact coverage, and improved adaptability to challenging tasks. Existing methods for multi-quadruped manipulation typically focus on predefined or closed task families, often relying on multi-agent reinforcement learning (MARL) to train task-specific coordination policies. However, such methods struggle in open-ended continual learning settings, where tasks arrive sequentially and robots are expected to acquire new coordination skills while reusing previously learned ones without catastrophic forgetting. To address this challenge, we propose Conquer, a semantic skill-library framework that formulates continual multi-quadruped coordination as a retrieve-adapt-update process. First, to accommodate varying team sizes across tasks, we design a team-structured Self-Allies-Goal (SAG) backbone that supports variable-cardinality robot teams by explicitly modeling each robot's own state, teammate context, and task goal. For each incoming task, Conquer constructs a task-level semantic descriptor from pre-execution information and retrieves a relevant skill from the library for adaptation. After successful execution, Conquer updates the skill library by extracting trajectory-level semantic descriptors and organizing them according to semantic distance, thereby enabling continual skill accumulation and cross-task knowledge transfer. Simulation experiments show that Conquer achieves a final average success rate of 95.6%, demonstrating strong forward transfer and negligible catastrophic forgetting. Real-world rollouts on Unitree Go2 teams further validate the deployment feasibility of Conquer for practical multi-quadruped coordination. Simulation and real-robot demonstration videos are available at: https://conquer-project.pages.dev/.

07.
arXiv (CS.CV) 2026-06-16

Learned JPEG Compression for DNN Vision

JPEG, a lossy image compression technique designed for human viewers, has maintained its dominance for decades. However, in the era of artificial intelligence (AI), a substantial portion of image data, often compressed by JPEG, is and will continue to be consumed by deep neural networks (DNNs) instead of humans, thus creating a need to optimize JPEG for DNN inference performance. To this end, we propose learned JPEG compression for DNN vision (J4D), a novel training framework for determining JPEG encoding parameters to minimize compression rate while maximizing DNN inference performance. The major challenge of solving this optimization problem lies in representing the JPEG codec and compression rate in closed form. By incorporating a differentiable soft quantizer based on a probabilistic quantization scheme, we not only obtain a differentiable proxy for the JPEG codec, but are also able to compute the entropy of the coded source analytically, which is a close estimate of the actual compression rate. Equipped with both the differentiable JPEG codec and the information-theoretic rate estimator, we are then able to solve the aforementioned optimization problem with backpropagation. After training, the learned encoding parameters will be subsequently used in actual JPEG encoding based on probabilistic quantization. Extensive experimental results across multiple datasets and DNN architectures demonstrate that J4D consistently and significantly outperforms the default JPEG and other competitive JPEG codecs optimized for DNNs. Notably, compared to the default JPEG, J4D achieves an increase in accuracy by as much as 11.60% at the same rate, or a reduction of compression rate up to 80.05% at the same accuracy. Additionally, with the help of J4D, we show the potential to design universal JPEG encoding parameters for various DNN architectures for the first time.

08.
arXiv (CS.CL) 2026-06-16

A large-scale pipeline for LLM-assisted corpus annotation: variation and change in the English consider construction

As natural language corpora expand at an unprecedented rate, manual annotation remains a significant methodological bottleneck in corpus linguistic work. We address this challenge by presenting a scalable pipeline for automating grammatical annotation in voluminous corpora using large language models (LLMs). Unlike previous supervised and iterative approaches, our method employs a four-phase workflow: prompt engineering, pre-hoc evaluation, automated batch processing, and post-hoc validation. We demonstrate the pipeline's accessibility and effectiveness through a diachronic case study of variation in the English evaluative consider construction (consider X as/to be/{\O} Y). We annotate 143,933 'consider' concordance lines from the Corpus of Historical American English (COHA) via the OpenAI API in under 60 hours, achieving 98%+ accuracy on two sophisticated annotation procedures. A Bayesian multinomial GAM fitted to 44,527 true positives of the evaluative construction reveals previously undocumented genre-specific trajectories of change, enabling us to advance new hypotheses about the relationship between register formality and competing pressures of morphosyntactic reduction and enhancement. Our results suggest that LLMs can perform a range of data preparation tasks at scale with minimal human intervention, unlocking substantive research questions previously beyond practical reach, though implementation requires attention to costs, licensing, and other ethical considerations.

09.
arXiv (CS.CV) 2026-06-16

MamBOA: State-Space Architecture for Video Recognition

Fine-grained action recognition demands temporal reasoning that general-purpose architectures address through different cost-accuracy tradeoffs: 3D dense operators couple computation to the input volume, while difference-based methods approximate motion through rigid, hand-crafted subtraction of uncontextualized features - each reflecting a deliberate design choice with corresponding limitations in expressiveness or flexibility. We present MamBOA, a backbone-agnostic temporal framework built upon a novel interleaved scan structure that recasts the selective state-space recurrence (S6) as a native motion synthesizer. By interleaving consecutive feature representations extracted from a pretrained backbone into a single alternating sequence, the proposed scan structurally drives the recurrence to encode both temporal observations of each position within a shared hidden state, separated by only a single decay step - rendering the inter-frame transition an intrinsic component of the state dynamics rather than an externally computed quantity. A cascade of dedicated alignment and decoding operations then distills this joint encoding into an explicit motion representation, which a dual-path pooling mechanism adaptively aggregates by balancing attention-driven selection with uniform temporal coverage. The framework interfaces seamlessly with CNN, Transformer, and Mamba backbone families, adding only ~2.1 GFLOPs per feature pair. On Diving48, MamBOA achieves 85.02% Top-1 accuracy with an image-pretrained backbone and 86.24% with a video-pretrained backbone processing the entire video in a single forward pass - demonstrating that structurally induced state-space dynamics constitute a principled and general foundation for motion modeling.

10.
medRxiv (Medicine) 2026-06-15

Cost-Performance Evaluation of Large Language Models for Aspect-Based Sentiment Analysis of HCAHPS Patient Comments: A Validation Study

Background: Hospital Consumer Assessment of Healthcare Providers and Systems (HCAHPS) free-text comments contain actionable feedback, but timely, scalable, and affordable sentiment analysis remains challenging for health systems that rely on third-party vendors. Objectives: To evaluate cost-performance tradeoffs between a cost-optimized and a flagship large language model (LLM) for aspect-based sentiment analysis of HCAHPS comments, using human inter-rater agreement as a reproducibility benchmark. Methods: We analyzed 512 free-text HCAHPS comments collected from two community hospitals in calendar year 2023. Six trained reviewers (medical students, recent medical graduates, and practicing internists) independently assigned positive, negative, or neutral labels to each comment-aspect pair; the majority label among three reviewers formed the consensus reference standard. Two OpenAI models - GPT-5-nano (cost-optimized) and GPT-5 (flagship) - were prompted in a zero-shot setting via the OpenAI API. We calculated pairwise Cohen's {kappa} to establish a human inter-rater baseline, then compared each model's labels to the consensus using Cohen's {kappa}, accuracy, weighted F1, and per-call cost and latency. Results: Mean human inter-rater agreement was {kappa} = 0.79 (substantial). Both LLMs exceeded this baseline (cost-optimized {kappa} = 0.85; flagship {kappa} = 0.85) with nearly identical accuracy (0.92) and weighted F1 (0.93 vs. 0.93). Performance was strong on positive (F1 ~ 0.97) and negative (F1 ~ 0.90) classes but poor on the underrepresented neutral class (F1

11.
arXiv (CS.LG) 2026-06-19

Predictability as a Fine-Grained Measure for Privacy

arXiv:2606.20546v1 Announce Type: new Abstract: Differential privacy (DP) ensures rigorous individual-level privacy guarantees against even the most knowledgeable attackers, but its worst-case nature can impose a costly privacy-accuracy tradeoff. We introduce privacy via predictability, a fine-grained framework that explicitly incorporates the attacker's core knowledge, a compromised portion of the dataset generated by a stochastic process, and a specified family of queries. Predictability measures privacy leakage as the incremental gain in an attacker's ability to predict sensitive information about unknown individuals after observing the algorithm's output, beyond what can already be inferred from the compromised data. We show that predictability and DP are generally incomparable: each can be small while the other is large. However, in the worst-case regime where all but one individual is compromised, and all binary queries are considered sensitive, predictability implies mutual-information DP. More generally, predictability provides a finer-grained privacy metric tailored to specific sensitive information and specific attacker models. We introduce a general framework, using the generalized method of moments (GMM), to analyze asymptotic predictability when the compromised data is generated by a stationary, ergodic, mixing process. Using this analysis, we derive a predictability-calibrated output perturbation scheme for ERM. Our approach is complementary to DP and can be used alongside DP to provide fine-grained privacy control.

12.
medRxiv (Medicine) 2026-06-22

Association of Digoxin Use at Norwood Discharge with Fontan Completion: A Study from the Pediatric Heart Network Public Dataset

Background: Digoxin use after the Norwood procedure has been associated with improved interstage survival in hypoplastic left heart syndrome and related conditions. Whether this benefit translates into improved longer-term outcomes through staged palliation remains unknown. We aimed to determine the association of digoxin use at Norwood discharge with transplant-free survival and Fontan completion. Methods: We conducted a retrospective cohort study using the Pediatric Heart Network (PHN) Single Ventricle Reconstruction trial public dataset, including 549 infants enrolled at 15 North American centers between 2005 and 2008. Competing risk analysis was used to evaluate Fontan completion and Cox regression to assess death or transplantation within 6 years after the Norwood procedure. Mixed-effects models compared pre-Fontan hemodynamic and echocardiographic right ventricular indices between patients treated with and without digoxin after accounting for center clustering and adjustment for sex, shunt type, heart failure medications at Norwood discharge, and census block poverty level. Results: The 6-year cumulative incidence of Fontan completion was higher among patients discharged on digoxin than among those not receiving digoxin (82% vs 71%; p = 0.013). Competing-risk analysis accounting for death and transplant demonstrated a greater likelihood of Fontan completion among digoxin users (aHR 1.31; 95%CI 1.09-1.58; p = 0.005), without significant difference in the hazard of death or transplant (aHR 0.78; 95%CI 0.53-1.15; p = 0.208). No significant differences in pre-Fontan hemodynamic or echocardiographic indices were observed between groups. Initiation of digoxin post Stage II procedure was not associated with improved survival or likelihood to complete Fontan. Conclusion: Digoxin use at the time of Norwood discharge was associated with a 30% greater likelihood of Fontan completion by 6 years, without accompanying improvement in transplant-free survival. These findings extend prior observations of improved interstage outcomes associated with digoxin use and suggest that treatment may facilitate progression through staged palliation.

13.
arXiv (CS.AI) 2026-06-12

Will AI Agents Free Us From Meaningless Work? A Human-Centered Analysis

arXiv:2606.12430v1 Announce Type: cross Abstract: Some claim that AI agents will free workers from the boring parts of their jobs, yet little is known about how workers themselves identify which tasks should be automated. Prior research focuses on occupations, overlooking that workers experience varying levels of meaning across tasks within the same role. We address this gap with a task-level analysis grounded in Graeber's theory of bullshit jobs. Using ratings from 202 workers on 171 workplace tasks, we (1) validate a five-item scale of perceived bullshitness, (2) show that perceived bullshitness strongly predicts desire for AI delegation, and (3) find that such tasks are also seen as requiring less human oversight. Together, these findings suggest that tasks perceived as bullshit are natural candidates for AI delegation, aligning worker preferences with perceived feasibility.

14.
arXiv (CS.CL) 2026-06-11

PRInTS: Reward Modeling for Long-Horizon Information Seeking

Information-seeking is a core capability for AI agents, requiring them to gather and reason over tool-generated information across long trajectories. However, such multi-step information-seeking tasks remain challenging for agents backed by language models. While process reward models (PRMs) can guide agents by ranking candidate steps at test-time, existing PRMs - designed for short reasoning with binary judgment - cannot capture richer dimensions of information-seeking steps, such as tool interactions and reasoning over tool outputs, nor handle the rapidly growing context in long-horizon tasks. To address these limitations, we introduce PRInTS, a generative PRM trained with dual capabilities: (1) dense scoring based on the PRM's reasoning across multiple dimensions of step quality (e.g., interpretation of tool outputs, tool call informativeness) and (2) trajectory summarization that compresses the growing context while preserving essential information for step evaluation. Extensive evaluations across FRAMES, GAIA (levels 1-3), and WebWalkerQA (easy-hard) benchmarks on multiple models reveal that best-of-n sampling with PRInTS enhances information-seeking in open-source models as well as specialized agents, matching or surpassing frontier models with a much smaller backbone agent and outperforming other strong reward modeling baselines.

15.
arXiv (CS.CV) 2026-06-16

Self-Questioning Vision-Language Models: Reinforcement Learning for Compositional Visual Reasoning

Vision-Language Models (VLMs) are AI systems that process both images and text, yet they often struggle with compositional visual reasoning questions that require chaining multiple steps together, such as identifying objects, counting them, and comparing the results. Existing approaches improve this reasoning by training models on human-written step-by-step explanations, but creating these annotations is expensive and difficult to scale. We propose a self-questioning framework that trains a VLM to break visual questions into smaller sub-questions and answer each one before producing a final response, using a reinforcement learning algorithm called Group Relative Policy Optimization (GRPO). The model is never shown examples of how to decompose questions, it discovers this behavior on its own, guided by a reward signal that scores whether the output contains sub-questions and whether the final answer is correct. We apply this framework to a 3-billion-parameter model, training on both synthetic scenes of geometric shapes (CLEVR) and real-world photographs (A-OKVQA). On A-OKVQA, both self-questioning and standard reinforcement learning substantially improve accuracy over the untrained model (52.2% and 51.6% vs. 46.8%). We introduce the first self-questioning VLM by rewarding not only the final answer like standard RL but additionally for generating intermediate sub-questions, enabling it to discover compositional decomposition strategies. These results suggest that teaching AI systems to ask themselves intermediate questions is a promising strategy for complex visual reasoning, particularly when the difficulty of a question warrants explicit step-by-step decomposition.

16.
arXiv (CS.AI) 2026-06-18

Towards Multi-Agent-Simulation-Based Community Note Evaluation

arXiv:2606.18268v1 Announce Type: cross Abstract: Community-based fact-checking that relies on cross-consensus is expanding rapidly on social media platforms. However, the delay and low-ratio of cross-consensus community fact-checks rated by human contributors remains a significant challenge. To address this, we first created ComRate, a large-scale dataset comprising 2.5 million community notes and over 209 million ratings sourced from $\mathbb{X}$. We then propose MultiCom, a persona-guided multi-agent rating framework for community note evaluation. MultiCom simulates diverse rater population by clustering contributors in a matrix-factorized rater space and prompting persona agents to generate structured assessments based on the official community notes rating schema. These agents output structured and explainable judgments, such as confidence, agreement signals and reasons. An out-of-fold calibrated aggregation algorithm combines features such as raw votes and diagnostic reason signals for reliable prediction. Extensive evaluations demonstrate that MultiCom outperforms alternative methods, achieving an average accuracy of 84.7% (balanced accuracy 68.3%, macro-F1 60.1%) on the evaluation set.

17.
medRxiv (Medicine) 2026-06-23

Associations Between Social Responsiveness and Sleep Disruption are Modulated by Chronotype in Early Adolescence: Cross-Sectional and Prospective Findings from 10,108 Participants of the Adolescent Brain and Cognitive Development (ABCD) Study

Background: Sleep disruption is prevalent in people with neurodevelopmental disorders such as autism but is not clear whether it occurs as an endophenotype or secondary to other behaviours. The ABCD Study is a population-based longitudinal study that monitors the health, demography and lifestyle of over 11,000 children in the US. In this study we leverage these data to investigate whether traits consistent with autism (social responsiveness) are associated with sleep disruption independent of lifestyle and other behavioural measures. Methods: Autistic traits were assessed using the Social Responsiveness Scale at age 11, and sleep disruption and behavioural outcomes were assessed at ages 11 and 13 years using the Sleep Disturbance Scale, and the Child Behaviour Check List, respectively. Demographic, health and lifestyle-related variables were assessed by caregiver questionnaires. Regression models were applied to investigate associations between autistic traits and sleep outcomes. Results: There was a significant cross-sectional association between sleep disturbance and SRS at age 11 years old that was independent of sex, ethnicity, socioeconomic position, physical activity, sedentary behaviour and anxiety/depression ({beta} = 0.12, 95% CI (0.07, 0.17); p < 0.001), that persisted at age 13, and that was modulated by chronotype, with evening types showing a stronger association. Discussion: Social responsiveness assessed in early adolescence (age 11) were associated with sleep disruption independent of multiple confounding factors and were prospectively associated with sleep disruption at age 13 years. These findings contribute to the evidence that disruption of sleep and circadian timing may have a primary role in the neurobiological mechanisms that mediate autistic traits.

18.
arXiv (CS.LG) 2026-06-16

Time-Varying Audio Effect Modeling by End-to-End Adversarial Training

arXiv:2512.15313v2 Announce Type: replace-cross Abstract: Deep learning has become a standard approach for the modeling of audio effects, yet strictly black-box modeling remains problematic for time-varying systems. Unlike time-invariant effects, training models on devices with internal modulation typically requires the recording or extraction of control signals to ensure the time-alignment required by standard loss functions. This paper introduces a Generative Adversarial Network (GAN) framework to model such effects using only input-output audio recordings, without requiring a modulation signal extraction. We propose a convolutional-recurrent architecture trained via a two-stage strategy: an initial adversarial phase allows the model to learn the distribution of the modulation behavior without strict phase constraints, followed by a supervised fine-tuning phase where a State Prediction Network (SPN) estimates the initial internal states required to synchronize the model with the target. Additionally, a new metric based on chirp-train signals is developed to quantify modulation accuracy. Experiments modeling a vintage hardware phaser demonstrate the method's ability to capture time-varying dynamics in a fully black-box context.

19.
arXiv (CS.AI) 2026-06-24

When Preferences Fail to Become Incentives: A Utility-Behavior Gap in Large Language Models

arXiv:2606.22974v2 Announce Type: replace Abstract: Recent work on preference elicitation in large language models (LLMs) has demonstrated that, when given a series of choices between two outcomes, LLMs reveal a coherent, model-specific utility structure. Notably, this structure often includes preferences that the models' trainers did not intend, such as valuing people of some nationalities above others, raising the possibility that LLMs might be forming emergent, misaligned goals, which, if true, would have major safety implications. However, the choice paradigms in which these preferences are observed are not reflective of real-world situations in which misaligned behavior would be a practical concern. Therefore, we design an experimental paradigm to probe whether these preferences serve as motivations for LLM behavior in realistic scenarios. First, we reproduce prior findings on consistent preference elicitation. Next, we create a set of common writing tasks - essays, grant proposal abstracts, incident postmortems, and translations - where quality can be assessed by a blind, independent LLM judge panel. Then, we demonstrate that LLMs can be motivated via direct exhortation and other explicit cues to modulate their output quality on these tasks. Finally, we probe whether utilities inferred from explicitly reported preferences can shift output quality on these tasks by offering LLMs high-utility incentives for high-quality outputs. In all tasks, across all models tested, offering LLMs outcomes that they report in the choice paradigm as being highly preferred does not lead them to create higher quality outputs than offering them dispreferred outcomes, or even no outcomes at all. We conclude that the existence of coherent preferences as demonstrated in choice paradigms should not be taken as evidence that those preferences have incentive value for the models or affect their behavior in other contexts.

20.
arXiv (CS.AI) 2026-06-12

Agentic Large Language Models for Automated Structural Analysis of 3D Frame Systems

arXiv:2606.06525v2 Announce Type: replace-cross Abstract: Large language models (LLMs) have emerged as powerful foundation models with strong reasoning capabilities across domains. Beyond reactive text generation, agentic LLMs enable autonomous workflow execution through modular task decomposition and coordinated tool use. In structural engineering, recent efforts have developed agentic LLMs for automated analysis of plane frames. However, their extension to 3D frames remains underexplored due to challenges in irregular geometric representation, topological consistency, and long-horizon reasoning. This paper proposes an agentic LLM framework for automated structural analysis of 3D frames from natural language inputs. Irregular 3D frames are represented by projection onto a 2D plan, where orthogonal gridlines define spatial coordinates and a matrix of number of stories encodes vertical extrusion of each grid cell. Building on this representation, the framework establishes a multi-agent pipeline: a problem analysis agent parses input into structured JSON; a floor decomposition agent derives the spatial layout of each floor; the 3D geometry is assembled by node, girder, slab, and column agents; support and load agents assign boundary and loading conditions, and code translation agents generate executable SAP2000 script. Evaluated on ten representative 3D frames, the proposed framework achieves an average accuracy of 90% across repeated trials, demonstrating consistent and reliable performance.

21.
arXiv (CS.AI) 2026-06-19

DataMagic: Transforming Tabular Data into Data Insight Video

arXiv:2606.20388v1 Announce Type: cross Abstract: Data videos integrate dynamic charts, voice narration, and synchronized animations to communicate data insights as temporal narratives, making them an effective medium for improving data consumption efficiency in the data management lifecycle. However, producing high-quality data videos requires expertise spanning data analysis, narrative design, and video production. Existing approaches fall short: static visualization tools (e.g., BI dashboards) lack narrative logic and animation; authoring tools require users to pre-prepare visualizations rather than working from raw data; pixel-level video generation models cannot guarantee data fidelity or provenance. We demonstrate DataMagic, an end-to-end interactive system that transforms raw tabular data and natural language queries into narrative data-insight videos. To ensure data fidelity, DataMagic introduces the declarative specification DVSpec, which binds visual and animation elements to underlying data fields through data-driven semantic references. To address the combinatorial explosion of the design space, DataMagic adopts a Generate-then-Orchestrate multi-agent architecture that generates candidate scenes in parallel and then optimizes narrative coherence through global orchestration. Leveraging DVSpec's decoupling of logic and rendering, the system further supports three interaction modes and structured provenance-based data Q&A, transforming one-way videos into explorable interactive data interfaces. Evaluation on 109 real-world samples validates the effectiveness of the DataMagic. Homepage: https://datamagic-home.github.io/

22.
arXiv (CS.CL) 2026-06-24

Speculative Pipeline Decoding: Higher-Accruacy and Zero-Bubble Speculation via Pipeline Parallelism

Speculative Decoding (SD) accelerates low-concurrency LLM inference by employing a draft-then-verify paradigm. However, mainstream methods typically rely on multi-token prediction, which introduces escalating prediction difficulty and serial drafting latency. To address these, we propose Speculative Pipeline Decoding (SPD), a groundbreaking framework that unlocks the true potential of pipeline parallelism. By partitioning the target LLM into $n$ pipeline stages, SPD allows LLM to process $n$ tokens within single sequence in parallel to accelerate decoding. To continuous fill the pipeline in single sequence decoding, a speculation module aggregates intermediate features across different pipeline depths to predict the next token, executing strictly in parallel with the target model's pipeline step, to realize bounded difficulty, higher acceptance rates, and zero latency bubbles. Our experiments demonstrate that SPD achieves significantly higher theoretical and wall-clock speedup compared to mainstream baselines at moderate pipeline depth, though more aggressive settings require further improvement. Our code is available at https://github.com/yuyijiong/speculative_pipeline_decoding

23.
arXiv (quant-ph) 2026-06-16

Achieving double-logarithmic precision dependence in optimization-based quantum unstructured search

arXiv:2603.26039v3 Announce Type: replace Abstract: Grover's algorithm is a fundamental quantum algorithm that achieves a quadratic speedup for unstructured search problems of size $N$. Recent studies have reformulated this task as a maximization problem on the unitary manifold and solved it via linearly convergent Riemannian gradient ascent (RGA) methods, resulting in a complexity of $O(\sqrt{N/M}\log (1/\varepsilon))$, where $M$ denotes the number of target items and $\varepsilon$ denotes the success probability error. In this work, we adopt the Riemannian modified Newton (RMN) method to solve the quantum search problem, under the assumption that the ratio $ M/N$ is known. We show that, in this setting, the Riemannian Newton direction is collinear with the Riemannian gradient in the sense that the Riemannian gradient is always an eigenvector of the corresponding Riemannian Hessian. This structure removes the overhead of Hessian inversion and allows the proposed RMN method to retain the local quadratic convergence in terms of the error $\varepsilon$. More precisely, we rigorously prove an overall complexity of $O(\sqrt{N/M}+\log\log(1/\varepsilon))$. Furthermore, our approach remains Grover-compatible, namely, it relies exclusively on the standard Grover diffusion and oracle operators to ensure algorithmic implementability, and its parameter update process can be efficiently precomputed on classical computers.

24.
arXiv (CS.CV) 2026-06-16

Effective and Low-cost Lane-based Map Localization for Vehicle-Centric Route Generation

Driver-centric route representation plays a vital role in intuitive driving guidance systems. This paper presents OLRA, a low-cost, map-localization-based framework that derives driver-view-aligned routes by matching map-based navigation routes with camera-detected lane markings. This alignment process mutually enhances vehicle localization accuracy and visual route consistency. To bridge the evaluation gap across different paradigms, we introduce practical route evaluation metrics and benchmark OLRA against OpenPilot, a representative direct-generation approach. Experimental results on the nuScenes dataset demonstrate that OLRA outperforms OpenPilot in complex road segments and in route estimation at distance beyond 20 meters, achieving lower overall Euclidean error. This study is expected to promote future research in low-cost, maplocalization-based route generation methods.

25.
arXiv (quant-ph) 2026-06-19

Random Local Stabilizer Codes in Three Dimensions without String or Self-Similar Fractal Logical Operators

作者:

arXiv:2606.19873v1 Announce Type: new Abstract: Quantum error-correcting codes (QECs) are essential components quantum computation and have deep connections to quantum phases of matter. A key obstruction to passive self-correcting QECs is the presence of string logical operators, which can generate logical errors through constant-energy-barrier processes. Haah's Codes (fracton codes) showed that three-dimensional stabilizer codes can forbid such string logical operators, but their translation-invariant structure supports self-similar fractal logical operators with a logarithmic energy barrier. We introduce the qutrit random cubic codes, a family of local qutrit Calderbank-Shor-Steane stabilizer Hamiltonians with similar cube-check structure as Haah's Code 1 but built from spatially varying stabilizers. We prove that these models retain the no-string property and numerically observe that they have properties distinct from translation-invariant fracton codes: the smallest ground-state degeneracy exponent is $k=2$ for odd $L$ and $k=4$ for even $L$; noncontractible plane-logical operators span the entire logical space; and charge-push diagnostics show that the self-similar fractal operators are absent. These results demonstrate that constrained randomness can fundamentally change the nature of stabilizer codes and improve their self-correction properties. They further point to broader families of quantum error-correcting codes and quantum phases beyond canonical topological and fracton orders.