Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-16

Quantum Information Geometry of Multicomponent Superconducting Fluctuation Transport

arXiv:2606.15928v1 Announce Type: cross Abstract: Quantum geometry underlies many electronic responses, but its transport signatures have so far been established mainly for pure single-particle Bloch states. Whether collective many-body fluctuations possess a measurable quantum geometry remains largely unexplored. Here we show that superconducting fluctuation transport provides a direct probe of quantum information geometry in collective many-body matter. Starting from a multicomponent time-dependent Ginzburg-Landau theory in the Gaussian fluctuation regime, we identify the equilibrium density matrix of fluctuating Cooper pairs as the static pair propagator, which defines a positive mixed-state manifold in momentum space. The geometry of this manifold is directly measurable through paraconductivity: the longitudinal paraconductivity is governed by the quantum Fisher information of superconducting fluctuation modes, while the fluctuational anomalous Hall effect is governed by the mean Uhlmann curvature, the mixed-state counterpart of Berry curvature. This correspondence further yields geometric bounds between these two transport components, with no direct analogue in normal electronic transport. Applied to chiral superconducting fluctuations in quarter-metal systems motivated by rhombohedral multilayer graphene, a symmetry-allowed Lifshitz invariant generates finite mean Uhlmann curvature and logarithmically enhances the anomalous Hall conductivity above the critical temperature. Our results establish collective superconducting fluctuations as an experimentally accessible transport probe of mixed-state quantum information geometry.

02.
arXiv (CS.CV) 2026-06-11

CoVR-R:Reason-Aware Composed Video Retrieval

Composed Video Retrieval (CoVR) aims to find a target video given a reference video and a textual modification. Prior work assumes the modification text fully specifies the visual changes, overlooking after-effects and implicit consequences (e.g., motion, state transitions, viewpoint or duration cues) that emerge from the edit. We argue that successful CoVR requires reasoning about these after-effects. We introduce a reasoning-first, zero-shot approach that leverages large multimodal models to (i) infer causal and temporal consequences implied by the edit, and (ii) align the resulting reasoned queries to candidate videos without task-specific finetuning. To evaluate reasoning in CoVR, we also propose CoVR-Reason, a benchmark that pairs each (reference, edit, target) triplet with structured internal reasoning traces and challenging distractors that require predicting after-effects rather than keyword matching. Experiments show that our zero-shot method outperforms strong retrieval baselines on recall at K and particularly excels on implicit-effect subsets. Our automatic and human analysis confirm higher step consistency and effect factuality in our retrieved results. Our findings show that incorporating reasoning into general-purpose multimodal models enables effective CoVR by explicitly accounting for causal and temporal after-effects. This reduces dependence on task-specific supervision, improves generalization to challenging implicit-effect cases, and enhances interpretability of retrieval outcomes. These results point toward a scalable and principled framework for explainable video search. The model, code, and benchmark are available at https://github.com/mbzuai-oryx/CoVR-R.

03.
arXiv (CS.LG) 2026-06-11

Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria

arXiv:2606.11284v1 Announce Type: cross Abstract: Real-world multi-agent systems, from traffic coordination to resource allocation, are often modeled as general-sum games where individual incentives conflict with collective welfare. In these settings, the central challenge is not merely finding an equilibrium, but selecting socially desirable outcomes among many suboptimal Nash equilibria. Standard deep multi-agent reinforcement learning (MARL) methods struggle with this problem, as value-decomposition approaches are constrained by monotonicity assumptions and policy-gradient methods often converge to stable but socially inefficient equilibria. To address this limitation, we propose $\Phi$-Actor-Critic ($\Phi$-AC), a framework that leverages swap regret minimization to steer learning toward high-welfare correlated equilibria (CE). To make counterfactual regret estimation tractable in deep MARL, $\Phi$-AC employs a centralized attention critic that predicts vector-valued regrets in a single forward pass, avoiding computationally expensive counterfactual simulations. We further introduce a Lagrangian-based equilibrium selection mechanism that optimizes social welfare while enforcing stability through regret constraints. Experiments on matrix games, Multi-Agent Particle Environments (MPE), and the Melting Pot Harvest scenario demonstrate that $\Phi$-AC learns efficient and stable coordination strategies across diverse mixed-motive settings while maintaining high collective return and competitive fairness.

04.
arXiv (CS.CV) 2026-06-16

Task-Instructed Causal Routing of Vision Foundation Models for Multi-Task Learning

Vision foundation models (VFMs) have demonstrated strong robustness and transferability across a wide range of visual tasks. However, each model typically encodes strong inductive biases shaped by its pre-training objective and data domain, resulting in fragmented yet complementary visual knowledge. As a result, a single model often struggles to capture the diverse visual representations required across multiple dense prediction tasks. To address this limitation, we propose TIGER (Task-Instruction-Guided Expert Routing), a framework that coordinates multiple heterogeneous VFMs for multi-task dense prediction. Instead of naively aggregating expert features, TIGER leverages natural-language task instructions to guide a routing network that assigns token-level expert weights conditioned on task semantics, enabling adaptive integration of complementary expert features. TIGER further introduces a counterfactual loss that aligns routing decisions with each expert's causal contribution by measuring prediction changes when experts are excluded, encouraging more reliable and interpretable routing. We evaluate TIGER on two multi-task dense prediction benchmarks, NYUD-v2 and Pascal Context, where it consistently outperforms recent multi-task learning baselines while keeping all VFMs frozen. These results demonstrate that combining instruction-guided expert routing with counterfactual causal alignment enables effective coordination of heterogeneous vision foundation models.

05.
arXiv (quant-ph) 2026-06-15

Quantum geometrical description of hole spin qubits far away from the $\Gamma$-point

arXiv:2606.14683v1 Announce Type: cross Abstract: Hole spin qubits provide one of the leading platforms for spin-based quantum computing due to their large intrinsic spin-orbit interaction (SOI), which enables fast electrical manipulation. The SOI of planar quantum dots has mostly been investigated in theoretical studies by examining the SOI already present in the two-dimensional hole gas (2DHG). Here, we study the SOI created by the in-plane confinement by deriving non-perturbative effective Hamiltonians numerically for hole spin qubits. We find that the quantum geometry of the 2DHG naturally emerges, leading to a meaningful non-perturbative definition of pseudospin valid far away from the $\Gamma$-point. The SOI of the 2DHG and of the in-plane confinement have different forms; therefore, they cannot be turned off simultaneously, ruining the perfect spin-orbit switch functionality of spin qubits. We construct effective Hamiltonians using the symmetry approach for various low-dimensional hole systems: (i) a heavy-hole confined in a SiGe/Ge/SiGe heterostructure, (ii) a light-hole confined in SnGe/Ge, (iii) a gate-defined nanowire in SiGe/Ge/SiGe, and (iv) a hole confined in a Ge/Si core/shell nanowire. The non-perturbative effective Hamiltonians provide results with excellent agreement with the full Hamiltonians.

06.
arXiv (CS.CV) 2026-06-16

Pixel-TTS: Image based Text Rendering for Robust Text-to-Speech

Recent advances in pixel-based text modeling show that representing text as images enables models to exploit visual cues for language understanding. Grounding text in its visual form allows structurally similar characters with different Unicode encodings to produce similar embeddings, benefiting cross-lingual and zero-shot scenarios. Conventional text-based approaches treat each character independently, limiting generalization to unseen characters and requiring embedding expansion during cross-lingual adaptation. We propose Pixel-TTS, the first framework for visually grounded speech synthesis. It renders text as images and projects them through a 2D convolutional layer to generate embeddings. This design eliminates embedding matrix expansion during fine-tuning while improving robustness to unseen characters and orthographic variations. Extensive experiments show Pixel-TTS achieves competitive performance with strong baselines, faster convergence and robust zero-shot generalization.

07.
arXiv (quant-ph) 2026-06-15

Stab-QRAM: A Clifford-Only Quantum Oracle for Affine Boolean Data

arXiv:2509.26494v3 Announce Type: replace Abstract: Oracle-based quantum algorithms require coherent evaluation of classical functions on superposed inputs, and in fault-tolerant architectures this cost is dominated by non-Clifford gates: generic lookup constructions incur $T$-counts that grow with the data size. Here we show that affine Boolean functions $f(\mathbf{x})=A\mathbf{x}+\mathbf{b}$ over $\mathbb{F}_2$ – the algebraic core of parity checks, linear feedback shift registers, and cipher linear layers – are exactly the functions admitting computational-basis-preserving Clifford oracles, and we develop this correspondence into Stab-QRAM, a compiler mapping a specification $(A,\mathbf{b})$ to an ancilla-free circuit of CNOT and $X$ gates with zero $T$-count. Via K\"{o}nig's edge-coloring theorem, the compiled schedule provably attains the minimum depth for its gate set. Case studies spanning Simon-type oracles, block-encodings of $X$-type coset operators, and syndrome extraction for CSS codes show one compiler serving the algorithm, primitive, and error-correction layers of the quantum stack.

08.
medRxiv (Medicine) 2026-06-11

Hantavirus Disease in Uruguay: Trends and Mortality Before and During the COVID-19 Pandemic.

Introduction: Hantavirus disease is an emerging and potentially severe zoonosis of global distribution. In Uruguay, it is transmitted by rodents inhabiting peridomestic, suburban, and rural areas. Global incidence is estimated at 150,000 to 200,000 cases per year, with up to 300 annual cases in the Americas. Since 1997, Uruguay's Ministry of Public Health (MPH) has monitored Hantavirus cardiopulmonary syndrome (HCPS), the most common clinical presentation in the region. By 2019, a total of 271 cases had been identified in the country, with an estimated mortality rate of nearly 50%. Objectives: To describe the clinical, epidemiological, and occupational characteristics of patients with Hantavirus disease in Uruguay during the pre-pandemic (2018-2019) and pandemic (2020-2021) periods. Methods: A descriptive, cross-sectional, observational study was conducted, including all serologically confirmed cases of Hantavirus infection reported to the MPH between 2018 and 2021. Clinical and demographic data were extracted from the mandatory reporting form for zoonotic diseases. Incidence and case fatality rates were calculated, and factors associated with fatal outcomes were analyzed. Results: A total of 58 confirmed cases were identified between 2018 and 2021. Most patients were male (62%), with a mean age of 36.5 years (SD 16). A decline in incidence was observed during 2020-2021, with no significant change in case fatality. Direct rodent exposure was the most frequently associated risk factor. Montevideo and Canelones were the most affected departments. Renal and pulmonary involvement were significantly associated with mortality. Conclusion: Hantavirus remains a relevant public health concern in Uruguay. Although a decrease in incidence was observed during the COVID-19 pandemic years, case fatality rates remained high. The findings underscore the need for sustained surveillance and early recognition, particularly in urbanizing regions.

09.
arXiv (CS.CL) 2026-06-16

It's About Time: Temporal References in Emergent Communication

Emergent communication enables agents to develop bespoke languages that improve communication efficiency. Despite the known importance of temporal structure in natural language, there is no existing evidence of temporal references in emergent communication. This paper addresses this gap, by exploring how agents communicate about temporal relationships. We analyse three potential factors for the emergence of temporal references: environmental, external, and architectural. Our experiments demonstrate that altering the loss function is insufficient for temporal references to emerge; rather, architectural changes are necessary. A minimal change in agent architecture, using a different batching method, allows the emergence of temporal references. This modified design is compared with the standard architecture in a temporal referential games environment, which emphasises temporal relationships. The analysis shows that over 95% of the agents with the modified batching method develop temporal references, without changes to their loss function. We consider temporal referencing necessary for future improvements to the agents' communication efficiency, enabling future agents to use a closer to optimal coding as compared to purely compositional languages. These insights provide the basis for incorporation of temporal references into other emergent communication settings, and investigation of other aspects of language.

10.
arXiv (CS.CV) 2026-06-12

NavWAM: A Navigation World Action Model for Goal-Conditioned Visual Navigation

Goal-conditioned visual navigation requires a robot to act under partial observability by anticipating how its motion will change the future egocentric view and whether that change brings it closer to the goal. Navigation world models provide such visual foresight, but they remain prediction modules that require an external planner to convert predicted futures into closed-loop control. We propose Navigation World Action Model (NavWAM), a diffusion-transformer policy that turns navigation world-model prediction into executable action by representing future observations, goal-progress values, and action chunks in a shared latent sequence. By learning future prediction jointly with the action and value targets that determine closed-loop behavior, NavWAM makes visual foresight directly usable for robot control. We build NavWAM through simulation pretraining and real-robot adaptation, and evaluate it on image-goal navigation against planning-based world models and a representative direct navigation policy. Across offline benchmarks and closed-loop real-robot deployment, NavWAM improves over planning-based world-model baselines in our evaluations while using the default policy mode without CEM-style action search. Project page: https://dachii-azm.github.io/navwam/

11.
arXiv (CS.LG) 2026-06-11

DeMix: Debugging Training Data with Mixed Data Error Types by Investigating Influence Vectors

arXiv:2606.11616v1 Announce Type: new Abstract: High-quality training data is essential for the success of machine learning models. However, real-world datasets often contain mixed types of errors arising from systematic flaws in data preparation pipelines, including label errors, feature errors, and spurious correlations. Effective debugging of training data requires both detecting erroneous samples and identifying their specific error types to enable targeted repair, yet existing data cleaning and attribution methods fail to adequately address this dual requirement. In this paper, we propose DeMix, a novel framework that simultaneously diagnoses erroneous samples and their error types. Our key insight is that different error types produce distinct patterns on model behavior. DeMix captures such error-specific patterns by influence vectors that characterize how each training sample affects model predictions across all validation samples. We formulate training data debugging as a multi-label classification problem where a classifier is developed to predict error types directly from influence vectors. We further introduce an intervention-based learning strategy that guides the classifier to capture invariant rationales specific to each error type, ensuring the learned classifier generalizes effectively. Empirical evaluations on 11 tasks across tabular data prediction, recommendation systems, and LLM alignment demonstrate that DeMix significantly outperforms state-of-the-art approaches, achieving a 22.61% improvement in data debugging F1-score and a 9.32% gain in task model performance after data repair. Code is available at: https://github.com/SJTU-DMTai/DeMix.

12.
arXiv (CS.AI) 2026-06-12

Once-for-All: Scalable Simultaneous Forecasting via Equilibrium State Estimation

arXiv:2606.13285v1 Announce Type: cross Abstract: We introduce Equilibrium State Estimation (ESE), a novel paradigm for simultaneous prediction, where multiple interacting systems require separate yet coordinated forecasts. Such scenarios often arise in real-world settings such as economics and healthcare modeling. Unlike existing approaches that predict one system at a time, ESE forecasts all systems in a single pass. It first estimates the equilibrium state across systems, then generates holistic forecasts based on the difference between the current state and the estimated equilibrium. Extensive experiments on synthetic and real-world datasets, including currency exchange and COVID-19 spread modeling, demonstrate that ESE is at least as accurate as state-of-the-art (SOTA) methods while being significantly faster. In addition, ESE integrates seamlessly with conventional predictors, combining their accuracy with its exceptional efficiency and delivering a 10-70x speedup. With linear-time complexity, ESE scales far better than SOTA methods as the number of systems increases. Moreover, it remains accurate under diverse perturbations, establishing ESE as a fast, generalizable, robust, and scalable multi-prediction method.

13.
arXiv (CS.AI) 2026-06-11

Bridging the Morphology Gap: Adapting VLA Models to Dexterous Manipulation via Intent-Conditioned Fine-Tuning

arXiv:2606.12109v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models have demonstrated remarkable zero-shot generalization in robotic manipulation, yet the vast majority of pre-trained pipelines remain strictly confined to low-DoF parallel grippers. Adapting these rich semantic priors to high-DoF dexterous hands introduces a severe morphology gap, direct end-to-end joint fine-tuning inherently causes catastrophic forgetting of spatial reasoning and acute action manifold collapse due to data scarcity. In this paper, we present InDex, a novel, data-efficient adaptation framework rooted in cross-morphology semantic inheritance. Rather than discarding the pre-trained 1-DoF parallel grasp output, we repurpose it as a continuous, macroscopic virtual grasp intent proxy to sequentialize the control topology. We implement a two-stage decoupled learning architecture: the first stage parameter-efficiently aligns the VLA backbone to predict continuous arm trajectories and the scalar grasp intent; the second stage freezes this spatial backbone and leverages an intent-conditioned denoising diffusion head to decode fine-grained joint articulations for multi-fingered end-effectors. Extensive simulation benchmarks across a suite of multi-stage, contact-rich dexterous manipulation tasks demonstrate that InDex effectively masters intricate skills with minimal demonstration data, substantially outperforming monolithic baselines while preserving the robust spatial generalizability of the original VLA prior.

14.
arXiv (CS.CV) 2026-06-16

Chronological Blindness: Benchmarking Temporal Reasoning in Vision-Language Models with CHRONOSIGHT

Human perception of visual scenes is inherently temporal. We instinctively recognise whether a fruit is ripening or rotting, whether construction is progressing or being demolished, and approximately how much time separates two photographs of the same subject. Whether large vision-language models (VLMs) share this competence remains an open and practically important question. We introduce CHRONOSIGHT, a rigorously controlled benchmark evaluating five dimensions of visual temporal reasoning: CHRONORANK (chronological ordering of image sequences), CHRONOLOCATE (ordinal stage localisation from a single image), CHRONODELTA (estimation of time elapsed between two images on a logarithmic scale), CHRONOREVERSE (detection of temporally reversed sequences), and CHRONOODD (identification of a temporal outlier within a set). The benchmark comprises 1{,}000 items across eight process families (biological growth, food transformation, physical weathering, construction, environmental change, human ageing, astronomical phenomena, and urban dynamics) spanning timescales from minutes to millennia. We evaluate eight open-source VLMs (500 M to 19 B parameters) under two prompting regimes and collect human performance baselines. Human performance averages 0.89 across tasks; the best open model (Qwen2.5-VL-7B) reaches 0.40 under direct prompting, a gap we term chronological blindness. Lightweight LoRA fine-tuning on 151 examples raises CHRONODELTA accuracy from near-zero to 0.43, transferring zero-shot to related tasks (CHRONOODD: 0.37; CHRONOREVERSE: 0.64)suggesting the bottleneck is partly instruction following rather than visual perception. Benchmark, code, and predictions will be released upon acceptance.

15.
bioRxiv (Bioinfo) 2026-06-16

FlowBench: separating planning, fault recovery and interpretation in agentic bioinformatics

Agentic large language model (LLM) systems are being deployed in bioinformatics faster than they are understood, and single-metric evaluations conflate capabilities that fail independently. We introduce FlowBench, a benchmark that decomposes agentic bioinformatics performance into planning, fault recovery, biological interpretation, and end-to-end output-fidelity. Existing systems achieve high plan completeness, but their closed, single-provider designs prevent attribution of performance to scaffolding versus the underlying model. We therefore built FlowAgent, a modular, provider-agnostic framework whose components can be selectively disabled and whose backbone model can be swapped across providers on a shared harness, and used it to evaluate 23 models from three main providers. Three findings emerge. First, generating a valid workflow plan from a named toolchain is largely solved, whereas inferring an appropriate toolchain from biological intent alone is uniformly difficult regardless of model tier, compressing all models into a narrow 44-57% pass-rate band. Second, ablation shows that the dependency-structured plan and a completeness-reflection step drive performance, while adding a same-context validator-driven retry makes structural quality worse. Third, fault recovery and data-grounded interpretation remain unsolved. Models frequently propose fixes that force a clean exit while leaving the underlying data invalid, and data-grounded interpretation lags internal-knowledge recall by a consistent margin. Safety does not emerge from capability, and reasoning-tier models were among the least reliable at recognising unrecoverable faults. Once planning saturates, agent architecture and refusal calibration, not model scale, are the productive frontier.

16.
arXiv (CS.LG) 2026-06-12

The Geometry of Phase Transitions in Generative Dynamics via Projection Caustics

arXiv:2606.13191v1 Announce Type: new Abstract: Continuous-state generative samplers, including diffusion and flow-matching models, evolve through continuous reverse-time dynamics, yet their samples often undergo abrupt qualitative changes: trajectories commit to modes, semantic alternatives collapse, and small perturbations in narrow time windows can produce large downstream effects. This paper develops a geometric account of such phase-transition-like behaviour. We view denoising as gradient descent on a free energy landscape and show that sharp transitions arise near projection caustics, where the nearest-point projection onto the data support ceases to be unique. Motivated by this perspective, we introduce the Critical Boundary Detector (CBD), as practical diagnostics for score-direction instability. Across toy models, standard diffusion models, and latent text-to-image diffusion models, CBD localises mode commitment, predicts intervention-sensitive windows, and supports targeted control in geometrically sensitive regions. Our results connect geometry of data and dynamics of diffusion generation.

17.
arXiv (CS.LG) 2026-06-18

INDEQS: Informed Neural controlled Differential EQuationS

arXiv:2606.19138v1 Announce Type: new Abstract: Neural Controlled Differential Equations (NCDE) provide a powerful continuous-time framework for forecasting time series, but standard graph-based extensions typically learn spatial structure purely from data, even in settings where a directed graph structure is known a priori. We introduce Informed Neural controlled Differential EQuationS (INDEQS), a graph-based NCDE forecasting method that incorporates prior knowledge of a directed graph at distinct architectural positions. INDEQS separates inner mixing of hidden states across graph nodes from outer mixing between vector field and control, and offers both a lightweight graph-constrained variant and a more expressive variant, learning additional graph connections from data via adaptive graph convolutions. To systematically study when graph informedness is beneficial in forecasting, we devise a continuous advection simulation on directed graphs, yielding synthetic spatio-temporal datasets with known ground-truth flow structure. We then evaluate INDEQS on two real-world tasks: river discharge forecasting on a hydrological network and traffic flow prediction on PeMS08. Across these synthetic and real-world benchmarks, outer informedness consistently improves mean absolute error over an uninformed NCDE with comparable parameter count, particularly on larger graphs, while inner informedness offers a more parameter-efficient alternative when strict adherence to a known adjacency is desired. A comparison of discrete convolutional and continuous-time decoders further shows that continuous decoders yield better accuracy and greater temporal flexibility on real-world tasks. An implementation of INDEQS and the advection simulation is available at https://github.com/Mitchi1/indeqs.

18.
arXiv (quant-ph) 2026-06-12

Fibonacci Steady-States and Persistent Oscillations in an Ordered Multimode Dicke Model

arXiv:2606.13072v1 Announce Type: new Abstract: Ultracold atoms in multimode optical cavities provide a rich testbed for many-body phenomena enabled by light-mediated interactions. Recent experiments include realizations of spin glasses and associative memories, as described by multimode Dicke models with disordered couplings. However, the properties of multimode Dicke models with ordered coupling geometries remain largely unexplored. In this work, we investigate the stable steady-states of the multimode Dicke model with an ordered nearest-neighbor coupling geometry, where $n_c$ atomic clusters are coupled via $n_c-1$ cavity modes. We show that the number of mean-field stable steady-states in the superradiant phase exhibits Fibonacci scaling with the number of atomic clusters, and that a subset of these steady-states exhibit persistent oscillations. Using both the truncated Wigner approximation and the numerically-exact hierarchy of pure states, we further demonstrate that these features of the stable steady-state solutions persist for finite cluster sizes. Ordered multimode Dicke models, such as the nearest-neighbor coupling geometry considered here, are accessible with current experimental technologies and point toward a broader class of strongly interacting dissipative systems with similarly rich behavior.

19.
arXiv (math.PR) 2026-06-16

An Analytical Methodology for Quantifying Airspace Conflict Rate and Complexity

arXiv:2606.14897v1 Announce Type: cross Abstract: Air traffic growth, advanced air mobility, and increasingly autonomous operations are driving the need for scalable and adaptive airspace design methodologies. Central to this challenge is quantifying how traffic flow structure and demand, governed in part by airspace geometry, influence conflict generation and operational complexity. This paper presents an analytical framework for computing conflict rate and conflict probability in structured airspace using stochastic flow models. Traffic streams are modeled as renewal processes with prescribed inter-arrival time distributions, while interactions between flows are captured through geometry-dependent minimum spacing constraints at merges and crossings. Within this formulation, closed-form upper bounds on the expected conflict rate and conflict probability per aircraft are derived as functions of flow configuration and demand. These metrics are interpreted as complementary measures of airspace complexity, reflecting controller workload and per-aircraft operational risk. The methodology is applied to representative hexagonal cell geometries with varying routing structures and flow distributions. Results reveal non-monotonic tradeoffs between routing flexibility, capacity, and conflict generation, with intermediate flow configurations outperforming both highly constrained and highly distributed cases. The proposed framework provides a tractable tool for evaluating airspace design alternatives and complexity-informed traffic management strategies.

21.
arXiv (CS.CL) 2026-06-19

A Survey of On-Policy Distillation for Large Language Models

As Large Language Models continue to grow in both capability and cost, transferring frontier capabilities into smaller, deployable students has become an important engineering problem, and knowledge distillation remains a common technique for this transfer. The prevailing recipe in industrial pipelines, static imitation of teacher-generated text, carries a structural weakness that grows more severe as tasks become longer and more reasoning-intensive. Because the student is trained on flawless teacher prefixes but generates its own at inference, small errors tend to accumulate into trajectories it has rarely been trained to recover from, and the resulting exposure bias has been shown to scale roughly with the square of sequence length. On-Policy Distillation reorganizes the training loop around this observation by having the teacher provide feedback on what the student actually produces, with the goal of reducing the compounding term toward linear and reframing distillation as an iterative correction process rather than single-pass imitation. The resulting literature has expanded along divergence design, reward-guided optimization, and self-play, yet contributions remain scattered across the knowledge distillation, RLHF, and imitation learning communities without a unified treatment. This survey provides such a treatment. We formalize OPD as f-divergence minimization over student-sampled trajectories, organize the field along three design axes (what to optimize, where the signal comes from, and how to stabilize training in practice), and consolidate success conditions, recurring failure modes, and the connection between OPD and KL-constrained reinforcement learning. We close with open problems that emerge from this synthesis, including distillation scaling laws, uncertainty-aware feedback, agent-level distillation, and the growing overlap between knowledge distillation and RL.

22.
arXiv (CS.CV) 2026-06-15

Hierarchical Consistency Learning for Test-time Adaptation in Camouflage Perception

Camouflaged object detection (COD) aims to localize targets that exhibit minimal perceptual differences from backgrounds through physical attributes. Existing methods, constrained by the static train-then-freeze paradigm, suffer from domain rigidity and annotation dependency, limiting their adaptability to scene variations and unseen camouflage patterns. To overcome these, we propose the hierarchical consistency learning (HCL) framework, which integrates test-time adaptation for dynamic representation recalibration. Specifically, we design the hierarchical representation reconstruction (HRR) to alleviate feature entanglement by synergizing spatial reconstruction with dual-stream frequency-domain decomposition, enhancing robustness against appearance homogenization. The pixel and spectrum inference provide structural and contextual priors. We further introduce task affinity guidance (TAG) to propagate knowledge across branches via channel-wise affinity, aligning local discriminative cues and mitigating semantic drift. To ensure semantic invariance, we formulate the prototype consistency calibration (PCC), which aggregates region features into compact prototypes and establishes prototype-feature similarity. This imposes implicit and hierarchical constraints that bridge task and representation gaps. Extensive experiments across four camouflaged and four underwater object benchmarks, under three degradation settings, demonstrate that our method consistently outperforms state-of-the-art approaches, highlighting its robustness and generalization under distribution shifts.

23.
arXiv (CS.LG) 2026-06-19

The Hidden Environmental Cost of Poor Coding Practices in TensorFlow and Keras Applications: A Study on Resource Leaks and Carbon Emissions

arXiv:2606.19799v1 Announce Type: cross Abstract: Efficiency and sustainability are critical considerations in the development and deployment of machine learning (ML) applications. Among the factors influencing sustainability, resource leaks in ML code can introduce hidden inefficiencies that elevate energy consumption and CO2 emissions. Despite this, empirical evidence quantifying their environmental impact remains limited. This emerging results paper presents an initial empirical investigation of two common resource-leak smells, namely Improper Model Reuse (IMR) and Unreleased Tensor References (UTR), and their impact on energy consumption and CO2 emissions in TensorFlow and Keras workloads. Controlled experiments were conducted for each smell by executing identical training tasks while comparing against a smell-free baseline. Our preliminary results show that both smells consistently increase estimated electricity usage and carbon emissions. IMR and UTR increased electricity consumption by approximately 32% and 46%, respectively, with proportional increases in CO2 emissions. Paired statistical tests indicate that these differences are systematic and statistically significant, providing initial empirical evidence that resource-leak smells may degrade ML energy efficiency and environmental sustainability. These findings suggest that resource-leak smells pose measurable risks to both software quality and sustainability, emphasizing the importance of integrating resource-lifecycle management and energy-efficiency considerations into ML development.

24.
arXiv (CS.AI) 2026-06-12

Structured Testbench Generation for LLM-Driven HDL Design and Verification-Oriented Data Curation

arXiv:2606.12983v1 Announce Type: new Abstract: Automated testbench generation has become a critical bottleneck in large language model (LLM)-driven Register Transfer Level (RTL) workflows, where large numbers of candidate designs must be verified rapidly and reliably. Existing prompt-based approaches treat testbench generation as unconstrained code synthesis, yielding stochastic outputs with high token cost, low reproducibility, and insufficient coverage. To address this gap, we present STG, a Structured Testbench Generation framework that exploits the inherent structure of hardware designs to generate deterministic testbenches. As a direct verification tool, STG runs 720x faster than an iterative LLM-based testbench generation flow and higher rate of successful compilation, achieves higher coverage, and reduces false-pass verdicts on incorrect DUTs. STG also helps identify errors in RTL generation benchmarks by exposing faulty benchmark testbenches. As a data curation engine, it is 11x faster than LLM-based filtering on a single CPU core with 127x less energy, and the resulting distilled models provide state-of-the-art performance in our multi-benchmark evaluation. As a test-time scaling oracle, it reduces node count by 14-47\%. Our models are available at https://huggingface.co/collections/AS-SiliconMind/siliconmind-v12.

25.
arXiv (CS.CL) 2026-06-15

OdysSim: Building Foundation Models for Human Behavior Simulation

Large language models are increasingly deployed as human simulators for interactive evaluation and social simulation. Yet helpfulness-driven post-training pulls them toward a homogeneous, overly agreeable assistant register, creating a behavioral Sim2Real gap. We present OdysSim, the largest open systematic investigation of behavioral foundation models, i.e., models trained to simulate human behavior at scale. We propose SOUL, a taxonomy of five capability axes (CONV, SS, COG, ROLE, EVAL) that unifies 62 datasets and 23 benchmark tasks under one framework. Specifically, we curate the OdysSim corpus (21.4M interactions, 10B tokens, retrofitted with back-generated social contexts), construct the SOUL-Index benchmark, and develop an end-to-end training recipe combining midtraining, task-specific RL, and expert distillation. The resulting open 8B OSim model ranks first or tied-first on 8 of 23 tasks, outperforming any individual frontier model by this count, with the strongest gains on conversational and social tasks. Its outputs are also more human-like in length, formatting, and word choice, and it transfers zero-shot to out-of-distribution user simulation on $\tau$-bench, nearly matching real users on reaction alignment (93.2 vs. 93.5). We further show that LLM-as-judge RL induces reward-hacking patterns, and that our detectors can mitigate them during post-training. Together, our findings suggest that behavioral foundation models require rethinking the LLM training paradigm. We release all artifacts to support future research.