Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (quant-ph) 2026-06-17

Breaking the bicycle frame: Coset-based quantum LDPC codes

arXiv:2606.17268v1 Announce Type: new Abstract: Generalizing the construction of two-block group algebra (2BGA) codes, we introduce a family of two-block quantum LDPC codes constructed using the action of a group on the cosets of its subgroup. This replaces the regular group actions of the earlier two-block constructions and significantly expands the search space, yielding new quantum LDPC codes outside the 2BGA family. Through a computer search, we identify several new quantum LDPC codes, including weight-6 codes with parameters $[[48,8,6]]$, $[[96,8,10]]$, and $[[224,12,16]]$, as well as weight-8 codes with parameters $[[84,16,8]]$, $[[112,16,10]]$, $[[128,16,12]]$, and $[[168,16,15]]$. Furthermore, we introduce a maximally packed syndrome extraction schedule of depth $w+2$, including initialization and measurement steps, for any code with a maximum stabilizer weight of $w$ from our family. Under a standard circuit-level noise model, our codes, when decoded using BP-OSD, perform competitively with BB codes, achieving thresholds of $\approx0.65\%$ for the weight-6 family and $\approx0.35\%$ for the weight-8 family. Finally, we introduce a group-theoretic framework to generate sequences of graph-based covers of 2BGA codes, recovering and extending recent results on code constructions of this type.

02.
medRxiv (Medicine) 2026-06-18

Web-based education on Metabolism and Obesity is associated with improved lifestyle and health behaviours among Brazilian school teachers

Background: Obesity is a major global public health challenge, and teachers play a critical role in school-based health promotion. This study examined the perceived impact of a web-based educational program on metabolism and obesity delivered to Brazilian school teachers. Methods: This analytical cross-sectional study included 217 teachers who responded to the evaluation questionnaire after attending the course between 2017 and 2022. Statistical analyses included logistic regression and chi-square tests. Findings: Course completion rate was 81.98%, substantially exceeding the 5-15% typical of global MOOCs. However, ethnic disparities were observed: White respondents were 4.95 times more likely to complete the course than Black respondents (p=0.00097) and Brown respondents were 3.05 times more likely (p=0.0268) than Black respondents. Among non-completers, lack of time (64.7%) was the primary barrier. Participation was concentrated in Sao Paulo (77%), with no respondents from three northern states. Perceived difficulty showed a non-significant trend (p=0.0893) where by Black respondents had the lowest predicted difficulty; the most challenging course material was Scientific Content/Reading papers (50%). Completion was strongly associated with applying learned activities in teaching (p

03.
arXiv (CS.CL) 2026-06-15

Small LLMs: Pruning vs. Training from Scratch

Pruning promises a shortcut to strong small language models. In this work, we examine this promise by pruning Llama-3.1-8B at pruning ratios of 0.5–0.8 with six methods spanning depth, width, and sparse granularities, under two controlled token-matched settings. (1) With the same training token budget, pruned initialization consistently outperforms random initialization. This shows that the parent model provides a strong starting point, although the advantage narrows as the training token budget grows and as the pruning ratio rises, nearly vanishing at the highest pruning ratio we study. (2) When training from scratch is instead given the full token budget consumed by the whole pipeline, pruning at finer granularities still retains an advantage, while coarser structured pruning can be matched or surpassed. This suggests that the parent model transfers knowledge that additional training tokens alone cannot fully recover, but only at fine granularity. Taken together, our results yield a clear recommendation: with a large pretrained model in hand and a limited training token budget, pruning is better than training from scratch; when the training budget is not limited, training from scratch can be competitive for coarser pruning, so a large pretrained parent is not always necessary.

04.
arXiv (CS.CL) 2026-06-16

TokenPilot: Cache-Efficient Context Management for LLM Agents

As LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; however, their unconstrained sequence mutations alter layouts, introducing prefix mismatches and cache invalidation. This reveals a critical trade-off between text sparsity and prompt cache continuity. To address this, we present TokenPilot, a dual-granularity context management framework. Globally, Ingestion-Aware Compaction acts as a framework harness to stabilize prompt prefixes and eliminate open-world environmental noise at the ingestion gate. Locally, Lifecycle-Aware Eviction monitors the ongoing residual utility of context segments, enforcing a conservative batch-turn schedule to offload content segments only when task relevance expires. Experiments on PinchBench and Claw-Eval under both isolated and continuous modes demonstrate that TokenPilot reduces costs by 61% and 56% in isolated mode, and 61% and 87% in continuous mode, while maintaining competitive performance compared to prior systems. TokenPilot has been integrated into LightMem2 at https://github.com/zjunlp/LightMem2.

05.
arXiv (CS.AI) 2026-06-17

WallZero: Mastering the Game of WallGo with Strategic Analysis

arXiv:2606.17847v1 Announce Type: new Abstract: WallGo is a recently introduced strategic board game popularized by the 2025 Netflix series The Devil's Plan. Although played on a small 7 x 7 board, its combination of stone movement and wall placement yields high game-tree complexity and intricate strategic interactions. Despite its growing popularity, WallGo remains underexplored. This paper presents WallZero, an AlphaZero-based agent for the two-player WallGo setting. We introduce tailored action and feature designs to improve playing performance significantly. In the evaluation, WallZero defeats two professional Go players who participated in this study, securing on average 1.98x more territory per game. Beyond its strength, we use WallZero to assess game fairness and identify key strategies for mastering WallGo. Interestingly, our results show that the opening used in the Netflix series yields a more balanced game. Our code is available at https://rlg.iis.sinica.edu.tw/papers/wallzero.

06.
arXiv (CS.AI) 2026-06-24

Breaking Shortcut Learning for Cross-Trial EEG-Guided Target Speech Extraction via Two-Stage Training

arXiv:2606.24164v1 Announce Type: cross Abstract: Recent end-to-end models for EEG-guided target speech extraction report impressive results, underscoring potential for neuro-steered hearing technologies. However, our analysis reveals that high within-trial performance can be driven by trial-specific EEG structure that acts as shortcuts for target selection, leading to poor generalization on unseen trials. To overcome this gap, we propose TRUST-TSE, a two-stage framework to mitigate shortcut learning. By introducing contrastive pretraining with attended-speaker negative sampling, we encourage the EEG encoder to capture fine-grained EEG–speech alignment while suppressing trial-identity cues. We also employ a confidence-weighted extraction objective based on EEG–source similarity to guide extraction using the learned representations. Experiments on KUL and DTU datasets show that TRUST-TSE outperforms end-to-end baselines under strict cross-trial protocols, addressing a key reliability bottleneck of existing approaches.

07.
arXiv (CS.AI) 2026-06-15

Unsupervised Learning of Efficient Exploration: Pre-training Adaptive Policies via Self-Imposed Goals

arXiv:2601.19810v2 Announce Type: replace-cross Abstract: Unsupervised pre-training can equip reinforcement learning agents with prior knowledge and accelerate learning in downstream tasks. A promising direction, grounded in human development, investigates agents that learn by setting and pursuing their own goals. The core challenge lies in how to effectively generate, select, and learn from such goals. Our focus is on broad distributions of downstream tasks where solving every task zero-shot is infeasible. Such settings naturally arise when the target tasks lie outside of the pre-training distribution or when their identities are unknown to the agent. In this work, we (i) optimize for efficient multi-episode exploration and adaptation within a meta-learning framework, and (ii) guide the training curriculum with evolving estimates of the agent's post-adaptation performance. We present ULEE, an unsupervised meta-learning method that combines an in-context learner with an adversarial goal-generation strategy that maintains training at the frontier of the agent's capabilities. On XLand-MiniGrid benchmarks, ULEE pre-training yields improved exploration and adaptation abilities that generalize to novel objectives, environment dynamics, and map structures. The resulting policy attains improved zero-shot and few-shot performance, and provides a strong initialization for longer fine-tuning processes. It outperforms learning from scratch, DIAYN pre-training, and alternative curricula. Code is available at: https://github.com/Octavio-Pappalardo/ulee-jax

08.
arXiv (CS.CL) 2026-06-15

Persona-Pruner: Sculpting Lightweight Models for Role-Playing

Language Models (LMs) have shown remarkable potential as role-playing chatbots, delivering consistent, stylized interactions when given a specification of a character or user persona. However, applying these capabilities to real-world applications (e.g., ecosystems with numerous NPCs interacting simultaneously) exposes a critical inefficiency due to the excessive computational cost. In this paper, we question the necessity of dedicating a full, generalist model to a single persona, hypothesizing that a specific character identity relies on only a fraction of the model's total capacity. We observe that naively pruning LMs often severely degrades the role-playing performance for a specific persona; it does not distinguish between redundant knowledge and essential character traits. We propose Persona-Pruner, a framework that sculpts a lightweight role-playing model by isolating persona-specific sub-networks from a single description. Our experiments consistently show that Persona-Pruner preserves role-playing performance substantially more effectively than existing state-of-the-art LLM pruning techniques, reducing the performance drop from the dense model by up to 93.8% over the strongest baseline on RoleBench in LLM-as-a-judge score, while still maintaining general LLM capabilities. Code is available at https://github.com/jsu-kim/Persona-Pruner.

09.
arXiv (CS.AI) 2026-06-19

Playful Agentic Robot Learning

arXiv:2606.19419v1 Announce Type: cross Abstract: Current agentic robot systems can write executable Code-as-Policy programs, observe feedback, and revise behavior across multiple attempts, but they remain largely task-driven: reusable skills are acquired only after explicit instructions. We study Playful Agentic Robot Learning, where an embodied coding agent uses self-directed play as a continual skill-learning stage before downstream tasks arrive. We introduce RATs, Robotics Agent Teams designed for play-time skill acquisition. During play, RATs proposes novel yet learnable exploratory tasks, plans and executes robot-code policies, verifies intermediate progress, diagnoses failures, retries with dense, step-level feedback, and distills successful executions into a persistent code skill library. At test time, the agent reuses relevant skills from this frozen library to help solve new tasks. Experiments in LIBERO-PRO and MolmoSpaces show that play-learned skills improve held-out downstream tasks over no-play and random-play baselines, with 20.6 and 17.0 percentage-point gains over CaP-Agent0 on LIBERO-PRO and MolmoSpaces, respectively. Moreover, the learned skills can be plugged into other inference-time Code-as-Policy agents by simply retrieving them into the context, improving RoboSuite and real-world transfer by 8.9 and 8.8 points, respectively, without finetuning the underlying model.

10.
arXiv (CS.CV) 2026-06-17

SPATIA: Multimodal Generation and Prediction of Spatial Cell Phenotypes

Understanding how cellular morphology, gene expression, and spatial context jointly shape tissue function is a central challenge in biology. Image-based spatial transcriptomics technologies now provide high-resolution measurements of cell images and gene expression profiles, but existing methods typically analyze these modalities in isolation or at limited resolution. We address the problem by introducing SPATIA, a multi-level generative and predictive model that learns unified, spatially aware representations by fusing morphology, gene expression, and spatial context from the cell to the tissue level. SPATIA also incorporates a spatially conditioned generative framework with confidence-aware OT reweighting and morphology-profile alignment for modeling target-state morphology distributions. Specifically, we propose a confidence-aware flow matching objective that reweights weak optimal-transport pairs based on uncertainty. We further apply morphology-profile alignment to encourage biologically meaningful image generation, enabling the modeling of microenvironment-dependent phenotypic transitions. We assembled a multi-scale dataset consisting of 25.9 million cell-gene pairs across 17 tissues. We benchmark SPATIA against 18 models across 12 tasks, spanning categories such as phenotype generation, annotation, clustering, gene imputation, and cross-modal prediction. SPATIA achieves improved performance over state-of-the-art models, improving generative fidelity by 8% and predictive accuracy by up to 3%.

11.
arXiv (quant-ph) 2026-06-24

Wavelet Matrix Product States for Quantum Fields

arXiv:2606.23823v1 Announce Type: new Abstract: We introduce a variational method to solve continuum quantum models with discrete tensor network techniques. The method leverages wavelet matrix product states (wMPS): matrix product states built on top of sufficiently regular ($N\geq 6$) Daubechies scaling functions. These states live in the continuum field theory Fock space, have finite energy density, and can be optimized with standard algorithms, without restriction to free theories. Further, exploiting the multi-resolution analysis built into wavelets, and its quantum circuit description, we can iteratively refine wMPS to obtain accurate approximations at arbitrarily fine length-scales. We showcase the efficiency of the method on the Lieb-Liniger model, computing energy density and correlation functions.

12.
medRxiv (Medicine) 2026-06-22

Rare loss-of-function variants in POLD1, PMS1 and FAN1 modify age at onset of motor symptoms in Huntington's disease

Huntington's disease is a rare neurodegenerative disease whose primary risk factors are inherited expansions of a CAG repeat tract in the HTT gene. Somatic expansion of these tracts leads to neuronal toxicity, neuronal death and clinical disease progression. To identify genetic factors with a major impact on disease onset and progression, we genome sequenced 18,825 individuals for the ENROLL-HD study. Our results show rare inactivating mutations in three genes, all involved in DNA damage repair, are major determinants of age of onset for motor symptoms (n=10,610) and other clinical manifestations. Heterozygote carriers of predicted loss-of-function (pLoF) variants in POLD1 and PMS1 developed motor symptoms an average 20 years (n=3; P=1x10-5) and 7 years (n=6; P=2x10-3) later than non-carriers, respectively. Conversely, heterozygote carriers of pLoF variants in FAN1 (n=30) developed symptoms 10 years earlier (P=2x10-10). Our findings highlight therapeutic strategies and help predict age of onset for at-risk individuals.

13.
arXiv (CS.AI) 2026-06-15

STaR-DRO: Stateful Tsallis Reweighting for Group-Robust Structured Prediction

arXiv:2604.09737v2 Announce Type: replace-cross Abstract: Structured prediction with large language models requires outputs that are label-accurate, ontology-constrained, structurally valid, and evidence-grounded under label imbalance and heterogeneous group difficulty. We present a unified framework for ontology-constrained generation. First, we introduce a modular prompt-engineering architecture combining XML-style structure, expert disambiguation rules, chain-of-thought reasoning, metadata-aware decision logic, schema contracts, and a self-validation gate. It targets recurrent in-context failures, including format drift, label ambiguity, evidence hallucination, and metadata-conditioned confusion. Second, we propose STaR-DRO, combining Tsallis mirror ascent, sparse entmax-style primal mapback, EMA-smoothed group-loss tracking, rescaled ascent signals, and bounded excess-only multipliers. Unlike conventional DRO, which relies on dense Shannon-entropy exponentiated-gradient updates, can introduce high-variance stochastic reweighting, assigns positive adversarial mass to groups that are not persistently hard, and incurs costs through simplex competition, STaR-DRO upweights only persistently hard groups without suppressing easier ones. We evaluate the framework on EPPC Miner, a clinically grounded high-stakes structured-prediction task requiring hierarchical label prediction and evidence-span extraction from patient-provider secure messages. Across 1B-70B Llama models, prompt engineering improves zero-shot extraction, yielding an average label F1 gain of +14.46 and a Span F1 gain of +17.40. Building on supervised fine-tuning, STaR-DRO further improves accuracy and robustness, increasing average label F1 by +1.08 and +2.20 while reducing mean groupwise validation cross-entropy by 21.3% and 14.8% relative to SFT and standard DRO, respectively. These results advance reliable automated communication mining for patient-centered clinical care analysis.

14.
arXiv (CS.LG) 2026-06-12

Boltzmann Attention: Learnable Ising Couplings for Cooperative Attention

arXiv:2606.12478v1 Announce Type: new Abstract: Attention mechanisms are central to modern sequence models, yet standard attention computes relevance primarily through individual query–key similarities. Although softmax normalization introduces competition among positions, a standard attention layer does not explicitly parameterize learnable interactions between attention decisions. This limits its ability to directly model cooperative or antagonistic co-attention structure within the attention mechanism itself. We propose Boltzmann attention, an energy-based generalization in which attention patterns are governed by an interacting Ising model. The method augments the usual data-dependent local fields with learnable pairwise couplings, allowing the model to represent inter-position correlations beyond those captured by softmax or sigmoid attention. Experiments on character-level language modeling and synthetic bracket matching show that Boltzmann attention consistently improves over standard softmax attention within a standard Transformer architecture, with the advantage becoming more pronounced as sequence length increases. A four-way ablation confirms that the improvement arises from the learnable pairwise couplings. These results suggest that explicit inter-position interactions provide a principled enhancement for attention-based sequence modeling. Moreover, the Ising formulation opens a natural path toward quantum-computing-based sampling strategies: we demonstrate that diabatic quantum annealing provides a practical training method while maintaining competitive performance with exact Boltzmann computation.

15.
arXiv (CS.LG) 2026-06-18

Graph Instance Landscapes: When Structural Similarity Does (Not) Reflect Shortest-Path Performance

arXiv:2606.18267v1 Announce Type: cross Abstract: Benchmarking shortest-path algorithms is commonly based on aggregate performance over heterogeneous graph sets, which limits insight into how different search paradigms react to instance structure. We adopt an instance-landscape view of graph benchmarking by embedding graphs into a low-cost structural feature space and clustering them into regions of similar structure. Three benchmark suites are studied: weighted Erdős–Rényi graphs, random geometric (wireless) graphs, and real-world road networks. We evaluate four representative shortest-path solvers spanning uninformed exact search (Dijkstra), bidirectional exact search (bidirectional Dijkstra), heuristic-guided exact search (A$^{*}$), and deque-based strategies (DEQ). Clustering robustness is analyzed under multiple feature-selection schemes, and runtime distributions are compared across landscape regions using non-parametric tests. While generator parameters induce stable structural regions, we find that feature-space similarity does not necessarily imply performance similarity: significant runtime shifts are frequently observed even within the same landscape region. A merged-suite analysis further shows that different benchmark families occupy largely disjoint regions. These results highlight both the potential and the limits of structural landscapes for the structure-aware benchmarking of shortest-path algorithms.

16.
arXiv (CS.AI) 2026-06-16

Multi-Sensor Fusion for UAV Classification Based on Feature Maps of Image and Radar Data

arXiv:2410.16089v2 Announce Type: replace Abstract: The unique cost, flexibility, speed, and efficiency of modern UAVs make them an attractive choice in many applications in contemporary society. This, however, causes an ever-increasing number of reported malicious or accidental incidents, rendering the need for the development of UAV detection and classification mechanisms essential. We propose a methodology for developing a system that fuses already processed multi-sensor data into a new Deep Neural Network to increase its classification accuracy towards UAV detection. The DNN model fuses high-level features extracted from individual object detection and classification models associated with thermal, optronic, and radar data. Additionally, emphasis is given to the model's Convolutional Neural Network (CNN) based architecture that combines the features of the three sensor modalities by stacking the extracted image features of the thermal and optronic sensor achieving higher classification accuracy than each sensor alone.

17.
bioRxiv (Bioinfo) 2026-06-24

A comprehensive analysis of calreticulin mutants reveals distinct biophysicochemical proprieties with a potential for refined targeted therapies

Calreticulin mutations in myeloproliferative neoplasms result in the replacement of the C-terminus acidic sequence with a positively charged tail that causes pathological activation of the thrombopoietin. The two canonical variants are Type-1 and Type-2. The remaining are mainly classified as Type-1 or Type-2 like based on the wild type sequence retained. Here, we performed in silico biophysicochemical analyses of 76 CALR exon 9 frameshift variants by their sequence and predicted biophysical properties, complemented by structural modeling of the mutant homodimers. Beyond confirming the Type-1 versus Type-2 distinction, we found that the Type 1-like variants form a continuum of charge architecture along which two reproducible subgroups can be identified, rather than sharply separated classes. This work refines the conventional mechanism-based classification into a charge-resolved framework and provides testable hypotheses linking novel-tail chemistry to receptor activation in CALR-mutant neoplasms and paves the way for improved targeted therapies based on individual mutants characteristics

18.
arXiv (math.PR) 2026-06-19

Establishing an $\Omega(\sqrt{d})$ complexity lower bound for PDMP samplers and how to break it: a sub-$\sqrt{d}$ algorithm for Gaussian-tailed targets

arXiv:2606.19909v1 Announce Type: cross Abstract: Despite the theoretical appeal of their non-reversibility, to date, no Piecewise Deterministic Markov Process (PDMP) samplers have been developed that scale better than $\mathcal{O}(\sqrt{d})$ in computational complexity with respect to the target dimension $d$. We prove that this is a fundamental limitation by establishing an $\Omega(\sqrt{d})$ lower bound on the algorithmic complexity of PDMP samplers in a standard setup. By relaxing the assumption that the target density must remain invariant at all continuous times, we then demonstrate how to bypass this barrier. Specifically, we introduce a novel PDMP sampling scheme and show that it achieves an empirical complexity of $\mathcal{O}(d^\alpha)$, where $\alpha \in [0.2, 0.3]$ for Gaussian-tailed targets. In addition, this PDMP scheme is locally adaptive in both trajectory length and distance between velocity updates.

19.
arXiv (CS.CV) 2026-06-15

High-Fidelity Video Compression based on Invertible Neural Transform and Implicit Conditioning

Learning-based video compression has recently achieved competitive rate-distortion performance compared to conventional video codecs. However, most existing methods rely on non-invertible analysis-synthesis transforms, with reconstruction quality subject to both quantization and transform approximation errors. This limitation becomes particularly restrictive at higher quality points, where quantization errors are small and transform-induced distortion dominates. To address this, we propose InnVC, an Invertible neural network based Video Codec for wide-range and high-fidelity compression. The core idea is to preserve an invertible main transform path prior to quantization, while injecting content-adaptive context through a compact implicit conditioning field. This decouples strongly correlated video content from harder-to-model fine details, allowing different components to specialize in complementary reconstruction tasks for more efficient compression. To further improve compressibility, we introduce a scheduled masking strategy that progressively concentrates informative content into fewer latent channels for more effective entropy coding. Experiments on the UVG and MCL-JCV benchmarks show that InnVC achieves strong compression performance over a broad quality range, being particularly effective in the high-quality regime, yielding BD-rate reductions of 21.66% in PSNR and 46.06% in MS-SSIM relative to x265 on UVG. To the best of our knowledge, InnVC is the first neural video codec covers operating poins from low bitrate to high fidelity within a single architecture scale, spanning more than 20 dB in PSNR.

20.
arXiv (math.PR) 2026-06-24

Gradient Mean-Field Dynamics with Measure-Valued States: Well-Posedness, Chaos, and Long-Time Stability

arXiv:2606.24385v1 Announce Type: new Abstract: We study a stochastic mean-field interacting particle system whose state space is $\Y = \Tt^d \times \cP(U)$, where the first component represents a spatial variable and the second one is a probability measure over a compact metric space $U$. The dynamics are driven by locally Lipschitz drift operators: the spatial component evolves according to a Brownian diffusion, while the measure-valued component is perturbed by a projected cylindrical noise acting in the Arens–Eells space. We first establish existence and uniqueness of strong solutions for both the $N$-particle system and the associated nonlinear McKean–Vlasov equation under locally Lipschitz and linear growth assumptions on the drift coefficients. We then prove propagation of chaos: as $N\to\infty$, the empirical measure converges in expectation in Wasserstein–1 distance towards the unique McKean–Vlasov solution. Further, we investigate exponential convergence of the nonlinear McKean–Vlasov dynamics towards a unique invariant measure.

21.
arXiv (CS.CL) 2026-06-18

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

Long-context reasoning is an essential capability for large language models, particularly when they are deployed as autonomous agents that must reason over lengthy trajectories. Reinforcement learning (RL) has recently emerged as a dominant paradigm for improving this ability, yet existing work largely focuses on reward engineering while diverse training data remains scarce. We revisit this problem from a data-centric perspective and show that a simple yet effective data recipe alone, paired with a minimal outcome-based GRPO setup, suffices to substantially improve long-context reasoning. Our recipe targets three complementary task families – retrieval, multi-evidence synthesis, and reasoning – for which we construct and curate eight datasets totaling ~14K examples. Experiments on three models (Qwen3-4B/8B/30B-A3B) yield average gains of +7.2/+3.2/+6.4 points across seven long-context benchmarks, surpassing prior RL training sets. We further demonstrate that these gains transfer to agentic tasks, where continuing RL training on an agent-tuned model with our data recipe improves GAIA by +4.8 and BrowseComp by +7.0 points. We will release our datasets to facilitate future research.

22.
arXiv (CS.AI) 2026-06-15

CARE: Controlling LLM-Generated Policies through Auditable Review of Evidence in Scientific Experimentation

arXiv:2606.14581v1 Announce Type: cross Abstract: Granting LLMs direct control over costly, irreversible scientific experiments leads to unsafe exploration and unstable performance, but discarding LLM creativity entirely sacrifices significant optimization potential. We introduce CARE (Controlling LLM-Generated Policies through Auditable Review of Evidence in Scientific Experimentation), an auditable controller for high-throughput experimentation (HTE) optimization that keeps a non-LLM incumbent optimizer as the default action path while using LLMs to revise challenger ranking policies. Before each outcome is revealed, a public-evidence intervention gate compares the challenger with the incumbent. It authorizes the challenger's selection only when the evidence available before selection supports the change, with the decision recorded in the audit log. CARE outperforms all other evaluated methods on Minerva/Olympus and ChemLex benchmarks, with final-best improving from 80.0 to 88.5 on Minerva/Olympus and from 83.9 to 92.1 on ChemLex, relative to the public incumbent. Our experiments indicate that LLM self-evolution is more reliable when it expands the proposal space under an auditable controller, rather than directly choosing experiments.

23.
arXiv (quant-ph) 2026-06-16

Against probability: A quantum state is more than a list of probability distributions

arXiv:2601.18872v2 Announce Type: replace Abstract: The state of a quantum system can be represented by listing the outcome probabilities for a tomographically complete set of measurements. Such representations appear throughout physics, for example, in quantum field theory via correlation functions and in quantum foundations within generalized probabilistic frameworks. In this paper, we show a no-go result: To enable useful statements, the probability representation must be topologically robust$\unicode{x2014}$preserving the notion of closeness between states. Yet, a topologically robust probability representation cannot simultaneously retain other essential structure, such as the subsystem structure.

24.
arXiv (quant-ph) 2026-06-19

Charge-Conjugation Violation and Population Asymmetry in Bipartite Fermionic Lattices

arXiv:2606.06138v2 Announce Type: replace-cross Abstract: Charge conjugation violation (CCV) is a central concept in particle physics and appears also for quasiparticles in quantum many-body systems, which typically relies on an embedded external symmetry breaking to the underlying system. An open question is how an intrinsic CCV mechanism could emerge and what its macroscopic consequences would be. We establish sublattice kinks in bipartite fermionic lattices as a concrete setup showing intrinsic CCV. The intrinsic CCV of the sublattice kink is based on the graph-topological nature of the underlying Hamiltonian, with no explicit symmetry breaking taking place. It leads to a population asymmetry of different configurations and imprints a hidden leaf-like structure in the eigenenergy spectrum. The population asymmetry also leads to an imbalanced sublattice-kink production triggered by the vacuum-instability in the quench dynamics. Our work demonstrates the graph topology as the microscopic origin of intrinsic CCV, with the population asymmetry as the macroscopic consequence, of which the proposed setup is highly amenable to experimental implementation via cold-atom quantum simulators.

25.
arXiv (CS.AI) 2026-06-17

AUTOGATE: Automated Clock Gating via Toggling-Aware LLM-based RTL Rewriting

arXiv:2606.17461v1 Announce Type: cross Abstract: Fine-grain clock gating (FGCG) is among the most effective techniques for reducing dynamic power, yet current FGCG optimization flows remain largely manual. Recent LLM-based RTL optimization approaches remain limited by two key drawbacks: (1) the inability to process long waveform traces spanning millions of cycles, and (2) the difficulty of scaling optimization to large hierarchical codebases while preserving correctness. In this work, we present AUTOGATE, the first agentic framework for industry-grade RTL power optimization, enabling workload-aware clock-gating optimization across large hierarchical codebases. AUTOGATE introduces a Machine Learning (ML)-LLM co-design that bridges waveform-level analysis and RTL rewriting. Specifically, we design an ML-based clustering algorithm that distills raw toggling traces into compact, structured representations that guide LLM-based RTL rewriting. This enables accurate identification and application of clock-gating opportunities without requiring LLMs to directly process raw waveform data. To enhance scalability, AUTOGATE employs a hierarchical multi-agent architecture that decomposes large designs into independently optimizable modules, enabling coordinated optimization across deep design hierarchies. We evaluate AUTOGATE on a diverse set of designs ranging from small RTL designs to large industrial-grade codebases. Experimental results show that AUTOGATE consistently reduces dynamic power relative to baselines. Across the small-design suite, AUTOGATE reduces dynamic power by 49.31% on average. On industry-scale designs, it achieves 19.34% and 7.96% dynamic power reductions on NVDLA and BlackParrot, respectively, and up to 6.86% on highly optimized proprietary production designs.