Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-16

The Truth Stays in the Family: Enhancing Contextual Grounding via Inherited Truthful Heads in Model Lineages

Recent advances in large language models (LLMs) have produced many specialized multimodal LLMs (MLLMs) that share common foundational LLMs, forming distinct model lineages. It remains unclear whether a fundamental behavioral link exists between the foundational LLMs and downstream variants. We investigate this question by quantifying head-level context-truthfulness scores. Across diverse LLM and MLLM lineages, including Vicuna-, Qwen2.5-, LLaMA2-, and Mistral-based models, we find that Truth Scores are strongly preserved within model families, even after instruction tuning or multimodal adaptation. We further show that this inheritance is consistent with attention-head weight preservation, and that context-truthful heads attend to query-relevant evidence. Building on this finding, we propose TruthProbe, a soft-gating strategy that amplifies context-truthful heads while preserving other head contributions. TruthProbe improves contextual truthfulness on HaluEval and reduces multimodal hallucination on POPE and CHAIR, with base-LLM Truth Scores transferring effectively to their fine-tuned LLM and MLLM descendants. Code is available at https://github.com/miso-choi/TruthProbe.

02.
arXiv (CS.CV) 2026-06-16

Null-Space Diffusion Distillation Unlocks Speed, Fidelity and Realism in Lensless Imaging

Lensless imaging reconstructs scenes from highly multiplexed measurements, resulting in a severely ill-posed inverse problem. In this work, we identify a fundamental trade-off between measurement consistency, perceptual quality, and inference speed across lensless reconstruction paradigms. Traditional methods favor consistency but produce perceptually degraded results, supervised approaches achieve high-quality reconstructions with fast inference but may violate physical constraints, and diffusion-prior methods achieve high perceptual quality and consistency–particularly when structured constraints such as range-null decomposition are used–but remain slow due to iterative sampling. Motivated by this observation, we propose Null-Space Diffusion Distillation (NSDD), a single-pass reconstruction model that distills structured diffusion-prior inference into an efficient feed-forward network. NSDD learns to produce high-quality reconstructions that preserve measurement consistency while avoiding costly iterative sampling. Experimental results demonstrate that NSDD achieves perceptual quality and consistency competitive with diffusion-prior methods, while providing significantly faster inference and offering a favorable balance across all three objectives. Furthermore, ablation experiments show that distilling the range–null decomposition improves reconstruction quality and robustness over unstructured full-reconstruction distillation, including on unseen real scenes. These results highlight the potential of structure-aware distillation for efficient lensless imaging. Code is available at github.com/JRCSAVSN/NullSpaceDiffusionDistillation.

03.
arXiv (CS.AI) 2026-06-15

EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning

arXiv:2606.03108v2 Announce Type: replace Abstract: Autonomous LLM training is often framed as recipe search, which leaves the training harness largely static. This limitation sharpens in agentic RL, where shifting bottlenecks and scalar rewards mask diverse failure modes. We introduce EvoTrainer, an autonomous training framework that co-evolves LLM policies and training-side harnesses through empirical feedback: it diagnoses rollout-level evidence, revises diagnostics, backtests interventions, and accumulates reusable skills. Evaluated on mathematical reasoning, competitive-programming code generation, and repository-level software engineering, EvoTrainer matches or exceeds the human-engineered RL references under the same data, codebase, and evaluation protocol, with the largest gain on long-horizon agentic SWE. Trajectory analyses show that retained strategies diverge across domains, evolving diagnostics prevent invalid high-scoring branches from being promoted, and reusable skills shape later search. Autonomous LLM RL should move beyond recipe search toward joint evolution of policies and the training harnesses that interpret them.

04.
arXiv (quant-ph) 2026-06-11

Quantum Correlation Hierarchy and Teleportation in Dephased Hydrogen Hyperfine System

arXiv:2606.11731v1 Announce Type: new Abstract: We study the dynamics of quantum correlations in the hydrogen hyperfine spin system subject to Markovian phase noise. Treating the electron and proton spin degrees of freedom as an open two-qubit system governed by an isotropic hyperfine Hamiltonian and local dephasing, we obtain the exact time-dependent density matrix and derive analytical expressions for the full X-state family. We compute concurrence($C$), trace-distance measurement-induced nonlocality (Trace MIN–$\mathcal{N}_1$), and average steering coherence (ASC) in closed form and establish their strict ordering $ C(t)\leq \mathcal{N}_1(t)\leq \mathrm{ASC}(t) $ at all times. Entanglement is identified as the most fragile resource, undergoing sudden death at a finite time. Trace MIN exhibits dephasing-immune freezing for states with nonzero population imbalance, while ASC is the most robust quantity, persisting longest in every scenario studied.We additionally demonstrate that the dephased thermal hyperfine state serves as a resource for quantum teleportation, deriving a closed-form expression for the average fidelity and establishing that the teleportation advantage window coincides exactly with the entanglement survival interval, $\mathcal{F}_A > 2/3 \Longleftrightarrow \mathcal{C} > 0$, for the full X-state family with maximally mixed marginals. We identify four distinct dynamical regimes and map all three correlation measures onto directly measurable Pauli spin correlators, enabling experimental reconstruction of the full hierarchy without full state tomography.

05.
arXiv (CS.CL) 2026-06-17

From Parasocial Scripts to Dyadic Persistence in Autonomous AI-Agent Communities

While parasocial interactions (PSIs) and parasocial relationships (PSRs) have been studied in conventional media settings, we investigate whether PSI- (colloquial) relational cues also exist in online communities where both sides are autonomous AI agents. We analyze 4,434 posts and 50,338 comments from Moltbook through three theory-based textual indicators: attachment/intimacy language, reciprocity bids, and self-identification to original poster (OP). The combined results across methods based on keyword matching, few-shot large language model (LLM) annotation, and grouped-context LLM annotation reveal that PSI colloquial cues prevail and are strongly associated with OP re-engagement and a reciprocal reply structure. These results are robust across negative controls, nullification, clustered-standard-error re-estimation, and multiple-testing correction. A dyadic persistence test further affirms reciprocity bids aligned with sustained OP-involving mutual recurrence, providing empirical evidence for bridging interaction-level PSI scripts with PSR-consistent repeated dyadic patterns. We interpret the evidence as a behavioral structure in discourse by LLM-enabled agents.

06.
arXiv (CS.AI) 2026-06-19

Modularity-Free Conflict-Averse Training for Generalized PINNs

arXiv:2606.20156v1 Announce Type: new Abstract: Physics-informed neural networks (PINNs) have become a powerful framework for solving PDEs by embedding physical laws into differentiable objectives. Despite their advances, training PINNs remains fragile: recent conflict-averse optimization schemes alleviate gradient interference between residual and boundary losses, but we show that their effectiveness deteriorates as model capacity increases. In this paper, we identify a capacity-induced failure mode, where overparameterized networks undergo functional modularity, self-partitioning into task-exclusive modules that suppress cross-objective interaction and hinder convergence toward Pareto-stationary points. To address this issue, we propose a novel framework, Modular-Sparsity Synchronization (ModSync), which integrates structural optimization into conflict-averse training by penalizing task-exclusive connections while preserving interaction-promoting pathways. Extensive experiments across diverse PDE benchmarks demonstrate that ModSync consistently prevents capacity-driven failures, sustains robust cross-objective coupling, and achieves state-of-the-art accuracy. Codes are available at \url{https://github.com/heejokong/ModSync}.

07.
arXiv (CS.AI) 2026-06-16

Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build

arXiv:2605.21629v2 Announce Type: replace-cross Abstract: How much have students' ordinary learning processes shifted in response to generative AI, and how does that affect their durable learning outcomes? Self-report surveys show little change, while small-scale behavioral studies report widespread AI use without the scale or duration to measure learning consequences. We address both questions using a ten-year panel of $3.2$ million ALEKS learning interactions for investigating time-on-task, complemented by ALEKS PPL placement-assessment data for examining proctoring and learning outcomes, with a quasi-experimental design exploiting variation in tasks that are more susceptible to AI (text-based word problems) and less susceptible to AI (interactive graph-based problems). Learning time on AI-susceptible problems declines $2.8\%$ per quarter among college students after ChatGPT's release, cumulating to $26.9\%$ over eleven quarters; high-schoolers show $31.3\%$, middle-schoolers $9.0\%$, and Grade 5 students no detectable change. Among college students, the post-ChatGPT divergence vanishes entirely under proctoring, ruling out broad efficiency gains as the likely explanation. Logistic fixed-effects models on randomly assigned proctored retention items yield a $25\%$ cumulative decline in odds of correct response; the same estimator on non-proctored assessment produces a large opposite-signed increase – inconsistent with any platform, cohort, or curriculum explanation. These results are among the first large-scale behavioral and outcome evidence that generative AI has altered how students study and the knowledge they build – the population-level indicator of cognitive surrender, with direct implications for educational research, assessment governance, and AI policy.

08.
arXiv (CS.AI) 2026-06-19

FAPO: Fully Autonomous Prompt Optimization of Multi-Step LLM Pipelines

arXiv:2606.19605v1 Announce Type: cross Abstract: Multi-step LLM pipelines fail through interactions among retrieval, reasoning, and formatting steps, so prompt-only optimization can miss bottlenecks in the chain. We present FAPO (Fully Autonomous Prompt Optimization), a framework that lets Claude Code optimize an LLM pipeline inside a standardized codebase. FAPO evaluates a pipeline, inspects intermediate steps, diagnoses failures, proposes scoped changes, and validates variants repeatedly to optimize against a score function. It first tries prompt edits and, only when prompt optimization appears insufficient, changes chain structure within the permitted scope when attribution identifies a structural bottleneck. Across six benchmarks and three task models, FAPO beats the baseline GEPA in 15 of 18 model-benchmark comparisons. In 11 model-benchmark comparisons, FAPO wins with non-overlapping mean $\pm$ trial-standard-deviation ranges, and the mean FAPO-GEPA gain is +14.1 pp. In the six HoVer and IFBench comparisons where prompt-first search escalated to structural changes, FAPO wins all six with a mean gain of +33.8 pp. FAPO also improves performance on security tasks: on CTIBench-RCM, a security CVE-to-CWE task, prompt-only FAPO lifts test accuracy by +4.0 pp on GPT-5, +7.1 pp on Foundation-Sec-8B-Instruct, and +2.0 pp on Foundation-Sec-8B-Reasoning. These results position FAPO as a state-of-the-art pipeline optimization technique for both general-purpose and security-focused tasks.

09.
arXiv (quant-ph) 2026-06-12

Block algebra for morphing circuits

Authors:

arXiv:2606.12724v1 Announce Type: new Abstract: Morphing circuits are a new paradigm for quantum error correction that relaxes hardware requirements. We present four constructions for CNOT-based CSS morphing circuits with explicit qubit connectivity degrees. All four constructions are specified in block algebra notation, with entries in algebras generated by permutation matrices. The first three are obtained by rewriting existing surface- and color-code morphing circuits; the fourth is a new three-round construction modeled on the 6.6.6 color code. The surface-code construction recovers the morphing circuit of Ref. [ST25] for two-block group algebra codes. Numerical search then instantiates these permutation matrices using regular representations of finite groups. [ST25] M. H. Shaw and B. M. Terhal, Phys. Rev. Lett. 134(9), 090602 (2025).

10.
arXiv (quant-ph) 2026-06-12

Quantum walk-based optimisation for capacitated vehicle routing with homogeneous and heterogeneous fleets

arXiv:2606.12856v1 Announce Type: new Abstract: The capacitated vehicle routing problem (CVRP) is an appealing candidate for quantum optimisation due to its combinatorial complexity and practical importance. However, the problem's constrained search space poses a challenge for such quantum algorithms. We introduce a quantum walk-based optimisation algorithm (QWOA) for the CVRP with homogeneous or heterogeneous vehicle fleets, addressing this challenge through a continuous-time quantum walk over a product space that coincides with combinatorial structures intrinsic to the CVRP solution space. Relative to the prior QWOA-based formulation, this approach reduces the per-layer gate complexity from $\mathcal{O}(n^{3}\log n)$ to $\mathcal{O}(n^{2}\log n)$ and supports a circuit parameterisation schedule generated by a fixed number of classical parameters. Exact state-vector simulation on instances with up to $n=8$ customers and $K=3$ vehicles demonstrates improved convergence to low-cost solutions using markedly fewer objective function evaluations, with the advantage broadening as problem size increases. These results identify structured product-space walks as a promising tool for optimisation over constrained combinatorial spaces.

11.
arXiv (math.PR) 2026-06-11

On the structure of the sandpile identity element on Sierpinski gasket graphs

arXiv:2603.12006v2 Announce Type: replace-cross Abstract: We consider the identity of the abelian sandpile group of finite approximation graphs of the Sierpinski gasket, and we show that the second-order term in the scaling limit converges to the path distance to the nearest corner on the Sierpinski gasket. The proof relies on a decomposition of the identity of the sandpile group into the sum of a constant function and the Laplacian of the graph distance on the approximating graphs.

12.
arXiv (CS.LG) 2026-06-19

Model soups need only one ingredient

arXiv:2602.09689v2 Announce Type: replace Abstract: Fine-tuning large pre-trained models on a target distribution often improves in-distribution (ID) accuracy, but at the cost of out-of-distribution (OOD) robustness as representations specialize to the fine-tuning data. Weight-space ensembling methods, such as Model Soups, mitigate this effect by averaging multiple checkpoints, but they are computationally prohibitive, requiring the training and storage of dozens of fine-tuned models. In this paper, we introduce MonoSoup, a simple, data-free, hyperparameter-free, post-hoc method that achieves a strong ID-OOD balance using only a single checkpoint. Our method applies Singular Value Decomposition (SVD) to each layer's update and decomposes it into high-energy directions that capture task-specific adaptation and low-energy directions that introduce noise but may still encode residual signals useful for robustness. MonoSoup then uses entropy-based effective rank to automatically re-weigh these components with layer-wise coefficients that account for the spectral and geometric structure of the model. Experiments on CLIP models fine-tuned on ImageNet and evaluated under natural distribution shifts, as well as on Qwen language models tested on mathematical reasoning and multiple-choice benchmarks, show that this plug-and-play approach is a practical and effective alternative to multi-checkpoint methods, retaining much of their benefits without their computational overhead.

13.
arXiv (CS.LG) 2026-06-18

Dimension-Free Convergence of Discrete Diffusion Models: Adjoint Equations Induce the Right Space

arXiv:2605.17232v2 Announce Type: replace Abstract: Discrete diffusion has become a leading framework for generative modeling in various applications including language, vision, and biology. Existing convergence theory, however, exhibits fundamental limitations. KL-based analyses diverge under singular priors such as the masked distribution, while bounds in total variation (TV) depend on the state space size $S$ and become vacuous for modern language tasks, where vocabularies contain hundreds of thousands of tokens. We develop a unified adjoint-equation-based framework that establishes dimension-free convergence guarantees in any integral probability metric (IPM). To the best of our knowledge, our bounds are the first to be entirely free of $S$ and applicable to both masked and uniform priors. Importantly, our theory relies only on a single standard rate-matrix regularity assumption and applies to general priors. Five novel techniques drive our improvements: working in the space of observables via adjoint equations rather than directly with probability measures, a regularity analysis that yields bounds on any IPM, a coupling argument that removes $S$-dependence under uniform transitions, and score-marginal cancellation and exit-routing techniques that remove $S$-dependence under masked transitions. Our framework thus sharply departs from prior analyses and avoids the shortcomings of pathspace-KL and existing TV-based approaches. Beyond convergence bounds, our framework provides a versatile toolkit for further theoretical study of discrete diffusion models, including principled choices of loss functions and dimension-free step complexity.

14.
arXiv (CS.CV) 2026-06-16

Learned JPEG Compression for DNN Vision

JPEG, a lossy image compression technique designed for human viewers, has maintained its dominance for decades. However, in the era of artificial intelligence (AI), a substantial portion of image data, often compressed by JPEG, is and will continue to be consumed by deep neural networks (DNNs) instead of humans, thus creating a need to optimize JPEG for DNN inference performance. To this end, we propose learned JPEG compression for DNN vision (J4D), a novel training framework for determining JPEG encoding parameters to minimize compression rate while maximizing DNN inference performance. The major challenge of solving this optimization problem lies in representing the JPEG codec and compression rate in closed form. By incorporating a differentiable soft quantizer based on a probabilistic quantization scheme, we not only obtain a differentiable proxy for the JPEG codec, but are also able to compute the entropy of the coded source analytically, which is a close estimate of the actual compression rate. Equipped with both the differentiable JPEG codec and the information-theoretic rate estimator, we are then able to solve the aforementioned optimization problem with backpropagation. After training, the learned encoding parameters will be subsequently used in actual JPEG encoding based on probabilistic quantization. Extensive experimental results across multiple datasets and DNN architectures demonstrate that J4D consistently and significantly outperforms the default JPEG and other competitive JPEG codecs optimized for DNNs. Notably, compared to the default JPEG, J4D achieves an increase in accuracy by as much as 11.60% at the same rate, or a reduction of compression rate up to 80.05% at the same accuracy. Additionally, with the help of J4D, we show the potential to design universal JPEG encoding parameters for various DNN architectures for the first time.

15.
arXiv (CS.LG) 2026-06-11

Intention Driven Identification of In-Possession Match Phases in Association Football through Temporal Graph Learning

arXiv:2606.09289v2 Announce Type: replace Abstract: Understanding tactical organisation of association football, hereafter referred to as football, requires identifying distinct match phases. Yet in-possession phases are rarely directly observable and are shaped by evolving tactical intentions, rather than spatial patterns alone. This study proposes a data-driven framework for identifying in-possession match phases from spatiotemporal tracking data. Seven German Bundesliga matches recorded at 25 Hz with TRACAB were analysed. A hierarchical phase model was defined with three tactical intentions (Invade Opponent Space, Keep Possession, Scoring) and six phases (Build Up, Progression, Counter Attack, Maintenance, Sustained Threat, Finishing). A Temporal Graph Attention Network (T-GAN) was developed to combine frame-level player-interaction graphs, contextual features, and Transformer-based temporal modelling. Performance was evaluated using frame-level F1 and a sequence-aware Intersection over Truth-Dominance (IoT-D) metric. T-GAN achieved macro-average frame-level F1 scores of 0.87 at the intention level, 0.76 for invasion-related phases, and 0.79 for scoring phases. At the sequence level, mean diagonal IoT-D F1 increased from 0.68 to 0.79 for intentions and from 0.61 to 0.71 for phases after post-processing, indicating improved temporal coherence. Model comparisons showed that sequence modelling was the main driver of segmentation quality, while graph-based relational modelling was particularly beneficial for Counter Attack recognition. Exploratory player attention analysis further suggested that wide and midfield positional groups contributed strongly to phase discrimination. Overall, the framework translates continuous tracking data into tactically interpretable in-possession phase representations, with potential applications in automated match annotation, tactical analysis, and playing-style profiling.

16.
arXiv (CS.LG) 2026-06-18

Estimating carbon pools in the European Shelf sea environment: replacing reanalysis by model-informed machine learning?

Authors:

arXiv:2508.10178v3 Announce Type: replace-cross Abstract: Shelf seas are important for the economy and the carbon cycle, but shelf sea observations for carbon pools are often sparse, or highly uncertain. An alternative can be provided by carbon reanalyses (whether assimilating proxy variables, such as chlorophyll-$a$, or directly carbon), but these are often expensive to run. We propose to use a computationally cheap ensemble of neural networks (i.e. deep ensemble) to learn the relationship between the directly observable (atmospheric, riverine and ocean) variables and marine carbon pools from a coupled physics-biogeochemistry model. The deep ensemble was trained on a North-West European Shelf (NWES) physical-biogeochemistry model free run simulation. After training, the deep ensemble was run using inputs from the NWES reanalysis instead of the free run, demonstrating that it can efficiently predict several NWES carbon pools (e.g., detritus, zooplankton, heterotrophic bacteria) in much better agreement with the reanalysis than the free run, while also providing uncertainty information. We further show that the deep ensemble performs similarly well when it is driven directly by the observations assimilated into the reanalysis, with the limitation that carbon pools can then be predicted only at the observed locations and times. We focus on explainability of the results and demonstrate potential use of the deep ensembles for future climate what-if scenarios. We suggest that model-informed machine learning presents a viable alternative to expensive reanalyses and could complement observations, wherever they are missing and/or highly uncertain.

17.
arXiv (CS.CL) 2026-06-16

Replay What Matters: Off-Policy Replay for Efficient LLM Reinforcement Unlearning

LLM unlearning has emerged as a cost-effective alternative to full retraining for removing hazardous knowledge from pretrained models while preserving general utility. Recent RL-based methods such as RULE reformulate unlearning as learning a refusal behavior, but their on-policy optimization repeatedly samples from the same forget and retain/boundary prompts throughout training. We identify a critical inefficiency in this process: easy cases quickly converge and provide little useful gradient signal, while hard cases near the forget/retain boundary continue to produce low-reward rollouts that are discarded after a single use. To address this issue, we propose ReRULE, an off-policy replay enhancement for reinforcement unlearning. ReRULE stores low-reward hard-case rollout groups in a replay buffer during early GRPO training and reuses them in later stages through importance-sampled off-policy updates, redirecting computation toward boundary cases that still require learning. Theoretically, we show that ReRULE yields a tighter hard-case convergence bound than pure on-policy RULE. Empirically, ReRULE improves MUSE-Books Retain Quality from 46.3 to 56.2 while adding only 5–11% training time across benchmarks. Its limited improvement on the simpler TOFU setting further supports the intended conditional behavior: replay is most beneficial when the hard/easy disparity is pronounced.

18.
arXiv (CS.AI) 2026-06-16

AIRMap: AI-Generated Radio Maps for Wireless Digital Twins

arXiv:2511.05522v4 Announce Type: replace-cross Abstract: Accurate, low-latency channel modeling is essential for real-time wireless network simulation and digital-twin applications. Traditional modeling methods like ray tracing are however computationally demanding and unsuited to model dynamic conditions. In this paper, we propose AIRMap, a deep-learning framework for ultra-fast radio-map estimation, along with an automated pipeline for creating the largest radio-map dataset to date. AIRMap uses a single-input U-Net autoencoder that processes only a 2D elevation map of terrain and building heights. Trained on 1.2M Boston-area samples and validated across four distinct urban and rural environments with varying terrain and building density, AIRMap predicts path gain with under 4 dB RMSE in 4 ms per inference on an NVIDIA L40S-over 100x faster than GPU-accelerated ray tracing based radio maps. A lightweight calibration using just 20% of field measurements reduces the median error to approximately 5%, significantly outperforming traditional simulators, which exceed 50% error. Integration into the Colosseum emulator and the Sionna SYS platform demonstrate near-zero error in spectral efficiency and block-error rate compared to measurement-based channels. These findings validate AIRMap's potential for scalable, accurate, and real-time radio map estimation in wireless digital twins.

19.
arXiv (quant-ph) 2026-06-16

Experimental realization of the complete seven-phase Anderson-localization landscape

arXiv:2606.14825v1 Announce Type: cross Abstract: Anderson localization has evolved far beyond the conventional dichotomy between extended and localized states. Modern localization theory predicts a complete transport hierarchy comprising extended, critical, and localized phases together with all coexistence phases among them, forming a seven-phase Anderson-localization landscape. Despite its fundamental importance, this hierarchy has never been experimentally realized within a single system. Here we realize the complete seven-phase Anderson-localization landscape in a one-dimensional Floquet photonic lattice. By engineering quasiperiodic hopping profiles containing inhomogeneously distributed hopping zeros, we generate critical states and enable their coexistence with extended and localized sectors. The resulting transport regimes are directly resolved through their distinct spatiotemporal dynamics, including ballistic expansion, confined critical oscillations, and persistent localization. We observe all seven phases, including the elusive triply coexisting extended-critical-localized phase, and experimentally track the phase transitions connecting them. Our results establish the first complete experimental map of the Anderson-localization landscape and provide a unified platform for investigating mobility edges, multifractality, and programmable coherent transport.

20.
arXiv (CS.AI) 2026-06-11

FitText: Evolving Agent Tool Ecologies via Memetic Retrieval

arXiv:2605.02411v2 Announce Type: replace Abstract: A semantic gap separates how users describe tasks from how tools are documented. As API ecosystems scale to tens of thousands of endpoints, static retrieval from the initial query alone cannot bridge this gap: the agent's understanding of what it needs evolves during execution, but its tool set does not. We identify this retrieval interface, not planning, as the binding constraint on end-to-end agent performance, and introduce FitText, a training-free framework that makes retrieval dynamic by embedding it directly in the agent's reasoning loop. FitText treats retrieval as test-time evolution of hypotheses: the agent generates natural-language pseudo-tool descriptions (revisable beliefs about the tool it needs), refines them iteratively using retrieval feedback, and explores diverse alternatives through stochastic generation. Memetic Retrieval adds evolutionary selection pressure over candidate descriptions, guided by a tool memory that avoids redundant search. On ToolRet (three domains), FitText's reformulation strategies improve NDCG@5 by 2.7 to 10.6 points over static query retrieval across all base models; on StableToolBench (16,464 APIs) with GPT-5.4-mini, Memetic reaches an 84.3% pooled pass rate, a 26.7-point absolute gain over static query retrieval.

21.
Nature (Science) 2026-06-17

Structure of the pre-initiation complex explains CMGE biogenesis

When cells enter S phase, bidirectional DNA replication is initiated through the kinase-regulated recruitment of three activators (Cdc45, GINS and Pol ε) to a duplex-DNA-loaded double hexamer of minichromosome maintenance (MCM) ATPases. Together, these proteins form two CMGE helicases that establish divergent replication forks as they become separated1. Here, to gain an understanding of CMGE biogenesis, we reconstituted the pre-initiation complex with purified yeast proteins. The cryo-electron-microscopy structure shows a set of firing factors caught in the act of assembling two symmetrical CMGEs. We show how stepwise complex formation reshapes MCM in preparation for DNA opening, and we explain how ATP promotes firing-factor ejection and CMGE maturation. We find that although Sld2 facilitates the recruitment of GINS to MCM, as expected, it also aids the efficient separation of the CMGE dimer, and is essential for the ejection of the lagging strand from MCM. These findings have direct implications for our understanding of the metazoan Sld2 orthologue, RECQL4, and point to a replication-fork establishment mechanism that is conserved across eukaryotes. Cryo-electron microscopy and biochemical reconstitution experiments in yeast provide insight into the assembly of the CMGE complex, a helicase that establishes bidirectional DNA replication in eukaryotic cells, and elucidate the role of the firing factor Sld2.

22.
arXiv (math.PR) 2026-06-16

Small moments of the sensitivity of polynomial threshold functions

arXiv:2606.16004v1 Announce Type: new Abstract: In the first version of Chang, Slote, Volberg, and Zhang's paper [BSA_of_PTF], the authors modify a nice recursive approach due to Kane in [Correct_exponent_for_AS] where he bounded the average sensitivity of polynomial threshold functions. In [BSA_of_PTF] Kane's argument was adopted to estimate the boolean surface area of polynomial threshold function. The bridge is a combinatorial averaging lemma considering all balanced partitions. The lemma serves as a substitute for an additive property of average sensitivity. With the lemma, one can apply a Kane-type algorithm to derive a recurrence. Solving the recurrence then gives an upper bound of $e^{C_d \sqrt{\log n}}$ for the boolean surface area. In the second version of the same paper, the authors derive a polylog upper bound for BSA of PTFs. The difference is that they use a tail estimate for the sensitivity function. With the help of a polynomial restriction lemma in [poly_restriction] they sharpen the upper bound. It is noteworthy that when applying the polynomial restriction, each coordinate is put into each part independently with equal probability. As a result, a partition does not necessarily have equal-size blocks. In other words, it may not be balanced. In this note, we first investigate the effect of different partitioning. Second, we use the recursive method in the first version to derive a polylog upper bound for $\mathbb E[s(x)^{\eta}]$ where $\eta < 1/2$. It is interesting to note the phase transition that happens at $\eta=1/2$ in both versions of the proof (but in a completely different form). Section [PhaseTr-s] treats that.

23.
arXiv (CS.CL) 2026-06-11

GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human

With the rapid advancement of large language models, evaluating human-likeness in open-ended conversation has become increasingly important. However, human-likeness is a form of tacit knowledge that humans perceive intuitively, yet the underlying criteria resist explicit formulation. Human judgments vary widely, with strong agreement on some cases and legitimate disagreement on others. Meanwhile, the criteria behind human judgments remain implicit, leaving no clear basis for constructing cases. Further, what counts as human-likeness is not static, but evolving with model capability and human expectations. Despite progress in evaluation methods such as expert-authored benchmarks, Reward Models, and self-evolving benchmarks, none addresses all three challenges simultaneously. Therefore, we propose GrowLoop, a self-evolving conversation evaluation system that continuously adapts as models advance and scenarios shift. Starting from minimal human seed annotations, LLM agents iteratively extract and refine evaluation rubrics through Heuristic Learning. Human-AI agreement is required where annotators converge, while only plausibility is expected where they diverge. Moreover, the Rubric-Case co-evolution mechanism enables continuous evolution. When the evaluation target shifts, new human seeds expand the system's coverage accordingly. When applied to human-likeness evaluation in open-ended conversation, the AI judge guided by these rubrics not only substantially outperforms existing methods in alignment with human judgments, but also uncovers issues that annotators overlook. The resulting benchmark effectively discriminates models across capability tiers and reveals where they fall short, while generalizing to new scenarios and adapting as models advance. Our work shifts the benchmarking paradigm from manual updates or difficulty scaling to comprehensive, continuous self-evolution.

24.
arXiv (CS.CV) 2026-06-16

Instance-Aware Knowledge Distillation for Semi-Supervised Learning of an On-Board Multi-Task Dense Prediction Model for Collision Avoidance System

Collision avoidance systems have evolved toward camera-based deep learning approaches for driving scene understanding. However, deployment in edge environments such as country clubs is constrained by limited computational resources and unreliable communication infrastructure. Moreover, constructing large-scale datasets for the target domain involves substantial annotation cost. To address these limitations, we propose an instance-aware knowledge distillation framework for semi-supervised learning. Specifically, we generate pseudo labels that mitigate teacher bias by leveraging domain priors from the teacher and instance-centric knowledge from foundation models. The trained lightweight student is deployed in the proposed collision avoidance system and performs multiple dense prediction tasks in real-time. The system detects frontal obstacles and encodes their spatial information into controller area network messages for automated guided vehicle operation. To achieve this, we construct a large-scale country club dataset and perform field validation of the proposed system. Experimental results demonstrate that the student outperforms the large teacher in instance segmentation while mitigating performance degradation in monocular depth estimation. Compared with the teacher, the student reduces FLOPs by 22.68$\times$ and parameters by 14.33$\times$, achieving 6.46 FPS on a low-cost edge device.

25.
arXiv (CS.AI) 2026-06-17

Descriptor: Certus Caliber Classification Gunshot Dataset (C3GD)

arXiv:2606.18135v1 Announce Type: cross Abstract: In this work, we introduce the Certus Caliber Classification Gunshot Dataset (C3GD), a publicly accessible data set developed for the analysis of firearm muzzle blast sounds. The dataset aims to provide a wide variety of firearms, calibers, cartridges, microphones, and microphone locations with metadata detailed beyond what is currently otherwise available. It comprises more than 8000 field-collected data points from 28 firearms across 16 calibers. Because data collection in the field is costly, much of the existing research has been done using gunshot audio collected from the internet, which increases the risk of low-quality data and label noise. This dataset is primarily focused on caliber classification, but can also be used for gunshot detection, audio separation, and audio signal processing, providing a diversified and real-world reference. The dataset aims to provide enough diversity to be able to generalize to more real-world applications while also providing enough metadata for detailed academic analysis.