Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-11

Multi-Agent Reasoning with Adaptive Worker Allocation for Stance Detection

Stance detection requires identifying an author's position toward a target, often from short-form texts where stance is implicit, indirect, or rhetorically framed. Although large language models (LLMs) achieve strong performance on this task, single-pass prompting can be brittle when multiple interpretations are plausible. Existing aggregation strategies, such as majority voting or self-consistency, improve robustness by combining labels, but they discard the intermediate reasoning needed to resolve conflicting interpretations. We introduce a multi-agent reasoning framework with adaptive worker allocation for stance detection that shifts aggregation from label-level voting to reasoning-level synthesis. The framework employs a Manager-Worker architecture in which a Manager adaptively allocates a variable number of Worker agents based on input complexity. Each Worker analyzes the input from a distinct perspective and produces a reasoning-only explanation without emitting a stance label; the Manager then synthesizes these explanations to produce the final prediction. We evaluate the proposed framework on SemEval-2016, P-Stance, and COVID-19 Stance using Llama, Mistral, and Gemini. Results show that the framework yields the largest gains on implicit and context-dependent stance cases, achieving 86.07 Macro-F1 on COVID-19 and 82.90 on SemEval-2016, while remaining competitive on more explicit stance datasets such as P-Stance. These findings suggest that adaptive reasoning-level aggregation is most beneficial when stance cannot be reliably inferred from surface cues alone.

02.
PLOS Computational Biology 2026-06-09

Multi-stable oscillations in cortical networks with two classes of inhibition

by Arnab Dey Sarkar, Bard Ermentrout In the classical view of cortical rhythms, interactions between excitatory pyramidal neurons (E) and inhibitory parvalbumin-expressing interneurons (I) are sufficient to generate gamma- and beta-band oscillations. However, it is now well established that multiple inhibitory interneuron subtypes exist and that they play important roles in the generation and modulation of these rhythms. In this paper, we develop a spiking network model consisting of populations of E, I, and an additional interneuron type, somatostatin-expressing neurons (S), which receive excitation from the E cells and inhibit both the E and I populations. The S cells are further modulated by a third inhibitory subtype, vasoactive intestinal peptide (VIP) neurons, which receive inputs from other cortical areas. We reduce the spiking network to a system of nine differential equations that describe the mean membrane potential, firing rate, and synaptic conductance for each population. Using this reduced model, we identify a wide range of parameters that exhibit multiple coexisting rhythms. Employing tools from nonlinear dynamics, we then explore the roles of the two classes of inhibition, as well as VIP modulation, in shaping the properties of these rhythms.

03.
medRxiv (Medicine) 2026-06-16

Validation of a Smartphone-Image-Based Computer-Vision Model for Lean Mass and Body Fat Estimation Against Dual-Energy X-ray Absorptiometry

Introduction Body composition, rather than body weight alone, is an increasingly important health metric, and preservation of lean mass has become a central concern in obesity treatment, aging, and chronic disease management. Dual-energy X-ray absorptiometry (DXA) provides accurate assessment of fat and lean tissue, but its cost and logistical requirements limit repeated measurement. Computer-vision approaches show promise for estimating adiposity from smartphone images, but lean-mass estimation remains less established. Methods We evaluated a computer-vision body composition model, applied to consumer-grade smartphone photographs, against DXA in a held-out validation sample of 195 adults from an ongoing cross-sectional study. Body fat percentage and total lean mass percentage were co-primary outcomes; for total lean mass percentage, an image-only configuration (no added covariates) was pre-specified as primary. Agreement was quantified using Lin's concordance correlation coefficient (CCC) as the lead statistic, with Pearson correlation, mean absolute error, root mean square error, mean bias, and Bland-Altman limits of agreement. In secondary analyses, appendicular lean mass and total lean mass percentage were each estimated with and without routine anthropometric and demographic inputs (body weight, height, age, and sex). Results Total lean mass percentage agreed with DXA from image features alone (CCC 0.916). Body fat percentage, estimated with routine inputs added, agreed at least as closely (CCC 0.930). Adding routine inputs barely changed agreement for total lean mass percentage but markedly improved it for appendicular lean mass, an absolute quantity that scales with body size. Conclusions A smartphone-image-based model estimated both body fat and lean mass with strong agreement to DXA, with lean mass percentage from image features alone. The approach needs no fixed equipment or ionizing radiation. Whether it can track change over time, including in incretin-based weight loss where lean mass preservation is a concern, was not assessed in this cross-sectional study.

04.
arXiv (CS.AI) 2026-06-16

CIWI-CKT: Chaos-Informed Wave Interference Feature Fusion and Cross-City Knowledge Transfer for Traffic Flow Forecasting

arXiv:2606.15642v1 Announce Type: cross Abstract: Accurate traffic flow prediction remains challenging in cross-city, data-scarce scenarios where limited historical data hinders model generalisation. The chaotic nature of traffic dynamics, complex spatio-temporal dependencies, and heterogeneous urban networks complicate few-shot learning across cities. Existing deep learning approaches either treat traffic as purely deterministic or lack mechanisms to model wave-like interference patterns essential for cross-regime traffic dynamics. To address these limitations, this paper proposes CIWI-CKT, a novel Chaos-Informed Wave Interference Feature Fusion framework with Cross-City Knowledge Transfer. Our framework introduces three core innovations: chaos-informed wave generation that extracts measurable chaos invariants and models traffic as adaptive wave components; meta-interference processing that captures wave interactions between support and query regimes while producing a predictability score for confidence estimation; and chaos-aware meta-learning that enables efficient cross-city knowledge transfer while preserving chaotic characteristics. We establish theoretical guarantees including chaos-to-wave stability, wave-induced dimension reduction, and meta-learning generalisation bounds. Extensive experiments on four real-world traffic datasets demonstrate that CIWI-CKT significantly outperforms state-of-the-art spatio-temporal graph learning, transfer learning, prompt-based, and few-shot methods, improving prediction accuracy while substantially reducing required training data.

05.
arXiv (CS.CL) 2026-06-18

RECOM: A Validity Discrimination Tradeoff in Automatic Metrics for Open Ended Reddit Question Answering

Automatic metrics are the default for evaluating LLM-generated text, yet a metric is quietly asked to do two jobs: tell genuine content alignment from surface coincidence (validity), and tell a better system from a worse one (discriminative power). On open-ended, opinion-driven question answering, the two are in tension. We introduce RECOM (Reddit Evaluation for Correspondence of Models), a contamination-free evaluation dataset of 15,000 r/AskReddit questions (September 2025), each paired with its authentic community replies, which postdate every evaluated model's training cutoff. Scoring five open-source LLMs (7–10B) against every reply each metric paired with a random-derangement noise floor we find that no metric does both jobs well. Cosine similarity separates real from random answers (Cohen's $d \approx 2$) but cannot rank the five models ($|d| < 0.1$); BERTScore precision appears to rank the models (raw $|d|$ up to 0.63), but once response length is controlled this collapses to $|d| = 0.09$ and its validity is weak ($d \approx 0.8$, versus cosine's $\approx 2$). Because every metric scores the same outputs, this validity–discrimination tradeoff is a property of the metrics, not the models, and we argue it stems from representation design. Three independent LLM judges reproduce the validity gap and likewise separate the five models only weakly. We recommend reporting metrics on both axes, with an explicit random-baseline floor. RECOM is publicly available at https://anonymous.4open.science/r/recom-D4B0

06.
arXiv (CS.CV) 2026-06-17

Edit3DGS: Unified Framework for Dynamic Head Editing via 2D Instruction-Guided Diffusion and 3D Gaussian Splatting

We present Edit3DGS, a unified framework for dynamic 3D head editing that integrates 2D instruction-guided diffusion with 3D Gaussian splatting. Unlike prior approaches that separately address frame-based edits or static 3D reconstruction, our method couples semantic controllability in the image domain with photorealistic, temporally consistent 3D representations. Given an input video, editable facial regions are masked and modified using a text-conditioned diffusion model to support fine-grained operations such as expression transformation, attribute modification, and appearance refinement. The edited frames are then aggregated through 3D Gaussian splatting to produce a coherent, high-fidelity avatar that preserves both identity and motion dynamics. To enforce consistency, Edit3DGS incorporates multi-view batch editing and lightweight inpainting strategies that recover lost expressions across timesteps. Experimental results demonstrate that our framework enables controllable, artifact-free head editing with smooth temporal transitions, offering practical applications in virtual avatars, immersive communication, film production, and interactive media.

07.
arXiv (CS.LG) 2026-06-11

Learning Object Manipulation from Scratch via Contrastive Interaction

arXiv:2606.11525v1 Announce Type: cross Abstract: Contrastive Reinforcement Learning (CRL) has seen recent success in a wide variety of goal-conditioned robotics tasks by learning structured representations of the dynamics. However, despite its success in locomotion and simpler control domains, CRL often struggles in interaction-rich manipulation. We argue that a key source of this difficulty is object-centric interaction, such as contact or grasping, that induces distinct changes in the underlying dynamic modes. In this work, we formulate manipulation dynamics as a piecewise-smooth Markov process and show that interaction-induced mode changes create piecewise nonlinear reachability structures that are difficult for standard CRL energy functions to represent and plan over. Based on this analysis, we introduce Interaction-weighted Resampling (IWR). IWR performs interaction-aware resampling around phases before, during, and after interactions, encouraging the learned representation to preserve the mode boundaries that determine future reachability to capture multi-modal and piecewise nonlinear reachability. Across interaction-centric environments, including 2D dynamic control, robotic manipulation, and robot air hockey, IWR improves both sample efficiency and overall performance over prior CRL methods, with 19.8% average improvement in simulation. Finally, using a sim-to-real pipeline with policies trained by IWR, we demonstrate the first real-world goal-conditioned robot air hockey agent capable of hitting goals, improving success from 25% to 60%. Project Page: IWR-arxiv.github.io.

08.
arXiv (CS.CV) 2026-06-16

Decoupled Motion Representation Learning for Moving Infrared Small Target Detection

Infrared small target detection in dynamic scenes remains challenging due to the highly coupled motions among targets, imaging platforms, and dynamic backgrounds. Existing multi-frame methods usually perform implicit temporal modeling, where coherent background dynamics dominate motion correspondence learning, leading to an inherent trade-off between detection and false alarms. In this work, we observe that background motions exhibit strong global coherence, whereas small targets mainly correspond to sparse local motion anomalies. Moreover, many false-alarm responses maintain high consistency with globally coherent motion patterns, indicating that they mainly originate from coherent background dynamics rather than genuine target motions. Based on these observations, we propose a decoupled motion representation learning framework for moving infrared small target detection. Specifically, an explicit motion branch is introduced to model globally coherent motion dynamics using pretrained optical flow priors, together with a structure-preserving self-supervised adaptation strategy for infrared motion correspondence learning. Meanwhile, an implicit motion branch based on deformable feature alignment is designed to capture target-sensitive local motion anomalies under coherent motion guidance. Furthermore, a coherent-motion-guided local anomaly reasoning module is proposed to identify and suppress coherent-motion-induced false responses during localized motion modeling. Extensive experiments on two challenging infrared small target detection benchmarks demonstrate that the proposed method consistently outperforms existing state-of-the-art approaches, particularly in dynamic scenes with complex motions, while maintaining favorable inference efficiency.

09.
arXiv (CS.CV) 2026-06-11

Performance Analysis of YOLOv11 and YOLOv8 for Mixed Traffic Object Detection under Adverse Weather Conditions in Developing Countries

In modern vehicular systems, robust performance under harsh conditions has become a critical problem of autonomous driving. Our study delivers a comprehensive evaluation of the newest iteration of the YOLO series, which is YOLOv11 Nano architecture benchmarked against the widely adopted YOLOv8 Nano as a baseline on a custom fused dataset that combines the Indian Driving Dataset (IDD) [1] and Berkeley Deep Drive Dataset (BDD100K) [2]. We have analyzed the trade-offs among detection accuracy, inference speed, and computational efficiency in high-entropy scenarios involving dense mixed traffic, rain, and low-light conditions. Specifically, YOLOv11n achieves a mean Average Precision (mAP@50) of 46.6%, with a notable 3.2% improvement in Precision over the baseline, effectively reducing false positives in cluttered scenes. Furthermore, the proposed model exhibits enhanced energy efficiency, requiring 22% fewer FLOPs (6.3G vs. 8.1G) while maintaining real-time inference speed of 70.9 FPS on a Tesla T4 GPU, offering an optimal trade-off for safety-critical edge deployment.

10.
medRxiv (Medicine) 2026-06-15

Sociodemographic Disparities in Tafamidis Initiation and Clinical Outcomes in ATTR-CM Across the United States

BACKGROUND Transthyretin amyloid cardiomyopathy (ATTR-CM) is a progressive, life-threatening disease. Sociodemographic factors may influence time to treatment initiation and resulting clinical outcomes, yet these relationships are poorly characterized. OBJECTIVE Assess the effects of sex and race on tafamidis initiation and subsequent outcomes and their interaction with factors such as ATTR-CM type and social deprivation measures. METHODS A retrospective cohort analysis was conducted using the US Komodo Healthcare Map (01/2016-06/2024) among patients with amyloidosis, identified by ICD-10-CM diagnosis codes. Cumulative incidence of treatment initiation and survival probabilities for cardiovascular-related hospitalization (CVH) or death were estimated by Kaplan-Meier, stratified by sex and race. Cox proportional hazards models were fitted for both endpoints to estimate hazard ratios, adjusting for demographics and clinical characteristics. RESULTS Of 11,311 patients identified, White and Black patients (n=9,223) were included in subsequent analyses. Within 12 months of diagnosis, White women had the lowest cumulative incidence of tafamidis initiation (11.4%), followed by Black women (22.0%), Black men (26.7%), and White men (31.0%). Event-free survival at 12 months was lowest in Black women (42.9%), followed by Black men (46.8%), White women (48.6%), and White men (54.4%). Median (95% CI) time to CVH or death was shortest for Black women (8.0 months [6.8-10.0]) followed by Black men (9.9 months [8.8-12.0]), White women (11.0 months [9.6-13.0]), and White men (15.0 months [14.0-16.0]). CONCLUSIONS In this large, real-world cohort of US patients with ATTR-CM, sex and race contributed to disparities in tafamidis initiation and survival, underscoring compounded disparities in both access and outcomes.

11.
arXiv (CS.AI) 2026-06-19

StreamKL: Fast and Memory-Efficient KL Divergence for Boosting Attention Distillation

arXiv:2606.20005v1 Announce Type: cross Abstract: Attention distillation, which trains one attention distribution to match another by minimizing their Kullback-Leibler (KL) divergence, is widely used in knowledge distillation, model compression, continual learning, and sparse-attention LLM training. However, existing approaches materialize both attention distributions before computing the KL reduction, incurring $O(N_QN_K)$ memory and IO costs that become prohibitive at long context lengths. We present StreamKL, the first fused GPU primitive for attention KL divergence that eliminates this quadratic materialization. StreamKL derives a novel online formulation for the coupled two-distribution KL reduction, enabling a single one-pass forward kernel that streams query-key tiles through on-chip SRAM. For the backward pass, StreamKL recomputes attention probabilities tile-by-tile, avoiding storage of quadratic intermediates. We further design and implement efficient GPU kernels with dedicated optimizations. Experiments show StreamKL delivers up to $43\times$ and $14\times$ speedups over baseline methods in the forward and backward passes, respectively. Most importantly, StreamKL reduces the extra HBM footprint of attention distillation from $O(N_QN_K)$ to $O(1)$, enabling long-context distillation on a single GPU.

12.
arXiv (CS.CV) 2026-06-11

How Seemingly Inconsequential Design Choices Dictate Performance of LLMs in Pathology

General-purpose large language models (LLMs) are routinely used as baselines when evaluating specialized pathology models on whole-slide images (WSIs). Because WSIs exceed contemporary model context limits, LLM baselines routinely use small, high-magnification patches processed independently via majority voting, without systematic evaluation of seemingly inconsequential design choices such as patch size, patch count, and magnification. Generalist LLMs have consistently underperformed specialized systems, reinforcing the perception that domain-specific training or architectural adaptation is necessary for pathology tasks involving WSIs. Here, we conduct a systematic factorial analysis of four input design factors: inference mode, patch size, magnification, and patch count. We demonstrate that prior studies have overstated the gap between specialized models and general-purpose LLMs by choosing non-optimized input configurations. On the MultiPathQA benchmark, switching to a single balanced configuration (large patches at lower magnification, processed jointly) raises GPT-5 from 15.1% to 39.5% on cancer-type classification (TCGA) and from 38.1% to 62.9% on organ classification (GTEx). Per-task optimization yields further gains up to 43.9% (TCGA) and 71.6% (GTEx). The same configuration generalizes to two other models and to a fully held-out CPTAC cohort, where it improves Gemini 3 Flash by 23.4 percentage points without any task-specific tuning.

13.
arXiv (CS.LG) 2026-06-16

Semi-Supervised Noise Adaptation: Transferring Knowledge from Noise Domain

arXiv:2606.00558v2 Announce Type: replace Abstract: Transfer learning aims to facilitate the learning of a target domain by transferring knowledge from a source domain. The source domain typically contains semantically meaningful samples (*e.g.*, images) to facilitate effective knowledge transfer. However, a recent study observes that the noise domain constructed from simple distributions (*e.g.*, Gaussian distributions) can serve as a surrogate source domain in the semi-supervised setting, where only a small proportion of target samples are labeled while most remain unlabeled. Based on this surprising observation, we formulate a novel problem termed *Semi-Supervised Noise Adaptation* (SSNA), which aims to leverage a synthetic noise domain to improve the generalization of the target domain. To address this problem, we first establish a generalization bound characterizing the effect of the noise domain on generalization, based on which we propose a Noise Adaptation Framework (NAF). Extensive experiments demonstrate that NAF effectively leverages the noise domain to tighten the generalization bound of the target domain, leading to improved performance. The codes are available at https://github.com/AIResearch-Group/SSNA.

14.
arXiv (CS.LG) 2026-06-12

ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior

arXiv:2505.20076v4 Announce Type: replace Abstract: Post-hoc interpretability methods typically attribute a model's behavior to its components, data, or training trajectory in isolation, and are often tied to a particular level of granularity along the local-to-global spectrum. This leads to explanations that lack a unified view and may miss key interactions. We present ExPLAIND, a theoretically grounded, unified framework that integrates model components, data, and training trajectory while supporting explanations across granularities. We generalize recent work on gradient path kernels, reformulating models trained by AdamW as kernel machines. From the resulting kernel feature maps, we derive novel parameter-wise and step-wise influence scores. We empirically validate the resulting decomposition of model behavior in several settings and apply ExPLAIND to two case studies. Our findings on a Transformer exhibiting Grokking support previously proposed learning phases, while refining the final phase as one in which outer layers align around a representation pipeline learned after memorization. For EuroLLM pretraining, ExPLAIND reveals a two-phase dynamic, with the first characterized by outer-layer MLP learning and the second by increased relative influence of intermediate attention layers. These results establish ExPLAIND as a unified framework for interpreting model behavior and training dynamics.

15.
arXiv (quant-ph) 2026-06-16

Optimizing resource bounds in direct fidelity estimation

arXiv:2606.16336v1 Announce Type: new Abstract: Direct fidelity estimation provides a way to estimate the fidelity between an experimentally prepared state and a desired pure target state without performing full tomography. Two influential formulations were introduced in 2011 by Flammia and Liu and by da Silva, Landon-Cardinal, and Poulin. In these protocols, the total estimation error is controlled through two distinct probabilistic steps: first, the fidelity is approximated using randomly sampled Pauli observables; second, each sampled expectation value is estimated from finitely many measurement outcomes. In this work we show that additional structural information about the noise can substantially sharpen the corresponding resource bounds. In particular, for some canonical channels the effective number of sampled Pauli settings can be reduced, leading to lower measurement cost both in the general pure-state setting and in the case of a stabilizer state. These results illustrate a broader point: worst-case confidence bounds in direct fidelity estimation can be significantly conservative when experimentally relevant structure is ignored. As a technical ingredient, we also revisit the allocation of the total accuracy and confidence budgets between the two probabilistic steps. Reformulating the analysis in terms of separate error parameters yields a constrained optimization problem whose solution lowers the average number of measurements in the general pure-state setting. Numerical simulations based on quantum circuits implemented in Qiskit illustrate both the improvement obtained under structured-noise assumptions and the conservativeness of the original worst-case bounds.

16.
arXiv (CS.AI) 2026-06-11

Mind the Perspective: Let's Reason Recursively for Theory of Mind

arXiv:2606.11724v1 Announce Type: new Abstract: Theory of Mind (ToM) reasoning requires inferring agents' beliefs from partial and asymmetric observations, which remains an open challenge for LLMs. Existing prompting-based approaches improve ToM reasoning through observable-event filtering or temporal belief chains, without explicitly modeling nested beliefs. We introduce RecToM, an inference-time framework for ToM reasoning that models nested beliefs via recursive perspective construction. RecToM constructs each character perspective from the preceding character perspective along the character chain specified by the question, reducing higher-order belief questions to actual-world questions within the final constructed perspective. We further provide a KD45 analysis showing that RecToM's perspective construction induces a well-formed belief modality beyond simple event filtering. Experiments on ToM benchmarks, including Hi-ToM, Big-ToM, and FanToM, across multiple LLM backbones show that RecToM consistently outperforms recent advanced approaches, achieving state-of-the-art performance. Notably, RecToM reaches 100\% accuracy on Hi-ToM with GPT-5.4 and Qwen3.5, a benchmark requiring higher-order ToM reasoning.

17.
arXiv (CS.AI) 2026-06-18

Controllable Quantum Memory Capacity in Quantum Reservoir Networks with Tunable partial-SWAPs

arXiv:2605.12713v3 Announce Type: replace-cross Abstract: In the field of quantum reservoir computing (QRC), many different computational models and architectures have been proposed. From these models, we identify feedback-based models – which use a feedback mechanism to re-embed classical measurements from the QRC – and recurrent models – which use a multi-register approach with memory and readout qubits – as the two major competing architectures that have been discussed and validated on hardware. In this paper, we advance upon the recurrent architectures, which employ a two register approach to endow the QRC with a fading memory. While these approaches have been validated on hardware and have demonstrated great real-world performance on noisy-intermediate-scale-quantum (NISQ) quantum processing units (QPUs), the exact mechanism through which the memory capacity arises is not completely understood or fully controllable. With this, we augment the recurrent approaches and present a hardware-realizable mechanism, which we call a tunable partial-SWAP, that allows for the direct control of the rate of memory dissipation from a QRN implemented on a gate-based QPU. The theory behind this mechanism is discussed in terms of a controlled amplitude-damping channel and validation experiments using a randomized short-term memory capacity (STMC) recall benchmark and the NARMA-5 dataset are conducted using simulation and IBM QPUs, respectively.

18.
arXiv (CS.AI) 2026-06-16

MedAI: Evaluating TxAgent's Therapeutic Agentic Reasoning in the NeurIPS CURE-Bench Competition

arXiv:2512.11682v2 Announce Type: replace Abstract: Therapeutic decision-making in clinical medicine constitutes a high-stakes domain in which AI guidance interacts with complex interactions among patient characteristics, disease processes, and pharmacological agents. Tasks such as drug recommendation, treatment planning, and adverse-effect prediction demand robust, multi-step reasoning grounded in reliable biomedical knowledge. Agentic AI methods, exemplified by TxAgent, address these challenges through iterative retrieval-augmented generation (RAG). TxAgent employs a fine-tuned Llama-3.1-8B model that dynamically generates and executes function calls to a unified biomedical tool suite (ToolUniverse), integrating FDA Drug API, OpenTargets, and Monarch resources to ensure access to current therapeutic information. In contrast to general-purpose RAG systems, medical applications impose stringent safety constraints, rendering the accuracy of both the reasoning trace and the sequence of tool invocations critical. These considerations motivate evaluation protocols treating token-level reasoning and tool-usage behaviors as explicit supervision signals. This work presents insights derived from our participation in the CURE-Bench NeurIPS 2025 Challenge, which benchmarks therapeutic-reasoning systems using metrics that assess correctness, tool utilization, and reasoning quality. We analyze how retrieval quality for function (tool) calls influences overall model performance and demonstrate performance gains achieved through improved tool-retrieval strategies. Our work was awarded the Excellence Award in Open Science. Complete information can be found at https://curebench.ai/.

19.
arXiv (CS.CL) 2026-06-12

Learning to Reason by Analogy via Retrieval-Augmented Reinforcement Fine-Tuning

Retrieval-augmented generation (RAG) has become a standard mechanism for grounding language models in external knowledge, yet conventional retrieval based on lexical or semantic similarity is poorly suited for complex reasoning tasks: a semantically similar problem may demand an entirely different solution strategy, while a superficially different problem may share the same underlying reasoning pattern. We propose Retrieval-Augmented Reinforcement Fine-Tuning (RA-RFT), a post-training framework that teaches language models to reason by analogy. RA-RFT uses gold-relevance distillation to train a retriever that ranks contexts by expected reasoning benefit rather than semantic overlap, and then fine-tunes the policy model via reinforcement fine-tuning methods with retrieved analogous demonstrations, so the model learns to leverage reasoning traces under verifiable outcome rewards. We further analyze the diversity of retrieved contexts and find that reasoning-aware retrieval surfaces complementary solution strategies that provide distinct reasoning scaffolds for individual problems. Across challenging mathematical reasoning benchmarks, RA-RFT consistently outperforms standard reinforcement fine-tuning methods. For example, it improves AIME 2025 average@32 accuracy by 7.1 and 2.8 points over GRPO for Qwen3-1.7B and Qwen3-4B respectively – suggesting that reasoning-aware retrieval is a complementary axis of improvement and orthogonal to advances in reward design or training curricula.

20.
arXiv (CS.LG) 2026-06-16

Analytic Torsion and Spectral Gap Capture Persistent-Laplacian Performance

arXiv:2606.16990v1 Announce Type: new Abstract: While persistent Laplacians (PL) offer a richer geometric representation of data than persistent homology, utilizing their full eigenspectrum for learning tasks is often hampered by high dimensionality and the ``varying length'' problem across different filtration scales. We propose a compact spectral representation that distills the persistent Laplacian into three mathematically grounded invariants: Betti numbers, the spectral gap, and analytic torsion. Across benchmark datasets including MNIST, QM-3D, and SKEMPI WT, we demonstrate that this reduced feature space captures the essential predictive signal of the full spectrum, and in some cases outperforms it, while significantly reducing computational overhead and preventing the noise introduced by higher-frequency eigenvalues. Our results suggest that these invariants provide a principled, fixed-length interface between spectral geometry and topological learning.

21.
Nature Medicine 2026-06-16

<b>Engineered heart muscle passes early clinical milestone</b>

Engineered heart muscle allografts derived from induced pluripotent stem cells show promising early outcomes in patients with treatment-refractory advanced heart failure with reduced left ventricular ejection fraction, in support of further clinical investigation. Engineered heart muscle allografts derived from induced pluripotent stem cells show promising early outcomes in patients with treatment-refractory advanced heart failure with reduced left ventricular ejection fraction, in support of further clinical investigation.

22.
arXiv (CS.CV) 2026-06-11

Q-Fold: Query-Aware Focus-Context Spatio-Temporal Folding for Long Video Understanding

Long-video understanding remains challenging for multimodal large language models, because temporally extended videos often contain thousands of frames and are therefore expensive to process exhaustively. Existing methods usually construct compact visual inputs from long videos under a limited visual budget. However, most of them still follow a frame-centric paradigm and apply similar representations to retained content regardless of its importance. This makes it difficult to preserve both high-fidelity visual evidence and broad temporal coverage. To address this issue, we propose Q-Fold, a training-free input construction framework for long-video understanding. Instead of treating isolated frames as the basic modeling unit, Q-Fold operates on contiguous temporal segments and constructs a heterogeneous Focus–Context representation under query guidance. Query-relevant segments are preserved as high-fidelity Focus Frames, while less relevant segments are folded into chronology-preserving contextual layouts. In this way, Q-Fold preserves critical visual evidence and broad temporal coverage, while better maintaining local temporal continuity within short segments. Experiments on four long-video benchmarks with multiple Video-MLLMs show that Q-Fold consistently improves performance without increasing the input budget. Notably, it achieves gains of up to 9.1 percentage points on an ultra-long video benchmark. Code will be made publicly available.

23.
Nature (Science) 2026-06-08

Daily briefing: Human embryo genomes precisely altered

作者:

The use of ‘base editing’ to precisely tweak human embryos has divided researchers. Plus, the number of lives saved by less-polluting cars in China and how to tip the world towards a sustainable future. The use of ‘base editing’ to precisely tweak human embryos has divided researchers. Plus, the number of lives saved by less-polluting cars in China and how to tip the world towards a sustainable future.

24.
arXiv (CS.CL) 2026-06-11

Fast Speech Foundation Model Distillation Using Interleaved Stacking

Distilling a large speech foundation model (SFM) into an efficient student model has been successfully applied to low-resource environments. Although distillation reduces inference latency, it requires an additional student model training. However, the training efficiency of SFM distillation remains underexplored. In this work, we explore training acceleration of SFM distillation to speed up model deployment. We examine the potential of stacking, in which the model depth is progressively increased through training until the target model depth is reached. While existing stacking methods improve training speed, they suffer from performance degradation. To handle this limitation, we propose interleaved stacking, a novel stacking method that consistently preserves layer position throughout the stacking process. This property is particularly critical in SFMs, in which each layer encodes distinct layer-specific knowledge. We validate the effectiveness of the proposed method on SUPERB.

25.
arXiv (math.PR) 2026-06-11

The Geometry of Admissible Short Selling in Discrete-Time Stochastic Portfolio Theory

arXiv:2606.11191v1 Announce Type: cross Abstract: While discrete-time Stochastic Portfolio Theory (SPT) provides a robust framework for market analysis, existing work on functional generation has predominantly focused on long-only portfolios defined on the entire unit simplex. This paper extends the geometric framework of functional generation to the broader class of bankruptcy-proof long-short portfolios defined on local market state spaces. We establish that, within this admissible setting, pseudo-arbitrage is fully characterized by the concavity of the generating function on the market state space, thereby relaxing the usual global domain requirement. A central contribution of this work is a geometric characterization of the short-selling mechanism. We prove that the presence of short selling is equivalent to the negativity of the maximal concave extension of the generating potential. This phenomenon is linked to the steepness of the logarithmic gradient as the market approaches a zero boundary nested inside the simplex. To systematically exploit this mechanism, we introduce the barycentric scaling transformation, a constructive methodology that maps classical long-only generating functions onto restricted domains to engineer admissible strategies with controlled short-selling exposure. Finally, through the analysis of specific shrunken portfolios, we identify a geometric phase transition: under suitable boundary conditions, admissible strategies exhibit a long-only core and a short-selling region in a qualitative sense (without asserting an exact partition of the state space). This provides a unified geometric perspective on relative arbitrage beyond the long-only constraint.