Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
medRxiv (Medicine) 2026-06-24

Topical fresh Taraxacum mongolicum wet dressing as an adjunct to ceftriaxone for localized skin and soft tissue infections: A single-center assessor-blinded randomized controlled trial

Background: Localized skin and soft tissue infections may need systemic antibacterials, but local inflammation can delay symptom recovery. We evaluated whether topical fresh Taraxacum mongolicum wet dressing added to ceftriaxone was associated with short-term benefit in selected clinically stable adults. Methods: In this single-center, assessor-blinded, three-arm randomized trial, 180 adults aged 18-74 years were randomized 1:1:1 to topical T. mongolicum plus intravenous ceftriaxone, topical T. mongolicum alone, or ceftriaxone alone for 7 days. The primary outcome was day-7 clinical response assessed by blinded independent assessors using prespecified global clinical improvement criteria. Analyses followed the intention-to-treat principle; sensitivity analyses assessed robustness. Results: Day-7 clinical response rates were 91.67% (55/60), 76.67% (46/60), and 68.33% (41/60) in the combined, T. mongolicum, and ceftriaxone groups, respectively (overall P = 0.006). Compared with ceftriaxone alone, combined therapy had a higher response rate (risk difference, 23.3 percentage points; 95% CI, 9.6 to 37.0; risk ratio, 1.34; 95% CI, 1.11 to 1.62). Sensitivity analyses were directionally consistent. Secondary outcomes and bacterial clearance favored the combined group. No serious adverse events were reported. Conclusions: In selected clinically stable adults with localized skin and soft tissue infections, adjunctive topical fresh T. mongolicum plus ceftriaxone was associated with improved short-term outcomes compared with ceftriaxone alone. Findings require cautious interpretation because this was a single-center, partially blinded trial without a placebo dressing control. The dressing should not replace antibiotics, drainage, or urgent care when indicated. Trial registration: International Traditional Medicine Clinical Trial Registry, ITMCTR2026000549.

02.
arXiv (CS.AI) 2026-06-19

Bidirectional Tutoring for Developmental Motor Learning in Robots: Co-Developed Interaction Dynamics Support Stable Learning

arXiv:2606.19728v1 Announce Type: cross Abstract: Infants are well known to develop their motor skills through dense interaction with caregivers. Although such social interaction is crucial for human development, motor-skill learning in robots is often treated as a unidirectional process in which robots passively receive demonstrations from tutors. This overlooks a key property of social interaction: it is inherently bidirectional, with tutor and learner dynamically adapting to each other. In such interactions, the robot's past experiences may function as prior constraints that shape the dynamics of their co-developed trajectories. We hypothesize that bidirectional tutoring allows such constraints to guide the formation of consistent behavioral patterns that preserve behavioral coherence and support generalization, whereas unidirectional interaction lacks such constraints and leads to broader, less consistent behavioral patterns. To examine this hypothesis, we conducted two experiments with a physical humanoid robot performing an object manipulation task: one involving human-robot interaction and another employing an AI tutor interacting with the real robot through an adaptive intervention mechanism designed to examine whether similar effects would emerge under more controlled conditions. We implement the developmental learning framework using a free-energy-principle-based neural network extended with generative replay, which supports stable sequence-by-sequence learning from single tutored episodes. Across both settings, bidirectional tutoring fostered consistent behaviors and stage-wise generalization, while the robot gradually required less tutor guidance. These results suggest that bidirectional tutoring, as an embodied and socially grounded approach, provides an effective scaffold for developmental motor learning in robots.

03.
arXiv (CS.CV) 2026-06-11

Benchmarking Cross-Domain Audio-Visual Deception Detection

Automated deception detection is crucial for assisting humans in accurately assessing truthfulness and identifying deceptive behavior. Conventional contact-based techniques, like polygraph devices, rely on physiological signals to determine the authenticity of an individual's statements. Nevertheless, recent developments in automated deception detection have demonstrated that multimodal features derived from both audio and video modalities may outperform human observers on publicly available datasets. Despite these positive findings, the generalizability of existing audio-visual deception detection approaches across different scenarios remains largely unexplored. To close this gap, we present the first cross-domain audio-visual deception detection benchmark, that enables us to assess how well these methods generalize for use in real-world scenarios. We used widely adopted audio and visual features and different architectures for benchmarking, comparing single-to-single and multi-to-single domain generalization performance. To further exploit the impacts using data from multiple source domains for training, we investigate three types of domain sampling strategies, including domain-simultaneous, domain-alternating, and domain-by-domain for multi-to-single domain generalization evaluation. We also propose an algorithm to enhance the generalization performance by maximizing the gradient inner products between modality encoders, named ``MM-IDGM". Furthermore, we proposed the Attention-Mixer fusion method to improve performance, and we believe that this new cross-domain benchmark will facilitate future research in audio-visual deception detection.

04.
arXiv (CS.CV) 2026-06-16

GOOSE-M2F: Adapting Mask2Former for High-Fidelity, Long-Tailed Fine-Grained Semantic Segmentation in Unstructured Outdoor Terrain

We present GOOSE-M2F, a task-specific adaptation of Mask2Former for the GOOSE 2D Fine-Grained Semantic Segmentation (FGSS) Challenge at ICRA~2026. The GOOSE benchmark spans 64 fine-grained classes across unstructured outdoor terrain with a severely long-tailed distribution, where rare classes occupy fewer than 50 pixels per image. We extend the Swin-Large Mask2Former baseline with three targeted contributions: (1)200 Object Queries to eliminate representational saturation; (2)a Feature Refinement Module (FRM) combining ASPP-lite and CBAM dual-attention; and (3)an Auxiliary Supervision Head that delivers direct per-pixel gradients for rare classes. A multi-stage training strategy pairs Distribution-Balanced loss, Rare-Class Copy-Paste augmentation, dynamic IoU-aware re-weighting, and EMA. At inference, a dense sliding-window engine with 2D Gaussian kernel blending and 4-scale TTA adds +10.57\%. GOOSE-M2F achieves 70.08\% Official Composite mIoU (63.55\% fine, 76.61\% coarse), placing 3rd on the GOOSE 2D FGSS leaderboard. Code and trained models are publicly available at: \href{https://github.com/Aditya-Lingam-9000/GOOSE-M2F}{Github GOOSE-M2F Code} and \href{https://huggingface.co/XYZ9843/GOOSE-M2F}{Hugging Face GOOSE-M2F}.

05.
arXiv (CS.CV) 2026-06-19

OTCHA: Optimal Transport-driven Confidence-aware Latent Hub Alignment for Multi-View Medical Image Classification

Multi-view imaging, such as mammography and chest radiography, is a standard component of clinical practice. However, medical images are often unregistered and contain view-specific artifacts or irrelevant background cues that can obscure diagnostically relevant findings. Many existing methods directly fuse per-view representations, allowing such irrelevant content to contaminate the fused embedding and reducing robustness under varying view configurations. We propose OTCHA, a confidence-aware latent hub token alignment module based on optimal transport (OT) that refines patch tokens before fusion for multi-view classification. OTCHA introduces a set of learnable latent hub tokens shared across views. For each view, we compute an OT plan between patch tokens and hub tokens that jointly considers feature similarity and geometry, and augment the OT formulation with token-conditional dustbins to enable partial matching and discard irrelevant tokens. The resulting transport plan provides token-wise matching confidence, which gates hub-mediated message passing and weights a novel optimal-transport-based representation alignment loss to stabilize refinement. Experiments on three multi-view medical image datasets demonstrate consistent improvements over competing baselines across diverse anatomies and view configurations. Our code is available at https://github.com/labhai/OTCHA.

06.
arXiv (quant-ph) 2026-06-16

Noise-induced shallow circuits and absence of barren plateaus

arXiv:2403.13927v3 Announce Type: replace Abstract: Motivated by realistic hardware considerations of the pre-fault-tolerant era, we comprehensively study the impact of uncorrected noise on quantum circuits. We first show that in the task of estimating observable expectation values any noise truncates most quantum circuits to effectively logarithmic depth. We then prove that quantum circuits under any non-unital noise do not exhibit barren plateaus for cost functions composed of local observables. However, by using the effective shallowness, we also design an efficient classical algorithm to estimate observable expectation values within any constant additive accuracy, with high probability over the choice of the circuit, in any circuit architecture. Taken together, our results establish that, unless we carefully engineer quantum circuits to take advantage of the noise, noisy quantum circuits are unlikely to offer an advantage over shallow ones for algorithms that output observable expectation value estimates, such as many variational quantum machine learning proposals.

07.
arXiv (CS.CL) 2026-06-16

Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving

Large language models (LLMs) have shown promising first-order logic (FOL) reasoning capabilities with applications in various areas. However, their effectiveness in complex mathematical reasoning involving multi-step FOL deductions is still under-researched. While LLMs perform competitively on established mathematical reasoning benchmarks, they struggle with multi-step FOL tasks, as demonstrated by Deepseek-Prover-V2-7B's low accuracy (4.2%) on our proposed theorem proving dataset. This issue arises from the limited exploration of diverse proof strategies and the potential for early reasoning mistakes to undermine entire proofs. To address these issues, we propose DREAM, a self-adaptive solution that enhances the Diversity and REAsonability of LLMs' generation strategies. DREAM incorporates an Axiom-Driven Strategy Diversification mechanism to promote varied strategic outcomes and a Sub-Proposition Error Feedback to help LLMs reflect on and correct their proofs. Our contributions include pioneering advancements in LLMs' mathematical reasoning through FOL theorem proving, introducing a novel inference stage solution that improves performance by 0.6% to 6.4%, and providing a curated dataset of 447 mathematical theorems in Lean 4 format for evaluation.

08.
arXiv (math.PR) 2026-06-15

Uniform-in-time error estimates for McKean-Vlasov SDEs with common noise and stochastic algorithms

arXiv:2606.14170v1 Announce Type: new Abstract: In this work, by construct an asymptotic coupling by reflection, we first explore the uniform-in-time estimate on probability distance for two measure-valued processes induced by a McKean-Vlasov SDE with common noise and an interacting particle system, where the drift terms are dissipative merely in the long distance. As direct applications of this estimate, we establish the uniform-in-time error estimates for the numerical solutions derived via backward/tamed/adaptive Euler-Maruyama methods. Moreover, as another direct application, the uniform-in-time conditional propagation of chaos is quantified.

09.
arXiv (CS.CV) 2026-06-16

ScoutVLA: UAV-Centric Active Perception via a Dual-Expert VLA Model for Open-World Embodied Question Answering

Aerial Embodied Question Answering (EQA) requires Unmanned Aerial Vehicles (UAVs) to actively perceive the environment and answer natural language questions. Existing outdoor EQA systems usually stop once the target enters the UAV's field of view, leaving the fine-grained viewpoint adjustment needed for evidence-seeking questions largely unresolved. To address this issue, we introduce FG-EQA, a fine-grained active perception EQA benchmark with more than 40K simulated trajectories and 1K real-world trajectories. Drawing inspiration from the ``waggle dance'' of scout bees, which iteratively adjust their flight paths to verify target information, we propose ScoutVLA, an evidence-driven Vision-Language-Action model for outdoor EQA. To emulate this active exploration behavior, ScoutVLA features a decoupled dual-expert architecture: a vision-language expert infers the semantic intent to identify missing evidence, while an independent action expert employs high-DoF flow matching to generate continuous viewpoint-refinement trajectories. To balance the competing demands of continuous control and semantic reasoning, we devise a decoupled training strategy with a knowledge insulation mechanism that prevents the action gradients from erasing the model's multimodal reasoning ability. Extensive simulated experiments and a qualitative real-world field study both verify the superiority of ScoutVLA over the state-of-the-art baselines, demonstrating a 10.48$\boldsymbol{\times}$ higher average strict success rate and a 7.72$\boldsymbol{\times}$ higher average QA correctness.

10.
arXiv (CS.CV) 2026-06-18

Hybrid Transformer-Mamba for Weakly Supervised Volumetric Medical Segmentation

Weakly supervised segmentation enables model training from plane-level labels. Existing methods often rely on 2D encoders, neglecting the volumetric nature of medical data. We propose TranSamba, a hybrid Transformer-Mamba architecture designed to capture 3D context via cross-plane modeling. TranSamba augments a Vision Transformer backbone with Cross-Plane Mamba blocks, leveraging linear-time modeling for efficient information exchange across neighboring planes. This exchange improves in-plane self-attention and subsequent attention maps for object localization. TranSamba maintains linear time complexity and constant space complexity with respect to the input volume depth. Extensive experiments on three datasets covering diverse modalities and pathologies show that TranSamba achieves state-of-the-art performance, demonstrating the generalizable efficacy of cross-plane modeling. Code is available at: https://github.com/YihengLyu/TranSamba.

11.
arXiv (quant-ph) 2026-06-25

Fast and Parallel High-Rate STAR Architecture for Megaquop Quantum Simulation

arXiv:2606.25011v1 Announce Type: new Abstract: Fault-tolerant quantum simulation is approaching a phase where encoding overhead, logical Clifford operations, magic-state preparation, and rotation synthesis must be optimized together for efficient implementation. Space-Time efficient Analog Rotation (STAR) architectures reduce two of these costs by preparing small-angle rotation magic states directly, and the transversal STAR variant further lowers the Clifford overhead. Existing concrete implementations, however, largely inherit the low $O(1/d^2)$ encoding rate of the surface code, while high-rate codes have not yet been integrated into comparably explicit architectures. Here, we introduce a high-rate STAR architecture for local lattice Hamiltonian simulation based on a symmetry-driven co-design of the algorithm, QEC code, and neutral-atom hardware. Translation symmetries of the target lattice determine the choice of bicycle chain codes, a tunable family of self-dual bivariate bicycle codes that natively implement Clifford gates required for lattice simulation. Disjoint logical representatives allow STAR injections to be performed in parallel on all $k$ logical qubits in a code block, amortizing resource state preparation and enabling practical post-selection rates. On neutral-atom platform, the same translation symmetry compiles the key logical operations into low-depth, hardware-native acousto-optic-deflector shifts. End-to-end estimates show that an $8 \times 8$ transverse-field Ising simulation to $T^* \approx 8 (zJ)^{-1}$ requires $2240$ physical qubits and $\sim 200$ s per shot, a $\sim 5.5\times$ space reduction relative to a surface code STAR baseline at comparable speed; for Fermi-Hubbard dynamics to $T^* \approx 4 (zt)^{-1}$, the corresponding estimates are $\sim 6300$ physical qubits and $\sim 200$ s per shot. These results provide a concrete route toward early fault-tolerant quantum simulation with high-rate codes.

12.
arXiv (CS.LG) 2026-06-17

Adaptable Method for Crystal Design across Diverse Constraints and Objectives with Pretrained Property Predictors

arXiv:2410.08562v5 Announce Type: replace-cross Abstract: Advanced crystal design can accelerate materials discovery across applications from photovoltaics to spintronics. Practical design must satisfy multiple properties and physical constraints, yet existing machine-learning-based approaches to such design often depend on large datasets, retraining, or task-specific generators. Here, we show that direct predictor-guided gradient optimization enables data-efficient, constraint-rich crystal design by combining off-the-shelf predictors with site-wise element masks, template initialization, and task-specific losses. In perovskites, it outperformed generative and Bayesian baselines under three targets – band gap, formation energy, and tolerance factor – and two hard constraints. DFT assessment further showed band-gap targeting competitive with a leading generative model despite using predictors trained on roughly one-tenth of the data. By flexibly combining pretrained predictors with application-oriented masks and custom losses, the same framework supported half-metal design. Such modularity could help researchers and engineers translate diverse application requirements directly into optimized candidate crystals with minimal computational cost.

13.
arXiv (quant-ph) 2026-06-16

Charging Quantum Batteries with Chiral Squeezing

arXiv:2606.16764v1 Announce Type: new Abstract: We propose a quantum-battery charger based on a driven bosonic Kitaev chain (BKC), where chiral squeezing converts passive input fluctuations into ordered, non-passive battery states. While a coherent input pulse exhibits phase-sensitive chiral transport, the charging dynamics is dominated by bidirectionally propagating fluctuations that are amplified and squeezed into orthogonal quadratures at opposite chain ends. In contrast to conventional phase-preserving amplifiers, our scheme stores largely extractable energy and achieves a work-like signal-to-noise ratio (SNR) near unity, even in the presence of thermal noise and moderate symmetry-preserving disorder.

14.
arXiv (CS.CL) 2026-06-25

To Isolate or to Score? Model-Adaptive Assessment for Cost-Efficient Multi-Agent RAG

Multi-agent document assessment for retrieval-augmented generation is computationally expensive, driving practitioners toward smaller, deployable models whose assessment mechanisms remain poorly understood. We conduct a controlled study of training-free interventions on 7B-9B instruction-tuned models across diverse QA benchmarks, revealing a sharp dichotomy in how models benefit from assessment. For weaker baselines, the dominant mechanism is per-document isolation. Astoundingly, assessment-free isolation matches full multi-agent assessment, demonstrating that resolving multi-document context confusion, rather than scoring quality, drives outsized gains of up to 50 percentage points. Conversely, for strong baselines where scoring quality matters, we introduce Reasoning-Score Coupling, a label-free perturbation probe that classifies scoring behavior. Integrating these findings, we propose MADARA, a model-adaptive routing architecture. Crucially, MADARA's diagnostic thresholds derived from a single pilot model generalize zero-shot to four unseen model families, providing a robust, lightweight pipeline to eliminate computational overhead.

15.
arXiv (CS.CL) 2026-06-19

Characterizing Narrative Content in Web-scale LLM Pretraining Data

The narrative composition of web-scale LLM pretraining corpora remains largely unexplored even though narrative is a fundamental mode of human communication. We present the first fine-grained study of narrative features in Dolma, a 3-trillion-token open pretraining corpus. Drawing on narrative theory, we design a framework spanning three core narrative elements (agency, setting, and events) operationalized as 11 interpretable dimensions. After sampling and annotating a diverse set of 400 passages, we finetune and validate NarraBERT, a RoBERTa-based model for fine-grained narrative prediction. We apply NarraBERT to 3M passages, resulting in a new dataset, NarraDolma. We find (i) narrative structure is measurable at scale across extremely heterogeneous data, (ii) we uncover a continuous, multidimensional narrative structure underlying web text, and (iii) narrative qualities are unequally distributed across pretraining sources and topics in ways that current curation practices neither measure nor account for. Our framework, dataset, and analyses provide a foundation for understanding how narrative qualities are distributed in LLM pretraining data and for studying how data composition affects narrative reasoning tasks. We publicly release NarraDolma and NarraBERT.

16.
arXiv (CS.CL) 2026-06-24

MedBench v5: A Dynamic, Process-Oriented, and Hallucination-Aware Benchmark for Clinical Multimodal Models

Existing medical AI benchmarks lack process visibility, atomic skill evaluation, and integrated hallucination detection. We introduce MedBench v5, a redesigned benchmark for clinical multimodal models (language, vision-language, and agent systems) that moves from static QA to dynamic, process-oriented evaluation. MedBench v5 features: (1) a dual-dimensional framework combining Clinical Cognitive Responsiveness (14 sub-dimensions) and Medical Atomic Skills (4 agent environments), covering 63 tasks; (2) three switchable information-flow stressors (omission, contradiction, evidence delay) for factorized degradation analysis; (3) a dynamic process audit protocol with five reasoning nodes that produces model-specific failure fingerprints; (4) hallucination propagation monitoring across initiation, propagation, anchoring, and contradiction interaction-capturing silent hallucination. Experiments on frontier models show that strong overall task performance does not guarantee process stability: stressors mainly disrupt contradiction detection, diagnosis updating, hallucination propagation, and contradiction-based self-correction, while final evidence grounding can remain superficially stable. MedBench v5 provides a unified infrastructure for capability profiling, controllable stress testing, process auditing, and hallucination trajectory analysis in clinical AI evaluation.

17.
arXiv (math.PR) 2026-06-11

On Skorokhod Problems for Reflected and Singular Stochastic Heat Equations

arXiv:2606.11951v1 Announce Type: new Abstract: We prove a Skorokhod decomposition for the Markov processes $X^a$ and $X$ associated to the gradient Dirichlet forms with respect to the measures $\rho^a\mu^{\beta}$ and $\rho\mu^{\beta}$, respectively. Here, $\mu^{\beta}$ is the law of the standard Brownian bridge $\beta$, while $\rho^a$ and $\rho$ denote densities which are given by $\rho^a(z) := \mathbf{1}_{[0,\infty)}(\bar{z}_a)$ and $\rho(z) := \int_0^1 \mathbf{1}_{[0,\infty)}(\bar{z}_x) \, dx$, respectively, for all $z\in L^2(0,1)$ which have a (unique) continuous representative $\bar{z}$ which vanishes at zero and one. To this end, we derive infinite-dimensional integration by parts formulas (IbPFs) w.r.t. $\rho^a\mu^{\beta}$ and $\rho\mu^{\beta}$, which contain Hida distributions alongside the usual drift terms. We represent these Hida distributions by integration w.r.t. vector measures of bounded variation. The vector measures in question are constructed via an approximation argument, making use of a generalization of Prokhorov's theorem for vector measures. We further prove that, almost surely, the sample paths of $X^a$ and $X$ take values in the equivalence class of continuous functions vanishing at zero and one for all and $dt$-almost all times, respectively. The main motivation for studying $\rho^a\mu^{\beta}$ and $\rho\mu^{\beta}$ lies in the fact that the distributional terms in their IbPFs are simplifications of the distributional term in the IbPF w.r.t. the law of the reflected Brownian bridge on the unit interval $\mu^{|\beta|}$. Representing the latter by integration w.r.t. a vector measure of bounded variation is still an open problem.

18.
arXiv (CS.CL) 2026-06-25

Evaluating LLMs on Real-World Software Performance Optimization

Software performance optimization is a notoriously complex and manual task. Despite the growing use of Large Language Models (LLMs) for code refinement, we still lack benchmarks that capture how optimization actually happens in real-world codebases. Existing frameworks often oversimplify the problem by focusing on isolated functions or a single performance metric, missing the critical trade-offs between execution time and memory footprint, the inherent noise of the measurement environment, and the variability introduced by different input data and execution conditions. We address this by introducing SWE-Pro, a repository-level benchmark derived from 102 expert-written optimizations from open-source projects. Unlike previous benchmarks, SWE-Pro pairs each task with parameterized tests to evaluate runtime, peak memory, and Time-Weighted Memory Usage (TWMU) across varying input data and execution conditions under noise-aware measurement conditions. Our evaluation shows that current LLMs struggle significantly: runtime gains are negligible, and memory optimizations are nearly non-existent. This stands in sharp contrast to expert implementations, which achieve an aggregate speedup of 15.5x and peak memory reduction of 171.3x over benchmark tasks. Expert-written improvements are observed in 91.2% of tasks for runtime and 65.7% for peak memory. Our findings expose a substantial gap between current LLM capabilities and the demands of expert-level engineering.

19.
medRxiv (Medicine) 2026-06-16

AI-assisted continuous-time modelling of metastatic breast cancer reveals subtype-specific spatiotemporal organ interactions

Metastatic breast cancer is one of the leading causes of premature mortality among women worldwide. A major barrier to optimal care is the marked heterogeneity in both the temporal dynamics of metastatic spread and the organ-specific spatial distribution of metastases. Existing analyses do not adequately capture this complexity, as they either neglect temporal dependencies or assume independence between metastasic sites. As a result, it remains unclear how established metastases influence subsequent organ-specific dissemination. We address this question using patient-level longitudinal trajectories from a large multicentre real-world metastatic breast cancer registry, combined with an AI-assisted disease-progression modelling framework based on continuous-time Markov chains that represent combinations of metastatic sites and the non-uniform and practice-driven timing of radiologic response assessments, as encountered in routine clinical care. We present a stochastic model determined by progression rates, which are parameterised to capture baseline organ-specific transition risks, patient-level covariates, and pairwise inter-organ interaction effects. High-dimensional treatment information is incorporated using an large language model based encoding. We find that metastatic spread follows non-independent, subtype-specific spatiotemporal patterns, with subtype-specific inter-organ interaction patterns that shape progression. Visceral metastases, particularly lung and liver metastasis, are associated with an increased hazard of subsequent brain metastasis, with effects varying across hormone receptor-positive, HER2-positive, and triple-negative subtypes. Together, these findings define a clinically relevant spatiotemporal architecture of metastatic progression in breast cancer. This framework enables refined mechanism-informed risk stratification and provides a data-driven rationale for targeted and risk-adapted – rather than symptom-triggered – surveillance strategies.

20.
arXiv (CS.CL) 2026-06-24

SciZoom: A Large-scale Benchmark for Hierarchical Scientific Summarization across the LLM Era

The explosive growth of AI research has created unprecedented information overload, increasing the demand for scientific summarization at multiple levels of granularity beyond traditional abstracts. While LLMs are increasingly adopted for summarization, existing benchmarks remain limited in scale, target only a single granularity, and predate the LLM era. Moreover, since the release of ChatGPT in November 2022, researchers have rapidly adopted LLMs for drafting manuscripts themselves, fundamentally transforming scientific writing, yet no resource exists to analyze how this writing has evolved. To bridge these gaps, we introduce SciZoom, a benchmark comprising 44,946 papers from four top-tier ML venues (NeurIPS, ICLR, ICML, EMNLP) spanning 2020 to 2025, explicitly stratified into Pre-LLM and Post-LLM eras. SciZoom provides three hierarchical summarization targets (Abstract, Contributions, and TL;DR) achieving compression ratios up to 600:1, enabling both multi-granularity summarization research and temporal mining of scientific writing patterns. Our linguistic analysis reveals striking shifts in phrase patterns (up to 10x for formulaic expressions) and rhetorical style (23% decline in hedging), suggesting that LLM-assisted writing produces more confident yet homogenized prose. SciZoom serves as both a challenging benchmark and a unique resource for mining the evolution of scientific discourse in the generative AI era. Our code and dataset are publicly available on GitHub (https://github.com/janghana/SciZoom) and Hugging Face (https://huggingface.co/datasets/hanjang/SciZoom), respectively.

21.
arXiv (CS.AI) 2026-06-12

Counterfactual Credit Policy Optimization for Multi-Agent Collaboration

arXiv:2603.21563v5 Announce Type: replace Abstract: Collaborative multi-agent large language models (LLMs) can solve complex reasoning tasks by decomposing roles, but reinforcement learning for such systems is limited by credit assignment: shared terminal rewards obscure individual contributions and can encourage free-riding. We introduce two optimizer-agnostic credit assignment methods for converting joint outcomes into agent-specific learning signals. Counterfactual Credit for Policy Optimization (CCPO) estimates an agent's marginal contribution by comparing the realized joint outcome with a counterfactual outcome where that agent is removed. Self-Evaluated Credit for Policy Optimization (SEPO) uses constrained self- and peer-evaluations as a verifier-anchored credit signal while keeping the external task outcome dominant. Both operate at the reward-construction layer rather than as policy optimizers, producing role-specific rewards or advantages for GRPO, GSPO, or REINFORCE++. We instantiate these credit signals in a sequential Think–Solve setting and evaluate them on mathematical reasoning benchmarks. Results show that explicit credit assignment often improves dual-agent reasoning, especially on MATH500 and several out-of-distribution settings, while gains vary across models and datasets. Our code is available at: https://github.com/bhai114/ccpo.

22.
arXiv (CS.CV) 2026-06-16

SUP-MCRL: Subject-aware Unified Pseudo-feature Coded Multimodal Contrastive Representation Learning for EEG Visual Decoding

Non-invasive brain-computer interfaces suffer severe fidelity degradation in neural visual decoding when generalizing to natural visual experiences. Conventional multimodal contrastive representation learning solely optimizes geometric distance alignment, neglecting semantic consistency and subject selectivity, causing spurious zero-shot alignment. We propose SUP-MCRL, a unified framework integrating three collaborative mechanisms: (1) Semantic-entity Aware Visual Encoder (SAVE), learning spatial attention to extract semantic content without pre-trained saliency models; (2 Unified EEG Enhancer (UEE), employing multi-scale atrous convolutions and inter-band attention for adaptive cross-subject robustness; and (3) Prototype-based Progressive Augmenter (PPA), maintaining an EMA-updated pseudo-feature pool to prevent representation collapse. Zero-shot experiments on THINGS-EEG achieve 66.0%/91.9% (Top-1/Top-5) intra-subject and 24.0%/52.9% LOSO accuracy, surpassing state-of-the-art methods. Code is available at https://github.com/NZWANG/SUP-MCRL.

23.
arXiv (CS.AI) 2026-06-16

Beyond Weights and Gradients: A Taxonomy of Federated Learning Messages

arXiv:2606.16891v1 Announce Type: cross Abstract: Federated Learning is rapidly evolving beyond the exchange of traditional model weights and gradients, yet existing definitions fail to capture the full scope of modern payloads like synthetic data and federated analytics. This paper addresses the gap by proposing a formal mathematical definition of a federated message that accounts for both utility and privacy. We introduce a taxonomy that organizes these exchanges into three categories: model structures, statistical summaries, and data-conditioned representations. By evaluating these groups based on computational demands, communication costs, and privacy risks, we provide a clearer understanding of the trade-offs involved in decentralized training. Our review of 202 recent publications highlights a significant shift since 2021 toward diverse messaging paradigms, signaling a move away from standard deep learning updates toward more specialized information sharing. This framework provides a structured path for future research to optimize federated systems for varying hardware and security requirements.

24.
arXiv (CS.CL) 2026-06-18

TurnGuide: Enhancing Meaningful Full Duplex Spoken Interactions via Dynamic Turn-Level Text-Speech Interleaving

Full-Duplex Speech Language Models (FD-SLMs) are specialized foundation models designed to enable natural, real-time spoken interactions by modeling complex conversational turn-taking such as interruptions, backchannels, and overlapping speech. End-to-end (e2e) FD-SLMs leverage real-world double-channel conversational data to capture nuanced two-speaker dialogue patterns for human-like interactions, but their conversational abilities often degrade compared to pure-text conversation due to prolonged speech sequences and limited high-quality spoken dialogue data. Although interleaved text-speech generation could mitigate this degradation, integrating discrete text tokens into continuous double-channel audio streams could disrupt the precise time alignment required for fluid interaction. To address this, we propose TurnGuide, a novel text-speech interleaved generation approach for e2e FD-SLMs that dynamically segments assistant speech into dialogue turns and interleaves turn-level text and speech generation. This approach allows FD-SLMs to integrate the semantic intelligence of LLMs without compromising the natural acoustic flow. Extensive experiments show that TurnGuide not only significantly improves e2e FD-SLMs to produce semantically meaningful, coherent speech but also achieves state-of-the-art performance on various turn-taking events. Demos are available at https://dreamtheater123.github.io/TurnGuide-Demo/. Code is available at https://github.com/dreamtheater123/TurnGuide.

25.
arXiv (CS.CV) 2026-06-11

Lighting-aware Unified Model for Instance Segmentation

Foundation models like the Segment Anything Model (SAM) demonstrate impressive zero-shot generalization but frequently degrade under diverse real-world illumination, particularly for instance segmentation. In this work, we address this limitation by developing Lighting Convolutional-Attention (\lca{)}, an adapter module that enhances segmentation robustness without fine-tuning the heavy backbone. \lca{} employs a dual-branch architecture to process RGB features alongside contrast maps, enabling physically motivated sensitivity to structural changes rather than illumination artifacts. We optimize \lca{} through a pairwise training strategy, introducing a targeted loss term that explicitly penalizes discrepancies between clean images and their corresponding illumination variants. To evaluate and support this architecture, we conduct a comprehensive empirical study across multiple existing benchmarks and present a novel Unity-based synthetic dataset specifically designed to accurately replicate complex real-world lighting conditions. Extensive experimental results demonstrate that our approach successfully bridges the domain gap, delivering superior lighting-robust segmentation.