Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-18

A CEFR-Inspired Classification Framework with Fuzzy C-Means To Automate Assessment of Programming Skills in Scratch

arXiv:2604.00730v2 Announce Type: replace-cross Abstract: Context: Schools, training platforms, and technology firms increasingly need to assess programming proficiency at scale with transparent, reproducible methods that support personalized learning pathways. Objective: This study introduces a pedagogical framework for Scratch project assessment, aligned with the Common European Framework of Reference (CEFR), providing universal competency levels for students and teachers alongside actionable insights for curriculum design. Method: We apply Fuzzy C-Means clustering to 2008246 Scratch projects evaluated via Dr.Scratch, implementing an ordinal criterion to map clusters to CEFR levels (A1-C2), and introducing enhanced classification metrics that identify transitional learners, enable continuous progress tracking, and quantify classification certainty to balance automated feedback with instructor review. Impact: The framework enables diagnosis of systemic curriculum gaps-notably a "B2 bottleneck" where only 13.3% of learners reside due to the cognitive load of integrating Logic Synchronization, and Data Representation–while providing certainty–based triggers for human intervention.

02.
arXiv (quant-ph) 2026-06-24

Quantum Correlations of Neutrinos in the Kerr-Newman Space-time

arXiv:2605.10424v2 Announce Type: replace-cross Abstract: Quantum phases provide a connection between gravitation and quantum information, which proposes a novel avenue to explore the properties of space-time. In this paper, we investigate the quantum correlations (QCs) of neutrinos in the Kerr–Newman space-time. Both radial and non-radial propagations are considered under the weak-field approximation. The results show that, for inward propagations, the oscillation probabilities and QCs differ significantly from those obtained in the Schwarzschild metric. In the case of radial outward propagation, the larger angular momentum $a$ increases the oscillation period of the survival probability $P_{ee}$, entanglement, and monogamy of nonlocality, whereas the larger charge $Q$ decreases the corresponding periods. For non-radial propagations, $M$ and $a$ can noticeably modulate the amplitudes of the considered QCs, which is not observed in the case of radial propagations. Furthermore, we find that, despite differences in their variation ranges, entanglement and coherence exhibit highly consistent oscillation behaviors in both radial and non-radial propagation cases. These findings provide a comprehensive understanding for the neutrinos-based relativistic quantum information.

03.
arXiv (math.PR) 2026-06-12

Averaging principles for nonautonomous multiscale McKean-Vlasov stochastic systems

arXiv:2606.12820v1 Announce Type: new Abstract: This paper investigates a class of nonautonomous multiscale McKean-Vlasov stochastic systems. By leveraging the nonautonomous Poisson equation, we rigorously establish both strong and weak averaging principles, accompanied by explicit convergence rates. Notably, the coefficients of the averaging equations derived in the general case retain dependence on the scaling parameter $\varepsilon$. However, under the additional assumptions that the fast-scale coefficients are either asymptotically convergent or time-periodic, we demonstrate that the slow component converges, in the strong or weak sense, to averaging equations with coefficients independent of $\varepsilon$.

04.
arXiv (CS.CL) 2026-06-19

NEST: Narrative Event Structures in Time for Long Video Understanding

Recent progress in vision-language models has enabled the processing of increasingly long video sequences, but the ability to handle extended token streams does not translate to understanding of narrative structure in long videos. Existing long video benchmarks focus on needle-in-a-haystack retrieval rather than evaluating how low-level actions form events, how events interact across time, and how narratives progress, for example, whether a model can connect an early setback, such as a job loss to a later relationship breakup, despite long gaps, intervening scenes, or flashbacks that reframe what occurred. We introduce NEST (Narrative Event Structures in Time for Long Video Understanding), a dataset of 1005 full-length movies (avg. 98 minutes), each annotated with 102 multimodal narrative events grounded in visual content, dialogue, and audio. NEST captures multimodal narrative events with structured annotations grounded in visual content, dialogue, and audio, and links them through relations that reflect narrative structure, including temporal ordering, hierarchical composition, and long-range dependencies. We introduce baselines for event trigger detection (ETD), event localization (EL), event argument extraction (EAE), and event relation extraction (ERE). The benchmark is highly challenging for grounded event discovery, with ETD below 8%, EL under 6%, and EAE below 11%. In contrast, ERE is more tractable once events are given, reaching 35.45% F1 zero-shot and 44.42% F1 after fine-tuning.

05.
arXiv (CS.AI) 2026-06-16

Co-Scraper: query-aware DOM Pruning and Reusable Scraper Synthesis for Lightweight Web Data Extraction

arXiv:2606.14821v1 Announce Type: cross Abstract: The abundant and heterogeneous nature of web content necessitates automated information extraction, and generating scrapers that can be reused across similar web pages offers an effective solution for scalable data extraction. In this work, we propose Co-Scraper, a two-stage framework capable of handling the hierarchical complexity of long HTML documents. By integrating a query-aware DOM pruning mechanism with stable extraction strategy induction, Co-Scraper can effectively transforms web content into executable programmatic wrappers using a fine-tuned Qwen3-8B model. On the test set of SWDE, Co-Scraper achieves state-of-the-art performance with an F1 score of 94.78% and a reuse success rate of 90.39%. This framework significantly enhances the accuracy and resilience of data extraction, providing a highly efficient approach for web data acquisition tasks.

06.
arXiv (CS.LG) 2026-06-24

Not All Invariants Are Equal: Curating Training Data to Accelerate Program Verification with SLMs

arXiv:2603.15510v2 Announce Type: replace Abstract: The synthesis of inductive loop invariants remains a critical bottleneck in automated program verification. While Large Language Models (LLMs) show promise in mitigating this issue, they often fail on complex programs, producing invariants that are invalid or computationally ineffective. Although fine-tuning is a natural strategy to address these limitations, obtaining high-quality training data remains an open challenge. We first formalize the properties required for a high-quality training invariant, and then present Wonda, a rigorous data curation pipeline that extracts such invariants from raw verifier output via AST-based normalization followed by LLM-driven semantic rewriting and augmentation with provable quality guarantees. Fine-tuning Small Language Models (SLMs) on Wonda-curated data yields consistent gains across the Qwen3, Llama-3.1, and Mistral families: the 4B and 8B Qwen3 models nearly double invariant correctness and double speedup rates, while Llama-3.1-8B triples both. On the challenging InvBench suite, the same 4B model outperforms an off-the-shelf model 20x its size and matches the end-to-end verification time of GPT-OSS-120B, while a 14B Qwen3 model matches that of the frontier model GPT-5.2, all without test-time compute overhead. Our code is publicly available on GitHub.

07.
bioRxiv (Bioinfo) 2026-06-11

Revealing trajectories of multi-modal voxel-level changes in neurodegenerative diseases using latent event mapping

Neurodegenerative diseases are driven by pathological mechanisms that can be indirectly measured in vivo using multi-modal neuroimaging. However, current computational methods that aim to reconstruct trajectories of voxel-level changes in the brain are either not computationally scalable or fully interpretable, limiting their ability to reveal associations between disease progression and underlying mechanisms. Here we introduce Latent Event Mapping (LEMING), a generative unsupervised modelling technique that learns a latent map of disease events along a common pseudo-timeline of events. We apply LEMING to amyloid PET and structural MRI data from the Alzheimer's Disease Neuroimaging Initiative to reveal the first voxel-level trajectories of events in Alzheimer's disease. Notably, we show how LEMING can provide new insights into progression-dependent disease mechanisms. We find that acetylcholine receptor density is significantly positively associated with both late-stage amyloid and atrophy events, suggesting that either these receptors are targeted later in disease progression, or that amyloid does not play an active role. This has strong implications for therapeutics that target acetylcholine receptors, particularly for early-stage intervention strategies.

08.
arXiv (quant-ph) 2026-06-17

Engineering entanglement and transport in interacting quantum walks with tailored potentials

arXiv:2606.17825v1 Announce Type: new Abstract: Controlling the interplay between particle propagation and quantum correlation generation is a central challenge in quantum transport. Here, we investigate two distinguishable continuous-time quantum walkers evolving on parallel one-dimensional lattices, interacting via distance-dependent potentials. While on-site interactions reproduce the typical bosonic behaviour, extending the interaction to a linear potential over multiple neighbors introduces controlled Bloch-like oscillations and shifts the bound-pair regime to stronger couplings. More generally, we explore a Coulomb-like interaction parameterized by strength, spatial scaling, and decay rate. This reveals a rich phase diagram including four distinct dynamical regimes: (i) a high-entropy, oscillatory regime akin to a linear potential; (ii) a strongly localized, bound-pair regime; (iii) a novel intermediate regime combining near-ballistic spreading with strong correlations; and (iv) a weakly interacting, free-propagation regime. Notably, regime (iii) achieves concurrent optimization of transport efficiency and entanglement, offering a sweet spot for correlated quantum dynamics. Our results provide a tool for designing interaction-engineered quantum walks with potential applications in quantum information processing and simulations.

09.
arXiv (CS.CL) 2026-06-18

Written by AI, Managed by AI: Semantic Space Control and Index Sickness Elimination Across 391 Consecutive Sessions

The prevailing engineering intuition for addressing conceptual drift in long-horizon LLM collaboration is to trade more formal constraints for more reliable outputs – designing symbolic identifier systems, accumulating defensive rules in System Prompts, expanding context windows. Our engineering record shows that in long-horizon settings, this direction may produce effects contrary to design intent. Using action research methods in a real software project (Bang-v3) spanning approximately one month and 391 collaborative sessions, we document and analyze the failure process of these strategies. When the symbolic system exceeds a complexity threshold, LLMs do not become more accurate – instead, they abandon genuine understanding of business semantics, retreat to self-referential reasoning within the symbolic layer, and generate outputs that appear internally consistent but are physically disconnected from reality. We name this failure pattern "Index Sickness," and its canonical manifestation "Phantom Legislation." We name the underlying principle the "Pang Principle (Semantic Vitality Law)": natural language carrying explicit purpose conveys far greater information quality than symbolic expression. From this, we design and validate its physical engineering mechanism: "Baseline-Log Physical Separation." In the same project, this mechanism reduced AI Instructions volume by ~75%, and across the subsequent ~150 sessions, no recurrence of Index Sickness was observed. A bilingual companion version (Chinese) is included as supplementary material.

10.
arXiv (CS.AI) 2026-06-11

Runtime Enforcement of Hybrid System Properties

arXiv:2606.12022v1 Announce Type: cross Abstract: Runtime enforcement has emerged as a promising approach for ensuring the safety of autonomous and cyber-physical systems operating in uncertain and dynamic environments. Unlike traditional runtime verification, runtime enforcement actively intervenes during execution to prevent property violations by modifying unsafe system behaviors. Existing enforcement frameworks primarily focus on untimed or discrete-time specifications and are often limited to delaying or suppressing events, making them inadequate for reactive systems exhibiting complex continuous dynamics. In this paper, we propose a runtime enforcement framework where safety requirements are modeled using Hybrid Automata (HA). The framework combines discrete-event editing with continuous-time monitoring to support enforcement actions such as suppression, delay, and insertion of events at arbitrary time instants. Upon observing environmental inputs, the automaton is initialized, and runtime reachability analysis is used to synthesize safe corrective actions. We formally define the enforcement problem for safety hybrid automata, establish enforceability conditions, and present an online enforcement algorithm for reactive systems. A detailed case study on an Adaptive Cruise Control (ACC) system demonstrates the effectiveness of the proposed approach in maintaining safety properties under unsafe controller behaviors. Experimental results show that the framework introduces minimal computational overhead while ensuring continuous compliance with safety requirements in real time.

11.
arXiv (CS.CL) 2026-06-19

LedgerAgent: Structured State for Policy-Adherent Tool-Calling Agents

Policy-adherent tool-calling agents in customer-service domains must maintain task states across turns while calling tools and obeying domain policies. Task states consist of relevant facts, identifiers, constraints, and conditions observed through user interaction and tool calls. In standard agents, task states are not represented separately. Observations, tool returns, and policy instructions are placed in the prompt, leaving agents to reconstruct the relevant states from the prompt each time they decide what to do next. This design makes state management implicit, creating two common failure modes. An agent may retrieve the right facts but later ground its decision in stale, missing, or incorrect information; and a syntactically valid tool call may still violate a domain policy that depends on the current task state. We introduce \textsc{LedgerAgent}, an inference-time method for tool-calling agents that maintains observed task states in a separate ledger and renders the states into the prompt. The ledger is also used to check state-dependent policy constraints before environment-changing tool calls are executed, blocking policy violations. Across four customer-service domains and a mixed panel of open- and closed-weight models, \textsc{LedgerAgent} improves average pass\textasciicircum{}k over a standard prompt-based tool-calling approach, with the largest gains under stricter multi-trial consistency metrics.

12.
arXiv (CS.LG) 2026-06-24

A Theory of Saddle Escape in Deep Nonlinear Networks

arXiv:2605.01288v3 Announce Type: replace Abstract: In deep networks with small initialization, training exhibits long plateaus separated by sharp feature-acquisition transitions. Whereas shallow nonlinear networks and deep linear networks are well studied, extending these analyses to deep nonlinear networks remains challenging. We derive an exact identity for the imbalance of Frobenius norms of layer weight matrices that holds for any smooth activation and any differentiable loss and use this to classify activation functions into four universality classes. On the permutation-symmetric submanifold, the identity combines with an approximate balance law to reduce the full matrix flow to a scalar ODE, giving a critical-depth escape time law $\tau_\star = \Theta(\varepsilon^{-(r-2)})$ governed by the number $r$ of layers at the bottleneck scale rather than the total depth $L$. We find that this same $r-2$ exponent is recovered under He-normal initialization with $r$ bottleneck layers rescaled by $\varepsilon$, where the symmetry manifold is preserved by the flow but not attracting. We find close agreement between our theory and numerical simulations.

13.
arXiv (CS.AI) 2026-06-11

Vision-Language-Action Jump-Starting for Reinforcement Learning Robotic Agents

arXiv:2604.13733v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) enables high-frequency, closed-loop control for robotic manipulation, but scaling to long-horizon tasks with sparse or imperfect rewards remains difficult due to inefficient exploration and poor credit assignment. Vision-Language-Action (VLA) models leverage large-scale multimodal pretraining to provide generalist, task-level reasoning, but current limitations hinder their direct use in fast and precise manipulation. In this paper, we propose Vision-Language-Action Jump-Starting (VLAJS), a method that bridges sparse VLA guidance with on-policy RL to improve exploration and learning efficiency. VLAJS treats VLAs as transient sources of high-level action suggestions that bias early exploration and improve credit assignment, while preserving the high-frequency, state-based control of RL. Our approach augments Proximal Policy Optimization (PPO) with a directional action-consistency regularization that softly aligns the RL agent's actions with VLA guidance during early training, without enforcing strict imitation, requiring demonstrations, or relying on continuous teacher queries. VLA guidance is applied sparsely and annealed over time, allowing the agent to adapt online and ultimately surpass the guiding policy. We evaluate VLAJS on six challenging manipulation tasks: lifting, pick-and-place, peg reorientation, peg insertion, poking, and pushing in simulation, and validate a subset on a real Franka Panda robot. VLAJS consistently outperforms PPO and distillation-style baselines in sample efficiency, reducing required environment interactions by over 50% in several tasks. Real-world experiments demonstrate zero-shot sim-to-real transfer and robust execution under clutter, object variation, and external perturbations.

14.
bioRxiv (Bioinfo) 2026-06-14

Cellfm-datasets: A Unified Data Infrastructure for Single-Cell and Spatial Transcriptomics Foundation Model Pretraining

Large-scale cell foundation models are increasingly limited not only by model architecture, but also by the data infrastructure required to repeatedly sample sparse transcriptomic profiles from out-of-core cohorts. AnnData/H5AD has become a standard exchange format for single-cell and spatial omics analysis, yet its HDF5-backed layout is not designed for high-frequency random mini-batch loading under multi-worker and distributed pretraining. We present Cellfm-datasets, a data infrastructure artifact that converts H5AD cohorts into a self-describing compressed sparse row (CSR) memmap layout and exposes the resulting corpus through Hugging Face Dataset and IterableDataset interfaces. The artifact stores a shared gene vocabulary, per-sample metadata, optional spatial coordinates, observation metadata, manifests, and checksums, and reconstructs sparse cell or group records at runtime without dense expansion. A unified sampling abstraction supports random-cell groups, manifest-defined biological regions, and coordinate-based spatial blocks, with deterministic sharding across distributed ranks and data-loader workers. Spatial demonstrations on P14 mouse brain transcriptomics sections illustrate region- and block-level sampling over real anatomical structures. In controlled benchmarks on a public heterogeneous ModelScope scRNA-seq subset, Cellfm-datasets reached 60,571 +/- 1,734 samples/s in single-core random loading, scaled to approximately 160,000 samples/s with eight workers, and maintained near-constant process-private memory while reading up to one million cells. By moving sparse single-cell and spatial corpora from model-specific loader code into reusable, validated, and framework-native dataset artifacts, this design may reduce the engineering burden of reproducible cell foundation model pretraining and make repeated training runs, model comparisons, and mixed-modality data reuse easier to standardize.

15.
arXiv (CS.CV) 2026-06-16

DifferAD-R1: A Difference-Guided IndustrialAnomaly Localization with Multimodal LargeLanguage Models

Industrial anomaly localization aims to accurately identify and localize abnormal regions in industrial products, addressing the critical challenge of detecting unseen defect categories in real-world scenarios. Traditional closed-set methods often suffer from poor cross-scenario generalization, while existingMultimodal Large Language Model (MLLM)-based approachesface two core limitations: they either adopt QA-style paradigmsmisaligned with the practical demands of localization, or relyon standard optimization techniques such as Group RelativePolicy Optimization (GRPO), which fails to deliver effectivelearning signals for subtle defects. To tackle these issues, thispaper proposes DifferAD-R1, an MLLM-augmented reinforcement learning framework tailored for industrial anomaly localization. We design a Difference-Guided dual-image paradigm,which reformulates the localization task as a one-shot difference grounding problem to effectively explore cross-scenarioanomalies. A Dual-Consistency Localization Reward is developedfor hard-to-detect anomalies, enhancing optimization stabilityand robustness. Additionally, we integrate a difficulty-awarestrategy with adaptive reweighting and group-wise resamplingto prioritize learning on challenging instances. To facilitateevaluations in real-world industrial settings, we construct theAD-DualDiff dataset, comprising 13K paired images across 20categories. Experimental results demonstrate that DifferADR1 significantly outperforms existing baselines and achievescompetitive performance compared to large-scale models likeQwen3-VL (235B parameters). Our code is publicly availableat: https://github.com/Rong2026/work-1.

16.
arXiv (CS.CL) 2026-06-16

RASST: Retrieval-Augmented Simultaneous Speech Translation

Simultaneous speech translation produces target text incrementally from partial speech input. Recent speech large language models have markedly improved SST quality but still struggle with rare and domain-specific terminology. Retrieval augmentation has helped in automatic speech recognition and neural machine translation, but extending it to SST is non-trivial: retrieval must be fast and accurate under partial speech, and the model must decide whether and when to apply retrieved terms during incremental generation. We propose Retrieval-Augmented Simultaneous Speech Translation (RASST), which addresses both challenges. For accurate cross-modal retrieval under partial input, RASST trains a lightweight speech-text retriever that produces chunkwise terminology hints for the Speech LLM via multi-scale retrieval. To use these hints correctly, we synthesize training data that teaches the Speech LLM to decide whether and when to apply each retrieved term. Experiments on ACL 60/60 dev set and the ESO test set show that RASST improves terminology accuracy by nearly 40% and overall translation quality by up to 3 BLEU points, with negligible computational overhead.

17.
bioRxiv (Bioinfo) 2026-06-08

HydraMPP: A lightweight library for distributed massive parallel processing in Python - threading at scale.

We now exist in the era of massive datasets from genomics, large language models, and all the known knowledge of humanity right at our fingertips. Much of this data is becoming more accessible; however, processing such data remains an ongoing issue across systems including high performance computing (HPC) infrastructures. Massively parallel computing (MPP) has solved this using a divide and conquer approach by splitting workloads across independent nodes (i.e., central processing units (CPU) allowing for higher scaling of data). The main engine for this in python is Ray; however, it has many issues including a large code space, security issues, debugging opacity, and memory management issues. Here, we present HydraMPP, a lightweight, ease of use and utilization, with high auditability, and with SLURM ergonomics.

18.
arXiv (CS.LG) 2026-06-17

Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach

arXiv:2510.19528v2 Announce Type: replace-cross Abstract: We investigate the fundamental problem of leveraging offline data to accelerate online reinforcement learning - a direction with strong potential but limited theoretical grounding. Our study centers on how to learn and apply value envelopes within this context. To this end, we introduce a principled two-stage framework: the first stage uses offline data to derive upper and lower bounds on value functions, while the second incorporates these learned bounds into online algorithms. Our method extends prior work by decoupling the upper and lower bounds, enabling more flexible and tighter approximations. In contrast to approaches that rely on fixed shaping functions, our envelopes are data-driven and explicitly modeled as random variables, with a filtration argument ensuring independence across phases. The analysis establishes high-probability regret bounds determined by two interpretable quantities, thereby providing a formal bridge between offline pre-training and online fine-tuning. Empirical results on tabular MDPs demonstrate substantial regret reductions compared with both UCBVI and prior methods while remaining competitive with related approaches.

19.
arXiv (quant-ph) 2026-06-17

Hybrid Ferromagnet-SNSPDs: Single photon induced order-to-disorder transition in ferromagnets coupled to thin film superconductors

arXiv:2606.17177v1 Announce Type: cross Abstract: The development of midwave and longwave infrared single photon detectors is crucial for their emerging applications in spectroscopy, remote sensing, exoplanet detection, and free space quantum communications. However, existing sensors need to be operated at extremely low temperatures (0.08-0.9K) to reduce dark noise and hence require the use of advanced cryogenics such as dilution refrigerators or $^3$He cryogens, significantly limiting applications. Here we propose a vortex-engineering approach based on a hybrid phase transition in a ferromagnet/superconductor bilayer to increase the operating temperature of infrared single photon detectors up to 3.75K. We show that the introduction of a ferromagnetic layer produces a local magnetic field which impedes vortex crossing in the superconductor, reducing dark noise. When a single photon is incident, the photon-induced hotspot causes an order-to-disorder transition in the ferromagnet, leading to a vortex-induced phase transition in the superconducting layer. By engineering the ferromagnet's Curie temperature to be close to the device's operating temperature, single photon sensitivity can be achieved at increased operating temperatures. We predict at midwave/longwave infrared wavelengths (3-14$\mu$m) the operating temperature can be raised to 3.25-3.75K, enabling significantly simpler cooling systems.

20.
arXiv (CS.CV) 2026-06-12

ECA: Efficient Continual Alignment for Open-Ended Image-to-Text Generation

Incremental Learning (IL) for Open-ended Image-to-Text Generation (OpenITG) enables models to continuously generate accurate, contextually relevant text for new images while preserving previously acquired knowledge. Unlike prior studies, this paper addresses a more practical scenario in which the predominant category of visual data shifts over time as environments evolve. In this context, we introduce a new notion of continual alignment, which incrementally adapts the alignment module within pre-trained VLMs to preserve high-quality cross-modal representations. Based on this idea, we propose Efficient Continual Alignment (ECA), a novel exemplar-free IL approach for OpenITG. The key challenge is enabling the model to acquire new, task-specific features while minimizing interference with the established alignment without accessing raw data from previous tasks. To address this, ECA employs three core mechanisms: a Mixture of Query (MoQ) module that adapts task-specific query tokens, a Fisher Dynamic Expansion (FeDEx) that dynamically expands model structure based on a Fisher Information Matrix (FIM)-based metric, and an embedding dictionary with Dictionary Replay (DR) to retain past knowledge. To evaluate ECA's performance, we construct four new IL OpenITG benchmarks that better reflect real-world scenarios. Experimental results demonstrate that ECA significantly mitigates catastrophic forgetting and improves IL performance compared to baseline methods. Code and benchmarks are available at https://github.com/Snowball0823/ECA.

21.
arXiv (CS.CL) 2026-06-15

Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2

作者:

Structured width pruning of GLU-MLP layers in Llama-3.2 models, guided by the Peak-to-Peak Magnitude (PPM) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities. While performance on tasks relying on parametric knowledge (e.g., MMLU, GSM8K) and perplexity metrics degrades predictably with decreasing expansion ratios, instruction-following capabilities improve at the 2.4x equilibrium ratio (IFEval: +4.8 points / +46% in Llama-3.2-1B and +3.7 points / +39% in Llama-3.2-3B), and multi-step reasoning remains robust (MUSR). This pattern, observed consistently across both evaluated model sizes, challenges the prevailing assumption in compression research that pruning induces uniform degradation. To investigate this, we evaluated seven expansion ratio configurations using comprehensive benchmark suites that assess factual knowledge, mathematical reasoning, language comprehension, instruction-following, and truthfulness. Our analysis identifies the expansion ratio as a critical architectural parameter that selectively reshapes the model's task performance profile, rather than merely serving as a compression metric.

22.
arXiv (CS.CL) 2026-06-17

When Multiple Scripts Matter: Evaluating ASR in Clinical Settings

Automatic speech recognition (ASR) in non-English clinical settings is challenged by multiscript variability, where the same term may appear in multiple valid orthographic forms. Conventional string-matching evaluation metrics often underestimate ASR performance by treating orthographic variants as errors. To address this issue, we introduce MultiClin, a clinical ASR benchmark designed to evaluate robustness to multiscript variability. Experiments across diverse ASR models show that multiscript-aware evaluation provides a fairer assessment of recognition quality than conventional single-reference evaluation. We further investigate the impact of script consistency during training and find that inconsistent script mappings increase orthographic uncertainty and hinder model convergence, with a balanced 50% mapping ratio producing the highest entropy. In contrast, script unification consistently yields the best ASR performance. Our dataset and code are publicly available at: https://github.com/aitrics-ronaldo/Interspeech_MultiClin.

23.
arXiv (CS.AI) 2026-06-16

Edit Knowledge, Not Just Facts via Multi-Step Reasoning over Background Stories

arXiv:2602.02028v2 Announce Type: replace Abstract: Enabling artificial intelligence systems, particularly large language models, to update knowledge and flexibly apply it during reasoning remains a central challenge. Existing knowledge editing approaches emphasize atomic facts, improving factual recall but often failing to integrate updated information into a coherent framework usable across contexts. In this work, we argue that knowledge update is fundamentally a reasoning problem rather than a memorization problem. Consequently, a model should be trained in situations where the new information is instrumental to solving a task, combined with pre-existing knowledge, and exercised through multi-step reasoning. Based on this insight, we propose a training strategy based on three principles. First, new knowledge is introduced as a coherent background story that contextualizes novel facts and explains their relation to existing knowledge. Second, models are trained using self-generated multi-hop questions that require multi-step reasoning involving the new information. Third, training is done using knowledge distillation, forcing a student model to internalize the teacher's reasoning behavior without access to the novel information. Experiments show that models trained with this strategy effectively leverage newly acquired knowledge during reasoning and achieve remarkable performance on challenging questions that require combining multiple new facts.

24.
arXiv (CS.CV) 2026-06-15

Hierarchical Consistency Learning for Test-time Adaptation in Camouflage Perception

Camouflaged object detection (COD) aims to localize targets that exhibit minimal perceptual differences from backgrounds through physical attributes. Existing methods, constrained by the static train-then-freeze paradigm, suffer from domain rigidity and annotation dependency, limiting their adaptability to scene variations and unseen camouflage patterns. To overcome these, we propose the hierarchical consistency learning (HCL) framework, which integrates test-time adaptation for dynamic representation recalibration. Specifically, we design the hierarchical representation reconstruction (HRR) to alleviate feature entanglement by synergizing spatial reconstruction with dual-stream frequency-domain decomposition, enhancing robustness against appearance homogenization. The pixel and spectrum inference provide structural and contextual priors. We further introduce task affinity guidance (TAG) to propagate knowledge across branches via channel-wise affinity, aligning local discriminative cues and mitigating semantic drift. To ensure semantic invariance, we formulate the prototype consistency calibration (PCC), which aggregates region features into compact prototypes and establishes prototype-feature similarity. This imposes implicit and hierarchical constraints that bridge task and representation gaps. Extensive experiments across four camouflaged and four underwater object benchmarks, under three degradation settings, demonstrate that our method consistently outperforms state-of-the-art approaches, highlighting its robustness and generalization under distribution shifts.

25.
medRxiv (Medicine) 2026-06-17

High burden of subclinical TB in Africa revealed from a postmortem cohort.

Tuberculosis (TB) is increasingly recognised as a spectrum of infection and disease, yet the prevalence of viable, asymptomatic Mycobacterium tuberculosis (M.tb) infection remains uncertain. Subclinical Tuberculosis (scTB), defined as microbiologically confirmed M.tb infection in the absence of recognised symptoms, is under detected by symptom, sputum and imaging-based approaches. We conducted postmortem examinations of 94 adults who died from non-infectious causes, none of whom were clinically suspected of TB or reported TB related symptoms prior to death. Lung and extrapulmonary tissues were cultured for M.tb. Viable M.tb was confirmed in six individuals, corresponding to a prevalence of 6.4% (95% CI: 2.4 to 13.4%). These findings provide direct tissue-based evidence that viable, asymptomatic M.tb infection can persist beyond the reach of conventional clinical detection. Our data suggest that a biologically active reservoir of infection may exist undetected within high-burden settings, with implications for surveillance strategies aimed at TB elimination.