Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-16

Fully Quantum Algorithm for the 1-dimensional linear Lattice Boltzmann Method

arXiv:2606.16514v1 Announce Type: new Abstract: A fully quantum algorithm for solving the one-dimensional linear advection-diffusion equation using the Lattice Boltzmann method as a numerical procedure is presented in this work. We start by presenting a state of the art of the current usage of quantum algorithms for solving ordinary and partial differential equations. We then describe two algorithms for the one-dimensional Lattice Boltzmann method with two degrees of freedom. The first one is an existing hybrid quantum-classical algorithm with measurements at each time step, and the second one is our improved version, viz. a fully quantum algorithm where only one measurement is needed at the end of the algorithm. The fully quantum algorithm is first executed on a quantum simulator and then compared with a classical approach. Subsequently, the fully quantum algorithm is run on a quantum system with 133 qubits to investigate the effect of noise and the depth of the circuit on the output state. We find fluctuations in the final result due to the decoherence noise of the qubits.

02.
arXiv (CS.CV) 2026-06-17

Future Dynamic 3D Reconstruction: A 3D World Model with Disentangled Ego-Motion

Forecasting the evolution of dynamic environments is crucial for autonomous agents. While generative world models have recently achieved high photorealism in 2D video synthesis by mixing ego-motion and environmental dynamics within the image plane, they exhibit physical inconsistencies, such as morphing or vanishing objects, especially over long time horizons. In this paper, we propose FR3D, a world model that predicts a persistent 3D latent representation for future dynamic 3D reconstruction. Unlike prior works that treat the world as a sequence of image-based features, FR3D explicitly decouples the 3D evolution of the scene from the agent's trajectory, treating the inferred ego-motion as a latent proxy for action. This disentanglement resolves the ambiguities between self-motion and world-motion, ensuring geometric consistency into the future. Furthermore, we introduce a teacher-student distillation strategy that leverages the spatial "common sense" of off-the-shelf foundation models, leading to robust zero-shot generalization. Extensive experiments demonstrate FR3D's strong performance for future dynamic 3D reconstruction from monocular observations across multiple datasets, even 2 seconds into the future. Project page: https://fr3d-wm.github.io.

03.
arXiv (quant-ph) 2026-06-24

Syndrome aware mitigation of logical errors

arXiv:2512.23810v2 Announce Type: replace Abstract: Broad applications of quantum computers will require error correction (EC). However, hardware roadmaps indicate that physical qubit numbers will remain limited in the foreseeable future, leading to residual logical errors that constrain the size and accuracy of achievable computations. Recent work suggested logical error mitigation (LEM), which applies known error mitigation (EM) methods to logical errors, eliminating their effect at the cost of a runtime overhead. We introduce syndrome-aware logical error mitigation (SALEM), which mitigates logical errors conditioned on the error syndromes measured during error correction. The runtime overhead of SALEM is exponentially lower than that of LEM schemes which do not make use of syndrome data, enabling substantially larger circuit volumes that can be executed accurately. Compared to the routinely used combination of error correction and syndrome rejection (post-selection), SALEM increases the size of reliably executable computations by orders of magnitude. In the practical setting where space and time overheads are fixed and error reduction methods are compared by their resulting estimation errors, we observe a surprising phenomenon: SALEM, which tightly combines EC with EM, can outperform physical EM even above the standard fault-tolerance (pseudo) threshold. Thus, SALEM can make use of EC in regimes of physical error rates where EC is commonly deemed useless.

04.
arXiv (CS.LG) 2026-06-15

Machine Learning for Biomedical Raman Spectroscopy: From Spectral Acquisition to Clinical Translation

arXiv:2606.14169v1 Announce Type: new Abstract: Raman spectroscopy provides label-free, chemically specific characterization of biological systems and has become an important tool for cancer diagnosis, molecular subtyping, microbiological identification, and intraoperative decision support. Biomedical Raman spectra are, however, high-dimensional, noisy, and affected by fluorescence background, acquisition variability, and biological heterogeneity, making robust computational analysis essential. This review examines the role of machine learning across the biomedical Raman spectroscopy pipeline, from preprocessing and signal correction to unsupervised structure discovery, supervised diagnosis and molecular stratification, representation and transfer learning, explainability, biomarker discovery, and multimodal integration with imaging, pathology, and molecular profiling. Emphasis is placed on the use of machine learning not only for diagnostic classification, but also for biologically interpretable and clinically actionable analysis. We also discuss the main barriers to clinical translation, including limited dataset sizes, inter-instrument variability, inconsistent preprocessing, insufficient external validation, reproducibility concerns, and limited sharing of software, data, and metadata. We argue that progress will require methodological advances together with standardization, robust validation, explainability, and deployment-ready analytical frameworks. By integrating methodological, biomedical, and translational perspectives, this review outlines key directions for developing reliable and clinically deployable Raman-AI systems.

05.
arXiv (CS.AI) 2026-06-17

Patients With Personality: Realistic Patient Simulation through Controlled Diversity and Selective Disclosure

arXiv:2606.17441v1 Announce Type: cross Abstract: Simulating realistic patient interactions is a key requirement to testing clinical applications of LLMs at scale without time-consuming and expensive user studies. However, existing approaches often lack realism and controllability, often oversharing information unprompted, and failing to capture the wide variability of patient behavior. Here, we introduce PatientsWithPersonality (PWP), a patient simulation framework that generates realistic yet diverse virtual patient responses through explicit personality parametrization over a latent patient state. Grounded in HEXACO, a six-dimensional personality space used to quantify and parameterize human behavioral traits, our approach enables fine-grained control over conversational style, cooperativeness, and information disclosure within a unified framework. In a clinician evaluation, PWP is judged nearly as realistic as recorded human actors and clearly ahead of prior simulators, while being flagged as "too informative" far less often. Conditioning on HEXACO axes yields personas whose configured traits are recoverable by both clinicians and an autorater, span a substantially wider behavioral footprint than the closest baseline, and prevent oversharing. Altogether, our framework paves the way for more accurate and informative LLM benchmarking through our realistic and steerable patient simulator.

06.
arXiv (CS.CL) 2026-06-12

From Isolation to Entanglement: When Do Interpretability Methods Identify and Disentangle Known Concepts?

A goal of interpretability is to recover disentangled representations of latent concepts (features) from the activations of neural networks. The quality of features is typically evaluated in isolation, and under implicit independence assumptions that may not hold in practice. Thus, it is unclear to what extent common featurization methods such as sparse autoencoders (SAEs) and probes disentangle one concept from another. We propose a multi-concept evaluation setting using concepts including sentiment, domain, voice, and tense. We evaluate how well featurizers produce disentangled representations of each concept, observing that features are typically sensitive to only one concept, but also that concepts are distributed across many features. Then, we steer these features, measuring whether each concept is independently manipulable, and whether features interact. Even in idealized settings, steering a feature often affects many concepts, despite a near absence of interaction effects. These results suggest that correlational metrics are insufficient to establish steering selectivity, and that demonstrating that two features operate in separate spaces is insufficient to claim that they will be selective for one concept. These results underscore the importance of multi-concept evaluations in interpretability research.

07.
arXiv (CS.CV) 2026-06-16

You Don't Need Strong Assumptions: Visual Representation Learning via Temporal Differences

Progress in AI has largely been driven by methods that assume less. As compute and data increase, approaches with weaker inductive biases generally outperform those with stronger assumptions. This is particularly characteristic of the field of Visual Representation Learning, where approaches have gone from being dominated by Supervised Learning, to Weakly Supervised Learning, to the now widespread success of Self-Supervised Learning without human labels. Yet, even modern Self-Supervised Learning approaches still depend on strong inductive biases such as augmentations, masking, or cropping. If this trend holds, even these remaining biases should become bottlenecks at scale – and our experiments confirm this: the optimal strength of inductive biases decreases as data grows. This motivates the search for approaches that rely on fewer assumptions. To this end, we introduce Temporal Difference in Vision (TDV), a new paradigm for self-supervised learning from video that avoids existing inductive biases, relying instead on a causal assumption that the past causes the future. TDV functions by jointly training an image encoder and a motion encoder so that the current frame's representation plus the encoded motion equals the next frame's representation. Despite not leveraging any strong inductive biases, TDV matches state-of-the-art recipes on dense spatial tasks, laying the foundation for representation learning without strong assumptions.

08.
arXiv (CS.AI) 2026-06-18

Enhancing CVRP Solver through LLM-driven Automatic Heuristic Design

arXiv:2602.23092v2 Announce Type: replace Abstract: The Capacitated Vehicle Routing Problem (CVRP), a fundamental combinatorial optimization challenge, focuses on optimizing fleet operations under vehicle capacity constraints. While extensively studied in operational research, the NP-hard nature of CVRP continues to pose significant computational challenges, particularly for large-scale instances. This study presents AILS-AHD (Adaptive Iterated Local Search with Automatic Heuristic Design), a novel approach that leverages Large Language Models (LLMs) to revolutionize CVRP solving. Our methodology integrates an evolutionary search framework with LLMs to dynamically generate and optimize ruin heuristics within the AILS method. Additionally, we introduce an LLM-based acceleration mechanism to enhance computational efficiency. Comprehensive experimental evaluations against state-of-the-art solvers, including AILS-II and HGS, demonstrate the superior performance of AILS-AHD across both moderate and large-scale instances. Notably, our approach establishes new best-known solutions for 8 out of 10 instances in the CVRPLib large-scale benchmark, underscoring the potential of LLM-driven heuristic design in advancing the field of vehicle routing optimization.

09.
arXiv (CS.AI) 2026-06-12

The Theory of Mind Utility: Formal Specification of a Mentalizing Mechanism

arXiv:2606.12721v1 Announce Type: new Abstract: Inferring others' beliefs requires more than reading surface signals; it requires tracking who told them what, in what order, and how credibly. The Theory of Mind Utility (ToM-U) formalizes this epistemic state inference problem at the computational level of analysis, specifying what mentalizing computes and why without commitment to algorithmic or neural implementation. ToM-U achieves this by constructing Local Epistemic World Models (LEWMs) – directed typed graphs that represent agents, state nodes, and the epistemic relationships among them – and evaluating discrete candidate LEWMs against observed behavior until one achieves sufficient confidence. Five formal definitions specify the LEWM structure, agent node properties including ordered information access history, a bounded proliferation mechanism for recursive mentalizing, three inference procedures, and a residue function that captures the structured trace left by failed mentalizing attempts. ToM-U differs from Bayesian Theory of Mind and adjacent formal accounts, which presuppose rather than derive belief states, and from simulation theory and theory-theory, which lack a formal apparatus for epistemic state inference. The architecture generates directional, falsifiable predictions about mentalizing failure that follow from structural properties of the model rather than auxiliary assumptions, and positions ToM-U as a domain-agnostic mechanism upstream of goal inference and other downstream social cognitive processes.

10.
Nature (Science) 2026-06-10

A prognostic human brain network for diffuse midline glioma

作者:

Diffuse midline gliomas (DMGs) are near-universally lethal tumours of the childhood central nervous system1,2. In animal models, DMGs form brain-wide integrated networks through neuron-to-glioma synapses3–6 and glioma-to-glioma gap junctional coupling3. This extensive connectivity robustly promotes the growth and invasion of DMG3–9 and other glial malignancies10–12 through paracrine mechanisms and direct neuron-to-glioma synapses. However, the organization and clinical implications of these connections in the living human brain remain to be elucidated. Here, we develop tumour network mapping to compute the brain-wide connectivity profile of DMG, defining a conserved brain network across pontine and thalamic DMG associated with patient short-term survival (DMG network). Tumour functional connectivity with the DMG network was independently predictive of patient overall survival across two external validation cohorts. Tumour growth mapped to DMG network-specific trajectories and peak in-network neurometabolic changes across development spatiotemporally aligned with the peak age incidence of DMG. Analyses of single-nucleus RNA sequencing data confirmed diverse synaptic gene enrichment in high-connectivity DMG. Strikingly, incidental surgical resection of high-connectivity thalamic DMG tissue conferred a significant survival advantage. Collectively, these data define a conserved and prognostically important brain network in children with DMG, consistent with the hypothesis that DMGs exploit otherwise healthy brain circuits to promote tumour growth. Tumour network mapping of diffuse midline glioma (DMG) defines a conserved and prognostically important brain network in children with DMG, consistent with the hypothesis that DMGs exploit otherwise healthy brain circuits to promote tumour growth.

11.
arXiv (CS.LG) 2026-06-15

Dynamic Free-Rider Detection in Federated Learning via Simulated Attack Patterns

arXiv:2604.04611v2 Announce Type: replace Abstract: Federated learning (FL) enables multiple clients to collaboratively train a global model by aggregating local updates without sharing private data. However, FL often faces the challenge of free-riders, clients who submit fake model parameters without performing actual training to obtain the global model without contributing. Chen et al. proposed a free-rider detection method based on the weight evolving frequency (WEF) of model parameters. This detection approach is a leading candidate for practical free-rider detection methods, as it requires neither a proxy dataset nor pre-training. Nevertheless, it struggles to detect ``dynamic'' free-riders who behave honestly in early rounds and later switch to free-riding, particularly under global-model-mimicking attacks such as the delta weight attack and our newly proposed adaptive WEF-camouflage attack. In this paper, we propose a novel detection method S2-WEF that simulates the WEF patterns of potential global-model-based attacks on the server side using previously broadcasted global models, and identifies clients whose submitted WEF patterns resemble the simulated ones. To handle a variety of free-rider attack strategies, S2-WEF further combines this simulation-based similarity score with a deviation score computed from mutual comparisons among submitted WEFs, and separates benign and free-rider clients by two-dimensional clustering and per-score classification. This method enables dynamic detection of clients that transition into free-riders during training without proxy datasets or pre-training. We conduct extensive experiments across three datasets and five attack types, demonstrating that S2-WEF achieves higher robustness than existing approaches.

12.
arXiv (math.PR) 2026-06-15

Universality for Products of Random Matrices with i.i.d. Entries and the Fuss–Catalan Number

arXiv:2606.14450v1 Announce Type: cross Abstract: Let \((w_{ij})_{i,j\ge1}\) be a single infinite array of independent identically distributed real- or complex-valued entries of mean zero, variance \(\sigma^2\), and finite fourth moment. Set \(W_n=(w_{ij})_{1\le i,j\le n}\) and \(X_n=n^{-1/2}W_n\). For every fixed \(k\ge1\), we identify the almost sure limiting operator norm of several fixed products built from this family. Define the \(k\)-th freeness coefficient by \[ \gamma_k:=\sqrt{\frac{(k+1)^{k+1}}{k^k}}. \] Then we prove \[ \|X_n^k\|\to\sigma^k\gamma_k \qquad almost surely. \] The same limit holds for products sampled with replacement from any fixed finite pool of independent copies of \(X_n\); in particular, it holds for the product of \(k\) independent copies. Thus, the freeness coefficient captures the non-commuting characteristic between large random matrices %powers and independent or fixed-pool sampled products under the finite fourth moment assumption. The improvement of the classical Bai–Yin-type power estimate from the scale \(\sigma^k(k{+}1)\) to \(\sigma^k \sqrt{k{+}1}\) is a direct corollary of our result. The main technical challenge is to prove the upper bound using a high-moment expansion of %the upper bound is proved by a high-moment expansion of \(\E\Tr((X_n^kX_n^{*k})^m)\). The leading zero-defect trace words are tree-like and are counted by the Fuss–Catalan number \[ F_{k,m}= \frac1{km+1}\binom{(k+1)m}{m}. \] The combinatorial tool helps to devise a defect-sensitive global enumeration: if \(L=km\) and \[ r=(L+1-v)+(L-q), \] then the number of admissible word classes with defect \(r\) is at most \(F_{k,m}(Cm)^{Dr}\). This polynomial-in-\(m\) loss, with degree proportional to the defect, is summable in the logarithmic moment range.

13.
arXiv (CS.AI) 2026-06-16

AgentLeak: A Benchmark for Internal-Channel Privacy Leakage in Multi-Agent LLM Systems

arXiv:2602.11510v3 Announce Type: replace Abstract: Multi-agent Large Language Model (LLM) systems create privacy risks that current output-only benchmarks cannot measure. When agents coordinate on tasks, sensitive data may pass through inter-agent messages, shared memory, and tool arguments, all pathways that final-output audits typically do not inspect. We introduce AgentLeak, a benchmark for evaluating internal-channel privacy leakage in multi-agent LLM systems. AgentLeak instruments seven privacy-relevant communication pathways and provides a large-scale empirical evaluation focused on final outputs, inter-agent messages, and shared memory. Across 1,000 scenarios spanning healthcare, finance, legal, and corporate domains, five production LLMs (GPT-4o, GPT-4o-mini, Claude 3.5 Sonnet, Mistral Large, and Llama 3.3 70B), and 4,979 validated execution traces, we find that multi-agent configurations reduce final-output leakage (C1: 27.2% vs 43.2% in single-agent mode) compared with single-agent baselines but introduce internal channels that raise total system exposure to 68.9% (aggregated across C1, C2, C5). Inter-agent messages (C2) leak at 68.8%, compared with 27.2% for final outputs (C1), meaning that output-only audits miss 41.7% of violations. Across all five models and four domains, the pattern C2 $\geq$ C1 holds consistently. These results suggest, within the evaluated coordinator-worker setting, that privacy risk in multi-agent systems is strongly shaped by architectural coordination channels rather than final-output behavior alone: it arises from internal channels that remain invisible to standard output-level defenses.

14.
arXiv (CS.AI) 2026-06-15

A Two-Stage Statistical Framework for Evaluating Associative Interference in Large Language Models

arXiv:2606.14117v1 Announce Type: cross Abstract: Large language models (LLMs) are increasingly evaluated for bias using adaptations of human psychological paradigms, yet methodological limitations-particularly the conflation of refusal behavior with task performance-have hindered clear interpretation. Here, we adapt the Implicit Association Test (IAT) to a controlled, forced-choice framework and introduce a two-stage modeling approach that separates response compliance from task-consistent classification. Across three contemporary LLMs (Claude Sonnet-4, Gemini 2.5 Pro, and GPT-5), we evaluate associative interference, defined as reduced task-consistency in incongruent relative to congruent conditions. While compliance with the structured response format was uniformly high, interference effects varied substantially across models and domains. Claude Sonnet-4 exhibited strong interference in the Gender–Career domain (DeltaP = 0.086, 95% CrI [0.026, 0.173]) and smaller but credible effects in Gender–Science. Gemini 2.5 Pro showed attenuated interference, and GPT-5 exhibited minimal or no detectable interference across domains. These findings demonstrate that IAT-style associative asymmetries are not a universal property of LLMs, but instead depend on model-specific characteristics. By isolating interference from compliance and modeling item-level variability, this study provides a principled framework for evaluating structured response patterns in LLMs. The results highlight the importance of model-specific assessment and suggest that associative interference can be substantially mitigated in modern systems.

15.
arXiv (math.PR) 2026-06-16

On the Smoluchowski-Kramers approximation for the hyperbolic $O(N)$ linear sigma model and its mean-field limit

arXiv:2606.15214v1 Announce Type: cross Abstract: We study the hyperbolic $O(N)$ linear sigma model, i.e. a system of $N$ interacting stochastic damped nonlinear wave equations (SdNLW) with coupled cubic nonlinearities, posed on the two-dimensional torus and indexed by a parameter $\varepsilon > 0$. We show that as $\varepsilon$ goes to zero (Smoluchowski-Kramers approximation) and $N$ goes to infinity (mean-field limit), each component of the solution to the SdNLW system converges to the solution to the stochastic nonlinear heat equation (SNLH) with a mean-field nonlinearity. We prove such convergence via two regimes: first with $\varepsilon$ going to zero to obtain the parabolic $O(N)$ linear sigma model, i.e. a system of $N$ coupled SNLH, and then with $N$ going to infinity; or first with $N$ going to infinity for each component to obtain the mean-field SdNLW and then with $\eps$ going to zero. As a result, we obtain a commutative diagram regarding the convergence from the hyperbolic $O(N)$ linear sigma model to the mean-field SNLH.

16.
arXiv (CS.AI) 2026-06-16

Phys-JEPA: Physics-Informed Latent World Models for Multivariate Time-Series Forecasting

arXiv:2606.16076v1 Announce Type: cross Abstract: Multivariate forecasting in physical systems requires models that predict coupled temporal variables while preserving meaningful state evolution. Deep forecasters can fit temporal correlations, and physics-informed models can regularize predictions with scientific constraints, but these directions are often connected only at the decoded-output level. As a result, the hidden predictive state that generates future trajectories may remain statistically useful but physically unstructured. We introduce Phys-JEPA, a physics-informed joint-embedding predictive architecture for multivariate time-series forecasting. Phys-JEPA learns a latent world model in which predictive states are decomposed into physical and residual components, and physical consistency is imposed directly on latent states and latent transitions rather than only on decoded forecasts. This formulation uses known physical variables to organize the representation space while retaining residual capacity for unresolved dynamics. On Jena Climate 2009–2016, Phys-JEPA reduces aggregate MSE from 0.12482 to 0.12273 and temperature MSE from 0.01892 to 0.01831 at H=24. On Traffic, full Phys-JEPA improves aggregate MSE over the supervised baseline across all tested horizons, reducing H=192 MSE from 0.800784 to 0.773873. On Electricity, the best variant depends on horizon: static latent consistency is strongest at H=24 and H=48, while full Phys-JEPA gives the best aggregate and target-variable MSE at H=192. These initial results suggest that moving physics-informed learning from output space to latent predictive state space is a promising direction for interpretable temporal world models.

17.
bioRxiv (Bioinfo) 2026-06-14

Somatic variant detection in normal tissues from single-cell sequencing data

A crucial advantage of single-cell sequencing (SCS) is its ability to identify somatic variants in individual cells, enabling phylogenetic analysis of cellular populations within bulk tissues. While identifying somatic variants in tumor tissues via SCS has become a common practice, doing so in normal tissues remains challenging due to the rarity of somatic variants in normal cells. To evaluate the feasibility of somatic variant calling from widely available single-nucleus RNA-seq (snRNA-seq) and single-nucleus ATAC-seq (snATAC-seq) data, we profiled a Cell-line mix of six HapMap samples prepared by the SMaHT consortium using 10x Genomics 5' snRNA-seq (12k cells with 36k mean reads per cell) and snATAC-seq (11k cells with 14k median high-quality fragments per cell) for variant calling. PacBio long-read whole genome sequencing (WGS) data (109x) generated from individual cell lines were used as ground truth. Two computational tools, Monopogen and SComatic, were used for somatic variant calling from the SCS data. Monopogen achieved single nucleotide variant (SNV) detection accuracies of 93.30% in the snRNA-seq and 99.64% in the snATAC-seq data, both of which outperformed SComatic (74.35% and 94.29%, respectively). Monopogen also consistently detected somatic SNVs at cellular fractions as low as 0.5% (2.54% in snRNA and 0.81% in snATAC) in individual samples. Notably, snATAC-seq exhibited higher genomic coverage breadth and larger number of variants detected than snRNA-seq. While the SCS data have lower overall genome coverage than that of the bulk WGS, the single-cell level variant resolution allows Monopogen to assign variants to their cells of origin with over 80% accuracy in both RNA and ATAC modalities, thereby facilitating studies of clonal evolution and cell-type-specific mutagenesis. Other benchmarking methods were also evaluated (DeepVariant, Cellsnp-lite and Mutect2) for comparison. In conclusion, our study demonstrated the feasibility of performing reliable single-cell somatic mutation calling in a cell-line mixture and discussed the strengths and limitations of current computational methods when applied to normal tissues.

18.
arXiv (CS.AI) 2026-06-16

Distilling Drifting Transformers with Representation Autoencoders

arXiv:2606.15553v1 Announce Type: cross Abstract: Representation Autoencoders (RAEs) have improved diffusion and flow models by semantically richer latent space owing to the strongly label-wise clustered DINO features in the pretrained encoders. Yet in the distillation stage, the severe anisotropy and large curvatures caused by the rich semantic representations would hinder the convergence and performance, making the trajectory-based distillation unstable. In this work, we argue that the RAE latent space is compatible with distillation via the newly proposed Drifting Models. We first quantitatively study the curvatures and isotropy statistics across different autoencoders, and theoretically reveal that Drifting Model itself is highly likely to fail on extremely scattered spaces like reconstruction-based VAEs. These motivate us to apply the drifting paradigm directly to representation autoencoders. Our proposed method, Drift-RAE, distills pretrained flow models in RAE latent spaces using Drifting, together with insightful modifications that improve training stability by thereotically aligning drifting fields with other frameworks. Regarding the experimental evidences, we achieve 1.77 FID on ImageNet 256 dataset using only 10k distillation steps, surpassing state-of-the-art RAE distillation methods and appearing comparative with the original Drifting Model without requiring an auxiliary MAE feature extractor. The code will be made publicly available.

19.
arXiv (CS.CL) 2026-06-18

Trade-offs in Medical LLM Adaptation: An Empirical Study in French QA

The development of large language models (LLMs) has led to an increased focus on their adaptation to specialized domains and languages, yet the effectiveness of domain adaptation strategies remains unclear. We present a study of medical domain adaptation using French medical question-answering (QA) as a case study. We compare continual pretraining (CPT), supervised fine-tuning (SFT), and their combination across three model families, multiple sizes, and three initialization types, explicitly disentangling adaptation effects from base model choice. We evaluate both multiple-choice (MCQA) and open-ended QA (OEQA) under greedy and constrained decoding using automatic metrics and LLM-as-a-Judge evaluation. For MCQA, CPT+SFT most often achieves the best scores, but gains over SFT are small and frequently not statistically significant, making SFT a strong and cost-effective default. For OEQA, CPT consistently improves overlap-based metrics, while SFT often degrades generation quality; instruction tuning and CPT+SFT are preferred by LLM-based evaluation. Cross-lingual experiments further show effective transfer from French adaptation to English benchmarks. Overall, we provide practical guidelines for selecting adaptation strategies under computational constraints.

20.
arXiv (CS.AI) 2026-06-24

SEAL: Searching Expandable Architectures for Incremental Learning

arXiv:2505.10457v3 Announce Type: replace-cross Abstract: Incremental learning is a machine learning paradigm where a model learns from a sequential stream of tasks. This setting poses a key challenge: balancing plasticity (learning new tasks) and stability (preserving past knowledge). Neural Architecture Search (NAS), a branch of AutoML, automates the design of the architecture of Deep Neural Networks and has shown success in static settings. However, existing NAS-based approaches to incremental learning often rely on expanding the model at every task, making them impractical in resource-constrained environments. In this work, we introduce SEAL, a NAS-based framework tailored for data-incremental learning, a scenario where disjoint data samples arrive sequentially and are not stored for future access. SEAL adapts the model structure dynamically by expanding it only when necessary, based on a capacity estimation metric. Stability is preserved through cross-distillation training after each expansion step. The NAS component jointly searches for both the architecture and the optimal expansion policy. Experiments across multiple benchmarks demonstrate that SEAL effectively reduces forgetting and enhances accuracy while allocating additional capacity only when required. These results highlight the promise of combining NAS and selective expansion for efficient, adaptive learning in incremental scenarios.

21.
arXiv (CS.CL) 2026-06-19

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

Reinforcement learning (RL) is a central approach for improving reasoning capabilities in large language models (LLMs), where training efficiency depends critically on how problems are sampled during optimization. Existing adaptive curriculum learning methods typically prioritize prompts of intermediate difficulty, treating problem selection as a standard bandit problem with independent arms and overlooking the structured, heterogeneous nature of the task space. In this work, we frame problem sampling as a manifold-structured bandit problem with endogenous non-stationarity: problems are related through the model's latent representation space, and sampling decisions can steer how learning signals evolve across that space. To operationalize this perspective, we introduce Bayesian Manifold Curriculum (BMC), a structure-aware framework that organizes problems into a hierarchical task tree and applies Bayesian learning to guide sampling. Empirically, we find that different sampling strategies induce non-trivial tradeoffs between productivity (learning signal), diversity (coverage of the task manifold), and utility (evaluation relevance). These results show that prioritizing difficulty alone is insufficient for strong downstream performance, highlighting the importance of incorporating structure and type-awareness into problem sampling.

22.
arXiv (CS.CV) 2026-06-17

Bounding Box Label Propagation for Re-Annotation of Document Layout Analysis Datasets

Datasets in practical document processing scenarios typically grow over time, and their class annotations undergo continuous refinement. This creates significant re-annotation efforts, which are time-consuming and costly. A promising remedy is to re-annotate only a small subset of available documents manually and apply semi-supervised learning techniques that leverage both labelled and unlabelled data. Although there are numerous approaches to tackle this problem for classification, there exists no adaptation for the problem of re-classifying object detection instances, e.g. for document layout analysis. To this end, we propose Bounding Box Label Propagation (BBLP), a pseudo-labelling framework for object detection. An object encoder integrates visual, textual, and positional embeddings from object detection samples to come up with a joint embedding that can be used for Label Propagation on partially annotated datasets in a plug-and-play fashion. Evaluation results indicate that the proposed approach produces high-quality class annotations of bounding boxes. In the D4LA layout analysis dataset, it achieves a mAP of 54.0%, corresponding to 81.6% of fully supervised performance, while using only 10% labelled data. Our work demonstrates the potential of Label Propagation for object detection and lays the groundwork for reducing manual annotation efforts in real-world document processing applications.

23.
arXiv (CS.AI) 2026-06-11

LSTM-Based Detection of Structural Breaks in Property Insurance Loss Reserving: A Climate-Informed Approach

arXiv:2606.11463v1 Announce Type: cross Abstract: Accurate loss reserving is foundational to insurer solvency, yet accelerating climate driven catastrophes systematically violate the stability assumptions on which traditional actuarial methods depend. This white paper presents a research program testing whether Long Short Term Memory (LSTM) neural networks can detect and adapt to these structural breaks faster and more accurately than Chain Ladder, Bornhuetter Ferguson, and Cape Cod methods. Using 15 plus years of regulatory development triangle data from Florida and Louisiana, enriched with NOAA hurricane intensity indices and sea surface temperatures, we hypothesize a targeted improvement of 15, 20% in reserve accuracy for catastrophe exposed years, a threshold grounded both in the prior neural network reserving literature and in the formal convergence results developed here. Beyond empirical validation, we develop a theoretical framework grounding LSTM structural break detection in probabilistic terms, providing formal performance guarantees that compensate for the limited number of catastrophe events in the test period. We document the research design, methodology, expected contributions, and a candid assessment of limitations.

24.
arXiv (CS.CL) 2026-06-16

FinBalance: A Multi-Document Accounting Reconciliation Benchmark

Existing financial-NLP benchmarks mostly evaluate prepared artifacts such as filings, tables, or extracted values. Real accounting begins earlier: source documents must be reconciled into cited journal entries, aggregated into a balance sheet, and checked for contradictions. We introduce FinBalance, a multi-document accounting reconciliation benchmark built from source-document bundles across eight industries, three period types, and five difficulty levels. Human-authored business scenarios, accounting policies, tax/FX treatments, document schemas, distractors, and inconsistency templates are composed by a deterministic generator whose ledger produces journal entries,balance sheets, and 23 inconsistency-code labels. On a 710-record evaluation split, six contemporary LLMs reach at most 46% exact final-balance-sheet accuracy. Four models show a 26-41 pp gap between BS_exact, the model's reported balance sheet, and BS_recon, the balance sheet obtained by replaying its entries through our ledger. Models often recover numerically plausible entries but fail to bind them to supporting documents and aggregate them consistently. Citation-pressure prompting barely changes document-linking errors, while ledger-feedback ablations substantially improve reported balance sheets and expose inconsistency-detection trade-offs. Expert finance reviewers validate the benchmark design and labels.

25.
arXiv (CS.LG) 2026-06-19

PaAno+: Multiscale Encoding and Cross-Variable Attention for Time Series Anomaly Detection

arXiv:2606.20055v1 Announce Type: new Abstract: Time-series anomaly detection has significant practical value for industrial and medical monitoring, as well as other critical domains. Current Transformer- and large-model-based detection approaches incur excessive computational overhead, while existing lightweight alternatives are constrained by insufficient feature extraction and inadequate modeling of dependencies across multivariate variables. To mitigate the above drawbacks, this study develops a lightweight, efficient anomaly detection model, dubbed PaAno, within the patch-oriented representation learning paradigm. In the encoder module, a multiscale feature-extraction backbone is constructed using convolutional kernels with differentiated receptive fields to capture hierarchical temporal characteristics; subsequent cross-scale adaptive attention aggregation, combined with residual connection optimization, further stabilizes feature representation learning. A cross-variable fusion attention module is embedded to explicitly characterize inter-variable correlations, empowering the model to identify anomalous patterns amid intricate operational conditions. Moreover, a novel pretext task based on temporal patch-window sorting is customized to uncover intrinsic structural properties of time series, and triplet loss is leveraged to optimize the patch embedding space for enhanced feature discrimination. Extensive experiments on the TSB-AD benchmark demonstrate that the proposed PaAno achieves state-of-the-art detection accuracy on both univariate and multivariate tasks, yielding significant performance gains across evaluation metrics, including VUS-PR, relative to the original PaAno. Leveraging a compact network design, the presented model achieves favorable computational efficiency, enabling deployment on resource-limited terminals for real-time anomaly inference.