Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-12

Generalized Schrödinger Bridge on Graphs

arXiv:2602.04675v2 Announce Type: replace Abstract: Transportation on graphs is a fundamental challenge across many domains, where decisions must respect topological and operational constraints. Despite the need for actionable policies, existing graph-transport methods lack this expressivity. They rely on restrictive assumptions, fail to generalize across sparse topologies, and scale poorly with graph size and time horizon. To address these issues, we introduce Generalized Schrödinger Bridge on Graphs (GSBoG), a novel scalable data-driven framework for learning executable controlled continuous-time Markov chain (CTMC) policies on arbitrary graphs under state cost augmented dynamics. Notably, GSBoG learns trajectory-level policies, avoiding dense global solvers and thereby enhancing scalability. This is achieved via a likelihood optimization approach, satisfying the endpoint marginals, while simultaneously optimizing intermediate behavior under state-dependent running costs. Extensive experimentation on challenging real-world graph topologies shows that GSBoG reliably learns accurate, topology-respecting policies while optimizing application-specific intermediate state costs, highlighting its broad applicability and paving new avenues for cost-aware dynamical transport on general graphs.

02.
arXiv (CS.AI) 2026-06-18

Self-Evolving Multi-Agent Systems via Textual Backpropagation

arXiv:2506.09046v3 Announce Type: replace-cross Abstract: Leveraging multiple Large Language Models (LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network (ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative team focused on a specific subtask. Our framework follows a two-phase optimization strategy: (1) Forward Phase - Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase - Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables our framework to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across seven benchmark datasets, our work surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements.

03.
arXiv (CS.LG) 2026-06-15

Ensembling Sparse Autoencoders

arXiv:2505.16077v2 Announce Type: replace Abstract: Sparse autoencoders (SAEs) are used to decompose neural network activations into human-interpretable features. Typically, features learned by a single SAE are used for downstream applications. However, it has recently been shown that a single SAE captures only a limited subset of features that can be extracted from the activation space. Motivated by this limitation, we introduce and formalize SAE ensembles. Furthermore, we propose to ensemble multiple SAEs through naive bagging and boosting. In naive bagging, SAEs trained with different weight initializations are ensembled, whereas in boosting SAEs sequentially trained to minimize the residual error are ensembled. Theoretically, naive bagging and boosting are justified as approaches to reduce reconstruction error. Empirically, we evaluate our ensemble approaches with three settings of language models and SAE architectures. Our empirical results demonstrate that, compared to an expanded SAE that matches the number of features in the ensemble, ensembling SAEs improves the reconstruction of language model activations along with SAE stability. Additionally, on downstream tasks such as concept detection and spurious correlation removal, SAE ensembles achieve better performance, showing improved practical utility.

04.
PLOS Computational Biology 2026-06-12

A new method for augmenting short time series, with application to pain events in sickle cell disease

Authors:

by Kumar Utkarsh, Nirmish R. Shah, Tanvi Banerjee, Daniel M. Abrams Researchers across different fields, including but not limited to ecology, biology, and healthcare, often face the challenge of sparse data. Such sparsity can lead to uncertainties, estimation difficulties, and potential biases in modeling. Here we introduce a novel data augmentation method that combines multiple sparse time series datasets when they share similar statistical properties, thereby improving parameter estimation and model selection reliability. We demonstrate the effectiveness of this approach through validation studies comparing Hawkes and Poisson processes, followed by application to subjective pain dynamics in patients with sickle cell disease (SCD), a condition affecting millions worldwide, particularly those of African, Mediterranean, Middle Eastern, and Indian descent.

05.
arXiv (CS.AI) 2026-06-18

R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations

arXiv:2510.18085v2 Announce Type: replace-cross Abstract: Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.

06.
bioRxiv (Bioinfo) 2026-06-11

Calibrated Uncertainty Quantification for Patient-Level AML Drug Sensitivity Prediction Using Split Conformal Prediction

Accurate prediction of ex vivo drug sensitivity in acute myeloid leukemia (AML) patients from transcriptomic data is a critical challenge for precision oncology. Existing computational approaches have explored uncertainty quantification in cancer drug response prediction primarily using cell line data, while patient-level AML models typically rely on heuristic confidence measures rather than statistically calibrated uncertainty estimates. Here, we present a framework applying split conformal prediction to patient-level AML drug response modeling using the BeatAML 2.0 cohort. We trained Elastic Net and XGBoost regressors on bulk RNA-seq gene expression profiles from 318 AML patients, analyzing 34,764 patient-drug observations across 122 compounds. Baseline models achieved median Pearson R values of 0.291 (Elastic Net) and 0.281 (XGBoost) across 122 drugs. Wrapping these models with split conformal prediction yielded well-calibrated prediction intervals across three confidence levels: empirical coverages of 81.4%, 90.7%, and 95.5% against nominal targets of 80%, 90%, and 95%, respectively. Analysis of prediction interval widths revealed substantial drug-class-specific uncertainty patterns, with HDAC and BCL-2 inhibitors exhibiting markedly higher uncertainty than MDM2 inhibitors, suggesting a potential association between transcriptomic predictability and drug mechanism of action, although several drug classes were represented by only a small number of compounds. Predictive uncertainty was not significantly associated with ELN2017 molecular risk classification (Kruskal-Wallis p=0.395) or NPM1 mutation status (p=0.788). These results demonstrate that statistically valid uncertainty quantification can be achieved for patient-level AML drug response prediction despite substantial biological heterogeneity. to the best of our knowledge, no published study has applied split conformal prediction to patient-level ex vivo drug sensitivity prediction in the BeatAML cohort, providing a principled alternative to heuristic confidence scoring approaches. Keywords: Acute myeloid leukemia (AML); Ex vivo drug sensitivity; Conformal prediction; Uncertainty quantification; Precision oncology; BeatAML; Transcriptomic biomarkers; Machine learning.

07.
arXiv (CS.CL) 2026-06-15

MedLatentDx: Latent Multi-Agent Communication for Cross-Hospital Rare-Disease Diagnosis

Rare diseases affect over $300$ million patients across more than $7{,}000$ conditions, yet no single hospital encounters enough cases of any one condition for reliable diagnosis. Cross-hospital collaboration could help by allowing a diagnosing institution to use distributed, case-specific diagnostic evidence, but privacy regulations restrict the transmission of identifiable clinical text across institutional boundaries. This setting raises two challenges: existing medical agent systems often rely on textual evidence exchange, while raw latent states such as hidden states and KV caches may still reveal prompt-derived clinical content. We introduce MedLatentDx, a latent multi-agent communication framework in which hospital agents keep private clinical records and retrieved cases local, and send compact latent KV blocks to a host agent for rare-disease diagnosis. MedLatentDx supports two deployment settings: same-backbone hospital agents use latent KV distillation, while hospitals with different LLM backbones use cross-family latent alignment. On CrossRare-Bench, a self-built large-scale rare-disease benchmark with hospital-level partitions, MedLatentDx improves cross-hospital diagnostic performance while reducing reconstructable clinical content relative to raw-latent communication baselines.

08.
arXiv (CS.CL) 2026-06-16

Who Should Lead Decoding Now? Tracking Reliable Trajectories for Ensembling Masked Diffusion Language Models

Masked Diffusion Language Models (MDLMs) have emerged as a distinct paradigm for sequence generation. As MDLMs become diverse in capabilities and knowledge coverage, an important question is how to combine their knowledge. Toward this, we first investigate the unique decoding dynamics of MDLMs. We find that successful generations exhibit stable confidence dynamics over answer-relevant positions, while unreliable trajectories can often be corrected by injecting promising intermediate states from other models. Guided by this observation, we propose $TIE$ ($T$rajectory-based $I$terative $E$nsembling), a knowledge fusion framework in which MDLMs iteratively identify reliable decoding trajectories and relay them across models. TIE tracks confidence dynamics over answer-relevant positions to determine which model currently follows a more reliable trajectory and selectively transfers partially denoised sequences across models. As the model on the more promising trajectory often changes across denoising steps, TIE allows different models to contribute complementary strengths at different stages of generation. Strong performance across diverse reasoning tasks, along with our analyses, suggests that TIE offers a practical approach to the underexplored problem of MDLM ensembling.

09.
medRxiv (Medicine) 2026-06-15

Investigation of Intra-Fraction Stability and Inter-Fraction Reproducibility of Deep Inspiration Breath-Hold Across Two Hypofractionated Radiotherapy Regimens in the HYPORT Adjuvant Study.

Background: Deep Inspiration Breath Hold (DIBH) is a widely used respiratory motion management technique for minimizing cardiac dose in left-sided breast radiotherapy. In the Breast HYPORT Adjuvant study, DIBH was employed for cardiac sparing in patients without nodal irradiation using a standardized institutional protocol with the Varian Real-time Position Management (RPM) system. Both moderate-hypofractionation (control arm - 40Gy in 15 fractions) and one-week hypofractionation (experimental arm - 26 Gy in 5 fractions) regimens were delivered using this protocol. This study aimed to evaluate the robustness of DIBH by analyzing intra-fraction stability and inter-fraction reproducibility of breath-hold amplitude across the two treatment regimens. Methods: Respiratory waveforms acquired during each treatment session were analyzed to determine the median breath-hold amplitude and its standard deviation during beam delivery. Intra-fraction stability was assessed from vari- ations within individual treatment sessions, while inter-fraction reproducibility was evaluated relative to the simula- tion waveform amplitude across all treatment sessions. These parameters were compared between the two HYPORT regimens to examine breath-hold consistency during treatment delivery. Moreover, an additional comparison was made between the one-week hypofractionation regimen and the first five fractions of the moderate-hypofractionation regimen to evaluate the effect of treatment duration . Lung volumes from free-breathing and DIBH CT scans were analyzed to assess the effectiveness of patient breath-hold training. Results: Both arms demonstrated an average 1.7-fold increase of air volume in lung during the breath-hold position, confirming the effective implementation of DIBH during treatment planning and delivery. Structured training resulted in increased breath-hold amplitudes, with gains of 22.87% and 24.16% with respect to the first trial session in the experimental and control arms, respectively. Both regimens receive equivalent doses for approximately the same air volume in lung . Despite the different prescription doses in the two arms (26 Gy vs. 40 Gy), the experimental arm achieved an equivalent mean heart dose of 2.91% (75.6 cGy) compared with 2.95% (118.51 cGy) in the control arm, suggesting a similar cardiac preservation protocol adopted during treatment planning. Intra-fraction stability was similar between the control arm and the experimental arm, with median amplitude variations of 1.006 mm (95% CI: [0.998-1.015]) and 1.079 mm (95% CI: [1.067-1.097]), respectively. In contrast, inter-fraction reproducibility improved in the experimental arm, with lower deviation from simulation amplitude (0.44 {+/-} 0.24 mm vs. 0.66 {+/-} 0.25 mm) for the entire treatment schedule. The stability and reproducibility of experimental arm were further compared with the first five fractions of the control arm. The results were similar to those of the experimental arm. Conclusion: In this study, we compared two treatment regimens in terms of intra-fraction stability and inter-fraction reproducibility during DIBH radiotherapy. Both regimens demonstrated comparable intra-fraction stability, indicating effective motion management irrespective of treatment duration. However, the experimental arm showed better inter- fraction reproducibility, suggesting more consistent breath-hold performance throughout the treatment course. Based on stability and reproducibility, a reasonable narrowing of the DIBH gating window may be implemented with minor changes to the institutional protocol. The observed trend highlights the potential for improved consistency with the experimental approach and supports further investigation to better understand the underlying factors and strengthen these findings in future studies.

10.
arXiv (quant-ph) 2026-06-11

Experimental straintronics in nanotube quantum dots

arXiv:2606.12180v1 Announce Type: cross Abstract: Single-wall carbon nanotubes (SWCNTs) are narrow ribbons of graphene with atomically precise edges and a single quantum transport channel, at experimentally-relevant dopings. This makes them ideal systems to harness quantum transport straintronics (QTS), i.e. using mechanical strain to control accurately quantum transport. We present QTS data from three single-wall carbon nanotube quantum dot (SWCNT-QD) transistors over a broad range of in-situ tunable and reversible uniaxial strain ($\Delta\varepsilon_mech\approx$ 0 to 3 %). We first present the nanofabrication of the suspended SWCNT transistors whose channel lengths are $\approx$ 30 nm. The channels are strained by moving gold clamps holding firmly the nanotubes. We present detailed charge transport data, $dI/dV_{B} - V_{B} - V_{G}$ and $dI/dV_{B} - V_{B} - \Delta\varepsilon_mech$, showing a large mechanical-gating effect of the SWCNT-QDs. The precise reversibility of the data, and their agreement with QTS theory, confirms that the tubes are strained elastically. We demonstrate that the mechanical control of the QD doping is not due to capacitive-gating effects, but to quantitatively predictable bandstructure changes including a strain-tunable bandgap. This precise mechanical control of the doping and bandgap of SWCNT-QDs could find applications in qubits, condensed matter physics, and homojunction molecular transistors.

11.
arXiv (CS.LG) 2026-06-16

Unsupervised Learning for Missing Modalities in Multimodal Learning

arXiv:2606.15743v1 Announce Type: new Abstract: This paper addresses the missing-modality challenge in multi-modal learning by introducing Unsupervised Learning for Missing Modalities in Multi-Modal Learning (UL4M4), a flexible framework that imputes missing feature embeddings in a task-independent manner before supervised prediction. We propose modality-specific normalization and a novel partial-modality distance metric to enable fair clustering of incomplete observations, capturing cross-modal structures while preserving scale-invariance across varying dimensionalities and modality counts. Cluster centers from this unsupervised stage guide an iterative greedy imputation process for any missing modalities during training or inference, supporting arbitrary numbers of modalities and arbitrary missing patterns per sample. The imputation module is lightweight, uses frozen encoders, and decouples from the downstream task, allowing easy integration with any fusion/prediction architecture. Extensive experiments under diverse and highly incomplete regimes demonstrate UL4M4's robustness, achieving, to the best of our knowledge, the first consistent F1-Micro scores above 0.7 on challenging missing configurations even when more than 50\% of modality slots are missing. Results are also stable across cluster sizes and significantly outperform state-of-the-art baselines. Code is available here: https://github.com/h-ismkhan/Multimodal-Learning-with-Missing-Modalities-via-Unsupervised-Learning.

12.
arXiv (CS.CV) 2026-06-16

Federated Medical Image Segmentation under Real-World Label Noise: A Benchmark Suite for Noisy Label Learning Method Selection

While federated learning (FL) enables collaborative medical image segmentation without centralizing sensitive data, real-world deployment is frequently complicated by cross-site label imperfections such as contour disagreement, missing or additional structures, and confused labels. Federated noisy label learning (FNLL) aims to mitigate these effects, yet remains underused in practice as existing evidence is largely based on synthetic noise, simplified settings, and limited real-world noisy evaluation. We address this gap by introducing a benchmark suite that combines diverse real-world noisy datasets, deployment-relevant client-noise scenarios, and label-noise-targeted evaluation to support systematic FNLL assessment and informed method selection. The suite combines curated real-world noisy medical image segmentation datasets from diverse sources with a comprehensive federated segmentation framework including various client-noise scenarios and noise-targeted evaluation. The presented suite provides a realistic and discriminative basis for FNLL evaluation in medical image segmentation and establishes a reusable foundation for fair benchmarking, dataset-specific label-noise characterization, and future method development under realistic federated settings. Code is available at https://github.com/MIC-DKFZ/FedSegNoiseBench.

13.
bioRxiv (Bioinfo) 2026-06-11

GeroEngine: Generative single-cell aging trajectories reveal a bidirectionally traversable identity core and direction-specific inflammatory remodeling

Authors:

Single-cell RNA sequencing (scRNA-seq) maps aging tissues at high resolution but is destructive, preventing longitudinal tracking; dropout and zero-inflation artifacts, amplified by shift-invariant linear simulations, confound age-associated variability. We developed GeroEngine, a technical-artifact-aware framework combining VAE-based trajectory simulation, LOPO cross-validation, linear baselines, reverse traversal, and reverse-directed network inference. In microglia and HSCs, the VAE reduced technical-artifact carryover while preserving trajectory heterogeneity and improving alignment to artifact-reduced reference manifolds. Consensus GeroTargets and GeroRegulators defined tissue-specific GeroNetworks organized into three pillars: lineage/replication identity collapse, a sex-dimorphic endocrine/stress core, and inflammatory remodeling. Forward and reverse simulations aligned to the common young[->]old aging axis revealed a sign-coherent, direction-specific program: identity/replication targets were bidirectionally recovered, whereas MHC/NF-{kappa}B inflammatory programs were preferentially forward-recovered. These results support identity collapse as a deep traversable core of aging and nominate upstream homeostatic restoration over downstream inflammatory suppression.

14.
arXiv (CS.CV) 2026-06-19

LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer

Recent advances in multimodal foundation models unifying image understanding and generation have opened exciting avenues for tackling a wide range of vision-language tasks within a single framework. Despite progress, existing unified models typically require extensive pretraining and struggle to achieve the same level of performance compared to models dedicated to each task. Additionally, many of these models suffer from slow image generation speeds, limiting their practical deployment in real-time or resource-constrained settings. In this work, we propose Layerwise Timestep-Expert Flow-based Transformer (LaTtE-Flow), a novel and efficient architecture that unifies image understanding and generation within a single multimodal model. LaTtE-Flow builds upon powerful pretrained Vision-Language Models (VLMs) to inherit strong multimodal understanding capabilities, and extends them with a novel Layerwise Timestep Experts flow-based architecture for efficient image generation. LaTtE-Flow distributes the flow-matching process across specialized groups of Transformer layers, each responsible for a distinct subset of timesteps. This design significantly improves sampling efficiency by activating only a small subset of layers at each sampling timestep. To further enhance performance, we propose a Timestep-Conditioned Residual Attention mechanism for efficient information reuse across layers. Experiments demonstrate that LaTtE-Flow achieves strong performance on multimodal understanding tasks, while achieving competitive image generation quality with around 6x faster inference speed compared to recent unified multimodal models.

15.
arXiv (CS.LG) 2026-06-15

High-Frequency Pricing at Scale for E-Commerce

arXiv:2606.13741v1 Announce Type: new Abstract: This paper presents the design, development, and implementation of a specialized forecast-then-optimize algorithmic pricing tool for sales campaigns in fashion e-commerce. Sales events present unique challenges for pricing including volatile demand patterns, rapid pricing decisions, and the need to balance short-term revenue with long-term profitability. We describe our approach combining daily-resolution demand forecasting using gradient-boosted trees with a multi-objective optimization framework that maximizes both long-term profit and net merchandise value for more than 5 million articles. Our solution addresses key limitations of existing weekly-granularity systems by implementing a forecast-then-optimize architecture that reduces pricing decision time from hours to minutes. We validate our approach through 23 A/B tests across 12 markets during 2023-2024 sales campaigns at Zalando, one of Europe's leading online fashion retailers. Experimental results demonstrate that the new pricing system achieves approximately 6% higher profit while maintaining equivalent performance on sales and revenue compared to the previous manual-algorithmic hybrid approach. Based on these results, the algorithm was successfully deployed to production and now handles the majority of algorithmic pricing decisions for sales campaigns at the company.

16.
medRxiv (Medicine) 2026-06-23

Uptake of minimal intervention dentistry among Romanian dental professionals and trainees: an exploratory cluster and network analysis

Background Minimal intervention dentistry (MID) is promoted as a prevention-oriented approach to caries management, but its integration into routine practice remains uneven. Existing research often examines MID-related knowledge, attitudes, or practices separately, offering limited insight into how these dimensions co-occur within individuals or are conditionally associated. Methods This exploratory cross-sectional survey examined multidimensional MID uptake among 327 Romanian dental students, residents, and specialists from five university centers. Ten MID-related scores were analyzed, including nine formative composites and one single-item peer-norm indicator. K-means clustering examined uptake profiles, and Gaussian graphical model network analysis with stepwise BIC selection examined conditional associations among constructs. Results A two-cluster solution was highly reproducible but modestly separated (n = 144 vs n = 183; average silhouette width = 0.13; mean Jaccard similarities = 0.92 and 0.94). The profiles reflected broadly lower versus higher uptake across knowledge-, belief-, and practice-related dimensions, while perceived peer norms for hygiene instruction showed the opposite pattern. Profile membership was not clearly patterned by gender, age band, professional status, or clinical experience. The primary network included 14 non-zero edges out of 36 possible edges, all positive; the strongest partial association linked diagnostic knowledge to diagnostic methods used in practice (partial r = .22). Familiarity, diagnostic knowledge, and general practices occupied more interconnected positions descriptively, but limited centrality stability precluded interpreting them as intervention targets. Conclusions MID uptake in this sample was better represented as a continuum of modestly differentiated profiles than as sharply separated participant types. The findings provide an exploratory map of multidimensional MID uptake and may inform future survey validation, implementation research, and dental education studies. Because the study was cross-sectional, convenience-sampled, and based on self-report, findings should be interpreted as hypothesis-generating rather than causal or population-representative.

17.
arXiv (math.PR) 2026-06-24

History estimation in random recursive trees: Pointwise approach via iterated Jordan centralities

arXiv:2606.24465v1 Announce Type: new Abstract: We study the problem of estimating the arrival times of vertices in a uniform random recursive tree from its unlabeled structure. We adopt a pointwise perspective and analyze the distribution of the relative estimation error, and derive tail bounds that are uniform in both the vertex and the tree size. For the ranking induced by Jordan centrality, the probability that the estimate exceeds the true arrival time by a factor $S$ decays on the order of $1/S$, while the probability of underestimating the arrival time by a factor $1/S$ decays exponentially in $S$. We introduce a refined centrality measure whose overestimation tail decays on the order of $(\log S)/S^{2}$, at the cost of a heavier lower tail of order $1/S^{2}$. These results reveal a tradeoff between upper- and lower-tail performance in arrival-time estimation that is invisible to the previously studied risk functional. Nevertheless, the refined centrality measure attains the optimal order of the risk for all its parameter values.

18.
arXiv (CS.CL) 2026-06-16

Prior over Evidence: Stereotype-Driven Diagnosis in LLM-Based L2 Pronunciation Feedback

Large language models are increasingly deployed for written pronunciation feedback in second-language (L2) English learning, under the assumption that their diagnoses are grounded in the supplied speech evidence rather than in priors from pretraining. This assumption is tested on 1,800 L2-Arctic utterances spanning six L1 backgrounds, three audio-capable LLMs, four pronunciation dimensions, and five evidence conditions ranging from a text-only baseline to numeric acoustic features and raw audio. Each (utterance x model x condition x dimension) cell is scored on three metrics: Rating Accuracy (RA) against gold labels, Evidence Coherence (EC) assessing internal consistency without ground truth, and Grounded Correctness (GC) evaluated against gold evidence. Results show three findings across models. First, rating accuracy and grounded reasoning decouple: 39.6% of judged cells contain internally coherent reasoning that supports a wrong rating, against only 15.8% where the reasoning supports a correct rating. Second, phoneme-level feedback converges to a fixed inventory of L2-English difficulty phones that recurs across all six L1 backgrounds and all evidence conditions. Third, acoustic evidence improves the rating only when the supplied feature directly probes the target dimension: textualised F0 range raises pitch-variation grounding from (0.18-0.19) to (0.45-0.62) across all three models, while stress and phoneme correctness, which require target-to-realisation alignment, remain ungrounded. The same audio waveform without textualised F0 values does not reproduce this improvement. These findings indicate that current general-purpose LLMs are more reliable as verbalisers of externally computed pronunciation evidence than as standalone diagnostic engines.

19.
arXiv (CS.AI) 2026-06-16

Post-Hoc Merging is Not Enough: Many-Shot Model Merging with Loss-Gap Balancing

arXiv:2606.16501v1 Announce Type: new Abstract: Model merging has become a practical post-training strategy for building a single multi-task large language model (LLM) by combining multiple task-specialized models. However, most existing approaches rely on post-hoc merging, in which task-specific models are merged only once after training. This one-shot aggregation often suffers from task interference, leading to information erasure across individual tasks. In this work, we show that replacing post-hoc merging with an iterative many-shot merging protocol is effective in improving multi-task performance. Building on this insight, we propose METIS, Mitigating Erasure from Task Interference for Stable many-shot merging. METIS is a loss-aware many-shot merging method that addresses information erasure in post-hoc merging through task-wise loss-gap weighting and consensus-based masking. Notably, METIS exhibits significant performance improvement on the worst-performing task, effectively mitigating information erasure. (Project page: https://imkyungjin.github.io/METIS/)

20.
arXiv (CS.CL) 2026-06-12

Detect, Remask, Repair: Diffusion Editing for Faithful Summarization of Evolving Contexts

Summaries of real-world events can become outdated as contexts evolve and new information arrives. A common response is to generate a new summary from the updated context, but full regeneration discards the previous draft, can obscure what changed, and may be unnecessary when only a few claims are unsupported. We study localized faithfulness repair: updating outdated spans in an existing summary while preserving supported content. We propose DETECT-REMASK-REPAIR, a diffusion-based framework that identifies, remasks, and repairs outdated regions with masked diffusion language models. To evaluate evolving-context summarization, we introduce StreamSum, a benchmark of synthetic event timelines. Experiments on DialogSum and StreamSum show that localized diffusion repair provides a controllable alternative to full rewriting: faithfulness-steered repair improves early drafts, one-step repair reduces repair cost to under half a second, with the framework enabling faithfulness-speed-preservation tradeoffs across datasets. We also find that the framework can provide a post-hoc correction step that improves faithfulness for autoregressive systems.

21.
arXiv (CS.AI) 2026-06-16

User as Code: Executable Memory for Personalized Agents

Authors:

arXiv:2606.16707v1 Announce Type: new Abstract: A personalized AI agent needs a user memory: a persistent model of who the user is, built across many conversations and consulted on each new one. Today this memory is almost always stored as unstructured text, a knowledge graph, or a flat store of facts, and consulted by retrieval – fetching the entries most similar to the current request. Such "bag-of-facts" memory recalls individual facts well, but because storing a fact and acting on it are separate steps, it struggles to resolve contradictions, aggregate over many records, or enforce rules. We argue that user memory should instead be executable. We introduce User as Code (UaC), a paradigm in which an agent's model of a user is a living software project: typed Python objects hold the user's state and ordinary Python functions encode the rules that govern it, so representing and reasoning about the user happen in one medium an interpreter can run. The enabling mechanism is a two-phase pipeline: an append-only log that never discards a fact, periodically checkpointed into typed code. This changes what memory can do. On standard long-term conversation benchmarks, UaC matches both a full-context upper bound and the strongest prior memory systems on recall (78.8% on LOCOMO). Its advantage emerges where representation matters most. On aggregate questions over a user's history – "how many international trips did I take last year?" – retrieval-based memory collapses (6-43%) while UaC stays near-perfect (99%), because the answer is a one-line computation over typed state rather than a search over text. And because its rules execute deterministically whenever the state changes, UaC can surface unsolicited, safety-critical alerts – such as a newly prescribed drug that conflicts with an allergy recorded months earlier – a capability query-driven memory cannot provide.

22.
medRxiv (Medicine) 2026-06-23

Comparative Evaluation of Machine Learning and Deep Learning Models for Early Prediction of Severe Acute Pancreatitis: A Multi-Model Study Using the 2012 Revised Atlanta Classification

Authors:

**Background:** Acute pancreatitis (AP) is a common gastrointestinal emergency with a subset of patients progressing to severe acute pancreatitis (SAP), which carries substantial morbidity and mortality. Current clinical severity scores such as BISAP, APACHE II, Ranson, and the Modified CT Severity Index require upon 48 hours of observation before reliable assessment is possible, limiting early triage. Machine learning (ML) approaches using routine admission laboratory values may enable earlier, more accurate prediction. **Methods:** We evaluated 11 models spanning three architectural families classical ML (Logistic Regression, Random Forest, Gradient Boosting), feedforward deep learning (MLP, Residual MLP, Attention MLP), and recurrent deep learning (LSTM, Stacked LSTM, Bidirectional LSTM, LSTM+Attention, CNN-LSTM) on a Chinese AP cohort of 722 patients (585 severe, 137 mild) labelled according to the 2012 Revised Atlanta Classification. Performance was assessed via 5-fold stratified cross-validation using AUC-ROC, F1 score, sensitivity, specificity, and PPV, with decision thresholds optimised for maximal F1. **Results:** Random Forest achieved the highest AUC of 0.877 (F1=0.917, sensitivity=96.8%, PPV=87.1%), followed closely by Gradient Boosting (AUC=0.874, F1=0.918). Classical ML models consistently outperformed deep learning counterparts. CNN-LSTM was the best recurrent model (AUC=0.777) but remained inferior to all classical approaches. LSTM-family models produced AUC values of 0.684-0.777, reflecting the cross-sectional tabular nature of the data. **Conclusions:** Random Forest provides robust, high-sensitivity early prediction of SAP severity using routine admission data. External prospective validation is required before clinical deployment. **Keywords:** acute pancreatitis; severity prediction; machine learning; random forest; deep learning; LSTM; Revised Atlanta Classification; early triage

23.
arXiv (quant-ph) 2026-06-15

Symplectic coherence: a measure of position-momentum correlations in quantum states

arXiv:2507.15738v2 Announce Type: replace Abstract: The interdependence of position and momentum, as highlighted by the Heisenberg uncertainty principle, is a cornerstone of quantum physics. Yet, position-momentum correlations have received little systematic attention. Motivated by recent developments in bosonic quantum physics that underscore their relevance in quantum thermodynamics, metrology, and computing, we establish a general framework to study and quantify position-momentum correlations in quantum states. We introduce symplectic coherence, a faithful and easily computable measure defined as the Frobenius norm of the block of the covariance matrix encoding position-momentum correlations, and demonstrate that symplectic coherence is monotone under relevant operations and robust under small perturbations. Furthermore, using a recent mapping by Barthe et al. (Phys. Rev. Lett. 134, 070604) which relates the covariance matrix of a bosonic state to the density matrix of a finite-dimensional system, we show that position-momentum correlations correspond to beyond-classical correlations in a virtual finite-dimensional quantum state, with symplectic coherence mapping naturally to geometric quantum discord. Taking energy constraints into account, we determine the maximal position-momentum correlations achievable at fixed energy, revealing structural insights about the corresponding optimal states. Finally, we illustrate the operational relevance of symplectic coherence through several examples in quantum information tasks and quantum thermodynamics. In the process, we establish new technical results on matrix norms and quantum covariance matrices, and demonstrate the conceptual significance of viewing covariance matrices as density matrices of virtual quantum states.

24.
arXiv (CS.LG) 2026-06-11

MemNovo: Look Back at the Spectrum for Balanced De Novo Peptide Sequencing from Mass Spectrometry

arXiv:2606.11868v1 Announce Type: new Abstract: De novo peptide sequencing from tandem mass spectrometry is pivotal in proteomics, enabling identification of novel peptides without reference databases. While recent Transformer-based encoder-decoder models have achieved remarkable performance, we uncover a critical pathology in their inference dynamics. Through comprehensive feature scaling experiments, we demonstrate that existing auto-regressive peptide decoders tend to over-rely on generated-sequence priors while progressively under-utilizing fine-grained physical evidence from the input mass spectrum. This phenomenon leads to suboptimal results, where generated peptide sequences are biologically plausible yet not faithful to the input spectrum. To rectify this, we propose MemNovo, a training-free and plug-and-play mechanism that re-balances peptide and spectral contributions at inference time. MemNovo alleviates the information bottleneck by establishing a persistent spectral memory bank and injecting retrieved features directly into the final decoding stage via an ultra-conservative residual connection. Theoretical analysis confirms that this mechanism restores the mutual information between the decoder state and the raw spectrum. Extensive experiments on the Nine Species benchmark with two representative baselines, Casanovo and InstaNovo, demonstrate that MemNovo consistently improves both amino acid precision and peptide precision, achieving up to 39.1% relative improvement in peptide precision for Casanovo and up to 3.9% for InstaNovo, with negligible computational overhead.

25.
arXiv (CS.AI) 2026-06-18

Mitigating Anchoring Bias in LLM-Based Agents for Energy-Efficient 6G Autonomous Networks

arXiv:2606.18272v1 Announce Type: cross Abstract: This paper presents an autonomous agentic resource negotiation framework designed to enable zero-touch network slicing in 6G architectures using Large Language Model (LLM) agents. While LLMs offer powerful reasoning capabilities, we demonstrate that such agents inherently suffer from anchoring bias, rigidly adhering to initial heuristic proposals and causing severe network over-provisioning. To systematically mitigate this cognitive bias, we propose a novel randomized anchoring strategy modeled via a Truncated 3-Parameter Weibull distribution. This mathematically bounded approach seamlessly integrates with burst-aware Digital Twins (DTs) employing Conditional Value at Risk (CVaR) to rigorously guarantee strict Service Level Agreement (SLA) tail-latencies. To validate our methodology, we introduce and prove the Bimodal Constraint-Avoidance Utility Theorem, demonstrating that while feasible negotiations follow classical convex bounds, highly constrained scenarios undergo a phase transition governed by an inverse rational decay envelope. Empirical results generated using a locally hosted 1B-parameter model (\texttt{otel-llm-1b-it}) confirm these dual-regime bounds. Our cognitive de-biasing successfully dismantles rigid negotiation patterns, forcing agents into active exploration to safely ride SLA boundaries and boost system energy savings up to 25\%. Crucially, the lightweight 1B LLM achieves sub-second inference latencies (0.95s mean), ensuring our multi-agent framework is compatible with the operational timescales of the O-RAN non-Real-Time RAN Intelligent Controller (non-RT RIC)\footnote{Our source code is available for non-commercial use at https://github.com/HatimChergui.