Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
bioRxiv (Bioinfo) 2026-06-18

Benchmarking attention-based methods for vision transformers' interpretability in retinal fundus imaging

Deep learning models based on Vision Transformers (ViTs) have shown strong performance in retinal fundus imaging, but their interpretability remains poorly understood. In particular, attention-based attribution methods are widely used to explain ViT predictions, despite limited evaluation of their faithfulness and biological relevance in medical imaging. Here, we systematically benchmark four attention-based interpretability methods for RETFound, a retinal ViT-based foundation model, that we previously fine-tuned to predict 17 retinal vascular phenotypes from UK Biobank fundus images1. We compare raw attention, attention rollout, gradient-weighted attention rollout, and Chefer's hybrid relevance-based method using both qualitative visualisation and quantitative evaluation frameworks. To assess attribution faithfulness, we perform perturbation-based deletion and insertion experiments, quantifying changes in model predictions as highly attended image regions are progressively removed or restored. To evaluate biological specificity, we run structure-aware analyses combining attribution maps with vessel segmentation and artery-vein labels through the Relative ratio of Attention Intensity (RAI) metric. Across models, attribution maps differed substantially depending on the selected interpretability method, highlighting the need for rigorous quantitative evaluation. Among the evaluated approaches, gradient-weighted attention rollout consistently achieved the strongest perturbation performance and produced attribution maps most closely aligned with the anatomical definition of the predicted retinal traits. Furthermore, vessel-type specific models systematically concentrate attention on the corresponding vascular structures despite being trained using only a single scalar value per image as supervision. These findings demonstrate that attention-based attribution methods capture biologically meaningful vascular representations, while also revealing method-dependent variability in attribution behaviour. This work provides a quantitative framework for evaluating interpretability methods in medical imaging with annotated segmentation and contributes toward more transparent and biologically grounded medical AI systems.

02.
arXiv (CS.CV) 2026-06-24

Training-Time Optical Priors for Wireless Capsule Endoscopy Classification: Hemoglobin-Aware Input Fusion with Cross-Vendor Evaluation

Background. RGB-trained classifiers for wireless capsule endoscopy (WCE) conflate hemoglobin contrast with bile staining and illumination falloff, limiting sensitivity to small-vessel vascular findings such as Lymphangiectasia. We introduce a physics-informed framework that injects an analytic, Monte-Carlo-inspired hemoglobin prior into a standard classifier purely at training time – to our knowledge the first use of an explicit optical light-transport prior in WCE classification. Methods. On Kvasir-Capsule (47,238 frames, 43 patients, 11 evaluable classes; patient-disjoint split) we test, across 6 seeds against an RGB-only EfficientNet-B0 baseline: (i) a 5-channel input-fusion variant feeding the prior P_blood alongside RGB; (ii) a distillation variant that runs on plain 3-channel RGB at inference; and (iii) a three-stream extension adding a temporal Transformer and an autoencoder-residual stream. We replicate across ResNet-18 and ConvNeXt-Tiny and report cross-vendor zero-shot transfer on the public Galar cohort. Results. Input fusion lifts cross-seed macro-AUC 0.760 -> 0.783 (5/6 seeds positive); distillation reaches 0.773; the three-stream model reaches 0.804 (+0.044 over baseline, paired DeLong p < 1e-4). Lymphangiectasia AUC rises 0.238 -> 0.337, sign-consistent across all 6 seeds. A four-variant ablation reveals a parameterization-mechanism boundary: only the spatial-channel form lifts. Cross-vendor zero-shot on Galar retains ~60% of the ConvNeXt-Tiny lift.

03.
arXiv (quant-ph) 2026-06-17

Einstein-Podolsky-Rosen correlations between mechanical oscillators revealed through SU(1,1) interferometry

arXiv:2606.18202v1 Announce Type: new Abstract: Quantum correlations are essential for achieving quantum advantage in computing, communication and sensing. Moreover, their observation challenges and constrains our fundamental understanding of nature. Mechanical oscillators in the quantum regime provide an appealing platform for preparing and investigating quantum correlations at macroscopic scales. Despite substantial progress, however, continuous-variable quantum correlations stronger than entanglement have not yet been observed in this macroscopic regime. Here, we report the experimental observation of continuous-variable Einstein-Podolsky-Rosen correlations between two spatially-separated mechanical oscillators with an effective mass of $\sim 16 \,\mu g$ each. This is achieved by coupling them to a superconducting qubit which allows for engineering a two-mode squeezing interaction when parametrically driven. Crucially, we show that this interaction can be used to witness quantum correlations through the realization of a mechanical SU(1,1) interferometer. Our results expand the toolbox of operations in circuit quantum acoustodynamics and demonstrate that quantum correlations stronger than entanglement can also be observed in macroscopic systems, thereby shedding light on the boundary between quantum and classical regimes.

04.
arXiv (CS.AI) 2026-06-19

Implicit Semantic-Aware Communication Based on Hypergraph Reasoning

arXiv:2606.20162v1 Announce Type: new Abstract: Semantic-aware communication has emerged as a transformative paradigm for next-generation communication systems, shifting the fundamental goal from transmitting bit-level symbols to reliably recovering and understanding the semantic meaning of information. Previous studies have demonstrated that representing the semantic content of source messages as graph-based structures can significantly improve communication efficiency and the accuracy of semantic inference at the receiver. However, existing solutions typically employ graphs that capture only pairwise relationships, thereby neglecting higher-order implicit correlations commonly observed in real-world scenarios, such as group interactions, multi-entity associations, and complex relational contexts. This limitation reduces semantic expressiveness and makes semantic inference susceptible to ambiguity and performance degradation, particularly under noisy or corrupted channel conditions. To address these issues, this paper proposes a novel hypergraph-based implicit semantic reasoning framework, HISR, which leverages hypergraphs to represent complex multi-entity relationships among semantic knowledge entities. In HISR, entities and their associated higher-order relations are mapped into dedicated semantic subspaces tailored to distinct relational contexts. This design not only disentangles diverse semantic interactions to mitigate the over-smoothing effects commonly found in traditional graph embedding methods but also enables robust semantic inference even when partial information loss occurs during transmission. Numerical results show that the proposed HISR achieves up to a 36.6% improvement in implicit semantic interpretation accuracy over the state-of-the-art benchmarks.

05.
arXiv (CS.LG) 2026-06-11

Evaluating and Combating the Impact of Concept Drift on the Performance of Machine Learning-Based Phishing Detection Systems

arXiv:2606.11471v1 Announce Type: cross Abstract: The expansion of the digital domain has resulted in a substantial increase in digital communication, with email emerging as one of the most prominent channels. The proliferation of email communication is apparent in both professional and personal contexts, thereby creating numerous vulnerabilities for malicious actors to exploit. Spam emails, a form of unsolicited correspondence often bearing malicious intent towards recipients, have been an ongoing challenge for email users since the inception of email technology, and this problem has been exacerbated by the growth of the digital landscape. Email spam filters are integral components of email clients, engineered to identify potentially harmful messages and alert users to their malicious content. Phishing, frequently the initial phase of malware-based attacks, is evolving rapidly, with malware becoming increasingly sophisticated over time. A widely adopted approach for detecting malicious activity within malware and spam domains is the application of machine learning. Our aim is to assess the impact of the evolution within the spam email domain on these machine learning-based detection systems and to explore strategies for mitigating associated performance degradation.

06.
arXiv (CS.AI) 2026-06-17

A Machine-Learned Comorbidity Index

arXiv:2606.17450v1 Announce Type: new Abstract: Traditional comorbidity scores (e.g., Charlson and Elixhauser) are widely used for risk adjustment and patient stratification, but they have two key limitations: (i) they are largely mortality-centric and do not align well with other clinical outcomes, and (ii) their linear, rule-based structure cannot capture nonlinear, outcome-specific risk relationships. We propose a Machine-Learned Comorbidity Index (MLCI) that maps diagnosis codes to a single scalar by maximizing the normalized Hilbert-Schmidt Independence Criterion (nHSIC) between the learned score and multiple clinical outcomes. MLCI captures nonlinear risk-outcome dependence and is supported by a theory that characterizes when a unified, informative admission-level ordering can be achieved across outcomes. Empirical results on multiple benchmark electronic health record (EHR) datasets show that MLCI outperforms strong baselines across multiple evaluation metrics.

07.
arXiv (CS.CL) 2026-06-15

OdysSim: Building Foundation Models for Human Behavior Simulation

Large language models are increasingly deployed as human simulators for interactive evaluation and social simulation. Yet helpfulness-driven post-training pulls them toward a homogeneous, overly agreeable assistant register, creating a behavioral Sim2Real gap. We present OdysSim, the largest open systematic investigation of behavioral foundation models, i.e., models trained to simulate human behavior at scale. We propose SOUL, a taxonomy of five capability axes (CONV, SS, COG, ROLE, EVAL) that unifies 62 datasets and 23 benchmark tasks under one framework. Specifically, we curate the OdysSim corpus (21.4M interactions, 10B tokens, retrofitted with back-generated social contexts), construct the SOUL-Index benchmark, and develop an end-to-end training recipe combining midtraining, task-specific RL, and expert distillation. The resulting open 8B OSim model ranks first or tied-first on 8 of 23 tasks, outperforming any individual frontier model by this count, with the strongest gains on conversational and social tasks. Its outputs are also more human-like in length, formatting, and word choice, and it transfers zero-shot to out-of-distribution user simulation on $\tau$-bench, nearly matching real users on reaction alignment (93.2 vs. 93.5). We further show that LLM-as-judge RL induces reward-hacking patterns, and that our detectors can mitigate them during post-training. Together, our findings suggest that behavioral foundation models require rethinking the LLM training paradigm. We release all artifacts to support future research.

08.
arXiv (CS.CV) 2026-06-16

Trusting Right Predictions for Wrong Reasons: A LIME Based Analysis of Deep Learning Interpretability in Lung Cancer Diagnosis

Lung cancer is the leading cause of cancer-related mortality, with approximately 2.5 million new cases and 1.8 million deaths annually, making reliable diagnosis a clinical priority. Although deep learning models have achieved strong performance in lung cancer classification, evaluation has largely focused on predictive accuracy, leaving their decision-making processes insufficiently examined. This study compares three architecturally distinct models: a Convolutional Neural Network (CNN), a pretrained ResNet50, and a Vision Transformer (ViT), trained on the IQ-OTH/NCCD lung cancer CT dataset. Local Interpretable Model-Agnostic Explanations (LIME) were applied to investigate model reasoning. In addition to standard performance metrics, a dual-correlation framework was introduced to measure both prediction agreement and explanation agreement across model pairs. All three models achieved strong classification performance, with ResNet50 attaining 98.61% accuracy, CNN 97.91%, and ViT 93.75%, while all achieved ROC-AUC scores of 0.99. Prediction correlations exceeded 0.99 across all model pairs, indicating highly consistent outputs. However, LIME explanation correlations remained below 0.26, revealing substantial differences in the image regions used to reach those predictions. Analysis of misclassified samples further identified a consistent spatial pattern: incorrect predictions were associated with attention outside the lung parenchyma, whereas correct predictions focused primarily within lung regions. These findings demonstrate that prediction agreement is a poor proxy for reasoning consistency, and that interpretability evaluation must be treated as an independent validation criterion alongside predictive performance in clinical AI systems.

09.
arXiv (CS.CV) 2026-06-16

Automated 3D Kinematic Monitoring for Circadian Activity and Anomaly Detection in Juvenile Fish

Precision aquaculture faces a "phenotyping bottleneck" in tracking high-resolution behavioral traits, as conventional methods cannot quantify instantaneous three-dimensional (3D) physical exertion. To address this, we present a high-throughput 3D behavioral phenotyping framework integrating deep learning object detection with binocular stereo vision for real-time monitoring of juvenile tilapia in high-density environments. The system automates non-contact body length estimation and reconstructs 3D swimming trajectories from absolute spatial coordinates. By eliminating 2D perspective distortions, this approach precisely quantifies 3D velocity and acceleration, marking the first estimation of true physical swimming speeds in free-roaming juveniles. Results show the framework successfully establishes circadian locomotor baselines, serving as an early warning system for physiological stress and providing an objective metric for fish vitality.

10.
arXiv (CS.AI) 2026-06-11

Does the Question Really Matter? Training-Free Data Selection for Vision-Language SFT

arXiv:2603.09715v2 Announce Type: replace Abstract: Visual instruction tuning is crucial for improving vision-language large models (VLLMs). However, many samples can be solved via linguistic patterns or common-sense shortcuts, without genuine cross-modal reasoning, limiting the effectiveness of multimodal learning. Prior data selection methods often rely on costly proxy model training and focus on difficulty or diversity, failing to capture a sample's true contribution to vision-language joint reasoning. In this paper, we propose CVS, a training-free data selection method based on the insight that, for high-quality multimodal samples, introducing the question should substantially alter the model's assessment of answer validity given an image. CVS leverages a frozen VLLM as an evaluator and measures the discrepancy in answer validity with and without conditioning on the question, enabling the identification of samples that require vision-language joint reasoning while filtering semantic-conflict noise. Experiments on Vision-Flan and The Cauldron show that CVS achieves solid performance across datasets. On Vision-Flan, CVS outperforms full-data training by 3.5% and 4.8% using only 10% and 15% of the data, respectively, and remains robust on the highly heterogeneous Cauldron dataset. Moreover, CVS reduces computational cost by 17.3% and 44.4% compared to COINCIDE and XMAS.

11.
arXiv (CS.AI) 2026-06-12

MP3: Multi-Period Pattern Pre-training forSpatio-Temporal Forecasting

arXiv:2606.13119v1 Announce Type: cross Abstract: Spatio-Temporal forecasting is crucial in diverse fields, such as transportation, climate, and energy. Urban spatio-temporal data exhibits temporal mirage: similar short-window inputs have divergent future trends, and vice versa. Existing spatio-temporal graph neural networks (STGNNs) cannot effectively identify such mirages. We argue that the core reason lies in the short-window inputs that have incomplete period observation, heterogeneous global spatial correlation, and cross-period superposition causality. To bridge this gap, we develop a novel Multi- Period Pattern Pre-training (MP3), a plug-and-play pre-training plugin for distinguishing temporal mirages. MP3 presents two core innovations: (1) The multi-period pattern learning is designed to learn multi-period patterns from long time series. Specifically, multi-period temporal modeling leverages edge convolution to identify different multi-period patterns. Multi-period spatial modeling uses a bottleneck project and a global memory bank to capture heterogeneous global spatial relations efficiently. Cross-period pattern interaction employs a causality-enhanced Transformer to capture dependencies across different period patterns. (2) This plugin can seamlessly integrate into existing STGNN backbones to strengthen their forecasting performance. The experiment on five STGNN baselines across five real-world datasets (including a large-scale dataset CA) verify the effectiveness, superior scalability and strong adaptability of MP3, which brings consistent and robust performance improvements across all evaluated baselines. On average, MP3 reduces the MAE 4.7% and the RMSE 5.0%. The code can be available at https://github.com/YAN-outlook/MP3.

12.
arXiv (quant-ph) 2026-06-17

Quantum conditional entropies from convex trace functionals

arXiv:2410.21976v4 Announce Type: replace Abstract: We study geometric properties of trace functionals that generalize those in [Zhang, Adv. Math. 365:107053 (2020)], arising from a novel family of conditional entropies with applications in quantum information. Building on new convexity results for these functionals, we establish data-processing inequalities and additivity properties for our entropies, demonstrating their operational significance. We further prove completeness under duality, chain rules, and various monotonicity properties for this family. Our proofs draw on tools from complex interpolation theory, multivariate Araki–Lieb and Lieb–Thirring inequalities, variational characterizations of trace functionals, and spectral pinching techniques.

13.
arXiv (CS.CL) 2026-06-15

An Empirical Study of Automating Agent Evaluation

Agent evaluation requires assessing complex multi-step behaviors involving tool use and intermediate reasoning, making it costly and expertise-intensive. A natural question arises: can frontier coding assistants reliably automate this evaluation process? Our study shows that simply prompting coding assistants is insufficient for this task. Without domain-specific evaluation knowledge, frontier coding assistants achieve only a 30% execution success rate and produce over-engineered evaluations averaging 12+ metrics per agent, indicating that strong coding ability does not automatically translate to reliable agent evaluation. We introduce EvalAgent, an AI assistant that automates the end-to-end agent evaluation pipeline. EvalAgent encodes evaluation domain expertise as evaluation skills (procedural instructions, reusable code and templates, and dynamically retrieved API documentation) that compose into a trace-based pipeline producing complete evaluation artifacts including metrics, executable code, and reports. To systematically assess generated evaluations, we introduce a meta-evaluation framework alongside AgentEvalBench, a benchmark comprising 20 agents, each paired with evaluation requirements and test scenarios. We further propose the Eval@1 metric to measure whether generated evaluation code both executes and yields meaningful results on the first run. Our experiments show that EvalAgent produces focused evaluations, improving Eval@1 from 17.5% to 65%, and achieving 79.5% human expert preference over baseline approaches. Further ablation studies show that evaluation skills are critical for handling complex evaluation: removing them causes Eval@1 to drop significantly from 65% to 30%.

14.
arXiv (CS.AI) 2026-06-16

CoAgent: Concurrency Control for Multi-Agent Systems

arXiv:2606.15376v1 Announce Type: cross Abstract: Multi-agent LLM systems – coding agents, devops agents, document agents – now routinely run several agents in parallel against the same git tree, Kubernetes cluster, or document. As soon as two of them mutate shared state, they enter the regime classical concurrency control has studied for decades, but classical mechanisms fit LLM agents poorly. A single agent transaction spans minutes of inference, read sets are broad and opaque rather than statically inferable, and the live state agents act on admits neither fork nor buffer, so writes take effect the moment they execute. Locks block long inference intervals; OCC abort-and-retry discards minutes of work on every conflict. This paper builds concurrency control on a capability classical transactions lack: the LLM inside each agent can judge whether a conflicting write invalidates its plan, and can repair exactly the operations that depended on it. Control therefore turns advisory: the runtime informs, the agent repairs. Our protocol, MTPO (Monotonic Trajectory Pre-Order), fixes a serialization order at launch, serves each read the order-filtered value, and applies writes speculatively in place; a one-way notification asks an affected reader to re-judge and patch its plan, while the framework mechanically undoes and reorders misplaced writes through the saga-style inverse each tool registers in advance. At quiescence the run is serializable in the pre-decided order. We realize MTPO as CoAgent, toolcall middleware whose privileged ToolSmith grows footprint-declared, undoable tools online. On ten contended workloads, CoAgent stays within 5\% of serial correctness at a $1.4\times$ speedup and near-serial token cost, where 2PL and OCC surrender nearly all concurrency gains; on a bash-only target system, it grows a 25-tool library online and lifts the task pass rate from 45/71 to 63/71 at $0.80\times$ the time and $0.86\times$ the cost.

15.
bioRxiv (Bioinfo) 2026-06-17

Correcting spatial transcriptomics data affected by a prevalent transcript leakage problem across platforms, species, and tissues

Spatial transcriptomics has been widely applied to study the spatial distribution of cell types, cell states, and specific gene expression in tissue samples. However, we show that there is a prevalent transcript leakage problem in spatial transcriptomics data, where transcripts expressed by a cell diffuse to its neighborhood and are recurrently detected in the nearby cells. By analyzing published data sets, we show that this problem is general across data produced from different tissues and different species using different imaging-based and sequencing-based spatial transcriptomics platforms. It affects both upstream tasks such as expression quantification as well as downstream tasks such as cell-type annotation and detection of spatially-dependent gene expression. To tackle the transcript leakage problem, we propose a reference-free Bayesian model-based method, DeLeakage, which cleans up the data much more effectively than existing denoising methods. DeLeakage also improves cell-type annotation and avoids false detection of spatially dependent expression.

16.
arXiv (CS.CL) 2026-06-16

Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code Interpreter

Reasoning with a Code Interpreter (CI) has emerged as an effective paradigm for enhancing the reasoning capabilities of large language models (LLMs) through executable computation and iterative verification. Despite its growing adoption, the behavioral properties underlying effective code reasoning remain largely underexplored. In this work, we investigate code reasoning from two distinct perspectives inspired by prior studies of natural language reasoning: extrinsic properties, represented by crucial tokens, and intrinsic properties, represented by code-specific cognitive behaviors. Across multiple LLMs, we find that stronger CI reasoning models consistently exhibit a higher prevalence of crucial tokens and cognitive behaviors, particularly verification, backtracking, and backward chaining. Building on these observations, we examine how these properties can be leveraged during both inference and training. At inference time, appending code-specific crucial tokens improves performance on several reasoning capabilities, including mathematical, ordering, and optimization, while yielding limited benefits elsewhere. At training time, augmenting a state-of-the-art framework with code-specific cognitive behaviors improves supervised fine-tuning and reinforcement learning performance in two of three evaluated models. Further analysis shows that these behaviors reduce overthinking in incorrect responses and improve token efficiency, while also revealing factors that limit gains in a certain model. Our findings provide the first systematic characterization of effective reasoning with CI and demonstrate both the potential and limitations of leveraging key properties to improve CI-based reasoning.

17.
arXiv (CS.LG) 2026-06-17

MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization

arXiv:2606.17526v1 Announce Type: new Abstract: Efficient optimization is essential for training large language models. Although intra-layer selective updates have been explored, a general mechanism that enables fine-grained control while ensuring convergence guarantees is still lacking. To bridge this gap, we propose MGUP, a novel mechanism for selective updates. MGUP augments standard momentum-based optimizers by applying larger step-sizes to a selected fixed proportion of parameters in each iteration, while applying smaller, non-zero step-sizes to the rest. As a nearly {plug-and-play} module, MGUP seamlessly integrates with optimizers such as AdamW, Lion, and Muon. This yields powerful variants such as MGUP-AdamW, MGUP-Lion, and MGUP-Muon. Under standard assumptions, we provide theoretical convergence guarantees for MGUP-AdamW (without weight decay) in stochastic optimization. Extensive experiments across diverse tasks, including MAE pretraining, LLM pretraining, and downstream fine-tuning, demonstrate that our MGUP-enhanced optimizers achieve superior or more stable performance compared to their original base optimizers. We offer a principled, versatile, and theoretically grounded strategy for efficient intra-layer selective updates, accelerating and stabilizing the training of large-scale models. The code is publicly available at https://github.com/MaeChd/MGUP.

18.
medRxiv (Medicine) 2026-06-16

Risk beliefs, intensive digital information and demand for a new preventative health product in public clinics: Evidence from an experiment in Zimbabwe.

Demand for preventative health care is weak in low-income settings. In a field experiment in a low-income, high-risk setting, we evaluated whether demand for a new bio-medical preventative health product, offered free at public health clinics, responds to digital feedback-based intensive information on health risks and benefits of prevention along with a clinic referral enabling access to the product. In our sample of women aged 18-24 years, we find a large correction in risk beliefs sustained six months after the intervention. Against a background of very low baseline usage, within six months we find a 5.8 percentage point increase in take up of the prevention method, a level of uptake which is very large relative to the control group. Reassuringly, there is no meaningful difference in up-take amongst baseline high- risk and low-risk individuals.

19.
arXiv (CS.CL) 2026-06-19

Med-R2: Perception and Reflection-driven Complex Reasoning for Medical Report Generation

Automated medical report generation (MRG) is increasingly used to reduce the burden of manual reporting and for decision support. Large vision-language models (LVLMs) hold great promise for automated MRG due to their fine-grained image-text alignment and advanced text-generation capabilities. Currently, state-of-the-art MRGs primarily focus on adapting pre-trained LVLMs with direct supervised fine-tuning (SFT), a fine-tuning strategy with medical image-report pairs. However, several factors limit the performance of these LVLMs. Firstly, direct SFT enables LVLMs to generate medical reports directly without an intermediate thinking process of pathological feature perception and diagnostic reasoning. This causes a potential failure to perceive pathological features and thus leads to misdiagnosis. Secondly, direct SFT lacks the incorporation of radiology-specific knowledge guidance, causing LVLMs to misinterpret perceived pathological features and make incorrect diagnoses. To address these gaps, we propose a novel fine-tuning strategy named Med-R2. We introduce a perception-driven long reasoning process that precedes report generation and incorporates radiology-specific knowledge as guidance. Additionally, to alleviate potential perceptual errors in complex reasoning, a reflection mechanism is introduced to refine the perception of pathological features and the generated report. Our experiments demonstrate that Med-R2 effectively enhances the capability of pathological features perception and diagnosis accuracy for MRG via fine-tuned LVLMs.

20.
arXiv (quant-ph) 2026-06-15

Quantum Entanglement of Bethe States

arXiv:2606.14140v1 Announce Type: cross Abstract: We investigate the quantum entanglement of Bethe states across a family of integrable spin chains, including the XXX$_{\frac{1}{2}}$ model, its higher-spin generalizations (XXX$_s$), and the non-compact $SL(2,\mathbb{R})$ chain. For on-shell eigenstates, we perform a comprehensive scan of the bipartite entanglement entropy across the entire spectrum of finite chains with periodic boundary conditions, and identify the Bethe solutions that minimize and maximize the entanglement. These extremal solutions follow systematic, spin-dependent patterns in the Bethe quantum numbers. In the XXX$_{\frac{1}{2}}$ spin chain, for the antiferromagnetic chain, the state with minimal entropy always coincides with the lowest-energy state (the ground state) within a given fixed-magnon sector. For the higher-spin XXX$_s$ model, however, the lowest-entropy state is not always identical to the ground state, and can even be the state of highest energy. By contrast, the Bethe roots that maximize entropy exhibit considerably more intricate structure. Our analysis further reveals how special Bethe root configurations, such as singular and strange solutions, affect entanglement, and it uncovers characteristic entanglement features in the non-compact $SL(2,\mathbb{R})$ chain that are absent from compact spin chains. For off-shell Bethe states, we develop an optimization algorithm that extremizes the entanglement entropy over rapidity distributions, enabling us to explore the maximum entanglement achievable by a Bethe state without imposing the Bethe ansatz equations.

21.
arXiv (CS.AI) 2026-06-19

Formal Verification of Learned Multi-Agent Communication Policies via Decision Tree Distillation

arXiv:2606.19632v1 Announce Type: cross Abstract: Multi-agent reinforcement learning (MARL) enables agents to develop coordination strategies through emergent communication, but neural policies lack the formal safety guarantees required for safety-critical robotic deployment in drone swarms and autonomous vehicle fleets. We present the first end-to-end framework for safety verification of learned multi-agent communication policies through policy abstraction: neural policies are distilled into interpretable decision trees, then formally verified, with empirical validation confirming that verified safety properties transfer to original networks. Our four-stage pipeline consists of domain-specific feature extraction from agent observations, decision tree distillation achieving 97.9% +/- 1.2% fidelity to neural policies, automated translation to PRISM probabilistic model checker specifications with complete feature-to-state-variable correspondence, and compositional verification of Probabilistic Computation Tree Logic (PCTL) properties via pairwise decomposition with union-bound aggregation and empirical neighbor modeling. Evaluating Vector-Quantized Variational Information Bottleneck (VQ-VIB) policies for multi-drone coordination with 5-7 agents, we verify 18 temporal logic properties across safety, liveness, and cooperation, achieving 88.9% property satisfaction with all five safety thresholds satisfied (0.3% collision probability vs. 1% threshold). Monte Carlo validation of original neural policies confirms that verified safety properties transfer with

22.
arXiv (CS.CV) 2026-06-17

Structure-Aware Text Recognition for Ancient Greek Critical Editions

Recent advances in visual language models (VLMs) have transformed end-to-end document understanding. However, their ability to interpret the complex layout semantics of historical scholarly texts remains limited. This paper investigates structure-aware text recognition for Ancient Greek critical editions, which have dense reference hierarchies and extensive marginal annotations. We introduce two novel resources: (i) a large-scale synthetic corpus of 185,000 page images generated from TEI/XML sources with controlled typographic and layout variation, and (ii) a curated benchmark of real scanned editions spanning more than a century of editorial and typographic practices. Using these datasets, we evaluate three state-of-the-art VLMs under both zero-shot and fine-tuning regimes. Our experiments reveal substantial limitations in current VLM architectures when confronted with highly structured historical documents. In zero-shot settings, most models significantly underperform compared to established off-the-shelf software. Nevertheless, the Qwen3VL-8B model achieves state-of-the-art performance, reaching a median Character Error Rate of 1.0\% on real scans. These results highlight both the current shortcomings and the future potential of VLMs for structure-aware recognition of complex scholarly documents.

23.
medRxiv (Medicine) 2026-06-16

A Poisson Process Life Expectancy framework for optimising patient lifetime during chemotherapy

Cancer therapy balances between two competing objectives - treatment efficacy against the tumour and the risk of treatment related severe adverse events, including patient death. Most existing optimal control theory (OCT) formulations rely on optimising heuristic cost functionals that lack direct clinical interpretability. In clinical practice treatment efficacy and patient tolerability are primarily assessed through survival metrics and adverse event rates. Here we introduce the Continuous Lifetime Payoff (CLP), a novel OCT objective functional that directly links treatment decisions to patient survival. It explicitly incorporates tumour dynamics, tumour eradication, and patient mortality from tumour progression, drug-related toxicity and age. We fit age-related mortality from life tables and infer parameters from simulated survival data. The CLP provides a clinically grounded framework for optimising chemotherapy regimens.

24.
arXiv (CS.CL) 2026-06-17

Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large Language Models (LLMs) for predicting dementia and depression severity from speech samples collected during standardized history taking interviews with 154 German-speaking subjects. We introduce an observer-based Global Depression Scale (GDS-D) aligned with the established Global Deterioration Scale (GDS), enabling parallel global staging of affective and cognitive symptoms. We compare three LLMs (Mistral 3.1, DeepHermes, Qwen3) in two settings: (1) zero-shot prediction and (2) LLM-based feature extraction for Support Vector Regression, using human and pause-enriched transcripts. Results show that LLMs effectively predict depression severity in zero-shot settings (best MAE of 0.60), while dementia assessment benefits substantially from structured feature extraction (best MAE of 0.78), reducing errors by up to 35% over zero-shot baselines. Pause-enriched transcripts achieve competitive performance with human transcriptions, demonstrating the viability of fully automatic screening pipelines for differential neuropsychiatric assessment.

25.
medRxiv (Medicine) 2026-06-22

Clinical-grade Cuffless Blood Pressure Monitoring via Deep-tissue Diffuse Speckle Pulsatile Flowmetry

Blood pressure (BP) is a vital sign which is measured to diagnose and manage hypertension. However, current methods to measure BP use inflatable cuffs which cause discomfort and limit the frequency at which measurements can be made, or intra-arterial catheters which are invasive and pose infection risks. Here, we propose and evaluate the use of Diffuse Speckle Pulsatile Flowmetry (DSPF) as a cuffless BP measurement method to address these limitations. DSPF is a laser speckle-based technique which simultaneously records blood flow rate and blood volume (i.e. photoplethysmography or PPG) signals from relatively deep vascular tissue. Using information from these signals, we studied DSPFs effectiveness in measuring systolic BP (SBP) and diastolic BP (DBP) through an outpatient study in which 133 patients were recruited, and in measuring beat-to-beat BP waveforms through an inpatient study in which two patients were recruited. In the outpatient study, the DSPF method was able to achieve mean absolute errors (MAEs) of 4.17 mmHg and 2.42 mmHg for SBP and DBP respectively compared to conventional cuff-based methods. It was also able to fulfil the requirements of the AAMI/ESH/ISO 81060-2:2018 standard for BP measurement devices and attain an "A" grade according to the British Hypertension Society grading scheme. For the inpatient study, it produced BP waveforms which had MAEs of 2.35 mmHg and 3.06 mmHg compared to arterial-line measurements for the two patients, respectively. Compared to PPG which has been studied more extensively as a cuffless BP measurement method, we found through ablation studies that DSPF was able to reach significantly lower MAEs and hence better accuracies. DSPF augments the performance of PPG-only methods by leveraging additional information from the blood flow rate signal, and we therefore find it to be a superior cuffless BP measurement method which can potentially be used in outpatient, inpatient, and remote settings.