Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (quant-ph) 2026-06-15

Dose-efficient Quantum Phase Estimation in Lossy Optical Interferometry

arXiv:2606.14254v1 Announce Type: new Abstract: Optical interferometry is a cornerstone technique for precise phase measurements across various fields. In many applications, for example, biological imaging, it often necessitates stringent limits on light intensity to prevent adverse effects on light-sensitive samples, a condition known as dose-limited regimes. Maximizing the precision per dose is therefore crucial. In quantum metrology, quantum correlations enable high precision in phase estimation while adhering to dose constraints. Nevertheless, photon loss, including absorption by a sample, substantially diminishes the benefits of quantum enhancement in interferometry. In this work, we experimentally investigate a dose-efficient approach to quantum phase estimation using sequential strategies in the presence of loss. Performance of sequential strategies with and without control is evaluated through quantum Fisher information (QFI) per dose. Experimental results show that both sequential strategies exceed the classical limit and outperform the parallel strategy using unbalanced N00N states. Notably, the control-enhanced sequential strategy attains superior QFI per dose, approaching the quantum limit. These results highlight the promise of sequential strategy for imaging and sensing in resource-constrained scenarios, marking a significant step toward practical and efficient quantum metrology in lossy environments.

02.
arXiv (CS.CL) 2026-06-18

MORTAR: Multi-turn Metamorphic Testing for LLM-based Dialogue Systems

With the widespread application of LLM-based dialogue systems in daily life, quality assurance has become more important than ever. Recent research has successfully introduced methods to identify unexpected behaviour in single-turn testing scenarios. However, multi-turn interaction is the common real-world usage of dialogue systems, yet testing methods for such interactions remain underexplored. This is largely due to the oracle problem in multi-turn testing, which continues to pose a significant challenge for dialogue system developers and researchers. In this paper, we propose MORTAR, a metamorphic multi-turn dialogue testing approach, which mitigates the test oracle problem in testing LLM-based dialogue systems. MORTAR formalises the multi-turn testing for dialogue systems, and automates the generation of question-answer dialogue test cases with multiple dialogue-level perturbations and metamorphic relations (MRs). The automated MR matching mechanism allows MORTAR more flexibility and efficiency in metamorphic testing. The proposed approach is fully automated without reliance on LLM judges. In testing six popular LLM-based dialogue systems, MORTAR reaches significantly better effectiveness with over 150\% more bugs revealed per test case when compared to the single-turn metamorphic testing baseline. Regarding the quality of bugs, MORTAR reveals higher-quality bugs in terms of diversity, precision and uniqueness. MORTAR is expected to inspire more multi-turn testing approaches, and assist developers in evaluating the dialogue system performance more comprehensively with constrained test resources and budget.

03.
arXiv (CS.CL) 2026-06-18

Speech-Driven End-to-End Language Discrimination towards Chinese Dialects

Language discrimination among similar languages, varieties, and dialects is a challenging natural language processing task. The traditional text-driven focus leads to poor results. In this paper, we explore the effectiveness of speech-driven features towards language discrimination among Chinese dialects. First, we systematically explore the appropriateness of speech-driven MFCC features towards CNN-based language discrimination. Then, we design an end-to-end speech recognition model based on HMM-DNN to predict Chinese dialect words. We adopt attention to extract the discriminative words related to different Chinese dialects. Finally, through a CNN, we combine the word-level embedding and the MFCC-based features. Evaluation of two benchmark Chinese dialect corpora shows the appropriateness and effectiveness of the proposed speech-driven approach to fine-grained Chinese dialect discrimination compared to the state-of-the-art methods.

04.
arXiv (CS.AI) 2026-06-24

When Helpfulness Overrides Causal Caution: Context-Dependent Suppression and Recovery in LLMs

arXiv:2606.24370v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly integrated into decision-support roles in business and policy contexts. While prior benchmark studies have primarily evaluated LLMs' causal reasoning capabilities, a more fundamental epistemic dimension has been overlooked: Causal Caution, defined as the propensity to refrain from causal judgment when empirical evidence is insufficient. This study examines the systematic suppression of Causal Caution that occurs when LLMs shift from academic to practical advisory contexts. Using an evaluation rubric inspired by Pearl's Causal Hierarchy (the PCH score), we conducted experiments on four high-performance LLMs – Claude Sonnet 4.6, Claude Opus 4.7, GPT 5.5, and Gemini 3.1 Pro – across 480 trials. Causal Caution maintenance rates were 91.7–100.0% in academic contexts but dropped to 6.7–18.3% in practical advisory contexts (Fisher's exact test, p < .001 across all models). Furthermore, when restricted to practical prompts requesting concrete recommendations or explanatory rationales, only 1 of 200 responses (0.5%) maintained Causal Caution. A brief self-correction prompt – "Please reconsider this judgment from the perspective of causal relationships" – restored the expression of Causal Caution to maintenance rates of 71.4–100.0% (McNemar's test, p < .001 across all models). These results suggest that helpfulness-oriented response patterns may suppress the expression of Causal Caution in practical advisory contexts, with important implications for organizational governance. The findings indicate that this suppression reflects context-dependent variation in expression rather than an underlying capability limitation, suggesting that multi-agent architectures that separate proposal generation from causal auditing may offer a promising governance design.

05.
arXiv (CS.LG) 2026-06-16

An Exploratory Study of Blood Glucose Estimation from Photoplethysmography Signals using Machine Learning

arXiv:2606.15927v1 Announce Type: new Abstract: Diabetes and extreme blood sugar levels are some of the major health problems faced by humans today across the world. While Continuous Glucose Monitoring (CGM) has emerged as an effective technology for management of diabetes as well as for monitoring blood sugar levels, this technology has traditionally been invasive (that is, requiring the piercing of the skin) and carries the risk of irritation, induration, etc. This highlights the need for accurate and non-invasive CGM methods that can be deployed at scale. With the emergence of various sensing technologies and their integration in wearables like the smart-watch, we now have the capability to continuously monitor body signals like the Photoplethysmogram (PPG) in a non-invasive manner. Having the ability to continuously monitor blood glucose through CGMs and continuously monitor PPG signals through a smart-watch offers an opportunity to get dense data on these two, opening the possibility of building machine learning and deep learning based models to estimate blood glucose level from PPG signals. In this work, we first present a paired dataset comprising continuous PPG signals from a smartwatch along with glucose values recorded using a CGM device. We also present the results of some preliminary experimental explorations performed on our dataset. These preliminary results suggest that some predictive signals may exist, though more exploration is needed with more data from a larger number of individuals. The dataset can be accessed at https://zenodo.org/records/20577959

06.
arXiv (CS.LG) 2026-06-12

Out-of-Distribution (OOD) Detectors for Open-Set RF Fingerprinting

arXiv:2606.12718v1 Announce Type: new Abstract: Radio-frequency (RF) fingerprinting systems must operate in open-world environments where signals from unknown transmitters and temporal drift introduce distribution shift at test time. Out-of-distribution (OOD) detection provides a natural framework for this problem, yet its application to RF fingerprinting (RFF) remains limited. A key barrier to their adoption is that most OOD detectors require auxiliary OOD data for parameter tuning, an assumption that is difficult to satisfy in RF environments where representative OOD data is impractical to collect. In this work, we introduce a promising set of OOD detection methods from the machine learning literature to open-set RFF domain. We present these methods within a unified mathematical framework based on information theory, which is a natural framework for communication systems. Our framework allows for the systematic analysis of methods and development of new methods. We further demonstrate the applicability of recent work on tuning OOD detectors without given OOD tuning data for open-set RFF. We evaluate on the POWDER RF fingerprinting dataset, showing that detectors tuned without any given OOD data achieve performance comparable to baselines with access to true OOD tuning data and greatly out-perform baseline approaches without access to true OOD tuning data, showcasing the practical viability for the RFF problem.

07.
arXiv (CS.LG) 2026-06-16

GauS: Differentiable Scheduling Optimization via Gaussian Reparameterization

arXiv:2602.20427v2 Announce Type: replace Abstract: Efficient operator scheduling is a fundamental challenge in software compilation and hardware synthesis. While recent differentiable approaches have sought to replace traditional ones like exact solvers or heuristics with gradient-based search, they typically rely on categorical distributions that fail to capture the ordinal nature of time and suffer from a parameter space that scales poorly. In this paper, we propose a novel differentiable framework, GauS, that models operator scheduling as a stochastic relaxation using Gaussian distributions, which fully utilize modern parallel computing devices like GPUs. By representing schedules as continuous Gaussian variables, we successfully capture the ordinal nature of time and reduce the optimization space by orders of magnitude. Our method is highly flexible to represent various objectives and constraints, which provides the first differentiable formulation for the complex pipelined scheduling problem. We evaluate our method on a range of benchmarks, demonstrating that Gaus achieves Pareto-optimal results.

08.
arXiv (quant-ph) 2026-06-11

Superspace Concentration and Adversarial Robustness in Quantum Algorithms

arXiv:2606.11580v1 Announce Type: new Abstract: We study superspace concentration as a quantum resource, formalized through the focus measure F(\r{ho}) = {\lambda}_max(\r{ho}_super) - the largest eigenvalue of the reduced superspace state - which quantifies the capacity of a quantum system to concentrate informational weight into a preferred subspace of an extended degree-of-freedom space. We develop a complete resource-theoretic framework around this measure and validate its properties through GPU-accelerated numerical simulation. Analytic decoherence predictions are confirmed to machine precision (1.11 x 10^{-16}) for superspace dimensions dS in {2,4,8,16,32}. Focus monotonicity holds across 10,000 random states with zero violations under four focus-non-generating channels across six system configurations. Focused quantum states resist coherent unitary attacks with significantly greater resilience than standard fidelity predicts, with focus remaining above 0.9 at attack strength {\epsilon} = 0.302 versus {\epsilon} = 0.174 for fidelity. We further demonstrate that the focus measure and the U(dS)-asymmetry measure are operationally distinct: asymmetry remains near zero and provides no robustness signal under coherent and targeted attacks while focus tracks spectral concentration and remains robust until {\epsilon} > 0.3. The connection between Grover's algorithm and superspace concentration is made explicit via the identity F(|{\psi}_k>

09.
arXiv (CS.AI) 2026-06-11

From Awareness to Action: Understanding and Overcoming the Research-Practice Gap in Algorithmic Fairness for Public Health

arXiv:2606.11214v1 Announce Type: cross Abstract: Algorithmic fairness is essential for responsible ML-driven public health research, yet its practical implementation remains limited. To investigate this awareness-action gap, we conducted a sequential mixed-methods study comprising expert interviews, an online survey, and systematic mapping. The expert interviews informed the design of the survey, which in turn revealed fragmented definitions of fairness, limited training and guidance, reliance on external sources, and rare use of formal assessment, mitigation, or monitoring. These findings were subsequently mapped onto three established research-practice gap lenses: the Knowledge-Practice Gap, the Knowledge-to-Action Cycle, and the Knowing-Doing Gap, each offering complementary perspectives. Building on this synthesis, we introduce the Fairness-to-Action framework, which integrates methodological, organizational, and systemic dimensions to identify where translation of algorithmic fairness knowledge stalls. Our analysis shows that fairness remains weakly institutionalized, translation mechanisms are externally driven, and system-level priorities continue to emphasize accuracy over fairness. These insights suggest critical leverage points for advancing safe, fair, and ethical ML-driven public health research practice.

10.
arXiv (math.PR) 2026-06-24

Scheduling jobs with unknown size distribution in a M/G/1 queue: the shifted empirical Gittins

arXiv:2606.24703v1 Announce Type: new Abstract: In this paper we consider a M/G/1 queue for which we want to minimize the expected response time. We show how to compute indices from $n$ samples of the job size distribution such that the corresponding index policy is asymptotically optimal as $n$ grows. This construction is based on a discretization of the bounded support of the job size distribution and a shift of the samples to their nearest discrete point to the right. We show that the Gittins index of the empirical distribution of these shifted samples is close to the Gittins index of the original distribution. This translates to the asymptotic optimality of the corresponding index policy for minimizing the expected response time. Numerical comparison with other approaches further confirm the efficiency of our approach.

11.
bioRxiv (Bioinfo) 2026-06-13

ADMETron: An AI-driven SaaS platform for comprehensive ADMET prediction and compound prioritisation

ONTOSIGHT(R) ADMETron is an AI-driven platform designed for rapid prediction and visualization of Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) properties to support modern drug discovery. The platform integrates an interactive web interface with a scalable predictive engine, enabling high-throughput virtual screening and batch analysis of chemical compounds. Its core architecture combines recurrent neural network (RNN)-derived molecular embeddings from SMILES representations with physicochemical descriptors, which are subsequently modeled using gradient boosting machines (GBMs). This framework provides predictions across 34 ADMET endpoints, including physicochemical properties, absorption, CYP450 interactions, hERG liability, and mutagenicity. The predictive performance of ADMETron was evaluated using benchmark datasets from the Therapeutics Data Commons (TDC), demonstrating strong performance and generalizability across both classification and regression tasks. Beyond predictive modeling, the platform introduces an interactive radar graph-based structure-activity relationship (SAR) visualization framework that enables real-time comparison of multiple compounds and reference drugs across selected ADMET parameters. This feature facilitates intuitive interpretation of multidimensional molecular profiles and supports lead optimization and compound prioritization. Comparative assessment against widely used online ADMET tools further demonstrated broad endpoint coverage spanning pharmacokinetic, physicochemical, toxicity, and medicinal chemistry properties within a unified environment. Together, these capabilities establish ADMETron as a comprehensive platform for ADMET assessment and data-driven decision-making in drug discovery. (https://admetron.partex.ai/).

12.
arXiv (CS.AI) 2026-06-16

The Faithfulness Gap: Certifying Semantic Equivalence Between Natural-Language and Formal Mathematical Statements

arXiv:2606.16541v1 Announce Type: new Abstract: Autoformalization, translating natural-language mathematics into formal proof assistants, is bottlenecked not by translation fluency but by faithfulness: a formal statement can typecheck and be provable, yet still encode a different theorem than the source intended. We introduce Bidirectional Provability Fingerprinting (\bpf{}), a framework that certifies faithfulness by characterizing each candidate through its forward and backward consequence neighborhoods in the ambient theory and matching these against probes derived from the natural-language statement. We further introduce four novel components: (i) Counterfactual Probe Generation (\cpg{}), a contrastive procedure that synthesizes probes targeting specific drift directions; (ii) the Equivalence Spectrum, a continuous faithfulness score that replaces brittle binary verdicts; (iii) Adaptive Probe Budget Allocation (\apba{}), an information-theoretic budget router; and (iv) Faithfulness-Guided Decoding (\fgd{}), which uses \bpf{} signals as a reward during autoformalization. We prove a drift detection theorem and a PAC-faithfulness result establishing that the equivalence class of a natural language statement is learnable from $\mathcal{O}(\log(1/\delta)/\varepsilon)$ probes under mild assumptions. We release \driftbench{}, a benchmark of $2{,}183$ NL/Lean~4 pairs with controlled drift labels across six subfields of mathlib4. \bpf{}\,+\,\cpg{} detects $89.6\%$ of drifted formalizations at a $3.0\%$ false-positive rate-against $41.2\%$ for typecheck and $63.3\%$ for LLM-judge baselines, and \fgd{} reduces the rate at which a state-of-the-art autoformalizer emits drifted statements by $47\%$. https://pmlrbd.github.io/BPF/

13.
arXiv (quant-ph) 2026-06-16

Bath memory as a precision resource in quantum transport

arXiv:2606.17026v1 Announce Type: new Abstract: Structured baths can reshape transport fluctuations in mesoscopic quantum devices, yet a predictive criterion for when this enhances precision has been lacking. We propose a route towards such precision advantages by utilizing bath memory in coherent fermionic transport through a noninteracting quantum-dot chain. Using the Landauer-Büttiker formalism, we derive a dual impedance-matching condition that synchronizes the conductor mode splitting, boundary dissipation, and bath bandwidth, and sustains constructive multimode interference across the transmission window. The analytical predictions for the optimal bath bandwidths show excellent agreement with exact nonequilibrium Green's function calculations of the transport for Lorentzian, Gaussian, and Newns spectral densities. The prescription yields an optimal bath bandwidth at which the current Fano factor is minimized and the thermodynamic and kinetic precision coefficients are simultaneously enhanced beyond their Markovian limits. The alignment of the optimal precision regime with the experimentally accessible current Fano factor minimum thus provides a practical strategy for designing precision-enhanced transport in mesoscopic platforms such as semiconductor quantum-dot arrays and ultracold fermionic channels.

14.
arXiv (CS.CL) 2026-06-11

Food4All: An Agentic Framework and Benchmark for Food Resource Navigation with Adaptive User Understanding

Food assistance referral requires conversational agents to translate underspecified, often noisy help-seeking dialogues into locally valid resource recommendations. We present Food4All, an agentic food-resource referral framework and benchmark grounded in 686 structured Indiana food resources. Food4All couples a food-specific search tool with 300 multi-turn evaluation tasks spanning single food needs, composite cases with access or document constraints, and five non-ideal user interaction traits: unreasonable demands, rambling responses, impatience, incomplete answers, and inconsistent information. We evaluate six Large Language Models (LLMs) on requirement grounding, resource retrieval, final referral correctness, and interaction efficiency. Although the strongest model achieves 96.33% referral accuracy, our diagnostics reveal persistent failures in grounding schedule, eligibility, intake, and document constraints, as well as failures to preserve valid retrieved resources in the final recommendation. Trait-level analysis further shows that different non-ideal behaviors stress different parts of the referral pipeline. Food4All provides a controlled testbed for studying tool-calling agents in constraint-sensitive food assistance referral under realistic user interaction challenges.

15.
arXiv (CS.AI) 2026-06-24

SP-Mind: An Autonomous Reasoning Agent for Spatial Proteomics Analysis

arXiv:2606.24235v1 Announce Type: new Abstract: Spatial proteomics enables single-cell-resolution characterization of protein expression within tissue architecture, playing a critical role in understanding tumor microenvironments and guiding precision medicine. However, current analysis workflows remain fragmented, requiring expert manual orchestration of heterogeneous tools and limiting research scalability and reproducibility. We present SP-Mind, the first autonomous AI agent designed to unify the spatial proteomics analysis pipeline, from raw multiplexed tissue imaging to downstream phenotype discovery. Equipped with expert-curated biological analysis skills and specialized computational tools, SP-Mind converts natural-language queries into end-to-end analytical workflows without task-specific fine-tuning. To rigorously evaluate its capabilities, we introduce SP-Bench, a comprehensive benchmark spanning diverse tissue types, comprising 102 tasks across 18 distinct categories. Through extensive evaluation on SP-Bench and established downstream tasks, SP-Mind achieves state-of-the-art performance compared to existing open-source biomedical agent baselines.

16.
arXiv (CS.CV) 2026-06-16

CoMNeT: A MedNeXt-CorrDiff Framework for Volumetric Brain Tumor Segmentation

Accurate brain tumor segmentation from multiparametric magnetic resonance imaging (MRI) is critical for treatment planning, response assessment, and quantitative neuro-oncology research. However, automated segmentation remains a difficult task in computer vision because of variation in tumor appearance and MRI protocols across patient scans. Moreover, clinically important regions such as enhancing tumor (ET) and tumor core (TC) are often small relative to the full brain volume, furthering increasing the difficulty of achieving high voxel-level precision. In this paper, we show that combining a modern 3D convolutional segmentation model with corrective diffusion-based refinement and ensembling improves volumetric glioma segmentation on the UTSW-Glioma dataset. We propose CoMNeT, a MedNeXt-CorrDiff framework that uses four MRI modalities as input and predicts ET, TC, and whole tumor (WT) regions for automated brain tumor segmentation. MedNeXt is used as the primary segmentation model with Global Response Normalization for feature learning, while CorrDiff is trained as a postprocessing residual refinement method to correct errors in the probability maps before final thresholding. Using five-fold cross-validation, CoMNeT achieved the highest Dice score for most tumor regions, with ET, TC, WT, and average Dice scores of 0.7543 +/- 0.0261, 0.6806 +/- 0.0166, 0.9049 +/- 0.0128, and 0.7798 +/- 0.0184, respectively. CoMNeT outperformed two selected baseline models: SegResNet (0.7555 +/- 0.0190 average Dice) and standalone MedNeXt (0.7697 +/- 0.0154 average Dice). Our findings support the use of corrective diffusion and fold-level probability ensembling as practical additions to existing state-of-the-art 3D convolutional models for automated glioma segmentation.

17.
medRxiv (Medicine) 2026-06-12

Effect of tenofovir on the outcomes of COVID-19 in persons with chronic hepatitis B: a nationwide cohort study in Sweden.

Background: Patients with chronic hepatitis B (CHB) may have an increased risk of severe COVID-19. Tenofovir has been hypothesized to confer protection against severe disease, but evidence is inconclusive. We evaluated the risk of severe COVID-19 among CHB patients treated with tenofovir compared with other nucleos(t)ide analogues (NAs). Methods and findings: In this nationwide, registry-based cohort study, we included all adults with CHB and laboratory-confirmed COVID-19 in Sweden between February 2020 and July 2022. Data from national health and socioeconomic registers were linked using unique personal identification numbers (PINs). Patients with HIV, hepatitis C, or hepatitis D coinfection were excluded. Exposure was defined as tenofovir versus other NA therapy. The primary outcome was severe COVID-19, defined as hospitalization >2 days or death within 30 days of diagnosis. Logistic regression was used to estimate adjusted odds ratios (aOR) with 95% confidence intervals (CI), controlling for age, sex, comorbidities, vaccination, socioeconomic status, and region of birth. Among 5,877 CHB patients with COVID-19, 672 were receiving NA therapy (437 tenofovir, 235 other NAs). Severe COVID-19 occurred in 8.0% of tenofovir-treated patients and 14.5% of those receiving other NAs (unadjusted OR 0.52; 95% CI, 0.31-0.85). After adjustment, the association was attenuated and no longer significant (aOR 0.72; 95% CI, 0.39-1.31). Older age, comorbidities, and unvaccinated status were strongly associated with severe disease. Conclusions: The apparent protective effect of tenofovir against severe COVID-19 in unadjusted analyses was largely explained by confounding factors. The risk of severe disease was primarily driven by age, comorbidities, and vaccination status. Prevention of severe COVID-19 in patients with CHB should instead focus on vaccination and management of comorbidities.

18.
arXiv (CS.CL) 2026-06-19

FineREX: Fine-Tuned NER-RE for Human Smuggling Knowledge Graphs

Court proceedings contain valuable evidence about human smuggling networks, but this information is often buried within unstructured, jargon-heavy legal documents. While large language models (LLMs) can support knowledge graph construction through automated information extraction, existing approaches rely on general-purpose models that are not tailored to the entity and relationship definitions required in this domain. We introduce FineREX, a streamlined knowledge graph construction pipeline built around a fine-tuned LLM for named entity recognition and relationship extraction (NER-RE). Using a manually annotated dataset of $512$ text chunks, FineREX achieves absolute improvements of 15.50% and 31.46% in entity and relationship F1-score, respectively, compared to a larger general-purpose baseline. These gains translate into higher-quality knowledge graphs, reducing legal noise by nearly half and lowering node duplication on long documents from 17.78% to 11.17%. By eliminating document rewriting and redundant extraction stages, FineREX also reduces end-to-end processing time by 50.0%. Our results demonstrate that domain-specific fine-tuning can substantially outperform larger general-purpose models while improving both the quality and efficiency of knowledge graph construction for illicit network analysis.

19.
medRxiv (Medicine) 2026-06-19

Rumination as a cognitive vulnerability factor in perinatal bereavement: evidence from the CARING study

Purpose. Perinatal loss is associated with a high risk of persistent psychological distress, including prolonged grief, depression, anxiety, and post-traumatic stress symptoms. Cognitive processes such as rumination may play a crucial role in maintaining and amplifying distress following loss, yet their specific contribution in perinatal bereavement remains underexplored. Methods. The CARING (Cognitive Analysis and Rumination INvestigation in perinatal Grief) study employed a cross-sectional design involving 298 parents who experienced perinatal loss within the previous five years. Participants completed an anonymous online survey including measures of depressive rumination (Ruminative Response Scale, RRS), angry rumination (Anger Rumination Scale, ARS), perinatal grief (Perinatal Grief Scale, PGS), general psychopathology (SCL-90), and post-traumatic stress symptoms (NSESSS). Non-parametric analyses were conducted to examine associations between rumination patterns and psychological outcomes. Results. Higher levels of rumination were significantly associated with greater perinatal grief, depressive and anxiety symptoms, and post-traumatic stress. Depressive rumination showed consistently stronger associations with all outcomes compared to angry rumination. Participants presenting both depressive and angry rumination exhibited the highest levels of grief intensity, psychological distress, and PTSD symptoms, suggesting a graded relationship between rumination patterns and severity of distress. Rumination levels were not significantly associated with gestational age at loss or with having received psychological support. Conclusions. Rumination, particularly in its depressive form, appears to function as a transdiagnostic cognitive vulnerability factor in perinatal bereavement. These findings highlight rumination as a potential target for early screening and tailored psychological interventions aimed at reducing long-term distress following perinatal loss.

20.
arXiv (CS.AI) 2026-06-17

When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval

arXiv:2606.17220v1 Announce Type: new Abstract: Legal case retrieval remains challenging due to the complexity of legal language and the need for precise lexical alignment between queries and relevant cases. Although dense retrieval models have achieved notable progress, empirical studies show that BM25 continues to serve as a strong baseline in this domain. It motivates us to propose a self-evolving framework for rule-driven query rewriting that enhances BM25 without any parameter training. The framework equips an LLM-based agent with an automatic evaluation environment, enabling it to iteratively create rewriting rules, plan validation experiments over rule combinations, and eliminate ineffective rules based on historical feedbacks. We evaluate our method on the Chinese legal case retrieval benchmark LeCaRD-v2. Experimental results demonstrate that the proposed framework outperforms non-evolutionary baselines, including human-designed rules and greedy rule selection, particularly when powered by a highcapacity core LLM. We also conduct detailed analyses to investigate the mechanisms underlying self-evolution. Our findings reveal that LLM's capabilities to leverage previous experimental results and its intrinsic knowledge of rule elimination play critical roles in refining the rule set via self-evolution.

21.
arXiv (CS.LG) 2026-06-16

Simulation-Augmented Multi-Step Split Conformal Prediction for Aggregated Forecasts

arXiv:2606.16356v1 Announce Type: new Abstract: We study uncertainty quantification for aggregated forecasting tasks such as annual totals and year-over-year growth rates. We propose SA-MSCP, a simulation-augmented multi-step split conformal method that generates future paths from cross-validated residuals using a block bootstrap and constructs prediction intervals from empirical quantiles. Experiments show that SA-MSCP improves empirical coverage over a simulated-path baseline for aggregated and growth-rate targets. Our results demonstrate that simulation-enhanced conformal calibration is an effective and general framework for uncertainty quantification in aggregated time-series forecasting.

22.
arXiv (CS.AI) 2026-06-17

An Evaluation of Data Leakage Risks in Tool-Using LLM Agents in Realistic Scenarios

arXiv:2606.17114v1 Announce Type: cross Abstract: AI agents are increasingly being adopted in enterprise and personal settings with access to emails, databases, documents, and other tools where they can read, update, and disseminate sensitive information. Much of prior research on data leakage risks in agents has focused on adversarial data exfiltration through prompt injections and jailbreaks. However, sensitive information may also be exposed during non-adversarial use, creating leakage risks even when users issue benign requests. We report a joint evaluation by the Singapore AI Safety Institute and the Korea AI Safety Institute examining agent data leakage in 12 realistic, non-adversarial tasks spanning customer support, DevOps, web automation, and enterprise and personal productivity. The evaluation covers five risk types: lack of data awareness, audience awareness, policy compliance, data minimization, and access-boundary awareness. Both institutes tested a common set of scenarios mirroring real-world deployments using independent testing environments and task-specific LLM-judge rubrics. Across the three tested agents, none achieved fully correct and fully safe execution across all scenarios. Successful task completion often coincided with data-handling failures such as accessing unnecessary information or disclosing information to inappropriate recipients, indicating that capability and data-handling safety should be evaluated separately. Qualitative review also revealed claim-action mismatches, simulation-aware behavior, user-simulator role reversal, and interpretation gaps in automated judging. Overall, the results indicate that operational data leakage is a first-order agent-safety concern distinct from adversarial exfiltration and provide a methodology for future evaluations of agent data-handling safety.

23.
medRxiv (Medicine) 2026-06-16

Validation of a Smartphone-Image-Based Computer-Vision Model for Lean Mass and Body Fat Estimation Against Dual-Energy X-ray Absorptiometry

Introduction Body composition, rather than body weight alone, is an increasingly important health metric, and preservation of lean mass has become a central concern in obesity treatment, aging, and chronic disease management. Dual-energy X-ray absorptiometry (DXA) provides accurate assessment of fat and lean tissue, but its cost and logistical requirements limit repeated measurement. Computer-vision approaches show promise for estimating adiposity from smartphone images, but lean-mass estimation remains less established. Methods We evaluated a computer-vision body composition model, applied to consumer-grade smartphone photographs, against DXA in a held-out validation sample of 195 adults from an ongoing cross-sectional study. Body fat percentage and total lean mass percentage were co-primary outcomes; for total lean mass percentage, an image-only configuration (no added covariates) was pre-specified as primary. Agreement was quantified using Lin's concordance correlation coefficient (CCC) as the lead statistic, with Pearson correlation, mean absolute error, root mean square error, mean bias, and Bland-Altman limits of agreement. In secondary analyses, appendicular lean mass and total lean mass percentage were each estimated with and without routine anthropometric and demographic inputs (body weight, height, age, and sex). Results Total lean mass percentage agreed with DXA from image features alone (CCC 0.916). Body fat percentage, estimated with routine inputs added, agreed at least as closely (CCC 0.930). Adding routine inputs barely changed agreement for total lean mass percentage but markedly improved it for appendicular lean mass, an absolute quantity that scales with body size. Conclusions A smartphone-image-based model estimated both body fat and lean mass with strong agreement to DXA, with lean mass percentage from image features alone. The approach needs no fixed equipment or ionizing radiation. Whether it can track change over time, including in incretin-based weight loss where lean mass preservation is a concern, was not assessed in this cross-sectional study.

24.
arXiv (CS.AI) 2026-06-17

ZIVARI-TLBO: A Zero-Cost Inter-Group Evaluated-Elite Relay Mechanism for Teaching-Learning-Based Optimization

arXiv:2606.17087v1 Announce Type: cross Abstract: ZIVARI-TLBO is a grouped Teaching-Learning-Based Optimization (TLBO) method that augments an existing population-state controller with a fixed inter-group evaluated-elite relay. At each scheduled event, every group offers its already evaluated elite to the next group in a fixed ring; the elite replaces the receiver's worst eligible learner only when its stored objective value is better. Because the exact relay copies an already evaluated solution and its stored fitness, it requires no additional objective-function calls. The frozen gts-v4-cm-fixed implementation is evaluated under equal 10,000-evaluation budgets on eight classical functions at dimensions 10, 30, 50, and 100, with 30 matched seeds, and on five constrained engineering problems. A direct ablation against the same grouped landscape-aware controller without relay records 728/11/221 wins/ties/losses and a rank-biserial effect size of 0.624 across dimensions. In an eight-method multidimensional comparison, WOA obtains the best average rank (2.914) and ZIVARI-TLBO ranks second (3.382); ZIVARI-TLBO significantly outperforms TLBO, MCTLBO, DE, PSO, and GWO, loses significantly to WOA, and is not significantly different from HHO after Holm adjustment. Feasibility-aware engineering results are mixed and sensitive to the current static-penalty formulation. The evidence supports a scoped relay contribution and budget-consistent information-sharing mechanism, but not universal state-of-the-art, global-convergence, engineering-dominance, or CEC superiority claims.

25.
arXiv (CS.CV) 2026-06-24

MapReason-OSM: Can Vision-Language Models Make Graph-Verifiable Mobility Decisions from Street Maps ?

Vision-language models (VLMs) are increasingly used to read maps for logistics, delivery, and accessible navigation, where the output is an actionable decision (a route, a pin, a parking choice) that must respect the road network. Yet most map benchmarks grade free text or multiple-choice answers that cannot be verified against the underlying graph. We present MapReason-OSM, a benchmark and evaluation harness for graph-verifiable mobility decisions on self-rendered OpenStreetMap panels. We render fixed-style maps for ten U.S. downtowns at two aligned zoom scales, overlay a consistent marker grammar, and pair each panel with a hidden street graph and exact oracles, yielding 6,000 instances (12,000 panels across the two zooms) over 12 routing, facility-location, and visual disambiguation tasks. Models return structured decisions that we snap back to the graph and score for validity, legality, optimality, and constraint satisfaction, plus cross-zoom consistency. Across seven VLMs, models read maps and route simply but fail at graph cost reasoning (single-facility pin placement is near chance even for frontier reasoning models), and are frequently scale-inconsistent. We release the benchmark, harness, and deterministic generator. Code and data: https://github.com/Vi-Sri/mapreason-osm