Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-25

Deep Neural Networks with Ordinal Loss for Medical Applications

arXiv:2606.25769v1 Announce Type: new Abstract: In many prediction problems in medical applications, target labels exhibit an inherent ordinal structure, where class ordering reflects clinically meaningful severity levels. The cost associated with misclassification is often non-uniform and asymmetric, as errors between distant ordinal categories may have substantially more severe consequences than errors between adjacent ones, and overestimating disease severity may have different clinical implications than underestimating it. Traditional loss functions such as multi-class cross-entropy treat all misclassifications equally and fail to incorporate this ordering information. Recent advances in ordinal regression aim to address this limitation by integrating rank-based structures into deep learning models. In this work, we introduce the Ordinal Cross-Entropy (OCE) framework, a general and architecture-independent approach for learning from ordinal data. The proposed method extends the standard cross-entropy formulation to account for misclassification severity through an ordinal cost matrix while preserving the probabilistic interpretation and optimization benefits of the conventional loss. We provide a theoretical analysis of the OCE gradient behavior and show that it yields smoother optimization dynamics and improved ordinal consistency. Experiments on benchmark datasets show that our method achieves lower prediction error costs and better calibration compared to existing state-of-the-art ordinal approaches, establishing OCE as a simple yet effective solution for ordinal regression in deep neural networks.

02.
arXiv (CS.CL) 2026-06-25

Sarashina2.2-TTS: Tackling Kanji Polyphony in Japanese Speech Generation via Data Scaling and Targeted Data Synthesis

While large language model (LLM)-based text-to-speech (TTS) systems have achieved high-quality speech synthesis, most existing systems focus on English and Chinese. Japanese, however, remains under-explored, and its unique linguistic challenges, such as widespread context-dependent kanji polyphony, have yet to be adequately tackled. Here we introduce Sarashina2.2-TTS (https://github.com/sbintuitions/sarashina2.2-tts), a Japanese-centric LLM-TTS system that tackles these challenges through a dual approach: data strategy and evaluation methodology. First, we scale training to approximately 361k hours of speech, incorporating a balanced mix of Japanese and English data. Furthermore, we design a targeted data augmentation pipeline covering all 2,136 Joyo (regular-use) kanji designated by Japan's Agency for Cultural Affairs to efficiently address kanji polyphony disambiguation. Second, we introduce the Joyo Kanji Yomi Benchmark (https://github.com/sbintuitions/JoyoKanji-Yomi-Benchmark), covering all 2,136 Joyo kanji and their 4,378 readings. Alongside this benchmark, we propose Kana-CER, a metric that compares synthesized speech against reference readings in the kana space, eliminating orthographic variations to directly measure pronunciation correctness. Experiments demonstrate that our targeted data augmentation significantly improves reading accuracy. Overall, Sarashina2.2-TTS achieves state-of-the-art kanji-level reading accuracy and matches top baselines on general sentence-level pronunciation, while delivering the highest speaker similarity in zero-shot Japanese speech synthesis. Furthermore, cross-lingual evaluation reveals that Sarashina2.2-TTS is the only system that maintains stable Japanese pronunciation regardless of the prompt language, confirming that our balanced training approach improves cross-lingual robustness.

03.
Nature (Science) 2026-06-17

Confined migration induces non-lethal DNA damage in developing neurons

Authors:

Migratory cells tend to have soft nuclei that deform and penetrate narrow spaces1,2. Extensive nuclear deformation during migration can cause nuclear-envelope rupture and DNA damage in cancer cells, which may contribute to malignant transformation during tumour progression3–6. However, the importance of DNA damage in physiological migration is less well understood. Here we demonstrate that the migration of neurons in developing cerebral and cerebellar cortices is accompanied by massive DNA double-stranded breaks (DSBs) due to mechanostress during passage through narrow interstitial spaces. In contrast to many other migratory cells, these DSBs occur without detectable nuclear envelope rupture. Confined migration increases topoisomerase-IIβ covalently bound DSBs, and these lesions are repaired through non-homologous end-joining during brain development without causing cell death. Genome sequencing revealed that DSBs tend to occur at transcriptionally inactive regions. The deletion of ligase IV at the onset of neuronal migration leads to persistent DSB accumulation in cerebellar neurons with moderate transcriptional changes in genes related to synaptic function, neuronal development and stress and immune responses. The mutant mouse develops mild motor deficits in later life, suggesting that the DNA damage generated during normal brain development poses a potential disease risk if left unrepaired. The migration of neurons in developing cerebral and cerebellar cortices is accompanied by massive DNA double-strand breaks due to mechanostress during passage through narrow interstitial spaces.

04.
arXiv (CS.CV) 2026-06-19

S-Agent: Spatial Tool-Use Elicits Reasoning for Spatial Intelligence

Real-world spatial intelligence requires reasoning over a continuous and evolving 3D world, yet existing VLMs and tool-augmented agents largely remain tied to static, stateless inference from isolated visual observations. We introduce \textsc{S-Agent}, a spatial tool-use agentic paradigm for understanding and reasoning over continuous multi-view images and videos. By formulating spatial reasoning as spatio-temporal evidence accumulation rather than isolated frame-level prediction, \textsc{S-Agent} reshapes spatial perception into scene-centric understanding beyond frame-centric recognition. Specifically, \textsc{S-Agent} casts the VLM as a semantic planner that decides what evidence is needed, while a hierarchy of spatial tools and experts grounds objects in 2D, lifts them into 3D geometric evidence, and aggregates this evidence into high-level spatial knowledge (e.g., counting, measurement, orientation, and relative position). Additionally, a temporal memory mechanism, including Scene Memory for maintaining the evolving scene state and Agent Memory for accumulating reasoning context, enables evidence integration across frames and reasoning steps. Comprehensive experiments on multi-view and video spatial reasoning benchmarks show that \textsc{S-Agent} consistently improves both open-source and closed-source VLMs in a training-free manner. Beyond inference-time augmentation, supervised fine-tuning (SFT) on \textsc{S-Agent}-generated spatial trajectories \textsc{S-300K} yields \textsc{S-Agent-8B}, a compact spatial agent that significantly surpasses similar-scale baselines (e.g., Qwen3-VL-8B) and performs comparably to advanced closed-source models (e.g., GPT-5.4 and Gemini 3).

05.
arXiv (quant-ph) 2026-06-25

Sp(2N, R) interferometry in multi-mode Gaussian bosonic systems for optimal metrology and quantum control

arXiv:2606.25768v1 Announce Type: new Abstract: Multi-mode interferometers for bosons in Gaussian states are important systems for quantum metrology with precision beyond the standard quantum limit and for bosonic quantum computing. However, there is a lack of theoretical foundation for generic $N$-mode Gaussian interferometry. In this work, we study quantum metrology and quantum control in multi-mode bosonic systems with quadratic Hamiltonians, exploiting the fundamental Sp$(2N,R)$ symmetry of such interferometers. We show that the optimal quantum control to maximize sensitivity requires aligning squeezing and displacement in the same direction. We propose Sp$(2N,R)$ echo, a multi-mode generalization of the SU$(1,1)$ interferometry, to achieve the sensitivity of phase estimation set by the quantum Fisher information. In addition, we introduce a geometrical means for reversing many-body dynamics with Sp$(2N,R)$ dynamical symmetry, such as dynamics of the bosonic Kitaev chain. Our schemes are readily realizable in optical, atomic, and mechanical platforms.

06.
PLOS Medicine 2026-06-23

Multi-omics biomarkers of endothelial dysregulation preceding chronic lung allograft dysfunction: A prospective cohort study

Authors:

by Giulia Iacono, Christina Begka, Bailey Cardwell, Carmel Daunt, Roxanne Chatzis, Celine Pattaroni, Alana Butler, Matthew Macowan, Bronwyn Levvey, Gregory I. Snell, Glen P. Westall, Benjamin J. Marsland Background Long-term survival of lung transplant recipients remains limited by chronic lung allograft dysfunction (CLAD). CLAD is only diagnosed following a persistent and substantial decline in lung function, after which irreversible damage to the lungs has occurred, limiting opportunities to effectively intervene at an early stage. There is a critical need for earlier detection prior to its clinical manifestation. The immunological drivers of CLAD remain unclear, limiting the development of predictive biomarkers and new therapies. Methods and findings In this hypothesis-generating, prospective cohort study, we profiled the microbial, metabolic, lipidomic, and gene expression dynamics of longitudinally collected broncho-alveolar lavages (BALs) from 56 CLAD-free lung transplant recipients up to 30 months post-transplant, and compared BALs from 13 CLAD-free patients to BALs from 13 patients who developed CLAD. In CLAD-free patients, the first 6 months post-transplant were hallmarked by diminished microbial diversity and increased abundance of Staphylococcus and Candida, coupled with upregulated innate and adaptive immune responses, and elevated nitric oxide metabolism (FDR 

07.
arXiv (CS.CV) 2026-06-17

Mordal: Automated Pretrained Model Selection for Vision Language Models

Incorporating multiple modalities into large language models (LLMs) is a powerful way to enhance their understanding of non-textual data, enabling them to perform multimodal tasks. Vision language models (VLMs) form the fastest growing category of multimodal models because of their many practical use cases, including in healthcare, robotics, and accessibility. Unfortunately, even though different VLMs in the literature demonstrate impressive visual capabilities in different benchmarks, they are handcrafted by human experts; there is no automated framework to create task-specific multimodal models. We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using $8.9\times$–$11.6\times$ lower GPU hours than grid search. We have also discovered that Mordal achieves about 69\% higher weighted Kendall's $\tau$ on average than the state-of-the-art model selection method across diverse tasks.

08.
arXiv (CS.CL) 2026-06-17

Non-Autoregressive Minimum Bayes' Risk Decoding for Fast Speech Recognition

Non-autoregressive (NAR) decoding generates output tokens in parallel, making speech recognition faster than autoregressive decoding, which generates them sequentially from left to right. However, the recognition performance is degraded because NAR decoding cannot resolve uncertainty by conditioning on previously generated tokens. To address this issue, we propose a novel NAR decoding framework based on minimum Bayes' risk (MBR) decoding, termed NAR-MBR decoding, that maximizes the expected utility calculated from samples drawn from the output probability of an NAR model rather than maximizing the output probability. Notably, by leveraging the nature of NAR models, multiple samples are obtained efficiently with a single forward computation. Our experiments across LibriSpeech, Switchboard, AMI, and web presentation corpus demonstrated that our NAR-MBR decoding outperformed previous NAR decoding and ran faster than AR decoding.

09.
arXiv (CS.CL) 2026-06-17

EComAgentBench: Benchmarking Shopping Agents on Long-Horizon Tasks with Distributed Hidden Intent

As LLM-based shopping agents enter production, existing benchmarks fail to capture how a shopper's requirements arrive: stated implicitly in the query, recorded in a profile, or revealed only when the right question is asked. Benchmarks that expose full intent upfront and grade only the final choice can neither pose this long-horizon challenge nor explain which requirement an agent missed. To address this gap, we introduce EComAgentBench, a benchmark of 662 tasks grounded in real Amazon products and reviews. Each task scatters these requirements across a visible query, a tool-gated profile, and scripted clarification; an agent must uncover hidden intent, verify candidates against attributes and review evidence, and commit to a single product within 100 tool calls. Moreover, typed, source-tagged rubrics grade every task, attributing each failure to a requirement and its source. Construction is automated yet reliable, with every answer fixed in code before any text is generated and every sample validated. Our evaluation of seven models reveals that even the strongest attains only 57.1% overall accuracy, and rubric satisfaction degrades from visible to hidden sources. Overall, we believe EComAgentBench will serve as a reproducible foundation for moving shopping agents from single-query search toward dependable assistance over long horizons.

10.
arXiv (CS.CV) 2026-06-12

MaskWAM: Unifying Mask Prompting and Prediction for World-Action Models

World Action Models (WAMs) present a promising paradigm for robotic control via video prediction. However, current WAMs suffer from fundamental spatial bottlenecks: standard text inputs introduce referential ambiguity in cluttered scenes, while unstructured RGB predictions lack semantic grounding and remain biased by task-irrelevant backgrounds. To overcome these limitations, we introduce MaskWAM, an object-centric world-action model. By jointly integrating masks as both explicit inputs and predictions via a unified Mixture of Transformers (MoT), MaskWAM unlocks robust policy generalization. This design provides two key benefits: (1) predicting future masks yields object-centric semantic supervision that suppresses visual noise, significantly enhancing even standard text-conditioned WAMs; and (2) coupling this predictive supervision with first-frame visual prompts, such as target object masks, establishes a precise spatial anchor that substantially reduces language ambiguity. Crucially, as WAMs are inherently vision-driven architectures, direct mask conditioning yields substantially stronger guidance than text alone, establishing a precise and robust paradigm for manipulating unseen objects. Evaluations on LIBERO, RoboTwin, and real-world tasks demonstrate that MaskWAM significantly outperforms baselines in both language-clear and language-ambiguous tasks.

11.
arXiv (CS.AI) 2026-06-17

Temporal Motif-aware Graph Test-time Adaptation for OOD Blockchain Anomaly Detection

arXiv:2605.29526v2 Announce Type: replace-cross Abstract: Ever-evolving transaction patterns have significantly hindered anomaly detection on emerging cryptocurrency blockchains due to the vast number of addresses and diverse anomalous behaviors. Recently, advanced Graph Anomaly Detection (GAD) approaches applied to blockchains have faced two critical challenges: adversarial pattern evolution by malicious actors and the out-of-distribution (OOD) problem caused by varied transaction semantics on blockchains. To address these challenges, we propose a novel framework termed TEmporal Motif-aware Graph Test-Time Adaptation (TEMG-TTA). First, we comprehensively capture the 3-node temporal motif distribution of each active address using an efficient computational mechanism, enabling downstream temporal motif-aware graph learning. Second, we design a simple yet effective test-time adaptation strategy to facilitate the sharing of common patterns between training and testing graphs. Extensive experiments on 5 real-world datasets demonstrate that our proposed TEMG-TTA outperforms state-of-the-art GAD approaches by an average of 54.88\%. A further case study on interpretable motif patterns reveals that TEMG-TTA explicitly characterizes the complex transaction patterns of anomalous addresses, thereby verifying the effectiveness of our technical designs. Our code is publicly available at https://github.com/LuoXishuang0712/TEMG-TTA/.

12.
bioRxiv (Bioinfo) 2026-06-24

BATTLE-AMP: Benchmarking Antimicrobial Peptide Predictors

As antimicrobial resistance outpaces antibiotic development, antimicrobial peptides (AMPs) have emerged as a promising class of alternative antibacterials, and computational predictors are increasingly used to prioritize AMP candidates. Such predictors are typically evaluated on binary AMP/non-AMP classification, which does not test whether they can identify peptides with clinically relevant potency against specific pathogens. We present BATTLE-AMP, a benchmarking framework that evaluates AMP predictors against experimentally measured minimum inhibitory concentrations (MICs) across clinically relevant bacterial species and strains. We surveyed 48 published methods, finding fewer than 25% reproducible, and benchmarked 10 model families (21 variants) using experimental MIC data, synthetic sequence perturbations, activity cliff analyses, and all-atom molecular dynamics (MD) simulations. Four findings emerge: (i) models trained on MIC data outperform binary classifiers regardless of architecture; (ii) the best model depends on the target pathogen, so model selection must be guided by the biological question; (iii) most models cannot distinguish active peptides from inactive sequences with identical amino acid composition; and (iv) activity cliffs remain unresolved by both machine learning and MD, marking a limit of current computational methods. BATTLE-AMP is released as an open Snakemake framework at https://github.com/szczurek-lab/battleamp-snakemake for benchmarking new models and scoring novel candidate libraries.

13.
arXiv (CS.AI) 2026-06-19

SPOT-E: Test-Time Entropy Shaping with Visual Spotlights for Frozen VLMs

arXiv:2606.20244v1 Announce Type: cross Abstract: Vision-language models (VLMs) often underperform on evidence intensive tasks because decisive visual evidence are small, localized, and easy to overlook, leading to failures in evidence readout even when high-level reasoning is intact. Prior inference-time visual interventions can improve grounding without retraining, but they are largely open-loop and lack a mechanism to verify whether highlighted evidence is actually used. We study answer-span prediction entropy as a model-internal feedback signal and show that naive entropy minimization is ambiguous, since low entropy may arise from evidence-grounded confidence or shortcut collapse. To resolve this ambiguity, we introduce low-entropy anchors and an entropy-shaping objective that reduces answer uncertainty while preserving baseline high-confidence tokens. We instantiate this principle in SPOT-E, a plug-and-play test-time method that produces question-conditioned spotlights, optimized per instance via light-weight tuning based on Group Relative Policy Optimization (GRPO). Across all benchmarks and different VLM families, SPOT-E yields consistent gains and improved robustness under visual corruptions. Code is publicly available at: \url{https://github.com/YinBo0927/SPOT-E}

14.
arXiv (CS.LG) 2026-06-17

Multi-Adapter PPO: A Cross-Attention Enhanced Wavelength Selection Framework for LIBS Quantitative Analysis

arXiv:2606.17476v1 Announce Type: new Abstract: Laser-induced breakdown spectroscopy (LIBS) quantitative analysis faces critical challenges in wavelength selection due to high-dimensional spectral data and the fundamental trade-off between prediction accuracy and feature efficiency. This paper presents a novel Multi-Adapter PPO framework that transforms wavelength selection into a reinforcement learning problem, leveraging cross-attention mechanisms and multiple specialized adapters to capture complex spectral relationships. Our approach outperforms traditional Particle Swarm Optimization (PSO) by an average of 28.4\% in comprehensive score and 45.2\% in prediction accuracy across steel and coal datasets. The proposed method demonstrates superior performance in balancing prediction accuracy with feature efficiency, achieving state-of-the-art results in LIBS quantitative analysis while maintaining interpretability and computational efficiency. We released our code and dataset here: https://github.com/Hflying/MAPPO

15.
arXiv (CS.AI) 2026-06-19

FreeStyle: Free Control of Style-Content Dual-Reference Generation from Community LoRA Mining

arXiv:2606.20506v1 Announce Type: cross Abstract: Style-content dual-reference generation aims to synthesize an image that preserves the structure and semantics of a content reference while adopting the style of a separate style reference.Despite recent progress, this setting remains challenging because models must balance content fidelity, style alignment, and instruction following avoiding semantic leakage from the style reference.A key bottleneck is the lack of large-scale triplet data with clean content-style separation and broad long-tail style coverage.In this work, we propose FreeStyle, a scalable dual-reference generation framework based on community LoRA mining.We treat community LoRAs as compositional anchors for style and content, and design a rigorous generation and filtering pipeline to construct large-scale Style-Reference and Content-Reference triplets across multiple base models.To address content leakage, we adopt a two-stage curriculum with stage-specific disentanglement mechanisms: an attention-level enrichment constraint that suppresses style-reference leakage in the style-transfer stage, and a frequency-aware RoPE modulation strategy that targets positional-correspondence-based leakage in the harder dual-reference stage.We also introduce a benchmark covering both style-reference and dual-reference generation, with evaluations on style similarity, content preservation, aesthetics, instruction following, and leakage rejection. The benchmark incorporates a style-invariant Content Alignment Score (CAS) and introduces a calibrated VLM-based Rejection Score for evaluating generation reliability and leakage suppression.Extensive experiments show that our model achieves a strong balance among style alignment, content preservation, and leakage suppression.

16.
arXiv (CS.AI) 2026-06-12

Divination by Prompt: LLM-Mediated Xuanxue on Chinese Social Media

arXiv:2606.12418v1 Announce Type: cross Abstract: The rapid proliferation of large language models (LLMs) has produced a striking cultural practice: using conversational AI for divination. This paper offers one of the first systematic studies of LLM-mediated divination in the context of Xuanxue, an internet-native umbrella term for mystical and spiritual practices on Chinese social media. Using a mixed-methods design, we analyze 23000+ posts and comments from Xiaohongshu and conduct 32 semi-structured interviews with users and professional diviners. Users primarily consult LLMs about pragmatic concerns - romantic relationships, careers, exams, and in-game gacha draws - via two intersecting pathways: trend-driven curiosity enabled by viral visibility and zero-cost access, and event-driven anxiety under conditions of uncertainty. A defining feature is collaborative prompt refinement, which turns users into active prompt engineers. Among commenters expressing a clear stance, perceived efficacy skews positive, with "accuracy" often justified through biographical fit and retrospective confirmation, consistent with Barnum and confirmation bias. Users also develop verification practices such as repeated trials and cross-model comparison. Professional diviners, by contrast, portray LLMs as lacking the "spiritual power" required for genuine divination, reflecting both ontological commitments and economic boundary-work. We also show how participants navigate tensions between scientific and metaphysical frames when interpreting AI-generated readings. Situating these findings in anthropological and cognitive-evolutionary theories of divination, we argue that LLM divination preserves core functions of traditional practice while introducing scalability, repeatability, and prompt-driven co-production that reshape how divinatory authority is constructed and evaluated.

17.
arXiv (CS.LG) 2026-06-25

Approximating velocity fields with planted attractors via Neural-ODEs for classification purposes

arXiv:2606.23550v2 Announce Type: replace-cross Abstract: In this work, Neural ODEs equipped with a curated collection of equilibrium points have been successfully employed for classification tasks. The planted attractors serve as indicators for the target classes, while the velocity field leveraging the universal approximation capabilities of the architecture shapes the dynamical landscape. This process defines the basins of attraction of the trained model, effectively directing each input (provided as an initial condition) toward its corresponding destination target.

18.
medRxiv (Medicine) 2026-06-18

Rare Coding Variants Reveal Distinct Genetic Architectures Across Multidimensional Sleep Phenotypes

Sleep and circadian traits have been widely studied using common variants, but the contribution of rare coding variation remains unclear. We analyzed rare coding variants in 397,065 whole-exome sequenced UK Biobank participants across 36 sleep phenotypes from self-report, diagnoses, sleep medication use and accelerometry, and meta-analyzed results with 171,536 whole-genome sequenced All of Us participants of diverse ancestries, with replication in the Mass General Brigham Biobank (N = 31,275). We identified 260 genes associated with sleep phenotypes, including novel associations with sleep medication use in 29 genes and 24 out of 29 have not previously been reported with any sleep phenotypes. We observed modest but significant rare variant heritability and strong genetic correlations between sleep medication use, insomnia and fatigue. Temporal gene expression trajectory analyses indicate that genes associated with self-reported sleep traits show constant high prenatal expression, whereas genes linked to sleep medication phenotypes exhibit peak expression in the late prenatal period. These findings highlight distinct biological mechanisms captured by different measurement sources of sleep phenotypes and reveal rare-variant-informed targets for therapeutic discovery.

19.
arXiv (CS.CL) 2026-06-25

MedGuards: Multi-Agent System for Reliable Medical Error Detection and Correction

As Large Language Models (LLMs) are increasingly deployed in healthcare settings, accurate error detection and correction in generated or existing text becomes critical, as even minor mistakes can pose risks to patient safety. Existing methods for error detection and correction, including automated checks and heuristic-based approaches, do not generalize well across unseen datasets. In this paper, we propose MedGuards as a medical safety guardrail, which is a new framework that treats medical error detection and correction as a multi-agent in-context learning task. Specialized agents separately detect, localize, and correct errors, while a confidence-guided arbitration mechanism resolves disagreements using reasoning traces and confidence scores. This design enhances interpretability, robustness, and adaptability, without requiring additional training of the base LLMs. Additionally, we introduce the Keyword-Prioritized Correction Score (KPCS), a new evaluation metric that considers whether critical keywords within the reference text are generated correctly, providing a more comprehensive assessment than conventional metrics. Experiments across four multilingual medical datasets consisting of clinical notes demonstrate significant improvements by the proposed framework across several metrics and models. Our aim is to enable safer deployment of LLMs in real-world healthcare applications. For reproducibility, we make our code publicly available at https://github.com/congboma/MedErrBench.

20.
arXiv (CS.AI) 2026-06-17

Offline Preference-Based Trajectory Evaluation

Authors:

arXiv:2606.17541v1 Announce Type: cross Abstract: Offline evaluation of agentic systems often collapses trajectories to terminal success, discarding information about partial progress and inducing widespread ties, creating substantial statistical inefficiency by reducing effective sample size and weakening the ability to distinguish systems. We propose preference-based trajectory evaluation, which compares trajectories directly through temporal preferences over progress and time-to-return profiles. We find that, across diverse agentic and interactive benchmarks, standard success-based metrics produce tied comparisons on roughly 75% of instances, whereas trajectory-aware preferences reduce ties to roughly 35%, improving discriminative power, ranking stability, and data efficiency. Our results suggest that benchmark saturation, often attributed to poor data collection or problem difficulty, may also be explained by the choice of evaluation measure.

21.
medRxiv (Medicine) 2026-06-22

Spatial Analysis and Multilevel Determinants of Hypertension in Zambia: Analysis of the 2017 WHO STEPS Survey

Background: Hypertension is the leading modifiable cardiovascular risk factor globally, with the fastest-growing burden in low- and middle-income countries. This study aimed to estimate national hypertension prevalence, map provincial patterns, assess spatial clustering, and identify individual and community-level determinants among Zambian adults using the 2017 WHO STEPS survey. Methods: This cross-sectional study used data from the 2017 WHO STEPS survey, a nationally representative sample of 4,301 adults aged 18-69 years. Hypertension was defined as systolic BP [&ge;]140 mmHg, diastolic BP [&ge;]90 mmHg, or current antihypertensive use. Spatial autocorrelation was assessed via Moran's I and LISA. Four nested generalised linear mixed models with PSU-level random intercepts identified individual and community-level determinants. Results: Overall weighted hypertension prevalence was 24.0%. Lusaka recorded the highest prevalence (30.2%), followed by Southern (29.9%) and Muchinga (28.3%) provinces; Western Province had the lowest (12.4%). Spatial clustering was statistically significant but modest (Moran's I = 0.0247, p < 0.001). Between-cluster variation reduced from ICC = 5.9% to 1.8% in the full model, indicating geographic differences were largely explained by individual characteristics. Age was the strongest predictor; adults aged 60-69 had nearly sevenfold higher odds than those aged 18-29 (AOR 6.92, 95% CI: 4.95-9.66). Women had lower odds than men (AOR 0.64, 95% CI: 0.52-0.79). Obesity (AOR 2.34), overweight (AOR 1.65), high cholesterol (AOR 1.40), diabetes (AOR 1.35), and single marital status (AOR 1.34) were independently significant. Western Province showed consistently lower odds than Central Province (AOR 0.48). Conclusion: Hypertension affects one in four Zambian adults, driven primarily by age, sex, obesity, dyslipidaemia, and diabetes. Geographically prioritised interventions, including community health worker-led screening programmes in Lusaka and Southern Province, would maximise population-level impact. Population-level salt reduction and alcohol policies represent cost-effective complementary strategies. Longitudinal studies with finer spatial resolution are needed to clarify causal pathways underlying observed geographic clustering and inform SDG Target 3.4 progress.

22.
arXiv (CS.AI) 2026-06-18

Controllable Quantum Memory Capacity in Quantum Reservoir Networks with Tunable partial-SWAPs

arXiv:2605.12713v3 Announce Type: replace-cross Abstract: In the field of quantum reservoir computing (QRC), many different computational models and architectures have been proposed. From these models, we identify feedback-based models – which use a feedback mechanism to re-embed classical measurements from the QRC – and recurrent models – which use a multi-register approach with memory and readout qubits – as the two major competing architectures that have been discussed and validated on hardware. In this paper, we advance upon the recurrent architectures, which employ a two register approach to endow the QRC with a fading memory. While these approaches have been validated on hardware and have demonstrated great real-world performance on noisy-intermediate-scale-quantum (NISQ) quantum processing units (QPUs), the exact mechanism through which the memory capacity arises is not completely understood or fully controllable. With this, we augment the recurrent approaches and present a hardware-realizable mechanism, which we call a tunable partial-SWAP, that allows for the direct control of the rate of memory dissipation from a QRN implemented on a gate-based QPU. The theory behind this mechanism is discussed in terms of a controlled amplitude-damping channel and validation experiments using a randomized short-term memory capacity (STMC) recall benchmark and the NARMA-5 dataset are conducted using simulation and IBM QPUs, respectively.

23.
arXiv (CS.CL) 2026-06-18

LLM Parameters for Math Across Languages: Shared or Separate?

Large language models (LLMs) exhibit substantial cross-lingual variation in mathematical reasoning performance, but it remains unclear whether these differences reflect language-specific parameters or a shared mechanism that manifests differently by language. We present a cross-lingual mechanistic analysis of mathematical reasoning in LLMs, enabling us to localize and compare model parameters that support mathematical reasoning across languages. We find that the extracted math-associated parameters exhibit partial cross-lingual overlap, with the strongest overlap concentrated in intermediate model layers. We further observe that English consistently produces the largest set of math-relevant parameters, whereas lower-resource languages reveal smaller sets of relevant parameters. These results suggest that math-related behavior in multilingual LLMs is neither fully language-invariant nor fully language-specific, but instead exhibits partial cross-lingual parameter overlap with systematic language-dependent differences.

24.
arXiv (CS.AI) 2026-06-15

From Chatbot to Digital Colleague: The Paradigm Shift Toward Persistent Autonomous AI

arXiv:2606.14502v1 Announce Type: new Abstract: Large Language Models (LLMs) are undergoing a fundamental transformation from conversational generators into integrated AI systems capable of reasoning, action, memory, and self-improvement. We conceptualize this transition as a shift from Chatbot to Digital Colleague: from conversational answers to persistent work. We organize this transition along two tightly coupled dimensions. First, at the cognitive core level, LLMs are advancing from Chatbot-era "fast thinking" systems driven by next-token prediction toward Thinking LLMs that leverage inference-time computation, Chain-of-Thought reasoning, reflection, process supervision, and reinforcement learning to support more deliberate and reliable cognition. Second, at the tool-augmented task execution level, LLMs are progressing from tool-calling Agents that invoke external resources in an ad hoc manner toward OpenClaw-style workstation systems (OpenClaw) equipped with persistent Workspaces, skills, verification loops, and governance. The "Workspace + Skill" paradigm makes episodic tool use colleague-like via state persistence, reusable procedures, task closure, and experience reuse. We examine data construction shifts from instruction-response pairs to State-Action-Observation trajectories and evaluation from static benchmarks to sandboxed, auditable, self-evolving AI ecosystems.

25.
arXiv (CS.CV) 2026-06-18

Posterior Continuation with Noise-Conditioned Frequency Exposure for Diffusion Inverse Problems

Diffusion posterior sampling solves inverse problems by combining a pretrained diffusion prior with measurement-consistency guidance. However, full-band guidance can be unreliable at high noise levels, where clean estimates contain score-induced errors and high-frequency measurement directions are weakly identifiable. We argue that posterior guidance should expose measurement frequencies according to the instantaneous diffusion noise level. Based on this principle, we propose a posterior continuation framework that constructs a family of intermediate posteriors whose likelihood emphasizes currently reliable frequency bands and gradually returns to full-band consistency. We instantiate this framework with a stabilized sampler that combines a diffusion predictor, frequency-limited likelihood refinement, and a Haar-domain commitment rule that commits reliable coarse corrections while deferring weakly identifiable details. Across super-resolution, inpainting, and deblurring, our method achieves competitive-to-state-of-the-art restoration performance, including up to 5 dB PSNR improvement on motion deblurring over strong baselines in evaluations on FFHQ and ImageNet.