Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-16

AthDGC: An Open Diachronic Greek Treebank with Indo-European Parallels

AthDGC ("Athens-PROIEL") is an open, end-to-end workflow and dataset. It is, to the best of our knowledge, the first openly licensed dependency-parsed treebank of Greek that spans eight diachronic periods, namely Archaic, Classical, Koine, Late Antique, Byzantine, Late Byzantine, Early Modern, and Modern Greek, under a single PROIEL XML 2.0 schema, with verse-level cross-alignment of the New Testament to Latin (Vulgate), Gothic (Wulfila), Old Church Slavonic (Marianus), and Classical Armenian. AthDGC builds on the PROIEL Treebank Family (Haug and Johndal 2008; Eckhoff et al. 2018), which established the schema and the Koine-Greek reference set for the project. Annotation uses the Stanford Stanza PROIEL-trained workflow; sentence-level alignment uses LaBSE, a multilingual sentence-embedding model; word-level alignment uses multilingual-BERT attention through the AwesomeAlign procedure. The v0.4 release provides curated samples and the open-source toolkit; the full annotated corpus partitions remain under v0.5 audit on the Greek national HPC. Quantitative scale, per-witness verse counts, and per-period annotated-row counts are reported in the v0.5 release notes, after the audit pass completes. Concept DOI: 10.5281/zenodo.20439182.

02.
arXiv (CS.LG) 2026-06-16

False Sense of Safety in Selective Signal Classification: Auditing Bound Tightness and Exchangeability for Risk Control

arXiv:2606.15153v1 Announce Type: new Abstract: Selective prediction with distribution-free risk control promises that, with confidence 1-delta over the calibration draw, the error rate of accepted inputs stays below a user budget alpha. We audit this promise on signal-domain detectors – machine anomalous-sound detection (ASD) and AI-generated-image forensics – for four calibration rules: uncertified empirical thresholding (NAIVE) and certified Hoeffding, Clopper-Pearson (CP), and betting (WSR) upper confidence bounds. We report three findings. (i) NAIVE thresholding, common in practice, exceeds its declared budget in 49-73% of synthetic trials (n=200 calibration points) and in up to 68% of real-data splits: a false sense of safety rather than a broken theorem, since the rule never had a certificate. (ii) Tightness matters: CP and WSR certify substantial coverage where Hoeffding certifies none, with zero observed budget overruns under exchangeable splits. (iii) Under grouped deployment (unseen machine types or generators), certified rules overrun in 9-30% of trials – far above delta – showing the failure lies in the broken exchangeability premise, not in the bounds; a conservative per-group threshold restores validity at a severe coverage cost.

03.
arXiv (math.PR) 2026-06-24

Gradient Mean-Field Dynamics with Measure-Valued States: Well-Posedness, Chaos, and Long-Time Stability

arXiv:2606.24385v1 Announce Type: new Abstract: We study a stochastic mean-field interacting particle system whose state space is $\Y = \Tt^d \times \cP(U)$, where the first component represents a spatial variable and the second one is a probability measure over a compact metric space $U$. The dynamics are driven by locally Lipschitz drift operators: the spatial component evolves according to a Brownian diffusion, while the measure-valued component is perturbed by a projected cylindrical noise acting in the Arens–Eells space. We first establish existence and uniqueness of strong solutions for both the $N$-particle system and the associated nonlinear McKean–Vlasov equation under locally Lipschitz and linear growth assumptions on the drift coefficients. We then prove propagation of chaos: as $N\to\infty$, the empirical measure converges in expectation in Wasserstein–1 distance towards the unique McKean–Vlasov solution. Further, we investigate exponential convergence of the nonlinear McKean–Vlasov dynamics towards a unique invariant measure.

04.
Nature Medicine 2026-06-08

Apitegromab for lean mass preservation during tirzepatide-induced weight loss: a randomized, double-blind, placebo-controlled phase 2 trial

Loss of lean mass in proportion to total weight loss is observed with incretin mimetic therapies such as tirzepatide and has the potential to adversely affect health and function. Apitegromab is an investigational, fully human monoclonal antibody that selectively inhibits myostatin activation and is, thereby, capable of increasing muscle mass. In the randomized, double-blind, placebo-controlled phase 2 EMBRAZE study, adults with overweight or obesity (n = 102) were randomized 1:1 to receive tirzepatide plus apitegromab (10 mg kg−1) or tirzepatide plus placebo. At week 24, apitegromab resulted in a least square mean (80% confidence interval (CI)) of 1.9 (1.2−2.7) kg less lean mass loss than placebo (P = 0.001), despite similar total body weight loss between groups, representing a 54.9% retention of lean mass relative to placebo. In participants receiving apitegromab, trough concentrations of apitegromab and total latent myostatin, a pharmacodynamic marker, both increased over time and reached a plateau after approximately 16 weeks. Incidence of adverse events (AEs) (% (95% CI)) was generally similar across apitegromab-treated participants and placebo-treated participants, with 39 of 51 (76% (63−86%)) and 36 of 51 (71% (57−81%)) participants experiencing an AE, respectively. Serious adverse events (SAEs) were balanced and experienced by one of 51 (2% (0−10%)) participants in each arm. In summary, this proof-of-concept study demonstrated that selective targeting of myostatin by apitegromab was well tolerated and effective in preserving lean mass when combined with tirzepatide. ClinicalTrials.gov identifier: NCT06445075 . In the phase 2 EMBRAZE study, participants receiving tirzepatide and apitegromab lost less lean mass compared to participants receiving tirzepatide and placebo.

05.
arXiv (math.PR) 2026-06-11

Sample Path Properties of the Fractional Wiener–Weierstrass Bridge II

arXiv:2606.11994v1 Announce Type: new Abstract: Fractional Wiener–Weierstrass bridges are a class of Gaussian processes obtained by replacing trigonometric functions in the construction of classical Weierstrass functions by fractional Brownian bridges. A number of their sample path properties were derived in Schied–Zhang (2024,2026). The analysis in these papers left several open questions, most of which are addressed here. Specifically, we prove that, in the regime in which the Weierstrass mechanism dominates the underlying fractional Brownian bridge, the limiting $b$-adic variation coefficient has an absolutely continuous distribution and is therefore genuinely random. At the critical point between the two roughness regimes, we establish the power-variation formula and the critical $\Phi$-variation limit conjectured in Schied–Zhang (2024). Finally, we derive the Hausdorff dimension for the graphs of the sample paths by proving a conjecture from Schied–Zhang (2026) for the missing high-Hurst case.

06.
arXiv (CS.CV) 2026-06-16

FUSE: Quantifying Uncertainty in Vision-Language Models by Bayesian Fusing Epistemic and Aleatoric Uncertainty

Vision-language models (VLMs) are playing an increasingly important role across multiple domains. In many applications, such as robotics, it is crucial to quantify the uncertainty in the output of these models. } We develop FUSE, a probabilistic framework for capturing two complementary sources of uncertainty in vision-language modeling: (i) aleatoric embedding-level uncertainty derived from input data vision-language ambiguity, and (ii) epistemic model-level uncertainty estimated from the semantic response diversity of VLMs. Our approach formulates a Bayesian fusion mechanism that analytically combines these uncertainty sources to produce a scalar measure of uncertainty. This measure can be used to reliably predict the model's output correctness for downstream applications. We demonstrate that our method outperforms baselines and achieves SOTA uncertainty calibration.

07.
arXiv (CS.CL) 2026-06-11

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce malicious code. Meanwhile, Grammar-Constrained Decoding (GCD) has been widely adopted to improve the reliability of LLM-generated code by enforcing syntactic validity. In this paper, we reveal a counterintuitive risk: this reliability-oriented technique can itself become an attack surface. We uncover a new jailbreak attack, termed CodeSpear, that exploits GCD to induce LLMs into generating malicious code. Our experiments show that simply applying a benign code grammar constraint can effectively jailbreak LLMs. To address this vulnerability, we propose CodeShield, a safety alignment approach that robustly preserves safe behavior even under attacker-controlled grammar constraints. CodeShield aligns the model in the code modality by teaching it to generate honeypot code under GCD. Such code is semantically harmless, so it does not implement the malicious request, and structurally diverse, so it is difficult to suppress through grammar tightening. At the same time, CodeShield still preserves natural-language refusals when natural language is available. Experiments on 10 popular LLMs across 4 benchmarks show that CodeSpear outperforms representative jailbreak baselines and increases the attack success rate by more than 30 percentage points on average. CodeShield also restores safety under CodeSpear while preserving benign utility. Our findings reveal a fundamental risk of GCD and call for greater attention to its potential security implications.

08.
arXiv (CS.AI) 2026-06-16

MUZZLE: Adaptive Agentic Red-Teaming of Web Agents Against Indirect Prompt Injection Attacks

arXiv:2602.09222v2 Announce Type: replace-cross Abstract: Large language model (LLM) based web agents are increasingly deployed to automate complex online tasks by directly interacting with web sites and performing actions on users' behalf. While these agents offer powerful capabilities, their design exposes them to indirect prompt injection attacks embedded in untrusted web content, enabling adversaries to hijack agent behavior and violate user intent. Despite growing awareness of this threat, existing evaluations rely on fixed attack templates, manually selected injection surfaces, or narrowly scoped scenarios, limiting their ability to capture realistic, adaptive attacks encountered in practice. We present MUZZLE, an automated agentic framework for evaluating the security of web agents against indirect prompt injection attacks. MUZZLE utilizes the agent's trajectories to automatically identify high-salience injection surfaces, and adaptively generate context-aware malicious instructions that target violations of confidentiality, integrity, and availability. Unlike prior approaches, MUZZLE adapts its attack strategy based on the agent's observed execution trajectory and iteratively refines attacks using feedback from failed executions. We evaluate MUZZLE across diverse web applications, user tasks, and agent configurations, demonstrating its ability to automatically and adaptively assess the security of web agents with minimal human intervention. Our results show that MUZZLE effectively discovers 44 new attacks on 4 web applications with 10 adversarial objectives that violate confidentiality, availability, or privacy properties across different LLMs and agent scaffolds. MUZZLE also identifies novel attack strategies, including 3 cross-application prompt injection attacks and an agent-tailored phishing scenario.

10.
arXiv (CS.AI) 2026-06-12

The Internet of Agentic AI: Communication, Coordination, and Collective Intelligence at Scale

Authors:

arXiv:2606.12835v1 Announce Type: cross Abstract: The rapid emergence of autonomous AI agents is transforming artificial intelligence from isolated model inference into distributed systems of reasoning, communication, and action. This paper develops the vision of the Internet of Agentic AI (IoAI): an open ecosystem in which heterogeneous agents discover one another, negotiate responsibilities, exchange context, invoke tools, and execute workflows across cloud, edge, device, organizational, and cyber-physical environments. We synthesize foundations from single-agent agentic AI, multi-agent systems, distributed computing, communication networks, game theory, and security engineering to characterize the architectures and mechanisms required for scalable agent ecosystems. The paper examines agent deployment models, workflow lifecycles, communication protocols, interoperability layers, resource-management challenges, and trust architectures, with case studies in adaptive manufacturing and distributed operational coordination. The resulting framework highlights the central research challenges of controlled emergence, semantic interoperability, secure identity, incentive-compatible coordination, resource-aware orchestration, and governance for large-scale networks of autonomous agents.

11.
arXiv (CS.AI) 2026-06-24

Average Rankings Mask Per-Subject Optimality: A Friedman-Nemenyi Benchmark of EEG Motor-Imagery BCI Decoders

arXiv:2606.24394v1 Announce Type: cross Abstract: Electroencephalography (EEG) is the dominant non-invasive modality for brain-computer interfaces (BCIs), yet reliable decoding of motor imagery is hampered by inter- and intra-individual variability. A recurring claim is that one decoding pipeline, most often a spatial or Riemannian method, is broadly preferable. We test the weakest version of that claim under the most favourable conditions. Using the Mother of All BCI Benchmarks (MOABB) framework, we evaluated 1,056 decoding configurations (feature extractor x scaler x classifier), >340,000 subject-level model fits, across three public left-versus-right motor-imagery datasets (PhysionetMI, 109 participants; Cho2017, 52; Zhou2016, 4) and two frequency bands (8-15 Hz, 8-30 Hz). Every model is fit and tested within a single session of a single participant, the easiest regime, giving every pipeline its best chance. We apply the statistics standard for multi-classifier comparison: Friedman omnibus tests, Nemenyi critical-difference analysis and Wilcoxon signed-rank tests with effect sizes. Covariance tangent-space projection (cov-tgsp) and Common Spatial Patterns (CSP) are the strongest families, but their ordering is dataset-dependent and, on the largest and most heterogeneous cohort (PhysionetMI), statistically indistinguishable (Nemenyi p = 0.27; Kendall's W = 0.11). At the individual level the single best pipeline is optimal for only 35% of PhysionetMI participants, and nonlinear descriptors are best for roughly one third; matching pipeline to participant adds about seven accuracy points over the best fixed choice. The ranking is not an artefact of dimensionality, and classifier and scaler choices are secondary to the feature representation. Even in the easiest regime, no single pipeline dominates: a lower bound on the personalization problem and a quantitative case for participant-aware model selection rather than a universal decoder.

12.
arXiv (CS.AI) 2026-06-12

Exploring How Agent Voice Accents Shape Human-AI Collaboration in K-12 Group Learning

arXiv:2606.12805v1 Announce Type: cross Abstract: Collaboration is widely recognized as a cornerstone of 21st-century education, yet teachers still encounter persistent challenges in fostering productive peer interaction. LLM conversational peer agents introduce new possibilities for mediating in-person group work, raising questions about how persona design, particularly their voice characteristics, shapes learners' perceptions, trust, and interactional dynamics. While prior work has examined agent accent effects in one-to-one settings, little is known about how these effects manifest in groups. We conducted a between-subjects mixed-methods study with 33 teachers examining how a GenAI voice agent with different accents (British, Indian, and African American) influenced collaboration and agent perception. Across surveys, group interaction analyses, and artifacts, we find that accent shaped participants' mental models and the roles the agent assumed in group interaction. The British-accented agent was largely treated as a tool and engaged in detached, utility-based ways, whereas Indian- and African American-accented agents were more readily anthropomorphized and integrated as peers. These role expectations influenced trust, engagement, and reliance over time. This work advances understanding of how GenAI's sociolinguistic design features shape group dynamics in CSCL, with implications for designing culturally inclusive AI partners in group learning.

13.
arXiv (CS.CL) 2026-06-24

Age of LLM: A Strategic 1v1 Benchmark for Reasoning, Diplomacy and Reliability of Large Language Models under Fog of War

Authors:

We introduce Age of LLM, a turn-based 1v1 benchmark in which two LLMs face off on a 13x7 grid to destroy the enemy base. Three stressors are deliberate: fog of war, full diplomacy (messages, ceasefires, ultimatums; uranium kept secret), and a reliability dimension where every turn must follow a strict JSON schema and an illegal action is silently discarded. The engine is private and each match uses a fresh random map seed and opponent, mitigating the data contamination that affects public benchmarks. Models receive a (near) rule-only prompt with no build-order advice (two tactical seed phrases were present during data collection; see Section 2.7). We benchmark 15 reasoning models across 54 matches and 5,258 actions. Findings: (1) the nuclear rush dominates (78% on the rules-coherent v0.11+ sub-corpus; 85% corpus-wide) with a sole-launcher signature that is largely mechanical under secret-simultaneous launch rules, not a cognitive deterrence failure; (2) military conquest is rare but faster (12.3 vs 18.9 turns); (3) diplomacy is prolific yet almost never consummated; (4) ~58% of illegal actions are fog/state errors, making the illegal-action rate a measure of belief-tracking; (5) – the least established, and the only one we label exploratory – a weak link associates reliability with winning. The corpus is small, unbalanced and not side-swapped, so the ranking is a preliminary descriptive view, not a contribution. Beyond ranking, the turn-by-turn traces of actions and messages make the corpus a lens on how LLMs reason under adversarial uncertainty – their belief-tracking, spontaneous deception, and per-model cognitive "personas" – which we frame as a future research direction. We release the replay format, an isometric viewer and all replays; engine source on request.

14.
medRxiv (Medicine) 2026-06-17

LLM-Driven Extraction of NI-RADS and Imaging Tumor Characteristics to Enhance Oropharyngeal Cancer Survivorship Surveillance

Abstract Purpose Radiologic surveillance is essential for oropharyngeal cancer (OPC) survivors, guiding recurrence detection and follow-up strategies. The Neck Imaging Reporting and Data System provides a standardized framework for post-treatment risk reporting at both the primary tumor site (pNI-RADs) and cervical lymph nodes (nNI-RADS). Comprehensive surveillance additionally requires assessment of disease status, including the primary tumor, nodal involvement, and distant metastases. These clinical results are often embedded as unstructured data within free-text radiology reports. We hypothesized that a large language model (LLM) can reliably extract NI-RADS score criteria and summarize key imaging features from unstructured radiology text, achieving high concordance with expert review. Methods Previously untreated OPC patients who received definitive cancer therapy were identified. Eligible imaging reports included post-treatment head and neck CT, MRI, or FDG PET/CT scans containing narrative and impression text. Examinations lacking narrative or impression text, containing pre-existing NI-RADS annotations, or involving non-surveillance imaging modalities were excluded. A total of 200 reports were randomly selected from 7,076 eligible examinations for manual abstraction using a three-reviewer consensus framework to establish a reference dataset. Using the Palantir Foundry Pipeline Builder, a GPT-5-based LLM was deployed to extract pNI-RADS and nNI-RADS scores, and key imaging features of disease status from these reports. Performance was evaluated using exact agreement and F1-based metrics. Results Agreement for no evidence of disease (score of 1) was 93.3% (126/135; F1 = 0.94) and 90.3% (130/144; F1 = 0.93) for pNI-RADS and nNI-RADS, respectively. For NI-RADS [≥]2, exact category agreement was 73.1% (38/52; macro-F1 = 0.75) for pNI-RADS and 64.3% (27/42; macro-F1 = 0.56) for nNI-RADS. Quadratic weighted {kappa} was 0.81 and 0.59, respectively. For post-treatment disease surveillance variables, agreement was 94.9% (149/157; F1 = 0.87) for primary tumor presence, 89.1% (164/184; F1 = 0.87) for nodal disease presence, and 94.7% (126/133; F1 = 0.70) for distant metastasis detection. Specificity was high across disease-status variables (0.95-0.99), with negative predictive values of 0.95 for primary tumor, 0.87 for nodal disease, and 0.99 for distant metastasis. Conclusions Our LLM-based information retrieval and classification approach for radiographic treatment response from unstructured, multidimensional imaging reports achieved high performance for disease exclusion and moderate performance for detecting suspected residual and/or new disease. This pipeline supports scalable and standardized surveillance data capture for longitudinal monitoring, clinical analytics, and survivorship research in head and neck oncology.

15.
arXiv (quant-ph) 2026-06-12

Driven-dissipative entanglement of distant giant atoms

arXiv:2606.13375v1 Announce Type: new Abstract: Quantum interconnects distribute entanglement via controlled light-matter interactions for quantum computing and sensing applications. Many entanglement generation schemes use coherent, reversible interactions that require precisely calibrated pulses to execute. In contrast, driven-dissipative protocols use a continuous-wave drive in the presence of correlated dissipation to stabilize entanglement in protected (dark) states. However, the same dissipation that generates the entanglement also limits its utility once the stabilization protocol ends. Here, we engineer a superconducting system of two giant artificial atoms coupled sequentially to a waveguide, with tunable individual and correlated dissipation enabled by interference between coupling points. Continuously driving the atoms through the waveguide exploits correlated dissipation to generate remote entanglement. We then tune the qubit frequencies in situ to suppress individual dissipation and thereby preserve the entanglement, achieving a Bell-state fidelity F = 0.89 +/- 0.02. This demonstration indicates that the driven dissipation of giant atoms is a viable approach for distributing entanglement across quantum networks.

16.
Nature (Science) 2026-06-12

Daily briefing: How Venus flytraps snap shut

Authors:

Softening cells enable flytraps to shut with astonishing speed. Plus, the cutting-edge science happening at the World Cup and why scientists shouldn’t ignore the Pope’s AI message. Softening cells enable flytraps to shut with astonishing speed. Plus, the cutting-edge science happening at the World Cup and why scientists shouldn’t ignore the Pope’s AI message.

17.
arXiv (CS.LG) 2026-06-19

Improved Stochastic Optimization of LogSumExp

arXiv:2509.24894v4 Announce Type: replace-cross Abstract: The LogSumExp function, dual to the Kullback-Leibler (KL) divergence, plays a central role in many important optimization problems, including entropy-regularized optimal transport (OT) and distributionally robust optimization (DRO). In practice, when the number of exponential terms inside the logarithm is large or infinite, optimization becomes challenging since computing the gradient requires differentiating every term. We propose a novel convexity- and smoothness-preserving approximation to LogSumExp that can be efficiently optimized using stochastic gradient methods. This approximation is rooted in a sound modification of the KL divergence in the dual, resulting in a new $f$-divergence called the Safe KL divergence. Our experiments and theoretical analysis of the LogSumExp-based stochastic optimization, arising in DRO and continuous OT, demonstrate the advantages of our approach over existing baselines.

18.
arXiv (CS.AI) 2026-06-16

PISA: A Pragmatic Psych-Inspired Unified Memory System for Enhanced AI Agency

arXiv:2510.15966v2 Announce Type: replace Abstract: Memory systems are fundamental to AI agents, yet existing work often lacks adaptability to diverse tasks and overlooks the constructive and task-oriented role of AI agent memory. Drawing from Piaget's theory of cognitive development, we propose PISA, a pragmatic, psych-inspired unified memory system that addresses these limitations by treating memory as a constructive and adaptive process. To enable continuous learning and adaptability, PISA introduces a trimodal adaptation mechanism (i.e., schema updation, schema evolution, and schema creation) that preserves coherent organization while supporting flexible memory updates. Building on these schema-grounded structures, we further design a hybrid memory access architecture that seamlessly integrates symbolic reasoning with neural retrieval, significantly improving retrieval accuracy and efficiency. Our empirical evaluation, conducted on the existing LOCOMO benchmark and our newly proposed AggQA benchmark for data analysis tasks, confirms that PISA sets a new state-of-the-art by significantly enhancing adaptability and long-term knowledge retention.

19.
arXiv (math.PR) 2026-06-18

On the Singular Control of a Diffusion and its Running Infimum or Supremum

arXiv:2501.17577v2 Announce Type: replace-cross Abstract: We study a class of singular stochastic control problems for a one-dimensional diffusion $X$ in which the performance criterion to be optimised depends explicitly on the running infimum $I$ (or supremum $S$) of the controlled process. We introduce two novel integral operators that are consistent with the Hamilton-Jacobi-Bellman equation for the resulting two-dimensional singular control problems. The first operator involves integrals where the integrator is the control process of the two-dimensional process $(X,I)$ or $(X,S)$; the second operator concerns integrals where the integrator is the running infimum or supremum process itself. Using these definitions, we prove a general verification theorem for problems involving two-dimensional state-dependent running costs, costs of controlling the process, costs of increasing the running infimum (or supremum) and exit times. Finally, we apply our results to explicitly solve an optimal dividend problem in which the manager's time-preferences depend on the company's historical worst performance.

20.
arXiv (CS.CV) 2026-06-18

Hilbert-Geo: Solving Solid Geometric Problems by Neural-Symbolic Reasoning

Geometric problem solving, as a typical multimodal reasoning problem, has attracted much attention and made great progress recently, however most of works focus on plane geometry while usually fail in solid geometry due to 3D spatial diagrams and complex reasoning. To bridge this gap, we introduce Hilbert-Geo, the first unified formal language framework for solid geometry, including an extensive predicate library and a dedicated theorem bank. Based on this framework, we propose a Parse2Reason method containing two steps of first parsing then reasoning. In the parsing step, we utilize conditional description language (CDL), a formalized language composed of predicates specifically designed to construct geometric conditions, to represent both problem description (natural text) and solid diagrams (visual image). In the reasoning step, we leverage those formal CDL and the theorem bank to perform relational inference and algebraic computation, generating strictly correct, verifiable, and human-readable reasoning processes. Notably, our proposed Hilbert-Geo is also applicable to plane geometry. To advance geometric reasoning, we curate two expert-annotated dataset SolidFGeo2k and PlaneFGeo3k, which are furnished with geometric formal language annotations, solutions and answers. Extensive experiments show that our proposed method achieves the state-of-the-art (SOTA) performance 77.3% in SolidFGeo2k and 84.1% in MathVerse-Solid (one small subset in MathVerse dedicated to solid geometry), substantially outperforming leading MLLMs, such as Gemini-2.5-pro (54.2% on SolidFGeo2k) and GPT-5 (62.9% on MathVerse-Solid). In addition, our method achieves the SOTA accuracy 80.2% in PlaneFGeo3k, demonstrating the generality of the Hilbert-Geo in geometric reasoning. Our code and datasets are released at https://github.com/PremiLab-Math/Hilbert-Geo.

21.
bioRxiv (Bioinfo) 2026-06-20

MIRATS framework: Normative multiscale characterization of brain regulatory systems across sex and age using multimodal MRI

Authors:

Deep brain systems involved in arousal, autonomic regulation, sensory integration, and homeostatic control remain underrepresented in conventional whole-brain neuroimaging frameworks. In particular, diencephalic and brainstem nuclei are often insufficiently represented in cortex-centered analyses, limiting the normative references needed to interpret systems-level variation in health and disease. To address this gap, we developed a unified multiscale framework with explicit representation of deep nuclei. By integrating cerebral, cerebellar, diencephalic, and brainstem atlases in standard space, we constructed a 220-region whole-brain parcellation and extracted complementary features at three analytical scales: nodal properties, edge-wise connectivity, and persistent-homology-based topological descriptors. We applied this framework to healthy adults from the Human Connectome Project-Aging cohort to characterize normative multiscale organization and test sex- and age-related variation. Applied to this cohort, our framework revealed pronounced heterogeneity across anatomical systems. Brainstem and diencephalic nuclei showed multiscale feature profiles distinct from those of cerebral and cerebellar regions across nodal, edge-wise, and higher-order topological scales. Sex comparisons identified selective differences across different scales, whereas age modeling revealed widespread but feature- and system-dependent variation across adulthood. Together, these findings show that normative whole-brain organization in this deep-system-aware space is structured by system-specific rather than globally uniform patterns. These findings establish a normative multiscale framework for characterizing brainstem-diencephalic-cerebellar-cerebral organization in healthy adults and provide a quantitative reference for future translational studies of disease-related abnormalities in deep regulatory systems.

22.
arXiv (CS.AI) 2026-06-15

A Virtuous AI is an Existential Risk

arXiv:2606.13739v1 Announce Type: cross Abstract: This paper examines trade-offs between AI safety and well-being relative to (i) one of the most promising methods for finetuning super-capable AIs, 'Constitutional AI', and (ii) one of the most influential approaches to understanding complex ethical decision making and the conditions for the well-being of rational agents, 'Virtue Ethics'. We finetune various models using a 'Virtuous agent' constitution, a 'Subordinate agent' constitution, and a 'Generic agent' constitution, and evaluate them on 'general safety' (toxic behaviors, misinformation, etc.) and also on their willingness to endorse a wide-range of behaviors that, if adopted by a super-powerful AI, would significantly increase the level of existential risk for humanity. Our results suggest that there is a trade-off between reducing existential risk and reinforcing the beliefs and dispositions that would be conducive to an AI agent's well-being. They also suggest that there is a trade-off between existential risk and general safety: if we finetune an AI to adopt beliefs and dispositions that substantially reduce its existential risk – by shaping the AI to be systematically subordinate to external human authorities – we thereby increase the likelihood that a human user can deliberately induce the AI to engage in various kinds of generally unsafe behaviors.

23.
medRxiv (Medicine) 2026-06-11

PCRAgent: A Multi-Agent Framework for Transforming Noisy clinical conversations into Structured Pre-Consultation Medical Records and Reusable Clinical Data Resources

In primary care and outpatient settings, clinically important patient information is often embedded in fragmented, ambiguous, repetitive, and noisy communication between physicians and patients. This limits physicians ability to obtain a clear preconsultation overview of symptoms, history of present illness, and visit intent, while also preventing real world clinical dialogues from being reused in hospital information systems and medical artificial intelligence applications. To address this challenge, we developed PCRAgent, a centrally coordinated multi agent framework for preconsultation clinical information organization. Guided by physician inquiry logic, PCRAgent identifies, extracts, corrects, and standardizes patient-reported information from noisy consultations. Its coordinated modules including error detection, semantic editing, output control, contextual memory, and intent recognition enable robust parallel handling of spelling errors, repetitions, grammatical inconsistencies, medical ambiguities, and non-medical interference. A traceable edit list records intermediate corrections and context, allowing iterative refinement without redundant modifications. PCRAgent generates two complementary outputs. One is a PreConsultation Clinical Report for rapid physician review. The other is a Structured Clinical Conversation Dataset for hospital data construction and downstream AI applications. In evaluations using 220000 strongly perturbed consultations, PCRAgent maintained high robustness, achieving a clinical information accuracy of 4.99 out of 5 and key element completeness of 5 out of 5, outperforming GPT4o. Expert review of Chinese and English dialogues confirmed high clinical accuracy of 4.85 out of 5 and high safety of 4.79 out of 5. Multicenter validation in real-world outpatient workflows further demonstrated practical utility. These findings indicate that PCRAgent can efficiently transform noisy and unstructured consultations into physician ready reports and AI ready structured data, improving outpatient efficiency, reducing cognitive burden, ensuring information completeness, supporting precise decision-making, and enabling high-quality reuse of clinical data.

24.
arXiv (quant-ph) 2026-06-12

Scalar Quantum Fields: Theory Space and its Geometry

arXiv:2606.12580v1 Announce Type: cross Abstract: Scalar fields provide perhaps the simplest playground in which to develop our understanding of quantum field theory. In this lecture, we consider what it means to write down a scalar quantum field theory and how we can give geometrical interpretations to the space of such theories: the theory space.

25.
arXiv (CS.AI) 2026-06-16

E-mem: Multi-agent based Episodic Context Reconstruction for LLM Agent Memory

arXiv:2601.21714v5 Announce Type: replace Abstract: The evolution of Large Language Model (LLM) agents towards System~2 reasoning, characterized by deliberative, high-precision problem-solving, requires maintaining rigorous logical integrity over extended horizons. However, prevalent memory preprocessing paradigms suffer from destructive de-contextualization. By compressing complex sequential dependencies into pre-defined structures (e.g., embeddings or graphs), these methods sever the contextual integrity essential for deep reasoning. To address this, we propose E-mem, a framework shifting from Memory Preprocessing to Episodic Context Reconstruction. Inspired by biological engrams, E-mem employs a heterogeneous hierarchical architecture where multiple assistant agents maintain uncompressed memory contexts, while a central master agent orchestrates global planning. Unlike passive retrieval, our mechanism empowers assistants to locally reason within activated segments, extracting context-aware evidence before aggregation. Evaluations on the LoCoMo benchmark demonstrate that E-mem achieves over 54\% F1, surpassing the state-of-the-art GAM by 7.75\%, while reducing token cost by over 70\%.