Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-24

Auxiliary Schmidt Rank as a Resource for Photonic Bell Measurements

arXiv:2606.24591v1 Announce Type: new Abstract: In quantum communication and fusion-based quantum computation, photonic Bell measurements are fundamentally limited when only passive linear optics is employed. While for qubits, some Bell states can be unambiguously identified with static beam splitters and no extra photons or entanglement, additional auxiliary photons or at least additional auxiliary degrees of freedom with a certain level of additional entanglement are needed to approach or attain a complete, deterministic Bell measurement. Here, we prove an exact resource threshold when the same two photons carry system qudits of dimension $d$ and a fixed auxiliary entangled state $\Phi$, possibly distributed over several additional degrees of freedom, with total Schmidt rank $r_\Phi$. We show that a single conclusive Bell-label functional can occur for $r_\Phi\geqslant\lceil d/2\rceil$, but deterministic discrimination of all $d^2$ Bell-state labels requires $r_\Phi\geqslant d$. A maximally entangled rank-$d$ auxiliary state achieves the bound by local Bell-basis sorting between each photon's system and auxiliary degrees of freedom. Thus, the auxiliary Schmidt rank is a certified resource for ancilla-photon-free, embedded photonic Bell measurements.

02.
arXiv (CS.AI) 2026-06-24

VeryTrace: Verifying Reasoning Traces through Compilable Formalism and Structured Verification

arXiv:2606.24124v1 Announce Type: new Abstract: Multi-step reasoning with Chain-of-Thought (CoT) prompting remains fragile: logical errors or hallucinations in early steps silently propagate, producing confident but incorrect conclusions. This paper presents VeryTrace, a zero-shot verification-and-repair framework that formalizes natural-language reasoning traces into a structured, compilable representation. VeryTrace introduces a Domain-Specific Language (DSL) that (i) makes step dependencies explicit, (ii) mechanizes quantitative content as executable expressions, and (iii) structures semantic inferences via deduction schemas. Our hybrid verifier combines deterministic checks for computational correctness, dependency resolution, and constraint satisfaction with targeted LLM audits for non-mechanizable semantic judgments, enabling step-level error localization and repair. Across three diverse domains-competition mathematics (AIME 2025), robotics planning (LLM-BabyBench), and kinship reasoning (CLUTRR), VeryTrace improves accuracy over zero-shot baselines on state-of-the-art LLMs without requiring domain-specific training or in-context examples, demonstrating that formalized trace verification achieves both precision and generalization.

04.
arXiv (CS.AI) 2026-06-24

Disentangling Aleatoric and Epistemic Uncertainty in Physics-Informed Neural Networks. Application to Insulation Material Degradation Prognostics

arXiv:2601.03673v2 Announce Type: replace-cross Abstract: Physics-Informed Neural Networks (PINNs) provide a framework for integrating physical laws with data. However, their application to Prognostics and Health Management (PHM) remains constrained by the limited uncertainty quantification (UQ) capabilities. Most existing PINN-based prognostics approaches are deterministic or account only for epistemic uncertainty, limiting their suitability for risk-aware decision-making. This work introduces a heteroscedastic Bayesian Physics-Informed Neural Network (B-PINN) framework that jointly models epistemic and aleatoric uncertainty, yielding full predictive posteriors for spatiotemporal insulation material ageing estimation. The approach integrates Bayesian Neural Networks (BNNs) with physics-based residual enforcement and prior distributions, enabling probabilistic inference within a physics-informed learning architecture. The framework is evaluated on transformer insulation ageing application, validated with a finite-element thermal model and field measurements from a solar power plant, and benchmarked against deterministic PINNs, dropout-based PINNs (d-PINNs), and alternative B-PINN variants. Results show that the proposed B-PINN provides improved predictive accuracy and better-calibrated uncertainty estimates than competing approaches. A systematic sensitivity study further analyzes the impact of boundary-condition, initial-condition, and residual sampling strategies on accuracy, calibration, and generalization, and the influence of measurement noise on aleatoric uncertainty. Overall, the findings highlight the capability of Bayesian physics-informed learning to support uncertainty-aware prognostics and informed decision-making in transformer asset management by tracking aleatoric and epistemic sources of uncertainty.

05.
arXiv (CS.LG) 2026-06-15

Approximating Whittle-Matern Fields over Discretized Manifolds

arXiv:2606.13827v1 Announce Type: cross Abstract: Markovian Whittle-Matérn fields have been convergently approximated by discrete Gauss Markov Random Fields (GMRFs) with sparse precision matrices using a Finite Element approximation of the two-parameter family, \[ (\kappa^2 - \Delta)^{\alpha/2} u = \mathcal{W}, \;\; \kappa \in \mathbb{R}, \; \alpha \in \mathbb{N}. \] of SPDEs. Using recent developements in the analysis of Discrete Exterior Calculus (DEC), we present a different, yet closely related, convergent GMRF approximation to these Matérn fields over complete, boundaryless Riemannian manifolds discretized as well-centered simplicial complexes. This convergent method (i) is agnostic to $\alpha, \kappa$ and thus allows a universal approximation scheme for the precision and covariance matrices of the entire $(\alpha, \kappa)$-family of GMRFs, so they may be inferred rather than guessed. (ii) inherently models pointwise and piecewise-smoothed measurements of a random field and approximates both equally well (iii) is computationally independent of the interpolants used - it suffers no overhead if one convergent interpolant were replaced with another suitable interpolant over the same mesh. Furthermore, we show that, on discretizations that are well-connected in a precise sense, and volume-concentrated, the precision matrices are spectral functions of a graph-laplacian. We provide a low rank approximator to the family of such Matérn GMRFs and mention a use case: reducing the number of measurements needed to model the GMRF by compressed-sensing.

06.
arXiv (CS.CV) 2026-06-25

RubricRL: Simple Generalizable Rewards for Text-to-Image Generation

Reinforcement learning (RL) has recently emerged as a promising approach for aligning text-to-image generative models with human preferences. A key challenge, however, lies in designing effective and interpretable rewards. Existing methods often rely on either composite metrics (e.g., CLIP, OCR, and realism scores) with fixed weights or a single scalar reward distilled from human preference models, which can limit interpretability and flexibility. We propose RubricRL, a simple and general framework for rubric-based reward design that offers greater interpretability, composability, and user control. Instead of using a black-box scalar signal, RubricRL dynamically constructs a structured rubric for each prompt–a decomposable checklist of fine-grained visual criteria such as object correctness, attribute accuracy, OCR fidelity, and realism–tailored to the input text. Each criterion is independently evaluated by a multimodal judge (e.g., o4-mini), and a prompt-adaptive weighting mechanism emphasizes the most relevant dimensions. This design not only produces interpretable and modular supervision signals for policy optimization (e.g., GRPO or PPO), but also enables users to directly adjust which aspects to reward or penalize. Experiments with an autoregressive text-to-image model demonstrate that RubricRL improves prompt faithfulness, visual detail, and generalizability, while offering a flexible and extensible foundation for interpretable RL alignment across text-to-image architectures.

07.
arXiv (CS.CL) 2026-06-15

ScoreGate: Adaptive Chunk Selection for Retrieval-Augmented Generation via Dual-Score Statistical Fusion

Fixed-cardinality retrieval injects a constant top-K chunks into the generator regardless of query complexity, causing over-retrieval for narrow queries and under-retrieval for compositional ones. We describe ScoreGate, a lightweight score-space decision mechanism that controls retrieval cardinality at inference time using two scores already produced by the standard pipeline: bi-encoder similarity s_i and cross-encoder reranker score r_i, with no additional model inference calls required. Its core insight is that cross-encoder affirmation can rescue semantically relevant chunks that bi-encoder retrieval ranks poorly due to vocabulary mismatch – a failure mode unaddressed by fixed-K or single-score thresholding. On MS MARCO (200 dev queries), ScoreGate achieves MRR@10 = 0.401 with 35% fewer retained chunks than Standard Top-K. On an internal benchmark (n=300, Fleiss' kappa=0.87), ScoreGate observed zero false positives (95% CI [96.4%, 100%]) at 97.77-99.34% recall, with 34.8% fewer tokens per query and only 31ms added latency. Results on both MS MARCO and real-world production traffic suggest that adaptive retrieval cardinality can improve retrieval efficiency without degrading retrieval quality.

08.
arXiv (CS.CV) 2026-06-18

Prior-guided Fusion of Multimodal Features for Change Detection from Optical-SAR Images

Multimodal change detection (MMCD) identifies changed areas in multimodal remote sensing data, demonstrating significant application value in land use monitoring and urban sustainable development. However, literature MMCD approaches exhibit limitations in both cross-modal interaction and exploiting modality-specific characteristics. This leads to insufficient modeling of fine-grained change information, thus hindering the precise detection of semantic changes. To address these problems, we propose STSF-Net, a framework designed for MMCD between optical and SAR images. STSF-Net jointly models modality-specific and spatio-temporal common features to enhance change representations. Specifically, modality-specific features are exploited to capture genuine semantic change signals, while spatio-temporal common features are embedded to suppress pseudo-changes caused by differences in imaging mechanisms. Furthermore, we introduce an optical and SAR feature fusion strategy that adaptively adjusts multimodal feature importance based on semantic priors obtained from visual foundation models. Finally, we introduce the novel Delta-SN6 dataset, the first openly-accessible multiclass MMCD benchmark consisting of very-high-resolution fully polarimetric SAR and optical images. Experimental results on Delta-SN6, BRIGHT, and Wuhan datasets demonstrate that our method outperforms the state-of-the-art by 3.21%, 0.87%, and 1.32% in mIoU, respectively.

09.
arXiv (CS.AI) 2026-06-17

Closing the Feedback Loop: From Experience Extraction to Insight Governance in Verbal Reinforcement Learning

arXiv:2606.17591v1 Announce Type: new Abstract: Training-free verbal reinforcement learning enables LLM agents to learn from world feedback – objective signals such as dynamic task outcomes, market returns, or demand forecasts – by extracting verbal rules from experience and injecting them as context, updating the agent's behavior without parameter changes. However, in non-stationary environments these agents face a retention-forgetting dilemma: retaining stale insights causes negative transfer, while discarding them causes catastrophic forgetting when conditions recur. We identify four requirements for navigating this dilemma – outcome-driven evaluation, persistent structured evidence, non-monotonic knowledge lifecycle, and compositional governance – and show that existing methods invest heavily in experience extraction while underinvesting in insight governance. We propose a three-layer architecture – rules, evidence, and skills – connected by a feedback-driven curation loop that closes the governance gap. Rules capture distilled experience from world outcomes; evidence logs track each rule's reliability across episodes; skills govern which rules to apply, how to resolve conflicts, and when to abstain. On financial forecasting as a case study, where world feedback is naturally abundant, noisy, and non-stationary, we show that the same accumulated experience either degrades performance below the zero-shot baseline or dramatically improves accuracy and risk-adjusted returns, depending on whether the curation loop is present.

10.
medRxiv (Medicine) 2026-06-17

Low-Density Lipoprotein Cholesterol and Dementia Risk: Integrating Mendelian Randomization and Target Trial Emulation Within the Heart-Brain Axis

Background: The heart-brain axis links cardiovascular and neurodegenerative disease through shared vascular and inflammatory mechanisms. Although low-density lipoprotein cholesterol (LDL-C) is an established causal factor in atherosclerotic cardiovascular disease (ASCVD), its relationship with dementia remains uncertain, with midlife elevations associated with increased risk but late-life associations often appearing null or inverse. To address this cholesterol paradox, we integrated mendelian randomization (MR) with an active-comparator new-user target trial emulation. Methods: We applied a triangulated causal inference framework integrating two-sample MR with observational target trial emulation. Genetic variants associated with LDL-C were used as instrumental variables to evaluate Alzheimer disease (AD), dementia with Lewy bodies (DLB), frontotemporal dementia (FTD), and any dementia (AnyDem), with causal estimates derived using inverse-variance weighted models and sensitivity analyses for heterogeneity and pleiotropy. In parallel, an active-comparator new-user design compared statin versus ezetimibe initiation among adults aged 60 years or older using propensity score (PS) overlap weighting and Cox proportional hazards models to evaluate cardiovascular and dementia outcomes. Results: Genetically predicted LDL-C was associated with increased risk of DLB (OR 1.65, 95% CI 1.30-2.10; p

11.
arXiv (quant-ph) 2026-06-12

Electric Field Distortions in Surface Ion Traps with Integrated Nanophotonics

arXiv:2503.20387v3 Announce Type: replace Abstract: The integration of photonic components into surface ion traps provides a scalable approach for trapped-ion quantum computing, sensing, and metrology, enabling compact systems with enhanced stability and precision. However, the introduction of optical apertures in the trap electrodes can distort the trapping electric field. This can lead to excess micromotion (EMM) and ion displacement which degrade the performance of quantum logic operations and optical clocks. In this work, we systematically investigate the electric field distortion in a surface ion trap with integrated waveguides and grating couplers using Finite Element Method (FEM) simulations. We analyze methods to reduce these distortions by exploiting symmetries and transparent conductive oxide materials.

12.
arXiv (quant-ph) 2026-06-19

Measuring Rényi entropy with an Echo Protocol

arXiv:2504.05237v3 Announce Type: replace Abstract: We present efficient and practical protocols to measure the second Rényi entropy, whose exponential is known as the purity. Our approach is based on expressing the purity in terms of transition probabilities generated by an echo-type forward-backward evolution sequence, making it applicable to quantum many-body systems. Notably, our approach does not rely on random-noise averaging, a feature that can be extended to protocols to measure out-of-time-order correlation functions, as we demonstrate. By way of example, we show that our protocols can be practically implemented in superconducting qubit-based platforms, as well as in cavity-QED trapped ultra-cold gases.

13.
arXiv (quant-ph) 2026-06-25

Quantum Error Correction-like Noise Mitigation for Wave-like Dark Matter Searches with Quantum Sensors

arXiv:2511.03253v2 Announce Type: replace-cross Abstract: We propose a quantum error correction-like noise mitigation protocol for enhancing the sensitivity of wave-like dark matter searches with quantum sensors. Our protocol uses multiple sensors to mitigate the noise affecting each sensor individually, allowing for the suppression of excitation noise that is parallel to the dark matter signal. We demonstrate that our protocol can improve the sensitivity to dark matter signals by a factor of $\sqrt{N}$, where $N$ is the number of sensors used, for small $N$. Furthermore, for sufficiently large $N$, we find that our protocol achieves the same performance as the standard quantum limit by the ideal measurement, which non-entangled sensors with parallel noise cannot reach due to the unknown phase of the dark matter field. Our work can be widely applied to various types of signals with unknown phases, and has the potential to enhance the sensitivity of quantum sensors such as arrays of resonant cavities.

14.
arXiv (CS.AI) 2026-06-15

HierSVA: A Data Synthesis Pipeline, Dataset, and Benchmark for LLM-Driven Hierarchical Hardware Formal Verification

arXiv:2606.13706v1 Announce Type: cross Abstract: We present HierSVA, an integrated suite that combines a pipeline, dataset, and benchmark for LLM-driven hierarchical hardware formal verification. HierSVA-SP pairs an RTL preprocessing toolchain with an LLM-in-the-loop formal verification flow to produce reference SystemVerilog Assertions (SVA) on hierarchical RTL. Applying it to BaseJump STL yields HierSVA-DS, a dataset of 342 modules, with hierarchy metadata and depths 0–9, accompanied by a deep subset of 28 module-bug pairs with natural-language specifications and bug variants. HierSVA-B decomposes assertion quality into six metric axes: syntax correctness, assertion proof success rate, vacuity, specification faithfulness, mutation coverage, and formal core coverage. Applying HierSVA-B to twelve recent LLMs reveals three findings. First, the module-level compile rate is 67.1\%; among generated assertions in evaluable runs, 82.1\% prove non-vacuously, but the corresponding assertion sets detect only 70.2\% of eligible injected faults and cover 36.2\% of the formal core. Second, on 211 evaluable model–module entries in the deep subset, assertion sets flag buggy RTL with 0.87 recall, but 40\% of predicted-buggy outcomes are false positives on correct RTL, limiting precision to 0.60. Third, agentic mode improves S1-style provability and strength metrics, but gains plateau and oscillate. Codes and artifacts are available at \href{https://github.com/HierSVAAnon/HierSVACodeAndArtifacts}{https://github.com/HierSVAAnon/HierSVACodeAndArtifacts}. Dataset is available at \href{https://huggingface.co/datasets/AnonymousHierSVA/HierSVA}{https://huggingface.co/datasets/AnonymousHierSVA/HierSVA}.

15.
arXiv (CS.LG) 2026-06-19

A Hybrid GNN-FEM Framework for Phase-Field Fracture Simulation. Physics-Preserving Hybridization for Generalizable Surrogate Modeling

arXiv:2606.19378v1 Announce Type: new Abstract: Scientific machine learning (SciML) has emerged as a promising approach for accelerating simulations of complex physical systems, yet achieving physically consistent and generalizable predictions for nonlinear, history-dependent problems remains a central challenge. In this study, we propose a hybrid GNN–FEM framework for efficient and generalizable phase-field fracture modeling. While phase-field approaches provide a robust variational framework for simulating complex crack evolution, their high computational cost limits practical applications because they require solving coupled, nonlinear, and history-dependent systems within an incremental finite element procedure. To address this challenge, a graph neural network surrogate is integrated into the conventional staggered scheme, replacing the phase-field update at each load increment while retaining the FEM-based displacement solver to enforce mechanical equilibrium and boundary conditions. By preserving the incremental solution structure, the framework remains consistent with history-dependent fracture evolution without requiring the surrogate to approximate the full solution trajectory. This selective surrogate strategy emphasizes the identification of a physically meaningful and incrementally structured learning target, rather than relying on brute-force data generation to learn the full fracture process. The proposed framework achieves strong generalization across varying geometries, loading conditions, material properties, and discretizations through dimensionless feature design, a graph-based formulation on mesh-based domains, and a physics-informed loss derived from the governing phase-field equation. Numerical experiments demonstrate that the hybrid approach reduces computational cost while maintaining accuracy compared with conventional FEM, and exhibits robust predictive performance across diverse problem settings.

16.
arXiv (CS.AI) 2026-06-16

The Faithfulness Gap: Certifying Semantic Equivalence Between Natural-Language and Formal Mathematical Statements

arXiv:2606.16541v1 Announce Type: new Abstract: Autoformalization, translating natural-language mathematics into formal proof assistants, is bottlenecked not by translation fluency but by faithfulness: a formal statement can typecheck and be provable, yet still encode a different theorem than the source intended. We introduce Bidirectional Provability Fingerprinting (\bpf{}), a framework that certifies faithfulness by characterizing each candidate through its forward and backward consequence neighborhoods in the ambient theory and matching these against probes derived from the natural-language statement. We further introduce four novel components: (i) Counterfactual Probe Generation (\cpg{}), a contrastive procedure that synthesizes probes targeting specific drift directions; (ii) the Equivalence Spectrum, a continuous faithfulness score that replaces brittle binary verdicts; (iii) Adaptive Probe Budget Allocation (\apba{}), an information-theoretic budget router; and (iv) Faithfulness-Guided Decoding (\fgd{}), which uses \bpf{} signals as a reward during autoformalization. We prove a drift detection theorem and a PAC-faithfulness result establishing that the equivalence class of a natural language statement is learnable from $\mathcal{O}(\log(1/\delta)/\varepsilon)$ probes under mild assumptions. We release \driftbench{}, a benchmark of $2{,}183$ NL/Lean~4 pairs with controlled drift labels across six subfields of mathlib4. \bpf{}\,+\,\cpg{} detects $89.6\%$ of drifted formalizations at a $3.0\%$ false-positive rate-against $41.2\%$ for typecheck and $63.3\%$ for LLM-judge baselines, and \fgd{} reduces the rate at which a state-of-the-art autoformalizer emits drifted statements by $47\%$. https://pmlrbd.github.io/BPF/

17.
arXiv (CS.CL) 2026-06-17

PACE-RAG: Patient-Aware Contextual and Evidence-Constrained RAG for Clinical Drug Recommendation

Drug recommendation requires a deep understanding of individual patient context, especially for complex conditions like Parkinson's disease. While LLMs possess broad medical knowledge, they fail to capture the subtle nuances of actual prescribing patterns. Existing RAG methods also struggle with these complexities because guideline-based retrieval remains too generic and similar-patient retrieval often replicates majority patterns without accounting for the unique clinical nuances of individual patients. To bridge this gap, we propose PACE-RAG (Patient-Aware Contextual and Evidence-Constrained RAG). Rather than directly copying frequent medications from retrieved patients, PACE-RAG personalizes recommendations by first extracting patient-specific clinical features, retrieving cases around these features, and then refining the final prescription using the patient's current symptoms, active medication history, and focus-specific prescribing tendencies. By analyzing treatment patterns tailored to specific clinical features, PACE-RAG generates patient-specific medication recommendations along with an explainable clinical summary. Evaluated on a Parkinson's cohort and the MIMIC-IV benchmark using Llama-3.1-8B and Qwen3-8B, PACE-RAG achieved state-of-the-art performance, reaching F1 scores of 80.84% and 47.22%, respectively. These results suggest that PACE-RAG is a robust and clinically grounded framework for personalized decision support. Our code is available at: https://github.com/ChaeYoungHuh/PACE-RAG.

18.
arXiv (CS.AI) 2026-06-19

Human-on-the-Loop Orchestration for AI-Assisted Legal Discovery

arXiv:2606.19812v1 Announce Type: new Abstract: Autonomous Large Language Model (LLM) agents are increasingly deployed in electronic discovery (e-discovery), where compounding errors across multi-step reasoning chains can constitute legal malpractice. Unlike single-turn retrieval, agentic workflows operating over privileged document corpora exhibit a class of failure we term "trajectory collapse": an early misclassification silently propagates, rendering an entire privilege review invalid. This paper makes three contributions. First, we propose a structured taxonomy of agentic failures in legal information retrieval, organized by functional stage. Second, we introduce a four-layer verification architecture – spanning planning, reasoning, execution, and uncertainty quantification – designed to intercept these failures before they compound. Third, we present a preliminary simulation study on a synthetic e-discovery corpus that demonstrates how mandatory Human-on-the-Loop (HOTL) escalation thresholds reduce privilege-waiver risk relative to fully autonomous baselines. Our results suggest that calibrated uncertainty thresholds can reduce privilege-waiver risk by up to 61% versus fully autonomous deployment, while routing fewer than one quarter of documents to attorney review.

19.
medRxiv (Medicine) 2026-06-22

Paired plasma and EV-enriched plasma proteomics reveal nonredundant sepsis-associated host-response signatures in critical illness

Background: Plasma proteomics may identify host-response signatures in sepsis, but it is unclear whether extracellular vesicle (EV)-enriched plasma provides distinct or redundant information compared with plasma. We compared paired plasma and EV-enriched plasma proteomes in critically ill patients with sepsis and critically ill non-sepsis controls (CINS). Methods: In this prospective observational study, paired plasma and EV-enriched plasma samples were analyzed from 56 critically ill adults, including 40 patients with sepsis and 16 CINS patients. Protein abundance was quantified using liquid chromatography-tandem mass spectrometry. Analyses compared proteomic depth, protein overlap, global concordance between compartments, and differential protein abundance between CINS and sepsis. Exploratory Gene Ontology enrichment was performed as a supplementary analysis. Results: EV-enriched plasma expanded proteomic detection, identifying 2,476 filtered proteins compared with 506 in plasma. Only 386 proteins were detected in both compartments, while 2,090 were unique to EV-enriched plasma and 120 were unique to plasma. Among shared proteins, plasma and EV-enriched plasma showed modest global concordance across critically ill patients (Spearman coeff = 0.322, p = 9.19 x 10^-11), with similar findings in sepsis alone. Differential abundance analysis identified 11 sepsis-associated proteins in plasma and 22 in EV-enriched plasma. Only SAA1, SAA2, and IGFBP6 were significant in both compartments. Exploratory pathway analysis supported acute-phase and inflammatory enrichment in plasma sepsis-associated proteins, while EV-enriched signals were directionally plausible but did not meet prespecified FDR thresholds. Conclusion: Plasma and EV-enriched plasma proteomics capture related but nonredundant sepsis-associated host-response information in critically ill patients.

20.
arXiv (CS.LG) 2026-06-18

Bridging Data Gaps in Structural Fragility Modeling through Transfer Learning: Methodology and Case Studies

arXiv:2606.18567v1 Announce Type: cross Abstract: This paper presents a methodology-centered transfer learning framework for fragility adaptation under domain shift, class imbalance, and scarce target labels while preserving engineering interpretability and supporting decision-making under uncertainty. Four transfer learning strategies (instance-based, parameter-based, hierarchical Bayesian, and multi-source) are demonstrated through three complementary case studies: (i) instance-based transfer learning via importance weighting, demonstrated on coastal bridge fragility using Hurricane Katrina observations; (ii) parameter-based transfer learning together with hierarchical Bayesian transfer learning, enabling partial pooling across strata and posterior uncertainty quantification, demonstrated on residential building fragility using Hurricane Ian observations; and (iii) multi-source transfer learning that fuses multiple analytical fragility models with learned source weights and regularized target-domain adaptation, demonstrated on seismic bridge fragility using observations from the 2001 Nisqually earthquake. Across these case studies, direct transfer of source models (i.e. using existing state-of-the-art models) fails under domain shift and severe class imbalance, while targeted adaptation substantially improves failure detection and predictive stability in low-data regimes. These findings highlight the need for systematic guidance on diagnostics, strategy selection, and uncertainty reporting when developing and adapting fragility models.

21.
arXiv (CS.CV) 2026-06-25

KidRisk: Benchmark Dataset for Children Dangerous Action Recognition

Children are naturally energetic, and during their spontaneous activities, they often encounter potentially dangerous situations, especially when lacking parental supervision. Identifying actions that pose risks plays a crucial role in ensuring their safety. This paper build a novel challenging dataset, namely KidRisk, including 2,500 short videos of children's actions and 10,000 images for dangerous action of children. We also introduce a benchmark on our newly constructs dataset and find that traditional deep learning models demonstrated limited effectiveness on these tasks. Therefore, we develop vision-language based baselines with exceptional context understanding of visual information. Our proposed methods achieved an accuracy of 83.53% in classifying children's actions and 96.14% in recognizing children's dangerous actions, significantly outperforming traditional approaches. These results confirm that vision-language models are not only feasible but also highly effective in detecting hazardous actions, contributing positively to safeguarding children's safety.

22.
Nature Medicine 2026-06-25

Neoadjuvant stereotactic body radiation therapy with durvalumab and oleclumab in ER<sup>+</sup>HER2<sup>−</sup> breast cancer: a randomized phase 2 trial

Patients with estrogen receptor-positive (ER+), HER2-negative, early breast cancer (BC) have low pathologic complete response (pCR) rates following neoadjuvant chemotherapy. Immune checkpoint inhibitors (ICIs) provide limited benefit in programmed death-ligand 1 (PD-L1)-negative tumors, characterized by an immune-cold tumor microenvironment. Here we hypothesized that immune-modulating stereotactic body radiation therapy (iSBRT; 3 × 8 Gy) could enhance response through tumor microenvironment reprogramming, and that CD73 blockade could further improve efficacy. We conducted a phase 2, randomized, multicenter trial (Neo-CheckRay) in 147 female patients with high-risk, ER+HER2− early BC. Patients received neoadjuvant chemotherapy plus iSBRT alone (No_ICI), with anti-PD-L1 durvalumab (Single_ICI) or with durvalumab plus anti-CD73 oleclumab (Double_ICI). In the intention-to-treat population, the primary endpoint, residual cancer burden 0/1 rate, was 35.4% with No_ICI, 45.1% with Single_ICI and 47.9% with Double_ICI, without statistically significant differences. pCR rates were 16.7%, 29.4% and 33.3%, respectively (P = 0.059). In the per-protocol population (MammaPrint High Risk, n = 131), pCR rates were 16.3%, 32.6% and 35.6%, respectively (P = 0.040). Among PD-L1-negative tumors (n = 91), pCR rates were 3.4%, 28.1% and 30.0%, respectively. No new safety signals were observed. Baseline transcriptomic analysis showed low immune signature expression in PD-L1-negative tumors. Paired baseline and on-treatment biopsies obtained 1 week after iSBRT demonstrated tumor microenvironment reprogramming toward an inflamed phenotype in the iSBRT + anti-PD-L1 arms. These findings suggest that iSBRT + anti-PD-L1 may convert immune-cold ER+HER2− BC into more inflamed tumors and improve response, particularly in PD-L1-negative disease. ClinicalTrials.gov registration: NCT03875573 . In the randomized phase 2 Neo-CheckRay trial, patients with early-stage ER+HER2− breast cancer received neoadjuvant immune-modulating stereotactic body radiation therapy with or without durvalumab, and with or without the anti-CD73 antibody oleclumab, leading to encouraging clinical responses when durvalumab was added including in patients with PD-L1-negative tumors.

23.
arXiv (CS.CL) 2026-06-24

Metis: Bridging Text and Code Memory for Self-Evolving Agents

Self-evolving agents improve over time by distilling experience from past executions and reusing it in future tasks. Existing systems represent such experience either as natural-language text injected into the agent context or as code exposed as callable tools. However, the choice between these representations is typically made at design time rather than derived from the characteristics of the experience itself, leaving the trade-offs between them poorly understood. We present the first controlled study that isolates text memory and code memory over an identical set of experiences. Our results show that the two forms exhibit complementary trade-offs in construction cost, execution efficiency, and transferability, such that neither representation alone is sufficient. Guided by these findings, we propose Metis, a self-evolving agent system built on a hierarchical dual-representation memory. Metis organizes textual experience into execution plans, environment facts, and common pitfalls, and selectively crystallizes recurring plans into validated callable tools. This design combines the broad applicability of text memory with the execution efficiency of code memory while incurring tool-generation cost only when justified by repeated reuse. We evaluate Metis on AppWorld, a challenging benchmark for interactive agents. The results show that Metis improves task accuracy by up to 20.6% over ReAct while reducing execution cost by up to 22.8%. Compared with representative self-evolving agent systems, Metis consistently achieves a better balance between accuracy, execution efficiency, and memory-construction cost.

24.
arXiv (CS.LG) 2026-06-17

Noise-Driven Exploration and Transient Freezing Select Flat Minima in Stochastic Gradient Descent

arXiv:2601.10962v2 Announce Type: replace Abstract: Stochastic gradient descent (SGD) is central to deep learning, yet the dynamical origin of its preference for flatter, more generalizable solutions remains unclear. Here, by analyzing SGD learning dynamics, we identify a nonequilibrium mechanism that governs solution selection during training. Numerical experiments reveal a transient exploratory phase in which SGD trajectories repeatedly escape sharp valleys and migrate toward flatter regions of the loss landscape before becoming confined to a final basin. Using a tractable physical model, we show that SGD noise reshapes the loss landscape into an effective potential that preferentially stabilizes flat solutions. We further uncover a transient freezing mechanism: as training progresses, the flattening landscape suppresses transitions between competing valleys. Stronger SGD noise delays this freezing transition, prolonging the exploratory phase and thereby increasing the probability of convergence to flatter minima. Together, these results provide a unified physical framework connecting learning dynamics, loss-landscape geometry, and generalization, and suggest guiding principles for the design of more effective optimization algorithms.

25.
arXiv (CS.AI) 2026-06-12

Decoding Insect Song: A Multitask Semisupervised Orthoptera Bioacoustic Classifier

arXiv:2606.13236v1 Announce Type: cross Abstract: Passive acoustic monitoring holds great promise for ecological inference, yet existing automated tools are typically narrowly trained and non-transferable. We address these limitations with PULSE, a semi-supervised, multi-task framework for Orthoptera bioacoustics, combining weakly-supervised species classification, self-supervised learning on unlabelled field audio, and knowledge distillation from a general-purpose bioacoustic model. Our domain-adapted specialist model outperforms a state-of-the-art general model across all metrics (macro F1: 0.21 vs. 0.07; AUC: 0.74 vs. 0.45; AP: 0.32 vs. 0.19), with active learning further raising F1 to 0.34 and AUC to 0.84. Beyond classification, the learned embeddings encode ecologically meaningful structure, exposed through an interactive visualisation tool for ecological discovery.