Briefing Chat: The epic journey of Stonehenge’s central stone
Nature staff discuss some of the week’s top science news. Hear the biggest stories from the world of science | 12 June 2026
Academic Intelligence · Curated Daily
AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.
Nature staff discuss some of the week’s top science news. Hear the biggest stories from the world of science | 12 June 2026
arXiv:2603.03998v2 Announce Type: replace Abstract: Quantum Singular Value Transformation (QSVT) provides a unified framework for applying polynomial functions to the singular values of a block-encoded matrix. QSVT prepares a state proportional to $\bA^{-1}\bb$ with circuit depth $O(d\cdot\mathrm{polylog}(N))$, where $d$ is the polynomial degree of the $1/x$ approximation and $N$ is the size of $\bA$. Current polynomial approximation methods are over the continuous interval $[a,1]$, giving $d = O(\sqrt{\kap}\log(1/\varepsilon))$, and make no use of any properties of $\bA$. We observe here that QSVT solution accuracy depends only on the polynomial accuracy at the eigenvalues of $\bA$. When all $N$ eigenvalues are known exactly, a pure spectral polynomial $p_{S}$ can interpolate $1/x$ at these eigenvalues and achieve unit fidelity at reduced degree. But its practical applicability is limited. To address this, we propose a spectral correction that exploits prior knowledge of $K$ eigenvalues of $\bA$. Given any base polynomial $p_0$, such as Remez, of degree $d_0$, a $K\times K$ linear system enforces exact interpolation of $1/x$ only at these $K$ eigenvalues without increasing $d_0$. The spectrally corrected polynomial $p_{SC}$ preserves the continuous error profile between eigenvalues and inherits the parity of $p_0$. QSVT experiments on the 1D Poisson equation demonstrate up to a $5\times$ reduction in circuit depth relative to the base polynomial, at unit fidelity and improved compliance error. The correction is agnostic to the choice of base polynomial and robust to eigenvalue perturbations up to $10\%$ relative error. Extension to the 2D Poisson equation suggests that correcting a small fraction of the spectrum may suffice to achieve fidelity above $0.999$.
arXiv:2606.13994v1 Announce Type: cross Abstract: LLM-based Agents are becoming increasingly capable and widely deployed, creating growing incentives for adversarial misuse in the real-world. A key emerging threat is Decomposition Attacks [glukhov2024breach, jones2024adversaries] in which a harmful task is broken into simpler, benign subtasks that evade safety mechanisms when executed separately but cumulatively fulfill the malicious intent. Although recent benchmarks assess agent safety in multi-turn and multi-tool-use settings, they do not explicitly capture this form of decompositional misuse and may not represent realistic adversarial execution flows. To this end, we introduce DeCompBench, a benchmark designed specifically to evaluate agentic safety under decomposition attacks. DeCompBench is created with a decomposition-by-design principle using a graphical framework and enables harmful task decomposition into individually benign and executable subtasks with realistic workflows. Our experiments using a custom decomposer show that state-of-the-art agents exhibit high refusal rates on monolithic harmful tasks, but significantly lower refusal rates on their decomposed variants, while often inadvertently fulfilling the adversarial objectives. These findings underscore the need for safety evaluations against decomposition attacks and corresponding defenses. Our dataset is publicly available and can be found at https://huggingface.co/datasets/decompositionbench/DeCompBench.
We present EDEN (Emergency Department Electronic Notes), a new and unique large-scale corpus of clinical notes produced in Emergency Departments of Italian hospitals. The corpus, in its current version, is composed of approximately 4 million clinical notes fully anonymized, covering diverse phases of patient care during the stay in the emergency department. In addition, a subset of about six thousand notes has been manually annotated by clinical experts through a structured Case Report Form (CRF) containing 132 items relevant for two patient situations in emergency departments, dyspnea and loss of consciousness. Items may assume numerical values (e.g., for blood saturation), categorical (e.g., for level of consciousness ), binary (e.g., for presence of traumas), and mixed value types. The annotation process involved multiple clinicians and underwent iterative revision to resolve ambiguities in item formulation, resulting in a richly structured (although high imbalanced) resource. The dataset aims to fill a relevant gap of data able to support both the development and the use of Large Language Models in concrete medical applications. We describe the data collection protocol, the on-site anonymisation pipeline, corpus statistics, and the annotation scheme. Finally, we propose CRF-filling as a novel structured information extraction benchmark, and provide zero-shot baseline resulting from Gemma-27B and MedGemma-27B. To the best of our knowledge, the EDEN dataset is the largest freely available corpus of clinical notes existing for the Italian language.
Structured width pruning of GLU-MLP layers in Llama-3.2 models, guided by the Peak-to-Peak Magnitude (PPM) criterion, reveals a systematic dichotomy in how reducing the expansion ratio affects different model capabilities. While performance on tasks relying on parametric knowledge (e.g., MMLU, GSM8K) and perplexity metrics degrades predictably with decreasing expansion ratios, instruction-following capabilities improve at the 2.4x equilibrium ratio (IFEval: +4.8 points / +46% in Llama-3.2-1B and +3.7 points / +39% in Llama-3.2-3B), and multi-step reasoning remains robust (MUSR). This pattern, observed consistently across both evaluated model sizes, challenges the prevailing assumption in compression research that pruning induces uniform degradation. To investigate this, we evaluated seven expansion ratio configurations using comprehensive benchmark suites that assess factual knowledge, mathematical reasoning, language comprehension, instruction-following, and truthfulness. Our analysis identifies the expansion ratio as a critical architectural parameter that selectively reshapes the model's task performance profile, rather than merely serving as a compression metric.
By combining chemistry and biology with AI modelling, Layla Hosseini-Gerami is finding and reviving therapeutics with ‘huge potential’. By combining chemistry and biology with AI modelling, Layla Hosseini-Gerami is finding and reviving therapeutics with ‘huge potential’.
Block-diffusion language models offer a promising path toward faster-than-autoregressive generation by combining block-wise autoregressive decoding with within-block parallel denoising. However, in the few-step regime needed for practical acceleration, standard confidence-thresholded decoding is often brittle: aggressive thresholds hurt quality, while conservative thresholds require unnecessary denoising steps. Existing approaches that address this issue either require additional training or incur extra test-time compute. We present S2D2, a training-free self-speculative decoding framework for block-diffusion language models. Our key observation is that a block-diffusion model becomes autoregressive when the block size is reduced to one, allowing the same pretrained model to act as both drafter and verifier. S2D2 inserts a speculative verification step into standard block-diffusion decoding and uses lightweight routing policies to decide when verification is worth its cost. This yields a hybrid decoding trajectory in which diffusion proposes tokens in parallel, while the autoregressive mode acts as a local sequence-level critic. Across three mainstream block-diffusion families, S2D2 consistently improves the accuracy-speed tradeoff over strong confidence-thresholding baselines. On SDAR, we observe up to $4.7\times$ speedup over autoregressive decoding, and up to $1.57\times$ over a tuned dynamic decoding baseline while improving accuracy by up to $4.5$ points. On LLaDA2.1-Mini, S2D2 remains complementary to built-in self-correction, including a conservative setting where it is $4.4\times$ faster than the static baseline with slightly higher accuracy.
arXiv:2605.04954v2 Announce Type: replace-cross Abstract: Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization (BBO), require a part of the optimization budget to be computed. This raises two questions: (a) from which fraction of the budget spent on feature computation does PIAS become worth it for BBO, and (b) which fraction of the budget optimizes the tradeoff between feature accuracy and PIAS performance. To this end, we perform a broad study where PIAS with varying sampling budgets for feature computation is compared to the single best algorithm on a broad range of algorithm selection scenarios. These scenarios consist of two portfolio sizes, three problem sets, 4 dimensionalities, and 10 target budgets. We find that PIAS is viable for the majority of tested scenarios, even when as much as a quarter of the total budget is spent on feature computation. The tradeoff for the fraction of the budget spent on feature computation to maximize the benefit of PIAS is highly dependent on the specific AS scenario. Further, on average 20 percent of PIAS loss to the virtual best solver is explained by the budget spent on feature computation, highlighting the importance of properly accounting for the feature budget.
Loss of lean mass in proportion to total weight loss is observed with incretin mimetic therapies such as tirzepatide and has the potential to adversely affect health and function. Apitegromab is an investigational, fully human monoclonal antibody that selectively inhibits myostatin activation and is, thereby, capable of increasing muscle mass. In the randomized, double-blind, placebo-controlled phase 2 EMBRAZE study, adults with overweight or obesity (n = 102) were randomized 1:1 to receive tirzepatide plus apitegromab (10 mg kg−1) or tirzepatide plus placebo. At week 24, apitegromab resulted in a least square mean (80% confidence interval (CI)) of 1.9 (1.2−2.7) kg less lean mass loss than placebo (P = 0.001), despite similar total body weight loss between groups, representing a 54.9% retention of lean mass relative to placebo. In participants receiving apitegromab, trough concentrations of apitegromab and total latent myostatin, a pharmacodynamic marker, both increased over time and reached a plateau after approximately 16 weeks. Incidence of adverse events (AEs) (% (95% CI)) was generally similar across apitegromab-treated participants and placebo-treated participants, with 39 of 51 (76% (63−86%)) and 36 of 51 (71% (57−81%)) participants experiencing an AE, respectively. Serious adverse events (SAEs) were balanced and experienced by one of 51 (2% (0−10%)) participants in each arm. In summary, this proof-of-concept study demonstrated that selective targeting of myostatin by apitegromab was well tolerated and effective in preserving lean mass when combined with tirzepatide. ClinicalTrials.gov identifier: NCT06445075 . In the phase 2 EMBRAZE study, participants receiving tirzepatide and apitegromab lost less lean mass compared to participants receiving tirzepatide and placebo.
Introduction: Hantavirus disease is an emerging and potentially severe zoonosis of global distribution. In Uruguay, it is transmitted by rodents inhabiting peridomestic, suburban, and rural areas. Global incidence is estimated at 150,000 to 200,000 cases per year, with up to 300 annual cases in the Americas. Since 1997, Uruguay's Ministry of Public Health (MPH) has monitored Hantavirus cardiopulmonary syndrome (HCPS), the most common clinical presentation in the region. By 2019, a total of 271 cases had been identified in the country, with an estimated mortality rate of nearly 50%. Objectives: To describe the clinical, epidemiological, and occupational characteristics of patients with Hantavirus disease in Uruguay during the pre-pandemic (2018-2019) and pandemic (2020-2021) periods. Methods: A descriptive, cross-sectional, observational study was conducted, including all serologically confirmed cases of Hantavirus infection reported to the MPH between 2018 and 2021. Clinical and demographic data were extracted from the mandatory reporting form for zoonotic diseases. Incidence and case fatality rates were calculated, and factors associated with fatal outcomes were analyzed. Results: A total of 58 confirmed cases were identified between 2018 and 2021. Most patients were male (62%), with a mean age of 36.5 years (SD 16). A decline in incidence was observed during 2020-2021, with no significant change in case fatality. Direct rodent exposure was the most frequently associated risk factor. Montevideo and Canelones were the most affected departments. Renal and pulmonary involvement were significantly associated with mortality. Conclusion: Hantavirus remains a relevant public health concern in Uruguay. Although a decrease in incidence was observed during the COVID-19 pandemic years, case fatality rates remained high. The findings underscore the need for sustained surveillance and early recognition, particularly in urbanizing regions.
We present WireframeDETR, our submission to the Structured Semantic 3D Reconstruction (S23DR) 2026 Challenge, which requires predicting a 3D building wireframe from multi-view COLMAP point clouds. Our method applies DETR-style set prediction directly to 3D point clouds, producing wireframes as sets of edge coordinate pairs without any intermediate vertex detection stage. We introduce three technical contributions: (1) contrastive denoising training that stabilises noisy Hungarian matching in early epochs; (2) a multi-scale encoder that aggregates the last encoder layer outputs via learned scalar weights; and (3) progressive auxiliary loss weighting that concentrates gradient signal on the decoder layers that most benefit from it. Our model achieves a public test HSS of 0.575 (F1~=~0.664, IoU~=~0.516) and a best validation HSS of 0.534 on the cleaned val split.
arXiv:2606.16424v1 Announce Type: cross Abstract: Non-Hermitian systems host a wide range of unconventional topological phenomena while large-scale simulations in finite three dimensional systems remain challenging because of the rapidly growing number of sites. In particular, higher-order topological corner modes are often studied only in small lattices, where strong finite-size effects can mask their intrinsic behavior. Here, we develop a tensor-network framework that combines quantics tensor cross interpolation with the kernel polynomial method, enabling compact representations of large non-Hermitian tight-binding Hamiltonians and direct calculations of real-space spectral functions for systems exceeding one billion lattice sites. Using this approach, we investigate three-dimensional non-Hermitian higher-order topological insulators with with structured real-space geometries. The unprecedented system size enables direct access to the macroscopic regime and allows corner-mode spectral responses to be resolved in genuinely three-dimensional systems.By tuning the loss strength, we identify distinct in-gap corner modes across weak- and strong-loss regimes.Our results establish tensor-network algorithms as a powerful strategy to perform real-space spectral calculations in exceptionally large non-Hermitian systems.
arXiv:2606.10968v2 Announce Type: replace-cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has become standard for improving LLM reasoning. However, existing PPO-style trust-region mechanisms remain position-agnostic by enforcing uniform thresholds across all tokens independently. This pointwise treatment conflicts with autoregressive generation in two critical ways. First, uniform thresholds ignore autoregressive asymmetry. Early-stage deviations produce compounding sequence-level drift, causing static thresholds to under-regulate early divergence and excessively constrain late-stage exploration. Second, evaluating token-level divergence in isolation overlooks cumulative prefix drift, granting the same divergence allowance regardless of how far the conditioning history has already deviated from the rollout policy. To address this limitation, we propose CPPO (Cumulative Prefix-divergence Policy Optimization), a token-level masking rule that aligns updates with a finite-horizon policy-improvement bound via two coupled mechanisms. First, a position-weighted threshold imposes stricter limits at early positions whose effects persist longer, relaxing constraints for late-stage tokens. Second, a cumulative prefix budget tracks historical deviations, dynamically restricting further token-level deviation to prevent compounding errors along the prefix. Empirically, CPPO enhances training stability and significantly improves reasoning accuracy across various model scales.
arXiv:2606.20326v1 Announce Type: new Abstract: We develop QCPIKAN, the first quantum-classical physics-informed Kolmogorov-Arnold network designed to solve partial differential equations (PDEs). Built upon Chebyshev-polynomial KAN layers and parameterized quantum circuits, this hybrid framework embeds physical constraints into the training loss to enforce physical consistency. Our theoretical investigations grounded in approximation theory prove that this design accelerates high-frequency error convergence to an exponential rate and effectively mitigates numerical dispersion. We validate the framework across three typical seepage scenarios in porous media, including single-phase flow, component transport and two-phase flow. Compared with existing quantum-classical physics-informed neural networks, QCPIKAN achieves superior performance in global prediction accuracy, local error control, dynamic evolution tracking and displacement front localization. This work provides a robust and efficient alternative for solving complex PDEs.
arXiv:2606.13051v1 Announce Type: new Abstract: Despite advances in information extraction driven by deep learning and large language models, performance gaps remain in highly specialized biomedical fields, where domainspecific complexity poses challenges for generalist models. In this work, we focus on the domain of autoimmunity, where the main entities of interest are autoimmune diseases, autoantibodies (i.e., molecules that may mark or cause these diseases), their molecular targets, their location in the body, and their associated clinical signs. Herein, we present AAbAAC (AutoAntibodies and Autoimmunity Annotated Corpus), a corpus of 115 abstracts selected from PubMed, where we manually annotated entities and their relationships. First, AAbAAC was used to evaluate several methods on the task of named entity recognition (NER), and secondly, to fine-tune NER models. Our study demonstrates the utility of AAbAAC for information extraction in the domain of autoimmunity, showing expected improvement in NER performance after finetuning. This illustrates the value of small-scale annotation efforts for specialized domains and contributes to the computational study of autoimmunity. The AAbAAC corpus is available at https://github.com/f-maury/AAbAAC.
arXiv:2510.00831v2 Announce Type: replace Abstract: The increasing complexity of modern power systems, driven by the integration of inverter-based and distributed energy resources, challenges the reliability of conventional protection schemes and motivates the use of machine learning for protection tasks. However, published results are often difficult to compare because datasets, sensing assumptions, and decision horizons vary across studies. This paper presents a controlled comparison of machine learning models for fault classification (FC) and fault localization (FL) under identical sensing, timing, and validation conditions on a common electromagnetic transient dataset, using decision windows of 10-50 ms to reflect protection-relevant time scales. For FC, the best-performing nonlinear models achieve F1 scores above 0.98 already at 10 ms, while lower-capacity models degrade at shorter horizons but improve with longer windows, indicating that relevant fault-type information is already present in the earliest transient. For FL, the top-performing models reach a stable localization error of about 10 % of normalized line length across all evaluated horizons, while weaker models form a clearly separated second performance tier. Line-resolved analysis shows that localization accuracy varies across grid segments, indicating topology-dependent difficulty rather than insufficient temporal context alone. These findings provide a controlled reference for comparing machine learning models across two protection tasks with fundamentally different information requirements.
Entity state tracking is a necessary component of world modeling that requires maintaining coherent representations of entities over time. Previous work has benchmarked entity tracking performance in purely text-based tasks. We introduce MET-Bench, a multimodal entity tracking benchmark designed to evaluate the ability of vision-language models to track entity states across modalities. Using three domains, we assess how effectively current models integrate textual and image-based state updates. Our findings reveal a significant performance gap between text-based and image-based entity tracking. We empirically show this discrepancy primarily stems from deficits in visual reasoning rather than perception. We further show that explicit text-based reasoning strategies improve performance, yet limitations remain, especially in long-horizon multimodal tasks. We apply reinforcement learning to improve entity tracking in open-source VLMs. This yields substantial in-modality gains, but does not transfer robustly across input modalities. Our results highlight the need for improved multimodal representations and reasoning techniques to bridge the gap between textual and visual entity tracking.
arXiv:2606.11870v1 Announce Type: cross Abstract: Machine learning is increasingly applied to accelerate the discovery of novel materials by exploring large compositional and structural design spaces. Yet, the scarcity of high-quality data and the frequent need for out-of-distribution prediction introduce substantial uncertainty, making the assessment of model reliability essential. In this work, we investigate uncertainty quantification as a means to evaluate model confidence in the context of permanent magnet research. In a first study, we benchmark classical and modern machine learning models for predicting intrinsic magnetic properties, focusing on the quality of their uncertainty estimates. We apply Gaussian negative log-likelihood loss and dropout-based Bayesian approximation as practical strategies for estimating predictive uncertainty. In a second study, we transfer these architectural features for uncertainty estimation to a more complex task: predicting coercivity from microstructural information using a graph neural network. Together, these studies demonstrate that uncertainty quantification not only enhances the trustworthiness of predictions but is also transferable across different modeling tasks.
Blood-based biomarkers of dementia are a promising scalable tool for early diagnosis, tracking disease progression, and evaluating therapeutic efficacy. Utility of these biomarkers will not only be dependent on the reliability of their association with pathology but also contingent on their ability to track cognitive status. Previously, we demonstrated diurnal variation in several biomarkers (amyloid beta (A{beta}) 42 and 40, 42/40 ratio, glial fibrillary acidic protein (GFAP), neurofilament light (NfL), and phosphorylated-Tau 217 (p-Tau217)) which has implications for their reliability. Here, we extend these observations to a larger cohort, include brain-derived tau (BD-Tau), which is assumed to be produced exclusively in the brain, and report endocrine measures of circadian rhythmicity. We not only assessed whether these biomarkers vary with time of day, but also whether they associate with daytime function and whether these associations vary with cognitive domain and number of repeated assessments. Data collected in 20 PLWA (72.4{+/-}5.9 years, mean{+/-}SD) and 19 controls (68.9{+/-}9.8 years) were analysed. Participants completed 14 days of home monitoring and one laboratory assessment of sleep and daytime function: mood, daytime sleepiness, reaction time, immediate and delayed memory recall, everyday memory errors. During the 27-hour residential laboratory session, 3-hourly blood samples were collected and analysed for the six blood-based biomarkers of dementia as well as melatonin and cortisol. Rhythmicity of melatonin and cortisol did not differ between groups. P-Tau217 and GFAP (p
arXiv:2606.16301v1 Announce Type: new Abstract: Domain Generalization (DG) aims to train models that generalize to unseen target domains but often overfit to domain-specific features, known as undesired correlations. Gradient-based DG methods typically guide gradients in a dominant direction but often inadvertently reinforce spurious correlations. Recent work has employed dropout to regularize overconfident parameters, but has not explicitly adjusted gradient alignment or ensured balanced parameter updates. We propose GENIE (Generalization-ENhancing Iterative Equalizer), a novel optimizer that leverages the One-Step Generalization Ratio (OSGR) to quantify each parameter's contribution to loss reduction and assess gradient alignment. By dynamically equalizing OSGR via a preconditioning factor, GENIE prevents a small subset of parameters from dominating optimization, thereby promoting domain-invariant feature learning. Theoretically, GENIE balances convergence contribution and gradient alignment among parameters, achieving higher OSGR while retaining SGD's convergence rate. Empirically, it outperforms existing optimizers and enhances performance when integrated with various DG and single-DG methods.
arXiv:2606.16639v1 Announce Type: new Abstract: Multimodal learning exploits complementary information across heterogeneous modalities. The informativeness of each modality can vary widely across samples and training stages. Existing multimodal curriculum learning strategies often assume that the relative complexity of samples remains unchanged throughout training and therefore cannot adapt to model evolution. We propose SPICE (Synergy and Partial Information based Curriculum Evolution), a novel progressive curriculum framework for multimodal interaction learning. Guided by Partial Information Decomposition (PID) theory, our approach decomposes multimodal interactions into redundant, unique, and synergistic information components, enabling an interpretable and dynamic characterization of sample complexity. Building on this decomposition, we design a progressive curriculum that evolves throughout training, allowing the model to transition from learning shared cross-modal cues to modality-specific patterns and, finally, to complex synergistic interactions. Adapting to model evolution, sample ordering is refined in real-time using PID information estimates derived from unimodal and multimodal predictions. Experiments across multiple multimodal benchmarks demonstrate consistent improvements over conventional training and state-of-the-art baselines, highlighting the effectiveness of PID information decomposition and adaptive sample ordering for multimodal curriculum learning.
Vision-Language Models (VLMs) hallucinate objects that are not present, and a growing line of work tries to curb this by feeding the model its own generated caption as auxiliary evidence – assuming that a caption, once available, is something to consume. We show this fails: naively appending a caption can lower accuracy rather than raise it, dropping Qwen2.5-VL-3B$^\dagger$ on HallusionBench by nearly ten points. To understand why, we build GD-Probe, a diagnostic set that pairs a global and a detail question on the same image, so that any difference in caption effect is attributable to the question alone. Caption utility proves to be a per-query property: the same caption helps global questions and harms detail ones, through a single mechanism – an embedded caption competes with the image for attention and pulls the model's evidence onto its own text – whose sign is set by whether the caption covers the queried content. Crucially, this regime is readable from quantities the decoder already emits, with no attention access or grounding. We turn this into GEASS (Gated Evidence-Adaptive Selective Caption Trust), a training-free, logit-level module that decides per query how much of the caption to trust, gating it by the clean path's confidence, weighting it by the entropy reduction it induces, and raising the evidence bar when the two pathways disagree. Across four VLMs and two benchmarks (POPE and HallusionBench), GEASS improves over both vanilla inference and contrastive decoding under a single fixed setting, adding only two forward passes and no parameters.
Migratory cells tend to have soft nuclei that deform and penetrate narrow spaces1,2. Extensive nuclear deformation during migration can cause nuclear-envelope rupture and DNA damage in cancer cells, which may contribute to malignant transformation during tumour progression3–6. However, the importance of DNA damage in physiological migration is less well understood. Here we demonstrate that the migration of neurons in developing cerebral and cerebellar cortices is accompanied by massive DNA double-stranded breaks (DSBs) due to mechanostress during passage through narrow interstitial spaces. In contrast to many other migratory cells, these DSBs occur without detectable nuclear envelope rupture. Confined migration increases topoisomerase-IIβ covalently bound DSBs, and these lesions are repaired through non-homologous end-joining during brain development without causing cell death. Genome sequencing revealed that DSBs tend to occur at transcriptionally inactive regions. The deletion of ligase IV at the onset of neuronal migration leads to persistent DSB accumulation in cerebellar neurons with moderate transcriptional changes in genes related to synaptic function, neuronal development and stress and immune responses. The mutant mouse develops mild motor deficits in later life, suggesting that the DNA damage generated during normal brain development poses a potential disease risk if left unrepaired. The migration of neurons in developing cerebral and cerebellar cortices is accompanied by massive DNA double-strand breaks due to mechanostress during passage through narrow interstitial spaces.
arXiv:2603.19423v2 Announce Type: replace-cross Abstract: Large language model (LLM) agents increasingly rely on external tools (file operations, API calls, database transactions) to autonomously complete complex multi-step tasks. Practitioners deploy defense-trained models to protect against prompt injection attacks that manipulate agent behavior through malicious observations or retrieved content. We reveal a fundamental capability-alignment paradox: defense training designed to improve safety systematically destroys agent competence while failing to prevent sophisticated attacks. Evaluating defended models against undefended baselines across 97 agent tasks and 1,000 adversarial prompts, we uncover three systematic biases unique to multi-step agents. Agent incompetence bias manifests as immediate tool execution breakdown, with models refusing or generating invalid actions on benign tasks before observing any external content. Cascade amplification bias causes early failures to propagate through retry loops, pushing defended models to timeout on 99\% of tasks compared to 13\% for baselines. Trigger bias leads to paradoxical security degradation where defended models perform worse than undefended baselines while straightforward attacks bypass defenses at high rates. Root cause analysis reveals these biases stem from shortcut learning: models overfit to surface attack patterns rather than semantic threat understanding, evidenced by extreme variance in defense effectiveness across attack categories. Our findings demonstrate that current defense paradigms optimize for single-turn refusal benchmarks while rendering multi-step agents fundamentally unreliable, necessitating new approaches that preserve tool execution competence under adversarial conditions.
Background Accurate assessment of the QT interval is challenging in the presence of QRS prolongation, such as during ventricular pacing or bundle branch block. Current correction methods are heterogeneous and lack consensus. To evaluate the relationship between QRS duration and QT interval during ventricular pacing and to develop a practical correction method for QT assessment. Methods In this prospective single-centre study, 94 patients undergoing electrophysiology study for supraventricular tachycardia were included. Standardised pacing was performed at the same cycle length from the right ventricular (RV) apex, high output and low output pacing from His catheter, and coronary sinus (reference). QRS and QT intervals were measured from 12-lead ECGs. Changes in QT (QT) and QRS duration (QRS) were analysed using linear regression and mixed-effects modelling. QT correction formulas of the form QT corrected = QT N x QRS were evaluated using Bland-Altman analysis across multiple coefficients. Results A significant positive correlation between QRS and QT was observed across all pacing sites (r = 0.52-0.74, p < 0.001). In mixed-effects modelling, QRS was a strong independent predictor of QT (0.59, p < 0.001), with no significant interaction between pacing site and QRS, supporting a consistent relationship across pacing locations. Bland-Altman analysis demonstrated that correction coefficients of 0.65-0.70 minimised systematic bias compared with lower coefficients, with similar precision across models (SD 16 ms) and no evidence of proportional bias. A coefficient of 0.65 provided the most balanced performance between bias and variability. Conclusion QT prolongation during ventricular pacing is primarily driven by QRS widening and follows a consistent linear relationship across pacing sites. A simple correction using QT corrected = QT 0.65 x (QRS 100 ms) provides a practical and accurate method for QT assessment, with potential clinical applicability in patients with conduction abnormalities or ventricular pacing.