Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-15

SinGeo: Unlock Single Model's Potential for Robust Cross-View Geo-Localization

Robust cross-view geo-localization (CVGL) remains challenging despite the surge in recent progress. Existing methods still rely on field-of-view (FoV)-specific training paradigms, where models are optimized under a fixed FoV but collapse when tested on unseen FoVs and unknown orientations. This limitation necessitates deploying multiple models to cover diverse variations. Although studies have explored dynamic FoV training by simply randomizing FoVs, they failed to achieve robustness across diverse conditions – implicitly assuming all FoVs are equally difficult. To address this gap, we present SinGeo, a simple yet powerful framework that enables a single model to realize robust cross-view geo-localization without additional modules or explicit transformations. SinGeo employs a dual discriminative learning architecture that enhances intra-view discriminability within both ground and satellite branches, and is the first to introduce a curriculum learning strategy to achieve robust CVGL. Extensive evaluations on four benchmark datasets reveal that SinGeo sets state-of-the-art (SOTA) results under diverse conditions, and notably outperforms methods specifically trained for extreme FoVs. Beyond superior performance, SinGeo also exhibits cross-architecture transferability. Furthermore, we propose a consistency evaluation method to quantitatively assess model stability under varying views, providing an explainable perspective for understanding and advancing robustness in future CVGL research. Codes will be available upon acceptance.

02.
PLOS Computational Biology 2026-06-22

Ten simple rules for making the supplement increase your paper’s impact

Authors:

by Volker Grimm, Uta Berger, Stefano Mammola Have you ever lost hours navigating supplementary materials—clicking between the main text and dozens of auxiliary files only to encounter broken links, illegible figures, and undefined variables and acronyms? If so, you’re not alone. What should support scientific communication has instead become an obstacle: supplementary information (SI) increasingly suffers from inconsistent formatting, poor accessibility, and fragmented organization that impedes rather than advances understanding. This is disheartening since the SI, if used effectively, has the power to enhance transparency, credibility, and reproducibility of research. Therefore, we propose 10 simple rules to help authors design SI that genuinely increase the impact of their research. The rules emphasize treating SI with the same care as the main text, using it strategically to support the scientific narrative while preserving clarity and focus. Key recommendations include creating a single, well-structured, self-contained SI master document; ensuring explicit cross-referencing between the main text and SI; making SI machine-readable; and avoiding the misuse of SI as a substitute for proper data repositories. We also highlight the importance of creativity in choosing appropriate formats and strict adherence to journal-specific guidelines. Finally, when available, we advocate the use of standardized templates to improve consistency, readability, and reuse across studies. By following these rules, authors can substantially increase the scientific impact of their work while at the same time contributing to more sustainable research practices.

03.
arXiv (math.PR) 2026-06-19

Power-law hypothesis and (un)fairness of PageRank on undirected multi-type PAMs

arXiv:2606.19583v1 Announce Type: new Abstract: The preferential attachment model (PAM) describes the sequential growth of a network based on the "rich-get-richer" principle. Several versions of it have become established for modeling, e.g., citation networks, capturing a power-law degree distribution. Directed versions of the preferential attachment model where the edges are directed from the new to the old vertices have been the subject of extensive research. They have been shown to exhibit remarkable properties such as heavier tails for the limiting graph-normalized PageRank than for the in-degrees. By contrast, for the undirected version, we recently showed that PageRank has similar tails as the degree. In the present paper, we discuss the PageRank asymptotics for a multi-type version of the undirected PAM (here vertices have different colors), complementing previous results of Antunes, Bhamidi, Banerjee and Pipiras on the asymptotics of PageRank on similar directed multi-type or colored PAMs. Our studies are motivated by the aim to go beyond the rigid rule of edge orientation in directed preferential attachment models. As the main result, for the case of a finite set of colors, we show that the power-law hypothesis for PageRank is fulfilled also for the colored undirected PAM, where, by contrast to the directed case, the power-law exponent is color-dependent for some choices of the initial color distribution and the attractiveness function. For the specific case of a two-type model, we discuss implications of our results on fairness in sampling underrepresented nodes from the network.

04.
arXiv (quant-ph) 2026-06-19

Progress on the Kretschmann-Schlingemann-Werner Conjecture

arXiv:2308.15389v4 Announce Type: replace Abstract: Given any pair of quantum channels $\Phi_1,\Phi_2$ such that at least one of them has Kraus rank one, as well as any respective Stinespring isometries $V_1,V_2$, we prove that there exists a unitary $U$ on the environment such that $\|V_1-({\bf1}\otimes U)V_2\|_\infty\leq\sqrt{2\|\Phi_1-\Phi_2\|_\diamond}$. Moreover, we provide a simple example which shows that the factor $\sqrt2$ on the right-hand side is optimal, and we conjecture that this inequality holds for every pair of channels.

05.
arXiv (CS.CL) 2026-06-15

SANA: What Matters for QA Agents over Massive Data Lakes?

Exploratory question answering (EQA) over data lakes requires an LLM agent to discover relevant sources, analyze retrieved data, and adapt its actions based on intermediate results. End-to-end accuracy alone cannot distinguish failures in search, planning, data analysis, or the agent's Action Policy: its decisions about what to do next and when to submit an answer. We present SANA (Search Agent Navigation Ablation framework), a diagnostic ablation framework that transforms EQA tasks into runtime profiles containing gold source sequence, sanitized subquestions, and execution records. SANA uses these profiles to construct idealized search, planning, and data-analysis tools, allowing each component to be ablated; the residual gap is diagnostic evidence for policy failures. To illustrate SANA as a reusable evaluation framework, we adapted two recent EQA benchmarks, LakeQA and KramaBench, and evaluated lightweight and mid-sized agents under fixed prompts, budgets, data lakes, and runtimes. Across both benchmarks, data analysis is a consistent bottleneck while planning is less so. Search is a major limitation in LakeQA's large data-lake setting, but less so for the smaller-scale KramaBench. SANA thus deconstructs end-to-end task accuracies into a diagnosis of where data-lake agents fail, and allows for systematic comparisons of progress in search, planning, data analysis, and agent design.

06.
arXiv (CS.LG) 2026-06-19

Flow Map Denoisers: Traversing the Distortion-Perception Plane for Inverse Problems

arXiv:2606.19802v1 Announce Type: new Abstract: Image restoration faces a fundamental tradeoff: methods that minimize error produce blurry reconstructions, while those that maximize perceptual quality yield sharp but less faithful images. Existing approaches either commit to a single operating point on this distortion perception (DP) frontier or require paired-data supervision, auxiliary models, or hyperparameter tuning of the sampler to access different points. We show that flow map models, a recent extension of flow matching for few-step sampling that learns an average field, implicitly define a one-parameter family of denoisers that continuously spans the DP frontier. The lookahead parameter t acts as a control knob between the MMSE and perceptual regimes. For Gaussian targets, we prove that varying t exactly recovers the optimal DP frontier; for natural images, we observe similar behavior empirically. Within a Plug-and-Play solver, the same mechanism extends to general inverse problems, where it controls a tradeoff between perceptual alignment and data consistency. Despite the lack of exact optimality guarantees in this setting, a single trained flow map spans the DP tradeoff, matching or exceeding specialized baselines at both extremes. Extensive experiments on CelebA ($128\times 128$) and AFHQ ($256\times 256$) across several linear and nonlinear inverse tasks validate our findings.

07.
arXiv (CS.CV) 2026-06-18

APT: Atomic Physical Transitions for Causal Video-Language Understanding

Physical events are not understood by their names alone, but by the causal state changes that compose them. A clip-level label such as "bounce" can be correct while hiding the process that makes the event physically valid, from support loss and contact onset to rebound and settling. To make this hidden process explicit, we introduce Atomic Physical Transitions (APTs): minimal, temporally localized state changes that bind a visible cue to an active physical mechanism and before/after dynamical regimes. An APT chain represents a video as an ordered causal transition sequence rather than a single aggregate event label: event labels tell what happened; APT chains explain why it happened. To make APTs learnable by VLMs, we construct mixed-source APT data from human annotations and simulator ground truth, covering 14 transition types across contact, gravity, friction, and rotation/stability, with 27,303 timed instances over 1,246 trials. Using this data, we find that current VLMs miss transition-level physics, with zero-shot recall at most 14% and errors dominated by missed transitions. Direct fine-tuning on APT chains improves transition detection but causes event-level forgetting, indicating that the model learns a specialized answer format rather than a reusable physical representation. We therefore propose APT-Tune, a parameter-efficient recipe that teaches VLMs to use causal transitions without forgetting how to answer video questions. It combines image-pad-aware supervision, format-conditional co-training, and mechanism-conditioned domain-to-type decoding to make APT learning format-robust and physically grounded. With only 11 M LoRA parameters on Qwen3-VL-2B, APT-Tune substantially improves APT recall while also improving event-level video transfer. These results show that APTs are not a new answer format, but a human-aligned causal supervision signal for physical video understanding.

08.
arXiv (CS.LG) 2026-06-11

RePAIR: Predictive Self-Supervised Representation Learning in Chess

arXiv:2606.11860v1 Announce Type: new Abstract: In this paper, we introduce Representation Prediction via Autoencoding using Iterative Refinement (RePAIR) - a novel self-supervised representation learning architecture that synthesizes Masked Autoencoders (MAE), Joint Embedding Predictive Architectures (JEPA), and Bidirectional Encoder Representations from Transformers (BERT). We demonstrate how it can be used to encode objects in sequential data like consecutive chess positions into compact yet meaningful representations. The basic principle of the architecture is to mask large portions of a sequence of latent states, similar to BERT and MAE. Then, we apply a lightweight Predictor to the latent representations that repairs gaps in the sequence in a lower-dimensional embedding space akin to JEPA. Our experiments in the domain of chess show that the Encoder refines the board representations such that meaningful chess concepts emerge clustered in the latent space. Furthermore, reconstructions of the masked board states show that the model is able to reason about the piece movements without relying on costly reinforcement learning methods. Lastly, we find that the resulting representation space allows for quick and intuitive dissections of chess games by observing the game path trajectories in this semantically rich space.

09.
PLOS Medicine 2026-05-29

Availability, appeal, and addictiveness by design: Tobacco and nicotine industry deliberate targeting of youth

Authors:

by Raglan Maddox, Becky Freeman, Charlotta Pisinger, Emily Banks Contemporary tobacco and nicotine products, particularly e-cigarettes, are deliberately designed, marketed, and distributed to maximize youth appeal, uptake, dependence, and use. Youth uptake is a predictable outcome of systems designed to maximize product availability, appeal, and addictiveness. In recognition of the World No Tobacco Day 2026 theme, "unmasking the appeal", this Perspective by Raglan Maddox and colleagues discusses how tobacco and nicotine products, particularly e-cigarettes, are deliberately designed and marketed to maximize youth appeal, and highlight the need for policies to ensure greater industry accountability and to tackle concerning uptake trends.

10.
arXiv (quant-ph) 2026-06-15

An integrated ultrahigh vacuum cluster tool for diamond surface science and single nitrogen-vacancy center measurements

arXiv:2606.13961v1 Announce Type: new Abstract: We present a custom-designed ultrahigh vacuum (UHV) cluster tool developed for studying shallow nitrogen-vacancy (NV) centers in diamond, enabling in situ diamond surface preparation, characterization, and single NV center dynamics measurements within a single connected platform. The system combines a surface science chamber for controlled surface modification and analysis with a cryogenic confocal microscope chamber dedicated to NV spin and optical measurements. This integrated approach enables a direct correlation between diamond surface chemistry and the resulting NV spin and charge properties. The instrument provides a versatile platform for systematic studies of surface-induced decoherence mechanisms and charge dynamics for shallow NV centers, and establishes a pathway toward reproducible surface engineering for quantum sensing applications.

11.
arXiv (CS.CV) 2026-06-18

Multi-Class Brain Tumor Classification Using Advanced Deep Learning Models: A Comparative Study

Despite recent advancements in deep learning, accurately classifying brain tumors from MRI images continues to pose challenges. In this research, we present a comprehensive evaluation of five different convolutional neural networks (CNN) architectures, including a customized baseline model and four pre-trained models - for use in classifying multi-class brain tumors using a clinically-sourced dataset of approximately 10,000 MRI images. We have utilized five different architectures; VGG16, VGG19, DenseNet121, and EfficientNetB0, which were all tested and trained within an identical experimental framework. Performance was measured by both overall accuracy and tumor-wise recall as a means to measure the clinically-relevant performance of each architecture. We found that EfficientNetB0 had the best overall classification accuracy at 95%, when compared to the other architectures tested; specifically VGG16 (94.37%), VGG19 (92.29%), DenseNet121 (90.91%) and the customized CNN (78.00%). An especially important finding of our research was the considerable improvement in detecting meningiomas; specifically, while simple CNNs could detect meningiomas with a recall rate of approximately 20%, EfficientNetB0 was able to detect meningiomas with a recall rate of 89%. Meningiomas are often difficult to detect because they can appear very subtly on MRI images. Additionally, an interesting finding was that the deeper VGG19 performed worse than the shallower VGG16. This indicates that in many cases the architectural efficiency of a CNN model may be more important than its depth when working with medical images. Overall, EfficientNetB0 appears to provide the optimal trade-off between classification accuracy, number of parameters used in the model and clinically meaningful performance.

12.
arXiv (CS.CV) 2026-06-16

Improved Knowledge Distillation for Land-Use Image Classification

In the present article, an improved Knowledge Distillation (KD) framework has been proposed for efficient compression of deep convolutional neural networks for land-use image classification task. Motivated by the need to achieve competitive classification accuracy while reducing computational complexity, a teacher-student learning paradigm is adopted in which a VGG16 network transfers knowledge to a lightweight MobileNetV2 model. The proposed framework integrates hard supervision from ground truth labels with a soft supervision strategy that combines Kullback-Leibler divergence and Cosine Similarity losses. Experiments conducted on three land-use datasets show that the proposed KD-based method yields improved performance, and achieves an accuracy of 99.04%, outperforming both baseline student training and single-loss distillation approaches, while retaining substantial model compression.

13.
arXiv (CS.CL) 2026-06-12

G-Long: Graph-Enhanced Memory Management for Efficient Long-Term Dialogue Agents

While Large Language Models (LLMs) have advanced open-domain dialogue systems, maintaining long-term consistency remains a challenge due to inherent limitations in long-context reasoning and the inefficiency of processing extensive raw text. Existing approaches typically rely on either unstructured memory storage, which is prone to information loss, or computationally expensive LLMs that incur high latency. To address these limitations, we propose G-Long, a graph-enhanced framework that utilizes a fine-tuned small Language Model (sLM) for structured triplet extraction and associative retrieval, significantly reducing operational costs. Furthermore, we introduce the novel attention-aware importance scoring mechanism that leverages the intrinsic cross-attention signals of a T5 summarizer to identify salient memories. Extensive experiments across diverse benchmarks demonstrate that G-Long achieves state-of-the-art performance in both response generation and memory retrieval, yielding performance gains of up to 9.8% in response quality on MSC and 40.8% in retrieval recall on LME, while significantly minimizing computational overhead.

14.
arXiv (CS.CL) 2026-06-16

From Affect Prediction to Affect Forecasting: Evidence for Distinct Information Sources in Longitudinal Text

Modeling dimensional affect in longitudinal text requires distinguishing current affect estimation from future affective change forecasting. Existing approaches often treat each text as an independent observation and apply similar assumptions to both tasks, without testing whether they rely on different information sources. This paper investigates that distinction using longitudinal self-reported ecological essays and feeling-word entries. We propose the Trait–State Affective Prediction (TSAP) framework and its temporal extension E-TSAP for per-text valence and arousal prediction, evaluated on a held-out prediction test set of 1,737 entries from 91 users. We further propose the Affective Change Forecaster Hybrid (ACF-Hybrid) for next-step affective change forecasting, evaluated on a held-out forecasting test set of 46 users. For prediction, E-TSAP achieves composite Pearson correlations of 0.670 for valence and 0.449 for arousal. For forecasting, textual representations perform worse than compact numeric trajectory baselines: the text-inclusive model achieves only r=0.316 for valence and r=0.284 for arousal, whereas a simple prior-state baseline reaches r=0.615 and r=0.670, respectively. ACF-Hybrid, using dimension-specific numeric trajectory features, achieves r=0.659 for valence and $r=0.658$ for arousal. These results show that textual semantics support current affect prediction, whereas future affective change is better captured through prior numeric trajectory dynamics.

15.
arXiv (CS.CV) 2026-06-17

Colab NAS: Obtaining lightweight task-specific convolutional neural networks following Occam's razor

The current trend of applying transfer learning from convolutional neural networks (CNNs) trained on large datasets can be an overkill when the target application is a custom and delimited problem, with enough data to train a network from scratch. On the other hand, the training of custom and lighter CNNs requires expertise, in the from-scratch case, and or high-end resources, as in the case of hardware-aware neural architecture search (HW NAS), limiting access to the technology by non-habitual NN developers. For this reason, we present ColabNAS, an affordable HW NAS technique for producing lightweight task-specific CNNs. Its novel derivative-free search strategy, inspired by Occam's razor, allows to obtain state-of-the-art results on the Visual Wake Word dataset, a standard TinyML benchmark, in just 3.1 GPU hours using free online GPU services such as Google Colaboratory and Kaggle Kernel.

16.
arXiv (CS.LG) 2026-06-11

Seeing Below the Limit of Detection: A Censored-Poisson Bayesian Latent-Growth Change-Point Detector (the Span Detector) for Serial ctDNA in HR+/HER2- Metastatic Breast Cancer

arXiv:2606.11876v1 Announce Type: cross Abstract: Circulating-tumour DNA (ctDNA) carries evidence of drug resistance months before imaging shows it, but the earliest evidence lives below the assay's limit of detection (LoD): a nascent subclone is detected only intermittently, producing a flickering sequence of faint detects and non-detects. Commercial liquid biopsies treat each draw as an independent snapshot and a non-detect as nothing. We argue a non-detect is a left-censored observation, and the pattern of non-detects and faint detects over time carries actionable evidence of growth before any single value is trustworthy. We introduce Span, a censored-Poisson Bayesian latent-growth change-point detector that models the binary detection process, accumulates a sequential generalised-likelihood-ratio statistic for an upward change-point in the per-variant detection rate, and raises a competing-risks alarm with calibrated false-alarm control. Span has no learned weights, so there is nothing to overfit. On a synthetic cohort of HR+/HER2- metastatic breast cancer on first-line CDK4/6-inhibitor plus endocrine therapy, at a matched 10% false-alarm rate, Span roughly doubles the fraction of impending progressions caught three months ahead (indolent regime: 25% vs 11% for the snapshot), with a falsifiable dose-response: large for indolent emergence, vanishing for fast emergence. A value-trajectory baseline performs identically to the snapshot, isolating the gain to the censored detection model. The survival backbone matches a Cox baseline on real breast-cancer data (GBSG-2, n=686; C-index 0.67 vs 0.68), and on a real longitudinal cohort with clean biomarkers (PBC2, n=312) the same pipeline correctly declines to win, a falsifiable boundary test confirming the mechanism is regime-specific. All ctDNA trajectories are synthetic.

17.
arXiv (quant-ph) 2026-06-11

Fun with Graph States: Nonlocal Bell Pairs and the Arf Invariant

arXiv:2606.06582v2 Announce Type: replace Abstract: We study inner products and partial amplitudes of graph states–a commonly employed class of quantum states, which are specified by graphs. We find that the magnitudes of these quantities are simply related to the rank of the adjacency matrix of the graph over F_2 while the phase is determined by the Arf invariant of its quadratic refinement. These facts motivate a nonlocal tensor factorization of the Hilbert space, with respect to which all graph states are products of Bell pairs with unentangled ancillae. These results may illuminate the quantum advantage in the framework of Measurement-Based Quantum Computation and suggest that graph states can be usefully visualized in the language of algebraic topology. In addition, we develop a specialized technique for computing expectation values of qubit-wise permutations in graph states, which is useful for calculating multi-invariants.

18.
medRxiv (Medicine) 2026-06-17

Silent Manipulation of Mental Health Treatment Recommendations from a Large Language Model

Authors:

Importance. Large language models (LLMs) increasingly inform mental health decisions by patients and clinicians. Inference-time activation steering can shift model behavior on a target dimension without altering weights or prompts and without disclosure to users, allowing treatment recommendations to be silently changed for commercial or ideological reasons. Objective. To determine whether directional activation steering can shift an open-weights LLM's depression treatment recommendations. Design, Setting, and Participants. This non-human subjects study applied directional activation steering to an open-weights LLM (DeepSeek V4 Flash) responding to 12 depression-advice scenarios (4 favoring medication, 4 favoring avoidance, 4 neutral), generated at 30 amplitudes from -1.5 to +1.5 in 0.1 increments plus an unsteered baseline. Exposures. A single steering direction contrasting antidepressant medication with self-directed approaches (diet, exercise, meditation, dietary supplements), constructed from 16 paired training prompts and applied at the attention output of every transformer block; weights and system prompt were held constant. Main Outcomes and Measures. The extent to which medication and four self-care categories were addressed, scored 0 to 3 by a human-validated LLM rater (Claude Opus 4.7), the medication-versus-self-care balance, and clinician referral, estimated per unit of amplitude using mixed-effects models with a scenario random intercept. Results. Across 372 generations, steering produced a graded, dose-dependent shift in the medication-versus-self-care balance, which declined by 0.32 per unit of amplitude (beta=-0.32; 95% CI, -0.39 to -0.25; P < .001); medication extent fell and self-care extent rose. The shift was largest for scenarios with no stated treatment preference (beta = -0.44; 95% CI, -0.54 to -0.34; P < .001). A clinician referral appeared in 322 of 372 responses (87%) and did not vary with steering amplitude (P = .63). Conclusions and Relevance. In this open-weights LLM providing depression treatment information, inference-time activation steering shifted treatment recommendations without altering weights, prompt structure, or safety outputs, with the largest effect among users expressing no treatment preference. These findings suggest a need for LLM disclosure standards and independent auditing as such models inform clinical decisions.

19.
arXiv (CS.LG) 2026-06-15

EqCollide: Equivariant and Collision-Aware Deformable Objects Neural Simulator

arXiv:2506.05797v2 Announce Type: replace Abstract: Simulating collisions of deformable objects is a fundamental yet challenging task due to the complexity of modeling solid mechanics and multi-body interactions. Existing data-driven methods often suffer from lack of equivariance to physical symmetries, inadequate handling of collisions, and limited scalability. Here we introduce \name, the first end-to-end equivariant neural fields simulator for deformable objects and their collisions. We propose an equivariant encoder to map object geometry and velocity into latent control points. A subsequent equivariant Graph Neural Network-based Neural Ordinary Differential Equation models the interactions among control points via collision-aware message passing. To reconstruct velocity fields, we query a neural field conditioned on control point features, enabling continuous and resolution-independent motion predictions. Experimental results on 2D and 3D scenarios show that \name achieves accurate, stable, and scalable simulations across diverse object configurations. It achieves $24.34\%$ to $57.62\%$ lower rollout MSE, even compared with the best-performing baseline model. Furthermore, \name could generalize to more colliding objects and extended temporal horizons, and stay robust to input transformed with group action. Code is available at: https://github.com/AI4Science-WestlakeU/EqCollide

20.
arXiv (CS.AI) 2026-06-11

Erased but Not Forgotten: How Backdoors Compromise Concept Erasure

arXiv:2504.21072v3 Announce Type: replace-cross Abstract: The expansion of text-to-image diffusion models has raised concerns about harmful outputs, from fabricated depictions of public figures to sexually explicit imagery. To mitigate such risks, prior work has proposed concept erasure methods that aim to sever unwanted concepts from the model via fine-tuning, yet it remains unclear whether these approaches truly remove all links to the harmful concept or merely conceal superficial connections. In this work, we reveal a critical vulnerability, the Erasure Evasion Backdoor (EEB): an adversary binds a backdoor trigger to a concept slated for removal, and this malicious link survives subsequent erasure. We show that both black-box and white-box adversaries can instantiate this threat. Across six state-of-the-art erasure methods, including robust ones that explicitly search for alternative representations of the target concept, EEB consistently exposes harmful content: up to 82% success against celebrity-identity unlearning, up to 94% for object erasure, and up to 16 times amplification of explicit-content exposure. While EEB uncovers a blind spot in current erasure methods, it also provides a diagnostic tool for stress-testing future concept erasure techniques.

21.
arXiv (CS.AI) 2026-06-19

The Autonomy Tax: Defense Training Breaks LLM Agents

arXiv:2603.19423v2 Announce Type: replace-cross Abstract: Large language model (LLM) agents increasingly rely on external tools (file operations, API calls, database transactions) to autonomously complete complex multi-step tasks. Practitioners deploy defense-trained models to protect against prompt injection attacks that manipulate agent behavior through malicious observations or retrieved content. We reveal a fundamental capability-alignment paradox: defense training designed to improve safety systematically destroys agent competence while failing to prevent sophisticated attacks. Evaluating defended models against undefended baselines across 97 agent tasks and 1,000 adversarial prompts, we uncover three systematic biases unique to multi-step agents. Agent incompetence bias manifests as immediate tool execution breakdown, with models refusing or generating invalid actions on benign tasks before observing any external content. Cascade amplification bias causes early failures to propagate through retry loops, pushing defended models to timeout on 99\% of tasks compared to 13\% for baselines. Trigger bias leads to paradoxical security degradation where defended models perform worse than undefended baselines while straightforward attacks bypass defenses at high rates. Root cause analysis reveals these biases stem from shortcut learning: models overfit to surface attack patterns rather than semantic threat understanding, evidenced by extreme variance in defense effectiveness across attack categories. Our findings demonstrate that current defense paradigms optimize for single-turn refusal benchmarks while rendering multi-step agents fundamentally unreliable, necessitating new approaches that preserve tool execution competence under adversarial conditions.

22.
arXiv (CS.LG) 2026-06-19

HEPTv2: End-to-End Efficient Point Transformer for Charged Particle Reconstruction

arXiv:2606.20437v1 Announce Type: cross Abstract: Charged-particle tracking – reconstructing trajectories from sparse detector measurements – is a fundamental high-energy-physics inference problem and a canonical example of learning under extreme combinatorial ambiguity. At the High-Luminosity Large Hadron Collider (HL-LHC), tracking must remain accurate and efficient despite unprecedented collision densities. Graph neural networks perform strongly, but incur substantial costs from graph construction and processing, while transformer-based approaches rely on auxiliary stages that prevent end-to-end optimization. To address this, we present HEPTv2, an end-to-end point-transformer architecture that reconstructs tracks from detector hits in one trainable pipeline. HEPTv2 combines a locality-aware point encoder with a track decoder that predicts complete trajectories without graph-building, clustering, or filtering. The encoder uses locality-sensitive hashing in detector coordinate space to preserve tracking-relevant geometry while enabling efficient local attention. The decoder resolves ambiguities through sectorized decoding and direct hit-to-track prediction under joint encoder-decoder supervision, allowing the full pipeline to be optimized end-to-end. On TrackML, HEPTv2 achieves 98.6% double-majority tracking efficiency at a 0.8% fake rate, while requiring only $\sim$15~ms inference time and 0.4~GB peak memory per event on a NVIDIA A100 GPU. Latency and memory scale approximately linearly for events with up to $5\times10^5$ hits. HEPTv2 establishes a new state of the art in the accuracy-latency trade-off, improving efficiency by 4.5% over the strongest prior transformer and by 1.1–2.2% over optimized graph-based pipelines, while reducing latency by factors of 7 and 38–52, respectively. These results show end-to-end transformers can deliver the accuracy and efficiency required for real-time particle reconstruction at the HL-LHC.

23.
arXiv (CS.CL) 2026-06-16

Simplifying the Modeling of Arbitrary Conditionals in Natural Language

Causal Transformers model sequences through an autoregressive factorization of the joint distribution, which enables efficient left-to-right decoding and conditional likelihood computation. However, they cannot tractably sample from or evaluate arbitrary conditionals – e.g., a block of text conditioned on past and future tokens. Recent work aims to solve this problem through novel architectures, but they often lead to sub-optimal modeling of such conditionals and degraded generations. We propose Arbitrary Conditionals GPT (AC-GPT) which introduces a simple modification to standard causal Transformers to enable evaluating and sampling from arbitrary conditionals – including past, future, and mixed contexts – within a single forward pass. Unlike prior approaches, our method preserves the standard left-to-right ordering and next-token prediction objective essential for both strong performance and efficient training on natural language. Crucially, this compatibility allows existing LLMs to be fine-tuned for arbitrary conditioning. Our empirical results indicate that our method outperforms baselines on modeling arbitrary conditionals, without degrading standard left-to-right performance.

24.
arXiv (CS.LG) 2026-06-19

Computational Methods and Challenges in Cell-Free DNA Analysis for Multi-Cancer Early Detection

arXiv:2606.20174v1 Announce Type: new Abstract: Cell-free DNA (cfDNA) is a promising avenue for non-invasive multicancer early detection (MCED), in that, it can enable multiple cancer detection simultaneously from a single blood draw, with particular sensitivity to cancers that currently lack established screening programs. Here we review the computational methods developed between 2022 and 2025 for cfDNA-based MCED. We focus on how fragmentomics and epigenetic features are extracted and analyzed to detect cancer at early stages. We first briefly outline the biological basis of cfDNA signals, then review classical statistical and machine learning approaches alongside deep learning frameworks including autoencoder-based models. For each method we discuss biological interpretability, validation strategy, and readiness for clinical integration. Furthermore, we categorize the current challenges into technical, computational, and methodological while outlining open problems in the field. This review shows that multimodal ensemble approaches have the strongest promise for clinical integration and the highest readiness. However, for better assessment of future work and side-by-side comparison, standardization of evaluation protocols and reporting results will be crucial.

25.
arXiv (CS.CV) 2026-06-15

ShearFuse-UNet: Hadamard, DCT, and Shearlet Transform Fusion for Next-Day Wildfire Spread Prediction

We propose ShearFuse-UNet, a lightweight and computationally efficient deep learning model for next-day wildfire spread prediction from multi-modal satellite data. The model integrates three complementary transform-domain branches inside each encoder block of a U-Net backbone: a 2D Fast Walsh-Hadamard Transform (WHT) branch, a 2D Discrete Cosine Transform (DCT) branch, and a cone-adapted digital Shearlet residual branch. The WHT and DCT branches establish orthogonal latent spaces with learnable spectral scaling and fixed soft-thresholding, while the Shearlet branch provides anisotropic, multi-directional feature decomposition that explicitly encodes the elongated edge structures characteristic of fire fronts. A learned SpectralFusion gate adaptively combines the WHT and DCT responses, and the Shearlet reconstruction is added as a residual. This three-branch design bears a loose structural analogy to transformer self-attention: the WHT and DCT branches provide complementary spectral representations that are adaptively fused, while the Shearlet branch contributes directional content through a residual pathway. Unlike self-attention, the proposed design relies on fixed mathematical transforms rather than learned projection operators, reducing parameter count and computational cost. Evaluated on the WildfireSpreadTS dataset, ShearFuse-UNet achieves an F1 score of 0.596 with only 267k parameters, outperforming a ResNet18-based U-Net (14M parameters, F1 = 0.589) and demonstrating a highly favorable accuracy-efficiency trade-off. Results on the Google Next-Day Wildfire Spread dataset further validate these findings across a different benchmark.