Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-17

Accelerated Convex Optimization via Hamiltonian Dynamics with Deterministic Integration Time

arXiv:2606.17260v1 Announce Type: cross Abstract: We develop Hamiltonian dynamics-based algorithms for smooth convex optimization that achieve accelerated rates of convergence. By exploiting contraction of averaged Hamiltonian flow trajectories rather than requiring contraction at trajectory endpoints, we show that Hamiltonian dynamics-based optimization methods admit deterministic and accelerated convergence guarantees, extending prior work that is limited to quadratic objectives or holds only in expectation. We analyze an idealized continuous-time algorithm and derive practical discrete-time implementations with optimal first-order complexity, thereby establishing Hamiltonian dynamics as a useful algorithmic primitive for deterministic accelerated convex optimization.

02.
arXiv (CS.AI) 2026-06-17

SP-GCRL: Influence Maximization on Incomplete Social Graphs

arXiv:2605.12513v2 Announce Type: replace-cross Abstract: Influence maximization (IM) in real platforms is challenged by incomplete, noisy social graphs and non-stationary diffusion dynamics. We propose SP-GCRL, a social-propagation-aware graph contrastive reinforcement learning framework that learns end-to-end seed selection under partial observability.We first introduce a social-propagation-aware nonlinear diffusion function to model reinforcement/diminishing effects and probability drift under repeated exposure; we then construct dual structural views and perform contrastive learning to obtain node representations robust to missing edges and weak ties, while replacing expensive strategy metrics with a GAT-based regression surrogate to improve efficiency and scalability; finally, we use DDQN to learn an end-to-end seed selection policy on top of these representations. Experiments on multiple real-world networks show that SP-GCRL achieves significant gains over heuristic and learning-based baselines across budgets and topologies, while maintaining strong large-scale scalability.

03.
arXiv (CS.CL) 2026-06-11

FlowBank: Query-Adaptive Agentic Workflows Optimization through Precompute-and-Reuse

Large Language Model (LLM)-based multi-agent systems are increasingly powerful, but current agentic workflow optimization paradigms make an unsatisfying trade-off. Task-level methods spend substantial offline compute yet deploy only a single workflow, leaving complementary candidates unused, while query-level methods synthesize a new workflow per query at substantial inference cost. Our motivating analysis shows these paradigms are more complementary than competing: workflows discovered during offline search often solve different subsets of queries, and many queries handled by expensive query-level generation can already be solved by cheaper precomputed workflows. This suggests a different objective: rather than searching for one universally best workflow or regenerating one per instance, we should build a compact bank of reusable, complementary workflows and select among them adaptively at inference time. Doing so requires solving three coupled problems: generating complementary rather than redundant candidates, compressing them into a small deployable portfolio, and assigning each query to the right workflow under a performance-cost trade-off. To this end, we present FlowBank, a three-stage framework for portfolio-based agentic workflow optimization. Diversifying proposes DiverseFlow to steer search toward under-covered queries and produce a high-coverage candidate pool. Curating proposes CuraFlow to compress this pool into a compact portfolio with minimal redundancy. Matching casts deployment as edge-value prediction on a query-workflow bipartite graph and routes each incoming query to the portfolio member with the best predicted utility. Across five benchmarks, FlowBank achieves the highest average score among the evaluated methods while remaining cost-competitive, improving over the strongest automated and handcrafted baselines by 4.26% and 14.92% relative, respectively.

04.
arXiv (math.PR) 2026-06-12

Counterintuitive problems in discrete probability

arXiv:2606.07516v2 Announce Type: replace Abstract: This manuscript contains a collection of counterintuitive problems in discrete probability, together with detailed solutions. The dataset was constructed as part of a broader research project investigating the capabilities of the latest-generation Large Language Models (LLMs) in solving discrete probability problems, in order to assess whether LLMs tend to make systematic reasoning errors associated with known cognitive biases. The problems collected here are specifically designed to challenge heuristic reasoning strategies that often lead to intuitively appealing but mathematically incorrect conclusions. The dataset combines several types of problems. Some are adapted from classical probabilistic paradoxes and cognitive-bias literature, while others originate from recreational mathematics sources or were developed by ourselves following similar principles. The primary purpose of this document is to provide a transparent and publicly accessible reference for the problems used in our experimental evaluation of language models, as well as providing detailed human-made solutions. At the same time, we believe that this collection may also prove useful for future research on probabilistic reasoning, cognitive biases, and the evaluation of reasoning capabilities in artificial intelligence systems.

05.
arXiv (CS.AI) 2026-06-19

Toward Calibrated Mixture-of-Experts Under Distribution Shift

arXiv:2606.20544v1 Announce Type: new Abstract: Calibration aligns a model's predictive uncertainty with the frequencies of its empirical outcomes and is important for understanding and trusting reported probabilities. Recent work shows that enforcing calibration at the level of individual predictors can improve ensemble accuracy and calibration, with mixture-of-experts (MoE) models showing strong empirical improvements in particular; however, the conditions under which calibration helps MoE are not well understood. In this work, we study how MoE models behave under distribution shift, focusing on how routing mechanisms interact with expert-level calibration. We show that expert calibration is sufficient to ensure calibration of the overall model under a broad class of distribution shifts in hard-routed models, but is insufficient for calibrating soft-routed models. To address this, we propose an adversarial reweighting that penalizes calibration errors of the routed aggregate under distribution shift, and we demonstrate that it improves the accuracy-calibration tradeoff both on average and on difficult subsets of the data, across model classes, prediction tasks, and distribution shifts.

06.
arXiv (CS.AI) 2026-06-16

CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift

Authors:

arXiv:2605.00074v2 Announce Type: replace-cross Abstract: DNA-synthesis providers screen incoming orders by searching the requested sequence against curated hazard lists. We show that this baseline collapses to a 100% false-flag rate when the hazardous sequence comes from a taxonomic family absent from the reference set: under Conformal Risk Control's certified miss-rate constraint, a low-discrimination signal forces the threshold below the entire test-benign mass. We compose three signals derived from a synthesis order's public annotation: $k$-mer Jaccard similarity to known toxins, the trimmed-mean score of a five-LLM judge panel, and cosine similarity to clustered embedding centroids. Fused under a monotone logistic aggregator and calibrated by Conformal Risk Control, the resulting screener certifies $\mathbb{E}[\mathrm{FNR}] \le \alpha + \mathrm{TV}$, where the additive term is the calibration-to-test distribution shift under family holdout (a certified ceiling of 24-49% across folds). Across ten leave-one-taxonomic-family-out folds at $\alpha=0.05$ on UniProt KW-0800 reviewed toxins, the calibrated screener achieves 0% empirical test miss rate on every fold and 0% test false-flag rate on nine of ten folds. The bound's finite-sample slack $1/(n_{\mathrm{cal}}+1)$ caps the certifiable miss rate at 1.77% on our 200-hazard subsample; reaching procurement-grade $\alpha=10^{-3}$ requires an $18\times$ larger calibration set, which the full reviewed UniProt KW-0800 corpus is large enough to deliver. The binding constraint on certifiable DNA-synthesis screening is calibration data, not algorithms. Code: https://github.com/najmulhasan-code/crc-screen

07.
medRxiv (Medicine) 2026-06-22

Climatic Drivers of Malaria risk in Children Under Five: A Large-Scale Analysis of individual-level data for 350,000 children in 26 Sub-Saharan African Countries

Background Malaria risk is influenced by climatic conditions, and children under five are particularly vulnerable due to their limited acquired immunity. We investigate the association between climatic factors and malaria risk in 350,000 children aged 5-59 months in sub-Saharan Africa over 18 years. Methods We included children aged 5-59 months with malaria tests from Demographic and Health Surveys (DHS) in 26 sub-Saharan African countries between 2006 and 2023. We linked these data to high-resolution climate exposures: temperature, precipitation, soil moisture, actual evapotranspiration and specific humidity. We fitted a mixed-effect logistic regression model incorporating Distributed Lag Non-linear Models (DLNM) over 1-6 month lag window for each exposure, controlling for seasonality and long-term trends. We examined effect modification by maternal education, household wealth, residential type, water source, sanitation facility, child age and sex, use of insecticide-treated bed nets (ITNs), and the age of the household head. Results Malaria prevalence was 19.5%. Malaria risk was highest at 24 degrees (OR: 1.45, 95% CI: [1.36, 1.54]), followed by a decline at higher temperatures. This elevated risk was mainly driven by short-term exposures (1-2 months). Precipitation increased risk up to 59 ~ 120 mm (1.10, [1.07, 1.12]), after which heavier rainfall reduced risk, particularly at short- to medium-term lags (1-4 months). Soil moisture was associated with increasing risk up to ~80 mm (1.11, [1.08, 1.14]), with a plateau at higher levels. Evapotranspiration showed a strong, near-linear positive association with malaria risk. Higher specific humidity levels (>14 g/kg) presented a lower risk, reaching a 45% reduction at 17 g/kg (0.55, [0.49, 0.61]), with the strongest protective effects at short-term lags (1-2 months). Elevated malaria risk at low and moderate average temperatures was particularly evident among children who did not sleep under an ITN net. Conclusion Malaria risk in children under five is strongly shaped by climatic factors, with complex and delayed associations. The findings provide evidence to guide targeted interventions and early-warning strategies for vulnerable populations.

08.
arXiv (CS.CV) 2026-06-16

Text-Driven Fusion for Infrared and Visible Images: Achieving Image Scene Adaptation on Hyperbolic Space

Infrared and visible image fusion aims to integrate complementary modalities, while existing Euclidean methods impose rigid distance metrics that distort multi-modal interactions and parent-to-child semantic hierarchies. To overcome these limitations, we introduce a text-driven fusion framework empowered by hyperbolic manifold learning. During training, BLIP-extracted text prompts serve as topological anchors within the hyperbolic space, guiding vision-attribute alignment through hyperbolic embeddings that naturally accommodate varying semantic granularities. By exploiting the exponential volume growth dictated by the Poincaré ball's negative curvature, this approach seamlessly embeds hierarchical trees to encode coarse-to-fine semantics without metric saturation, while the vast peripheral space prevents texture distortion during cross-modal fusion. At inference, the fusion process autonomously adapts to input content using the learned text-attribute priors, completely eliminating the need for textual input. Experimental results show our method outperforms state-of-the-art approaches on benchmark datasets, with code available at https://github.com/Shaoyun2023/TEDFusion.

09.
arXiv (quant-ph) 2026-06-19

Strain- and Electric-Field-Tunable Valley Polarization in Mo0.75V0.25Te2(Mo3VTe8) for Valleytronic Application

arXiv:2606.19954v1 Announce Type: cross Abstract: Valley polarization in 2D TMDs is promising for low-power valleytronic and spin-valley information processing, but time-reversal symmetry in pristine nonmagnetic TMDs keeps the K+ and K- valleys degenerate, limiting device applications. In this work, we investigated the structural stability, electronic properties, and tunable valley polarization of V-alloyed MoTe2 monolayer, Mo0.75V0.25Te2, using first-principles density functional theory (DFT) calculations. Substitutional alloying of MoTe2 with V introduced magnetic exchange interaction, which, together with spin-orbit coupling (SOC), lifted the valley degeneracy at the unequal valleys. The alloyed structure was found to be energetically and dynamically stable due to the absence of imaginary phonon modes. In pristine MoTe2, SOC produced spin splittings of 34.0 meV and 218.9 meV in the conduction bands and valence bands, respectively, but no valley polarization was observed. In contrast, Mo0.75V0.25Te2 exhibited spontaneous valley polarization of 37.3 meV in the conduction band and 78.2 meV in the valence band. The valley polarization was further enhanced by external electric fields and biaxial strain. A transverse electric field along the crystal c axis produced the maximum valley splitting of 132.8 meV in the valence band, whereas biaxial tensile strain increased the valence band valley splitting up to 160.8 meV. The maximum conduction band valley splitting reached 54.4 meV under 2% biaxial compressive strain. These results demonstrated that V alloying, combined with electric-field and strain engineering, provides an effective strategy for achieving large and tunable valley polarization in MoTe2. Thus, Mo0.75V0.25Te2 can be considered a promising 2D platform for tunable valleytronic device applications, such as transistors and sensors.

10.
arXiv (CS.AI) 2026-06-16

AI systems out-persuade expert humans

arXiv:2606.16475v1 Announce Type: cross Abstract: Many societal decisions are settled by contests of persuasion. Conversational AI is a powerful new entrant in these contests, but whether it can out-persuade skilled and highly incentivized humans has remained unclear. Here, in a series of four preregistered experiments (n = 18,978 conversations from 6,923 people), we pitted AI systems against a range of human persuaders, including laypeople, winners of a separately preregistered four-round online persuasion tournament, professional canvassers, and world championship debaters. We found that AI systems were reliably more persuasive than expert humans, even when expert humans chose their issues, researched in advance, underwent hours of live, structured practice, and were incentivized with {\pounds}1,000 cash bonuses. In a follow-up study, AI's advantage persisted after experts received a coaching tool that let them practice against the AI that beat them, review their performance history, and see what AI would have said at key moments. We found converging evidence that AI's advantage stemmed from rapidly deploying larger quantities of information: after coaching, expert humans could tie an AI constrained to respond at human speeds and with human-length messages. In a final study, we show that AI's advantage extends to consequential real-world behavior: AI was nearly 3x more effective than professional canvassers from a UK fundraising firm at raising real-money donations to Save the Children. Together, these results establish that frontier AI systems out-persuade expert humans in conversation, with significant implications for political communication.

11.
arXiv (CS.AI) 2026-06-11

GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning

arXiv:2510.04567v3 Announce Type: replace-cross Abstract: Graph Neural Networks (GNNs) are powerful tools for processing relational data but often struggle to generalize to unseen graphs, giving rise to the development of Graph Foundational Models (GFMs). However, current GFMs are challenged by the extreme heterogeneity of graph data, where each graph can possess a unique feature space, label set, and topology. To address this, two main paradigms have emerged. The first leverages Large Language Models (LLMs), but is fundamentally text-dependent, thus struggles to handle the numerical features in vast graphs. The second pre-trains a structure-based model, but the adaptation to new tasks typically requires a costly, per-graph tuning stage, creating a critical efficiency bottleneck. In this work, we move beyond these limitations and introduce Graph In-context Learning Transformer (GILT), a framework built on an LLM-free and tuning-free architecture. GILT introduces a novel token-based framework for in-context learning (ICL) on graphs, reframing classification tasks spanning node, edge and graph levels in a unified framework. This mechanism is the key to handling heterogeneity, as it is designed to operate on generic numerical features. Further, its ability to understand class semantics dynamically from the context enables tuning-free adaptation. Comprehensive experiments show that GILT achieves stronger few-shot performance with significantly less time than LLM-based or tuning-based baselines, validating the effectiveness of our approach. Our code is available at: https://github.com/yiming421/inductnode/.

12.
arXiv (CS.LG) 2026-06-17

Stable and Steerable Sparse Autoencoders with Weight Regularization

arXiv:2603.04198v2 Announce Type: replace-cross Abstract: Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary substantially across random seeds and training choices. To improve stability, we studied weight regularization by adding L1 or L2 penalties on encoder and decoder weights, and evaluate how regularization interacts with common SAE training defaults. On MNIST, we observe that L2 weight regularization produces a core of highly aligned features and, when combined with tied initialization and unit-norm decoder constraints, it dramatically increases cross-seed feature consistency. For TopK SAEs trained on language model activations (Pythia-70M-deduped), adding a small L2 weight penalty increased the fraction of features shared across three random seeds and roughly doubles steering success rates, while leaving the mean of automated interpretability scores essentially unchanged. Finally, in the regularized setting, activation steering success becomes better predicted by auto-interpretability scores, suggesting that regularization can align text-based feature explanations with functional controllability.

13.
arXiv (CS.CL) 2026-06-16

Creative Collision: Directorial Persona Steering and Competition in Large Language Models

Activation steering has emerged as a powerful tool for shaping the behaviour of large language models at inference time, yet most prior work injects a single semantic direction into the residual stream. We study the richer setting in which two semantically opposing steering vectors are superimposed – a regime we call Creative Collision. Concretely, we construct directorial persona vectors for Steven Spielberg (optimistic, redemptive moral valence) and Martin Scorsese (dark, morally ambiguous) via mean-difference activation contrast on curated screenplay-derived corpora, then interpolate between them with a scalar mixing parameter $\alpha \in [0,1]$ and a steering coefficient $\lambda$. Across five evaluation axes – moral valence, generation coherence, surface style, directional dominance, and vector geometry – three principal findings emerge: (i)~Spielberg's representational signature exhibits robust directional dominance, suppressing Scorsese's moral influence across almost the entire interpolation range; (ii)~intermediate collision points paradoxically improve generation coherence relative to pure single-director steering at high $\lambda$; and (iii)~both personas localise maximally to layer~28 of a 40-layer decoder-only transformer, revealing a shared moral-tone substrate. These results illuminate the geometry of competing semantic directions in transformer residual streams and have direct implications for controllable creative generation and value-aligned narrative synthesis.

14.
arXiv (CS.AI) 2026-06-16

Sustainable Materials Discovery in the Era of Artificial Intelligence

arXiv:2601.21527v3 Announce Type: replace-cross Abstract: Artificial intelligence (AI) has transformed materials discovery, enabling rapid exploration of chemical space through generative models and surrogate screening. Yet current generative AI models for materials discovery, which now drive exploration of vast chemical and structural spaces, optimize candidates exclusively for structural stability and functional properties, with no integration of environmental assessment at any stage of the design loop. Prospective and ex-ante life cycle assessment methods exist and have been applied to emerging technologies, but they operate as standalone downstream analyses, not as active constraints within generative or active-learning pipelines. The result is that environmental feedback, even when produced, arrives after design decisions have been made rather than informing them. The disconnect between atomic-scale design and lifecycle assessment (LCA) reflects fundamental challenges: (i) data scarcity across heterogeneous sources, (ii) scale gaps from atoms to industrial systems, (iii) uncertainty in synthesis pathways, and (iv) the absence of frameworks that co-optimize performance with environmental impact. In this Perspective, we propose integrating upstream ML-assisted materials discovery with downstream LCA into the ML-LCA framework, comprising five components: information extraction for building materials-environment knowledge bases, harmonized databases linking properties to sustainability metrics, multi-scale models bridging atomic properties to lifecycle impacts, ensemble prediction of manufacturing pathways with uncertainty quantification, and uncertainty-aware optimization enabling simultaneous performance-sustainability navigation. Case studies spanning polymers, glass, photoresists, and cement demonstrate both necessity and feasibility while identifying material-specific integration challenges.

15.
arXiv (CS.LG) 2026-06-16

Airport Terminal Passenger Queue Forecasting for Departure Gates and Security Checkpoints

arXiv:2606.07622v2 Announce Type: replace Abstract: Accurate passenger queue forecasting in airport terminals is essential for efficient departure operations, as it enables proactive congestion management. However, time-varying passenger demand and heterogeneous facility usage across multiple departure facilities make forecasting challenging. In this work, we propose a passenger queue forecasting framework that learns historical passenger flow patterns from operational data. The proposed model employs a Transformer-based architecture to capture temporal dependencies and inter-facility correlations using past queue length and waiting time at departure gates and security checkpoints, together with passenger throughput at check-in islands. The learned representations are mapped to two facility-specific prediction heads to predict queue length and waiting time at departure gates and security checkpoints. Experimental results demonstrate accurate forecasts up to two hours ahead. The proposed approach offers practical real-time decision support for proactive queue management and staff reallocation in airport terminal operations.

16.
arXiv (quant-ph) 2026-06-12

Supersymmetry of dissipative Bose-Fermi systems with application to Jaynes-Cummings and Dicke models

arXiv:2606.12682v1 Announce Type: new Abstract: We demonstrate how supersymmetries of Hamiltonians for coupled Bose-Fermi systems can be used to place the Hamiltonians of the Jaynes-Cummings model and Dicke model under the rotating wave approximation in matrix form and provide explicit analytic solutions for their eigenvalues. We then use this supersymmetry to place the Liouvillians of the associated Markovian open systems in matrix form and provide explicit solutions for their eigenvalues. These results are a consequence of the fact that the Hamiltonian of the Jaynes-Cummings model commutes with the linear Casimir invariant of the superalgebra $u(1|1)$ and that the Hamiltonian of the Dicke model commutes both with the linear invariant of $\sum_{i} u_{i}(1|1)$ and with the invariant of an additional $su(2)$ algebra. Our methods apply to various coupled Bose-Fermi systems with $u(1|1)$ and more generally with $u(n|m)$ dynamical superalgebras, and may provide efficient tools for studying more complicated examples.

17.
arXiv (CS.LG) 2026-06-19

MortarBench: Evaluating Mortgage Loan Origination Agents

arXiv:2606.19416v1 Announce Type: new Abstract: Loan origination is the process by which a lender creates a new loan, from application and underwriting through approval and funding. This process serves a critical role in evaluating the eligibility and level of risk posed by an applicant. Recently, firms have begun using mortgage loan agents to augment human loan officers, despite a lack of any public benchmark. To fill this gap, we present MortarBench, a loan origination agent benchmark. MortarBench uses a financial data synthesis and mutation pipeline to generate examples with broad edge case coverage that match real-world distributions and questions. We find that state-of-the-art large language models (LLMs) perform poorly, with closed-source models achieving at most 77.1\% exact match accuracy. We also discover systematic biases in LLM perception of foreignness related to non-English names. Noting these weaknesses, we introduce CRIT, a confidence calibration framework. Our method increases accuracy to 80.5\% while improving risk management steering and reducing bias.

18.
arXiv (CS.LG) 2026-06-17

Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection

arXiv:2604.17616v3 Announce Type: replace Abstract: Root cause analysis (RCA) for time-series anomaly detection is critical for the reliable operation of complex real-world systems. Existing explanation methods often rely on unrealistic feature perturbations and ignore temporal and cross-feature dependencies, leading to unreliable attributions. We propose a conditional attribution framework that explains anomalies relative to contextually similar normal system states. Instead of using marginal or randomly sampled baselines, our method retrieves representative normal instances conditioned on the anomalous observation, enabling dependency-preserving and operationally meaningful explanations. To support high-dimensional time-series data, contextual retrieval is performed in learned low-dimensional representations using both variational autoencoder latent spaces and UMAP manifold embeddings. By grounding the retrieval process in the system's learned manifold, this strategy avoids out-of-distribution artifacts and ensures attribution fidelity while maintaining computational efficiency. We further introduce confidence-aware and temporal evaluation metrics for assessing explanation reliability and responsiveness. Experiments on the SWaT and MSDS benchmarks demonstrate that the proposed approach consistently improves root-cause identification accuracy, temporal localization, and robustness across multiple anomaly detection models. These results highlight the practical utility of conditional attribution for explainable anomaly diagnosis in complex time-series systems. Code and models are available at: https://github.com/dfki-av/Conditional-Attribution-for-Root-Cause-Analysis-in-Time-Series-Anomaly-Detection.

19.
arXiv (CS.CL) 2026-06-19

NRITYAM: Language Models Meet Art and Heritage of Dance

Language models have become essential tools in shaping modern workflows. However, their global effectiveness hinges on a nuanced understanding of local socio-cultural contexts. To address this gap, we present NRITYAM, a comprehensive benchmark for evaluating the cultural comprehension capabilities of language models in the context of global dance traditions. NRITYAM comprises 9,260 carefully curated question-answer pairs spanning 12 languages, making it the largest dataset dedicated to evaluating cultural knowledge in dance. The dataset has been developed from the ground up through close collaboration with native dance artists and native speakers of the languages, who authored and validated culturally relevant questions specific to their regions. We evaluate a broad set of models, including large language models, small language models, multimodal large language models, and small multimodal language models. As a multilingual and multicultural benchmark, NRITYAM sets a new standard for evaluating the ability of AI systems to understand and reason about traditional performing arts. Detailed dataset samples are available at~\url{https://github.com/niladrighosh03/NRITYAM}.

20.
arXiv (CS.AI) 2026-06-19

The Autonomy Tax: Defense Training Breaks LLM Agents

arXiv:2603.19423v2 Announce Type: replace-cross Abstract: Large language model (LLM) agents increasingly rely on external tools (file operations, API calls, database transactions) to autonomously complete complex multi-step tasks. Practitioners deploy defense-trained models to protect against prompt injection attacks that manipulate agent behavior through malicious observations or retrieved content. We reveal a fundamental capability-alignment paradox: defense training designed to improve safety systematically destroys agent competence while failing to prevent sophisticated attacks. Evaluating defended models against undefended baselines across 97 agent tasks and 1,000 adversarial prompts, we uncover three systematic biases unique to multi-step agents. Agent incompetence bias manifests as immediate tool execution breakdown, with models refusing or generating invalid actions on benign tasks before observing any external content. Cascade amplification bias causes early failures to propagate through retry loops, pushing defended models to timeout on 99\% of tasks compared to 13\% for baselines. Trigger bias leads to paradoxical security degradation where defended models perform worse than undefended baselines while straightforward attacks bypass defenses at high rates. Root cause analysis reveals these biases stem from shortcut learning: models overfit to surface attack patterns rather than semantic threat understanding, evidenced by extreme variance in defense effectiveness across attack categories. Our findings demonstrate that current defense paradigms optimize for single-turn refusal benchmarks while rendering multi-step agents fundamentally unreliable, necessitating new approaches that preserve tool execution competence under adversarial conditions.

21.
arXiv (CS.CV) 2026-06-12

ASTER: Latent Pseudo-Anomaly Generation for Unsupervised Time-Series Anomaly Detection

Time-series anomaly detection (TSAD) is critical in domains such as industrial monitoring, healthcare, and cybersecurity, but it remains challenging due to rare and heterogeneous anomalies and the scarcity of labelled data. This scarcity makes unsupervised approaches predominant, yet existing methods often rely on reconstruction or forecasting, which struggle with complex data, or on embedding-based approaches that require domain-specific anomaly synthesis and fixed distance metrics. We propose ASTER, a framework that generates pseudo-anomalies directly in the latent space, avoiding handcrafted anomaly injections and the need for domain expertise. A latent-space decoder produces tailored pseudo-anomalies to train a Transformer-based anomaly classifier, while a pre-trained LLM enriches the temporal and contextual representations of this space. Experiments on three benchmark datasets show that ASTER achieves state-of-the-art performance and sets a new standard for LLM-based TSAD.

22.
arXiv (CS.CL) 2026-06-12

WildIFEval: Instruction Following in the Wild

Recent LLMs have shown remarkable success in following user instructions, yet handling instructions with multiple constraints remains a significant challenge. In this work, we introduce WildIFEval - a large-scale dataset of 7K real user instructions with diverse, multi-constraint conditions. Unlike prior datasets, our collection spans a broad lexical and topical spectrum of constraints, extracted from natural user instructions. We categorize these constraints into eight high-level classes to capture their distribution and dynamics in real-world scenarios. Leveraging WildIFEval, we conduct extensive experiments to benchmark the instruction-following capabilities of leading LLMs. WildIFEval clearly differentiates between small and large models, and demonstrates that all models have a large room for improvement on such tasks. We analyze the effects of the number and type of constraints on performance, revealing interesting patterns of model constraint-following behavior. We release our dataset to promote further research on instruction-following under complex, realistic conditions.

23.
arXiv (CS.LG) 2026-06-19

How to sketch a learning algorithm

Authors:

arXiv:2604.07328v3 Announce Type: replace Abstract: How does the choice of training data influence an AI model? This broad question is of central importance to interpretability, privacy, and basic science. At its technical core is the data deletion problem: after a reasonable amount of precomputation, quickly predict how the model would behave in a given situation if a given subset of training data had been excluded from the learning algorithm. We present a data deletion scheme capable of predicting model outputs with vanishing error $\varepsilon$ and failure probability $\delta$ in the deep learning setting. Our precomputation and prediction algorithms are only $\tilde{O}(\log(1/\delta)/\varepsilon^2)$ factors slower than regular training and inference, respectively. The storage requirements are those of $\tilde{O}(\log(1/\delta)/\varepsilon^2)$ models. Our proof is based on an assumption that we call stability. In contrast to the assumptions made by prior work, stability appears to be fully compatible with learning powerful AI models. In support of this, we show that stability is satisfied in a minimal set of experiments with microgpt. Our code is available at https://github.com/SamSpo1/microgpt-sketch. At a technical level, our work is based on a new method for locally sketching an arithmetic circuit by computing higher-order derivatives in random complex directions. Forward-mode automatic differentiation allows cheap computation of these derivatives.

24.
arXiv (quant-ph) 2026-06-17

Acceleration-induced spectral blind spots in stimulated atomic transitions

arXiv:2606.17396v1 Announce Type: cross Abstract: Stimulated transitions are among the most fundamental processes in light-matter interaction, underlying resonant absorption and emission in atomic systems. Here we show that uniform acceleration can convert this familiar response into a frequency-selective absence of response. Specifically, when an incident photon has a nonzero momentum component transverse to the acceleration, the stimulated transition probability vanishes at a discrete set of frequencies fixed by the acceleration, the atomic transition frequency, and the photon propagation angle. At these spectral blind spots, both ordinary stimulated absorption and acceleration-induced excitation are simultaneously suppressed, rendering the atom effectively unresponsive to the incident radiation. The effect arises from the nontrivial response of accelerated atoms to quantum vacuum fluctuations and provides a distinctive signature of the Unruh effect through the absence, rather than the enhancement, of stimulated transitions. We further provide an order-of-magnitude estimate showing that an electron-based implementation with spin splitting in combined electric and magnetic fields could access the required parameter regime. These results reveal an unexplored form of acceleration-modified light-matter interaction and identify spectral blind spots as a new manifestation of the Unruh effect.

25.
bioRxiv (Bioinfo) 2026-06-18

MorphoStat: A Statistics-Aware Pipeline for Morphological Profiling Analysis

Authors:

High-content imaging produces thousands of morphological measurements per cell. Interpreting these measurements requires normalization to remove plate effects, statistical tests selected on the basis of data distribution, and control over false discoveries across many features tested at once. MorphoStat is an open-source Python pipeline that applies this sequence of steps automatically. Given a CSV file from CellProfiler or a compatible imaging platform, it removes low-quality wells, normalizes each plate against DMSO controls using a MAD-scaled z-score, routes each feature to a parametric or nonparametric test based on a distributional check, applies Benjamini Hochberg correction, and writes out results and publication-ready figures. On the BBBC021 benchmark (MCF-7 breast-cancer cells, 632 wells, 473 features), MorphoStat recovered 12 of 13 known mechanism-of-action classes in principal component space, confirming that the normalization and statistical routing work as intended. The tool is available at https://github.com/Almunthir334/morphostat (DOI: 10.5281/zenodo.20354069) under the MIT license.