Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-19

Folded Transport MCMC: Eliminating Label Switching by Sampling on a Fundamental Domain

作者:

arXiv:2606.04307v2 Announce Type: replace Abstract: In Bayesian mixture models and other exchangeable-component models, the posterior is invariant under permutation of component labels, creating m! equivalent modes-the label-switching problem. Standard MCMC methods either mix poorly across these modes or rely on post-hoc relabelling that cannot guarantee the sampler has converged. We propose Folded Transport MCMC (FolT-MCMC), which eliminates label switching before sampling by restricting the Markov chain to a fundamental domain-a sorted or reflected subspace containing exactly one representative from each symmetric mode. The proposal is a learned normalising flow whose density is symmetrised over the group orbits, ensuring correct targeting on the reduced space. We show that this construction preserves a computable convergence diagnostic based on the oscillation of the log-density ratio, and that the diagnostic becomes sharper on the fundamental domain whenever the original-space flow under-covers one or more symmetric modes. Experiments on Gaussian mixtures (d=2-20), label-switching targets (up to 24 equivalent modes), a standard Bayesian three-component mixture posterior, and real accelerometer data from a supertall building show improvement ratios of 2x to 145x, with the folded diagnostic stable across dimensions while the unfolded diagnostic collapses.

02.
arXiv (CS.CV) 2026-06-11

MFEN:Multi-Frequency Expert Network for Visible-Infrared Person Re-ID

Visible-infrared person re-identification (VI-ReID) is challenging due to the large modality discrepancy between visible and infrared images. We contend that this discrepancy is largely related to differing lighting conditions, including differences in light wavelength and light source type. Recently, frequency-based VI-ReID approaches have achieved notable success because frequency information can better extract identity-relevant contours and details while excluding irrelevant lighting and color. However, existing methods either do not distinguish different frequency bands or focus on only one band, which is insufficient under diverse lighting conditions. To perform comprehensive frequency domain learning, we propose a Multi-Frequency Expert Network (MFEN) that enables multi-frequency modulation and adaptively combines different bands through a mixture-of-experts design. We further introduce Random Frequency Augmentation (RFA) and Frequency Auxiliary Optimization (FAO) to better train MFEN. The three modules are complementary and jointly capture critical frequency-domain details for robust representation learning. Extensive experiments on three VI-ReID datasets demonstrate the effectiveness of our approach.

03.
arXiv (CS.CV) 2026-06-16

CoIRL-AD: Collaborative-Competitive Imitation-Reinforcement Learning in Latent World Models for Autonomous Driving

End-to-end autonomous driving models trained with imitation learning (IL) often generalize poorly, particularly in long-tail scenarios where expert demonstrations are sparse. Reinforcement learning (RL) can provide complementary task-level supervision, but applying RL to real-world autonomous driving is challenging in offline settings without interactive simulators, where datasets are dominated by expert actions and provide limited behavioral diversity. We propose CoIRL-AD, a competitive dual-policy framework that integrates IL and RL under a unified offline training regime. CoIRL-AD decouples imitation and reward optimization into separate actors to alleviate objective conflicts, uses imagined future rollouts for long-horizon reward estimation, and introduces a competition mechanism that selectively transfers beneficial behaviors while keeping RL anchored to expert-like driving. Experiments on the nuScenes benchmark show that CoIRL-AD consistently improves robustness over strong IL-based baselines, with especially large gains in cross-city generalization and long-tail scenarios. Code is available at: https://github.com/SEU-zxj/CoIRL-AD.

04.
arXiv (CS.CL) 2026-06-16

Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents

Graphical user interface (GUI) agents powered by multimodal large language models (MLLMs) have shown greater promise for human-interaction. However, due to the high fine-tuning cost, users often rely on open-source GUI agents or APIs offered by AI providers, which introduces a critical but underexplored supply chain threat: backdoor attacks. In this work, we first unveil that MLLM-powered GUI agents naturally expose multiple interaction-level triggers, such as historical steps, environment states, and task progress. Based on this observation, we introduce AgentGhost, an effective and stealthy framework for red-teaming backdoor attacks. Specifically, we first construct composite triggers by combining goal and interaction levels, allowing GUI agents to unintentionally activate backdoors while ensuring task utility. Then, we formulate backdoor injection as a Min-Max optimization problem that uses supervised contrastive learning to maximize the feature difference across sample classes at the representation space, improving flexibility of the backdoor. Meanwhile, it adopts supervised fine-tuning to minimize the discrepancy between backdoor and clean behavior generation, enhancing effectiveness and utility. Extensive evaluations of various agent models in two established mobile benchmarks show that AgentGhost is effective and generic, with attack accuracy that reaches 99.7\% on three attack objectives, and shows stealthiness with only 1\% utility degradation. Furthermore, we tailor a defense method against AgentGhost that reduces the attack accuracy to 22.1\%. Our code is available at \texttt{anonymous}.

05.
arXiv (CS.AI) 2026-06-19

Uncertainty-Aware Reward Modeling for Stable RLHF

arXiv:2606.19818v1 Announce Type: cross Abstract: Reinforcement learning from human feedback (RLHF) aligns large language models by training reward models on preference data and optimizing policies to maximize predicted rewards. However, this pipeline faces two fundamental challenges: (1) reward models cannot signal when their predictions are unreliable, since they usually act as deterministic point estimators; and (2) modern group-based policy optimization can amplify unreliable reward signals, as exemplified by GRPO's uniform treatment of rewards during advantage computation. As policies explore increasingly diverse responses, these two limitations create a critical vulnerability: unreliable reward estimates may be granted disproportionate influence, triggering severe reward hacking. We propose Uncertainty-Aware Reward Modeling (UARM), which equips reward models with calibrated uncertainty via quantile-based conformal prediction and reweights GRPO advantages through heteroscedastic variance decomposition. Experiments across HelpSteer, UltraFeedback, and PKU-SafeRLHF demonstrate that UARM significantly improves reward model calibration, reduces reward hacking, and enhances downstream alignment quality compared to standard GRPO and uncertainty-agnostic baselines.

06.
arXiv (CS.AI) 2026-06-19

One Probe Won't Catch Them All: Towards Targeted Deception Detection

arXiv:2602.01425v2 Announce Type: replace Abstract: Linear probes are a promising approach for monitoring AI systems for deceptive behaviour. Previous work has shown that a linear classifier trained on a contrastive instruction pair and a simple dataset can achieve good performance. However, these probes exhibit notable failures even in straightforward scenarios, including spurious correlations and false positives on non-deceptive responses. In this paper, we demonstrate that deception detection is inherently heterogeneous: while a single universal probe achieves modest improvements (+0.032 AUC), post-hoc oracle analysis reveals substantially higher potential (+0.108 AUC) when probes are matched to specific deception types, and synthetic validation experiments suggest this ceiling is achievable a priori when the deception type is known in advance. Our findings reveal that instruction pairs capture deceptive intent rather than content-specific patterns, explaining why prompt choice dominates probe performance (70.6% of variance). Given this heterogeneity, we conclude that organizations should define their specific threat models and deploy appropriately matched probes rather than seeking a universal deception detector.

07.
arXiv (CS.CL) 2026-06-11

LifeSentence: Language models can encode human life course trajectories from longitudinal panel data

Forecasting human life outcomes is important to gain insights into how individuals attain long and healthy lives. Conventional statistical approaches yield limited accuracy, potentially due to discarding the sequential structure of the life course. Modern methods such as transformer architectures require large scale training data that most longitudinal panel studies lack. Here we introduce LifeSentence, a model for life-course reasoning that bridges large language models with longitudinal panel data. By representing each life event as a structured natural-language record and instruction-tuning a pretrained 24-billion-parameter language model across an 18-task evaluation taxonomy spanning prediction, robustness and reasoning, LifeSentence supplements panel data with distributional knowledge already encoded during pretraining. Trained on approximately 65,000 individuals from the German Socio-Economic Panel - roughly 45 times fewer than prior transformer-based approaches - LifeSentence outperforms classical and deep learning baselines across all task families, achieving a threefold improvement in joint event-and-timing prediction from best baselines and 91.2% Kendall's tau when reconstructing chronological order from timestamp-stripped event sets. Without explicit supervision, the model recovers documented patterns of social stratification, including the education premium, the gender wage gap and the motherhood penalty, from discrete event sequences alone. A natural-language interface further enables qualitatively new research queries, such as connecting an early-life history to a specified late-life endpoint, establishing LifeSentence as both a predictive tool and a probe for counterfactual exploration of human biographies.

08.
arXiv (CS.AI) 2026-06-19

Physical Atari: A Robust and Accessible Platform for Real-time Reinforcement Learning on Robots

arXiv:2606.19357v1 Announce Type: cross Abstract: We built a robot called the Robotroller that actuates an Atari CX40+ controller and a device called the Atari Devbox that renders the game frame and the reward signal from the Arcade Learning Environment on a screen. The Robotroller and the Atari Devbox, together with an off-the-shelf camera and a desktop computer, constitute a system that can be used to study reinforcement learning algorithms in the physical world. We call the full system Physical Atari. In this paper, we detail the key decisions that make Physical Atari a robust and accessible platform. To make the system robust, we designed the Robotroller so that all movement is done through bearings, which reduces wear. Additionally, we wrote software that monitors the state of the servos at a high frequency and intervenes to limit stress. To make the system accessible, we used affordable off-the-shelf components and parts that can be manufactured using consumer 3D printers. Physical Atari can be built for under $1,000 and has been used for weeks of non-stop reinforcement learning experiments without any mechanical failures. We used it to validate that reinforcement learning algorithms can learn directly on robots and show that even small distribution shifts between learning and deployment can significantly degrade the performance of policies. Our results underscore the importance of on-device adaptation for strong performance on robots.

09.
arXiv (CS.CL) 2026-06-19

Self-Preference Is Weak or Absent in Verifiable Instruction-Following Revision: A Four-Model Test Under Genuine Authorship

Large language models (LLMs) increasingly review and revise text, including their own. A documented self-preference bias (models favoring their own generations when acting as judges) raises the question of whether models also resist valid corrections to their own writing. We test this in a setting where "valid" is decided not by another model but by a deterministic verifier: instruction-following revision on IFEval. A model writes a draft; the official IFEval checker confirms the draft violates a constraint and that a candidate edit fixes it; the model then accepts or rejects that edit either as the genuine in-context author or as a fresh model that sees the draft neutrally. Across four mid-tier model families and 85 author-versus-fresh comparisons, we find no detectable self-preference: authors reject verified-good fixes to their own drafts at essentially the same rate as fresh models judging the same drafts (gap -5.1 pp, 95% CI [-12.9, +2.7]). A self-skepticism hint from a smaller pilot did not replicate at scale. The one robust observation is qualitative: when authors do reject a verified-good fix, 97% of their stated reasons are flaw-catching rather than preference, that is, about the character of rejections, not an elevated rate. Effects smaller than ~13 pp cannot be excluded at this sample size.

10.
arXiv (CS.CV) 2026-06-16

Fusion-E2Pulse: A Multimodal Event-RGB Fusion Network for Non-contact Pulse Wave Reconstruction

Non-contact pulse wave reconstruction hinges on the precise recovery of waveform morphology, including the dicrotic notch. Conventional Red-Green-Blue (RGB)-based methods, which extract physiological signals from recorded facial videos, are constrained by the integral imaging mechanism of standard cameras, where the exposure process induces a smoothing effect that attenuates subtle vascular pulsation details. Conversely, neuromorphic event cameras, while offering exceptional sensitivity to intensity fluctuations, are inherently susceptible to noise and artifacts induced by minor motion. To exploit the synergy between frame-based integration and event-based differential sensing, we propose a novel multimodal network named Fusion-E2Pulse. This framework utilizes filtered RGB signals as structural priors to suppress motion artifacts, while leveraging the high-sensitivity of event streams to recover fine-grained morphological details. Experimental results demonstrate that Fusion-E2Pulse achieves state-of-the-art performance, effectively balancing noise suppression and morphological fidelity, achieving a mean absolute error of 0.78 bpm for heart rate estimation, a waveform correlation of 0.89, and a systolic phase duration error of 16.74 ms, validating its efficacy in reconstructing fine-grained pathological features.

11.
arXiv (quant-ph) 2026-06-11

Dissociative recombination and ion-pair formation in $\mathrm{HeH^+}$ isotopologues: A time-dependent wave-packet study including rotational coupling

arXiv:2606.11352v1 Announce Type: cross Abstract: We present a comprehensive theoretical investigation of dissociative recombination (DR) and resonant ion-pair (RIP) formation in $\mathrm{HeH^+}$ isotopologues using time-dependent wave-packet propagation methods. Nuclear dynamics are treated on a set of 23 coupled electronic states, including $^2\Sigma$, $^2\Pi$, and $^2\Delta$ symmetries, in both adiabatic and strictly diabatic representations, with rotational couplings explicitly included. Reaction cross sections are computed over collision energies ranging from 0 to 50 eV. The results reveal that inclusion of a large manifold of resonant states and rotational couplings significantly enhances the DR cross section relative to earlier theoretical studies. In the diabatic representation, $^2\Sigma$ states dominate the recombination dynamics, while in the adiabatic representation, $^2\Pi$ and $^2\Delta$ states contribute significantly at low collision energies. For RIP formation, two different diabatization schemes yield systematically larger cross sections than previous models, highlighting the sensitivity of ion-pair production to electronic coupling structure. Isotopic effects are examined, showing a clear inverse dependence of cross section magnitude on reduced mass. The present results underscore the importance of multi-state coupling and nonadiabatic effects in accurately describing electron-molecule collision processes in primordial and astrophysical plasmas.

12.
arXiv (CS.LG) 2026-06-19

Alternating Direction Method of Multipliers for Nonlinear Matrix Decompositions

arXiv:2512.17473v3 Announce Type: replace-cross Abstract: We present an algorithm based on the alternating direction method of multipliers (ADMM) for solving nonlinear matrix decompositions (NMD). Given an input matrix $X \in \mathbb{R}^{m \times n}$ and a factorization rank $r \ll \min(m, n)$, NMD seeks matrices $W \in \mathbb{R}^{m \times r}$ and $H \in \mathbb{R}^{r \times n}$ such that $X \approx f(WH)$, where $f$ is an element-wise nonlinear function. We evaluate our method on several representative nonlinear models: the rectified linear unit activation $f(x) = \max(0, x)$, suitable for nonnegative sparse data approximation, the component-wise square $f(x) = x^2$, applicable to probabilistic circuit representation, and the MinMax transform $f(x) = \min(b, \max(a, x))$, relevant for recommender systems. The proposed framework flexibly supports diverse loss functions, including least squares, $\ell_1$ norm, and the Kullback-Leibler divergence, and can be readily extended to other nonlinearities and metrics. We illustrate the applicability, efficiency, and adaptability of the approach on real-world datasets, highlighting its potential for a broad range of applications.

13.
arXiv (CS.AI) 2026-06-12

A Minimal Model of Bounded Trade-Off Screening in Multi-Attribute Choice

arXiv:2606.13201v1 Announce Type: new Abstract: Human decision-making often involves choosing between multi-attribute alternatives, yet classical models assume fully compensatory utility aggregation despite evidence that people reject options with poor performance on critical attributes. We propose a bounded trade-off reasoning framework in which decisions are governed by a screening process that evaluates the balance between gains and losses across attributes. The model introduces a trade-off tolerance parameter that controls acceptable imbalance and can vary across contexts. Through simulation, we show that this mechanism produces preference patterns that differ from standard utility-based models and captures context-dependent variation in trade-off behavior. These results establish bounded trade-off screening as a plausible computational mechanism for multi-attribute choice and generate testable predictions for future behavioral studies.

14.
medRxiv (Medicine) 2026-06-15

Investigation of Intra-Fraction Stability and Inter-Fraction Reproducibility of Deep Inspiration Breath-Hold Across Two Hypofractionated Radiotherapy Regimens in the HYPORT Adjuvant Study.

Background: Deep Inspiration Breath Hold (DIBH) is a widely used respiratory motion management technique for minimizing cardiac dose in left-sided breast radiotherapy. In the Breast HYPORT Adjuvant study, DIBH was employed for cardiac sparing in patients without nodal irradiation using a standardized institutional protocol with the Varian Real-time Position Management (RPM) system. Both moderate-hypofractionation (control arm - 40Gy in 15 fractions) and one-week hypofractionation (experimental arm - 26 Gy in 5 fractions) regimens were delivered using this protocol. This study aimed to evaluate the robustness of DIBH by analyzing intra-fraction stability and inter-fraction reproducibility of breath-hold amplitude across the two treatment regimens. Methods: Respiratory waveforms acquired during each treatment session were analyzed to determine the median breath-hold amplitude and its standard deviation during beam delivery. Intra-fraction stability was assessed from vari- ations within individual treatment sessions, while inter-fraction reproducibility was evaluated relative to the simula- tion waveform amplitude across all treatment sessions. These parameters were compared between the two HYPORT regimens to examine breath-hold consistency during treatment delivery. Moreover, an additional comparison was made between the one-week hypofractionation regimen and the first five fractions of the moderate-hypofractionation regimen to evaluate the effect of treatment duration . Lung volumes from free-breathing and DIBH CT scans were analyzed to assess the effectiveness of patient breath-hold training. Results: Both arms demonstrated an average 1.7-fold increase of air volume in lung during the breath-hold position, confirming the effective implementation of DIBH during treatment planning and delivery. Structured training resulted in increased breath-hold amplitudes, with gains of 22.87% and 24.16% with respect to the first trial session in the experimental and control arms, respectively. Both regimens receive equivalent doses for approximately the same air volume in lung . Despite the different prescription doses in the two arms (26 Gy vs. 40 Gy), the experimental arm achieved an equivalent mean heart dose of 2.91% (75.6 cGy) compared with 2.95% (118.51 cGy) in the control arm, suggesting a similar cardiac preservation protocol adopted during treatment planning. Intra-fraction stability was similar between the control arm and the experimental arm, with median amplitude variations of 1.006 mm (95% CI: [0.998-1.015]) and 1.079 mm (95% CI: [1.067-1.097]), respectively. In contrast, inter-fraction reproducibility improved in the experimental arm, with lower deviation from simulation amplitude (0.44 {+/-} 0.24 mm vs. 0.66 {+/-} 0.25 mm) for the entire treatment schedule. The stability and reproducibility of experimental arm were further compared with the first five fractions of the control arm. The results were similar to those of the experimental arm. Conclusion: In this study, we compared two treatment regimens in terms of intra-fraction stability and inter-fraction reproducibility during DIBH radiotherapy. Both regimens demonstrated comparable intra-fraction stability, indicating effective motion management irrespective of treatment duration. However, the experimental arm showed better inter- fraction reproducibility, suggesting more consistent breath-hold performance throughout the treatment course. Based on stability and reproducibility, a reasonable narrowing of the DIBH gating window may be implemented with minor changes to the institutional protocol. The observed trend highlights the potential for improved consistency with the experimental approach and supports further investigation to better understand the underlying factors and strengthen these findings in future studies.

15.
arXiv (CS.LG) 2026-06-15

Beyond task performance: Decoding bioacoustic embeddings with speech features

arXiv:2606.14662v1 Announce Type: new Abstract: Pretrained audio embeddings are standard in bioacoustics, yet little is known about which acoustic features these models encode, nor which are useful for a given task. This hinders transparency and limits extension to rare species or data-scarce domains. Here we reveal which speech-like features are encoded in bioacoustic representations. Using the 88~eGeMAPS features across six taxonomic groups, we apply linear and nonlinear regression probes to quantify which acoustic properties each model captures. Results confirm a ``no free lunch'' pattern: no single model captures the full feature space. A concatenated embedding achieves the highest performance, suggesting complementary acoustic space coverage across models. Loudness features are best encoded ($R^2 = 0.76$) while F0 is hardest to recover ($R^2 = 0.33$). By cross-referencing recoverability with per-species feature salience (NMI), we derive data-driven model selection guidance for bioacoustics.

16.
arXiv (CS.CL) 2026-06-12

Ontology Memory-Augmented ASR Correction for Long Text-Speech Interleaved Conversations

Automatic speech recognition (ASR) correction has traditionally focused on isolated utterances or short local contexts. However, as text and speech become increasingly interleaved in long interactions, ASR correction requires conversation-level contextual evidence. Existing ASR correction methods often rely on the current hypothesis or concatenate raw dialogue history. In such contexts, sparse correction evidence can be difficult to locate amid redundancy and noise. Addressing these challenges, we propose an ontology memory-augmented ASR correction framework for long text-speech interleaved conversations. The framework organizes preceding interaction history into a dynamically updatable ontology memory, where entities, terminology, surface variants, potential ASR confusions, and semantic relations are stored as retrievable nodes for context-grounded correction. To evaluate this setting, we construct RAMC-Corr, a dataset derived from MAGIC-RAMC for long-range ASR correction with grounded context. Experiments on RAMC-Corr show that our method improves over direct correction in 9 out of 10 paired backbone-setting combinations and encourages more selective and evidence-grounded corrections for context-dependent ASR errors.

17.
arXiv (CS.AI) 2026-06-16

An AI Security Agent for University ACMIS: Multi-Vector Threat Detection and Automated Response

arXiv:2606.08270v2 Announce Type: replace-cross Abstract: University Academic Management Information Systems (ACMIS) are high-value targets for a wide spectrum of security threats including brute-force login attacks, payment fraud, privilege escalation, insider data theft, and academic integrity violations. Traditional rule-based intrusion detection systems are inadequate because many malicious activities are structurally indistinguishable from normal operations. This paper presents an AI-based security agent for ACMIS that combines supervised anomaly detection, behavioural analytics, and a natural language processing chatbot for secure password recovery. The agent monitors five operational layers: authentication, authorisation, financial transactions, user behaviour, and system health, and responds through a four-tier risk escalation framework. A modular architecture allows the core engine to be extended to other institutional systems. Experiments on a simulated ACMIS event log dataset of 147,922 sessions demonstrate a threat detection macro-average F1 of 0.966, compared to 0.156 for a rule-based baseline and 0.836 for a sequence-only (LSTM) baseline, with end-to-end critical-tier automated response latency under 1 ms on a single-node prototype. The integrated recovery chatbot achieves 97.1 percent identity verification accuracy and an 87.3 percent mass-reset attack detection rate with zero false positives on legitimate high volume recovery periods.

18.
arXiv (CS.CV) 2026-06-16

Imitating What Works: Simulation-Filtered Modular Policy Learning from Human Videos

The ability to learn manipulation skills by watching videos of humans has the potential to unlock a new source of highly scalable data for robot learning. Here, we tackle prehensile manipulation, in which tasks involve grasping an object before performing various post-grasp motions. Human videos offer strong signals for learning the post-grasp motions, but they are less useful for learning the prerequisite grasping behaviors, especially for robots without human-like hands. A promising way forward is to use a modular policy design, leveraging a dedicated grasp generator to produce stable grasps. However, arbitrary stable grasps are often not task-compatible, hindering the robot's ability to perform the desired downstream motion. To address this challenge, we present Perceive-Simulate-Imitate (PSI), a framework for training a modular manipulation policy using human video motion data processed by paired grasp-trajectory filtering in simulation. This simulation step extends the trajectory data with grasp suitability labels, which allows for supervised learning of task-oriented grasping capabilities. We show through real-world experiments that our framework can be used to learn precise manipulation skills efficiently without any robot data, resulting in significantly more robust performance than using a grasp generator naively.

19.
arXiv (CS.LG) 2026-06-16

Deep Learning-Based Lunar Crater Terrain Relative Navigation

arXiv:2606.14776v1 Announce Type: cross Abstract: Accurate position estimation is crucial for the successful implementation of future lunar landings using autonomous vehicles, especially in dangerous environments with sparse terrain features. In this paper, we propose a terrain relative navigation (TRN) algorithm combining our deep-learning crater detector, which was designed specifically for the NASA Crater Detection Challenge problem, and an Extended Kalman Filter (EKF). Our detector analyzes crater features from the monocular images acquired from orbit, and their matches with craters from a global database are identified via a Hungarian assignment approach followed by the consensus-based outliers removal method. The estimated measurements are then used to refine an EKF, where spacecraft pose estimation in the Lunar-Centered Lunar-Fixed (LCLF) frame of reference, augmented with altitude aiding information, constrains radial drift. The simulation results indicate that even if the spacecraft is off from its actual location up to 5 km, TRN could recover from this situation, achieving navigation error reduction to a few hundred meters. It should be noted that in order to maintain crater feature correspondences, it is important to match the image resolution and the scales within the scene to the detector training set distribution.

20.
arXiv (CS.LG) 2026-06-11

Phase Transitions in Attention: A Bayesian Theory of Copy Head Emergence

arXiv:2606.12058v1 Announce Type: cross Abstract: Attention is the key mechanism underlying in-context learning in transformers, and attention patterns have been observed empirically to emerge abruptly during training. We present a Bayesian theory of feature learning in attention; we then focus on how the copy subcircuit in the first layer of an induction head is learned by analyzing a single-layer softmax attention network trained on a copy task. We derive a closed-form posterior over the attention matrix and reduce it to a low-dimensional order parameter space. This reduction reveals a phase transition in the amount of training data, which we verify using both Bayesian sampling and standard training with Adam. We contrast our results with linear attention and find that softmax attention exhibits a first-order phase transition while in linear attention an initial second-order phase transition is followed by a smooth, continuous evolution toward the structured attention pattern (crossover). Our work provides a first-principles theoretical account of the abrupt emergence of the copy subcircuit, reminiscent of the one observed in training large language models.

21.
arXiv (CS.CV) 2026-06-16

WaveDINO: Learning-Based Atmospheric Correction of Unwrapped InSAR Interferograms Validated by GNSS: Results at Laguna del Maule and Campi Flegrei Volcanoes

Interferometric Synthetic Aperture Radar (InSAR) enables effective monitoring of volcanic deformation; however, the observed signals are often corrupted by atmospheric phase delays, seasonal surface changes, and decorrelation effects. Existing atmospheric correction methods, such as numerical weather model-based methods, can reduce these effects but do not consistently remove atmospheric artefacts and may introduce residual biases. To address these limitations, we propose a novel learning-based method for denoising unwrapped InSAR interferograms, using a hybrid training strategy that combines physically motivated synthetic deformation with real atmospheric noise. Specifically, we introduce WaveDINO, a wavelet-based multi-scale denoising framework conditioned on frozen DINOv3 foundation-model features and terrain information. Training uses synthetic magma-source deformation superimposed on short-term interferograms to expose the network to realistic atmospheric statistics while retaining known ground truth. Performance is evaluated on both controlled synthetic data and long-term real interferograms from Laguna del Maule (Chile) and Campi Flegrei (Italy), with independent GNSS measurements used for validation. WaveDINO consistently outperforms competing models, improving agreement with GNSS measurements, and reducing mean GNSS misfit by approximately 3% and 19% at two sites, respectively, while surpassing weather-model-based corrections.

22.
arXiv (CS.CL) 2026-06-15

Simulating Students' Java Programming Errors with Large Language Models

Understanding student errors in the programming is a cornerstone of programming education, yet obtaining a representative set of student errors for any newly designed task remains slow and costly, since authentic submissions only accumulate after extensive classroom deployment. This paper explores whether large language models (LLMs) can serve as scalable proxies for students by simulating realistic logical errors in code submissions. Using the CodeWorkout dataset of 74,000+ unique student Java submissions across 37 problems, we evaluate five LLMs under three mainstream prompting strategies: Input-Output (IO), Chain-of-Thought (CoT), and iterative Self-Refine. We assess performance along two key dimensions: diversity (the range of distinct error patterns) and alignment (alignment with authentic student mistakes), and examine how these vary by struggling level of programming tasks. Our quantitative findings reveal that while all models generate diverse errors, their alignment to human submissions diverges: Claude Sonnet 4 achieves the most balanced performance. In addition, we conducted a blinded expert annotation study (N = 401) comparing synthetic and authentic errors. This qualitative analysis confirms that the generated errors are functionally indistinguishable from authentic student errors. Moreover, higher-struggling-level problems elicit more diverse but less student-like errors. These results highlight trade-offs in using LLMs to simulate human learners and suggest design considerations for integrating synthetic errors into teachable agents, intelligent tutoring systems, and large-scale learning analytics.

23.
medRxiv (Medicine) 2026-06-18

The Effectiveness of aromatherapy and its supportive Interventions on anxiety and pain among breast cancer patients: A systematic review and meta-analysis

Introduction: Breast cancer treatments are often associated with pain and anxiety, which can hinder physical functioning and overall quality of life, even after treatment. Complementary therapies, such as aromatherapy, can be used to alleviate pain and reduce anxiety in breast cancer patients. This project aimed to synthesize current global evidence on the effectiveness of aromatherapy. Method: This systematic review followed the PRISMA 2020 guidelines, with a comprehensive, systematic search conducted in PubMed, CINAHL, Cochrane Library, and SCOPUS for randomized controlled trials (RCTS) published from 2015 to 2025. Eligible studies included adult women breast cancer surgery patients who received aromatherapy during various periods of breast cancer. Where possible, data from the included studies were pooled using meta-analysis. GRADE approach was used to assess certainty of findings. Results: The search yielded 84 studies. Out of these, six were included in this review. On average, aromatherapy reduces pain and anxiety scores by 0.79 (standard mean difference (SMD)=-0.79, 95% CI -1.42, -0.16) and 0.53 (SMD=-0.53, 95 CI=-0.90, -0.16) units, respectively, compared to control condition [Low-quality of evidence]. The combination of aromatherapy with music reduces pain and anxiety by 1.26 (SMD= -1.26, 95 CI=-1.65, -0.87) and 1.08 (SMD = -1.08, 95 % CI: -1.45, -0.70) units respectively compared to standard care [Low-quality of evidence]. Conclusion: There is a potential role for the use of aromatherapy and music therapy, to alleviate anxiety and pain, especially for non-preoperative anxiety and pain. Further research is needed to inform the integration of aromatherapy into the management of anxiety and pain.

24.
arXiv (math.PR) 2026-06-12

Characterizing metric-space-valued processes: separating classes and weak invariance principles for measure-theoretic inference

arXiv:2606.13084v1 Announce Type: cross Abstract: This article investigates stochastic processes taking values in metric spaces that lack a topological vector space structure, a regime characterized by intricate interplay between topological, geometric, and temporal dependence structures. It is formally established that spaces admitting an isometric Hilbertian embedding constitute a strict subclass within the much broader class of metric spaces possessing the ball property. While traditional kernel methods are susceptible to geometric distortion when the underlying space cannot be isometrically embedded into a Hilbert space, we bypass such limitations by exploiting a fundamental structural property inherent to this broader class; namely, that Borel probability measures are uniquely determined by their values on balls. These separating classes provide the foundation for the subsequently introduced measure-theoretic inference methodology. We derive uniform convergence of a family of time-dependent random measures, alongside weak invariance principles for the corresponding nonstationary random fields. This framework explicitly exposes how dependence and geometric complexity influence sample path regularity. Furthermore, because the rapid decay of small-ball probabilities can prohibit the existence of limiting distributions for supremum-based discrepancy measures, we develop $L^p$-based alternatives. By directly leveraging the introduced convergence results, this approach circumvents the need for higher-order $U$-process formulations. Finally, for spaces that do admit an isometric Hilbertian embedding, and where $U$-processes naturally arise, we establish limit theory for both degenerate and nondegenerate multi-parameter $U$-processes, and demonstrate that local discrepancy tests maintain asymptotic stability under dynamic parameter regimes.

25.
arXiv (CS.LG) 2026-06-11

Machine-learning-based multipoint optimization of fluidic injection parameters for improving nozzle performance

arXiv:2409.12707v2 Announce Type: replace-cross Abstract: Fluidic injection offers a promising solution to improve the performance of the overexpanded single expansion ramp nozzles (SERNs) during vehicle acceleration. However, determining the injection parameters that yield the best overall performance across multiple nozzle operating conditions remains a challenge. The gradient-based optimization method requires gradients of injection parameters at each design point, which can lead to high computational costs when using computational fluid dynamics (CFD) simulations. This paper uses a pretrained neural network to replace CFD during optimization, enabling quick calculation of the nozzle flow field at multiple design points. Considering the physical characteristics of the nozzle flow field, a prior-based prediction strategy is adopted to enhance the model's accuracy. In addition, the neural network's back-propagation algorithm computes gradients quickly by running the computation only once, thereby greatly reducing gradient computation time compared to the finite difference method. As a test case, the average nozzle thrust coefficient of an SERN at seven design points is optimized, resulting in a 1.14\% improvement. The time cost is greatly reduced compared with traditional optimization methods, even when the time required to establish the training database is included.