Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-11

Visualizing LLM Latent Space Geometry Through Dimensionality Reduction

arXiv:2511.21594v3 Announce Type: replace Abstract: Large language models (LLMs) achieve state-of-the-art results across many natural language tasks, but their internal mechanisms remain difficult to interpret. In this work, we extract, process, and visualize latent state geometries in Transformer-based language models through dimensionality reduction. We capture layerwise activations at multiple points within Transformer blocks and enable systematic analysis through Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). We demonstrate experiments on GPT-2 and LLaMa models, where we uncover interesting geometric patterns in latent space. Notably, we identify a clear separation between attention and MLP component outputs across intermediate layers, a pattern not documented in prior work to our knowledge. We also characterize the high norm of latent states at the initial sequence position and visualize the layerwise evolution of latent states. Additionally, we demonstrate the high-dimensional helical structure of GPT-2's positional embeddings and the sequence-wise geometric patterns in LLaMa. We make our code available at https://github.com/Vainateya/Feature_Geometry_Visualization. A better formatted blog-post with identical content is available at https://iclr-blogposts.github.io/2026/blog/2026/vis-llm-latent-geometry/.

02.
arXiv (CS.CV) 2026-06-17

Attention Alignment Between Humans and Vision-Language Models

Visual perception depends on top-down goals and bottom-up sensory mechanisms. Vision-language models implement both, allowing us to treat each component as a separable hypothesis about what drives where we look. We compared spatial attention maps from six vision-language models against human fixation heatmaps recorded on 200 images during two tasks (general description and social captioning). The six models spanned a 2$\times$2 factorial of CNN vs.\ ViT encoders crossed with LSTM vs.\ Transformer decoders, plus Molmo 7B-D and Qwen3.5 9B. We found that both decoder and encoder architecture shaped alignment, but decoder choice dominated. LSTM vs.\ Transformer decoders increased alignment by 40–50 percentage points (80–87\% vs.\ 40–59\% of the human noise ceiling). In contrast, CNN vs.\ ViT encoders contributed a secondary 5–20 point advantage depending on decoder family, with CNN-LSTM the most aligned model overall (85–87\%). Despite their alignment advantage, LSTM-decoder attention maps were spatially diffuse and minimally task-differentiated; ViT-Transformer, the weakest in alignment, showed the sharpest spatial concentration and strongest task differentiation. A hemispatial-neglect simulation confirmed that ablating attention impacted LSTM decoders more than Transformer decoders. In an exploratory extension using TRIBE-simulated synthetic neural responses, fixation alignment and neural relevance dissociate: CNN-Transformer attention maps better predicted synthetic brain activity despite lower fixation alignment, with attention maps best predicting early visual cortex. Together, top-down and bottom-up components trade off what they predict in behavioral and synthetic neural data.

03.
arXiv (CS.AI) 2026-06-18

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

arXiv:2606.19245v1 Announce Type: new Abstract: Artificial intelligence (AI) agents promise to accelerate drug discovery by compressing interpretation and decision-making loops, but practical deployment requires trusted evaluation on realistic program decisions. We introduce TherapeuticsBench Preclinical Pharmacology (TxBench-PP), a verifiable benchmark for small-molecule preclinical pharmacology and the first focused slice of a broader TherapeuticsBench effort across drug-discovery stages and therapeutic modalities. TxBench-PP tests whether agents can recover accurate conclusions from real-world assay data rather than memorized facts from literature. The benchmark contains 100 evaluations indexed by program stage, assay type, and task structure, spanning mechanism-of-action (MoA) and pharmacodynamic (PD) reasoning, compound-target engagement, causal target validation, developability and safety, and translational efficacy. Agents receive realistic workflow snapshots, inspect files in a coding environment, and return structured answers graded deterministically. Across 16 model-harness configurations, comprising 11 models and 4,800 trajectories, no system reliably recovered preclinical pharmacology decisions. The strongest configuration, Claude Opus 4.8 / Pi, passed 59.3\% of endpoint attempts (178/300; 95\% CI, 51.1-67.6), followed by GPT-5.5 / Pi at 55.3\% (166/300; 47.0-63.6).

04.
arXiv (quant-ph) 2026-06-16

Non-perturbative CPMG scaling and qutrit-driven breakdown under compiled superconducting-qubit control: a single-qubit study

作者:

arXiv:2603.29525v3 Announce Type: replace Abstract: Decoherence in superconducting qubits arises from both multilevel dynamics and structured environmental noise, yet perturbative models cannot capture all resulting signatures. Here, EmuPlat couples instruction-set-architecture-level waveform generation to the hierarchical equations of motion HEOM under $1/f$ non-Markovian pure dephasing. In the resulting non-perturbative regime – where filter-function predictions become quantitatively uninformative – CPMG scaling of a three-level superconducting transmon yields one calibration result, two physical findings, and one structural null. Y-CPMG exhibits axis-dependent scaling-law breakdown – non-monotonic decoherence, partial coherence revival, and pronounced X–Y population asymmetry ($0.204$ vs ${

05.
arXiv (CS.AI) 2026-06-12

ARMOR-MAD: Adaptive Routing for Heterogeneous Multi-Agent Debate in Large Language Model Reasoning

arXiv:2606.13197v1 Announce Type: new Abstract: Multi-agent debate (MAD) can improve large language model reasoning, but fixed debate pipelines often waste computation and can amplify correlated errors among similar agents. We propose ARMOR-MAD, a training-free heterogeneous MAD framework that treats debate as conditional computation. ARMOR-MAD combines three components: Pre-debate Agreement Routing (PAR) decides whether independently generated Round-0 answers require debate; Early Agreement Stopping Evaluator (EASE) stops debate after convergence; and Semantic Outlier Detection (SOD) down-weights abnormal final answers during aggregation. Across MATH Level 5, GSM8K, MMLU, and MMLU-Pro, ARMOR-MAD consistently improves over fixed-round heterogeneous debate with the same model pool, reaching 65.5\%, 96.5\%, 90.0\%, and 81.5\% accuracy, respectively. The results suggest that genuine model heterogeneity and agreement-based control are both important for making MAD more accurate and efficient.

06.
Nature (Science) 2026-06-10

Light-induced quantum friction of carbon nanotubes in water

Friction slows down moving objects at both macroscopic and microscopic scales1. At the electronic level, quantum friction describes direct transfer of momentum between a liquid and the electrons of a solid2. Owing to its microscopic nature, this phenomenon remains experimentally challenging to capture3. Here we show that near-infrared fluorescent single-walled carbon nanotubes (SWCNTs) exhibit light-induced quantum friction in water. It is measured by observing an excitation-power-dependent linear decrease of around 50% in the diffusion constants of functionalized SWCNTs in aqueous solution. This effect disappears when excitons are localized, as in the case of SWCNTs with quantum defects. We further show that the chemical manipulation of exciton concentration by molecules that increase or decrease SWCNT fluorescence also modulates the diffusion constant by up to a factor of 2. Optical pump terahertz (THz) probe spectroscopy shows an instantaneous response (around 30 cm−1) that we assign to direct exciton–water coupling in the range of water Debye modes. It is followed by an increasing (>100 ps) response in the range of intermolecular translational modes of the hydrogen bond network of water (>100 cm−1), resembling heating. Classical molecular dynamics simulations further support a mechanism in which the fluctuating dipole moments of excitons create frictional forces. These findings establish light-induced quantum friction between excitons in SWCNTs and water and show that electronic excitations can be used to control nanoscale motion and fluid properties. Near-infrared fluorescent carbon nanotubes exhibit light-induced quantum friction in water, in which exciton interactions slow nanoscale motion and enable optical control of diffusion and fluid dynamics.

07.
arXiv (quant-ph) 2026-06-17

Stochastic signal sensing with finite energy and dead time at the fundamental quantum limit

arXiv:2606.18133v1 Announce Type: new Abstract: State preparation, measurement, and reset operations take finite time and use finite energy in realistic experiments, yet the impact of this on optimal quantum metrological protocols is not properly understood. We study the effect on sensing a stochastic signal, relevant for the detection of ultralight dark matter and other searches for fundamental physics. We prove that two-mode squeezed vacuum is the optimal probe state given a finite mean-energy constraint for a family of incoherent sensing problems, including noise sensing and quantum illumination. For estimating a gain independent of a loss, we show that entanglement is a required resource to achieve the fundamental quantum limit and observe a non-Gaussian to Gaussian transition in the optimal unentangled state as the dead time increases. We apply our results to bulk acoustic wave resonators.

08.
Nature (Science) 2026-06-17

A prototype differential atom interferometer for fundamental physics

Gravitational waves and ultralight dark matter are among the most compelling frontiers in fundamental physics, motivating proposals for very-long-baseline atom interferometerssuch as AION1, MAGIS2, AICE3 and AEDGE4 that aim to detect at frequencies at which ground-based5 and space-borne6 laser interferometers lose sensitivity. Very-long-baseline atom interferometers look for signals by comparing the quantum phase evolution of widely separated atomic ensembles interrogated by a common laser. However, their performance depends critically on suppressing noise sources, particularly laser phase noise. The experimental validation of such noise rejection remains an important challenge. Here we demonstrate a prototype differential atom interferometer based on the single-photon clock transition of fermionic 87Sr. Thus, we obtain a gradiometer configuration with a species intrinsically suited to kilometre-scale and space-baseline operation. The instrument operates at the standard quantum limit7 with no excess noise beyond atom shot noise. The differential configuration maintains quantum-limited sensitivity in the presence of several radians of artificially injected laser phase noise per shot, which emulates the conditions expected in a very-long-baseline atom interferometer. We also demonstrate the recovery of coherent oscillatory signals across a broad frequency range under fully phase-randomized conditions, a capability that is inaccessible to a single interferometer operating in the same regime. These results provide an experimental validation of the noise-immune measurement principle underlying very-long-baseline atom interferometers and mark an important step towards next-generation quantum sensors for gravitational-wave detection and searches for ultralight dark matter8,9. A prototype differential atom interferometer operates at the standard quantum limit with no excess noise beyond atom shot noise, achieving performance in line with the specifications for future long-baseline atom interferometers.

09.
arXiv (quant-ph) 2026-06-19

Random Local Stabilizer Codes in Three Dimensions without String or Self-Similar Fractal Logical Operators

作者:

arXiv:2606.19873v1 Announce Type: new Abstract: Quantum error-correcting codes (QECs) are essential components quantum computation and have deep connections to quantum phases of matter. A key obstruction to passive self-correcting QECs is the presence of string logical operators, which can generate logical errors through constant-energy-barrier processes. Haah's Codes (fracton codes) showed that three-dimensional stabilizer codes can forbid such string logical operators, but their translation-invariant structure supports self-similar fractal logical operators with a logarithmic energy barrier. We introduce the qutrit random cubic codes, a family of local qutrit Calderbank-Shor-Steane stabilizer Hamiltonians with similar cube-check structure as Haah's Code 1 but built from spatially varying stabilizers. We prove that these models retain the no-string property and numerically observe that they have properties distinct from translation-invariant fracton codes: the smallest ground-state degeneracy exponent is $k=2$ for odd $L$ and $k=4$ for even $L$; noncontractible plane-logical operators span the entire logical space; and charge-push diagnostics show that the self-similar fractal operators are absent. These results demonstrate that constrained randomness can fundamentally change the nature of stabilizer codes and improve their self-correction properties. They further point to broader families of quantum error-correcting codes and quantum phases beyond canonical topological and fracton orders.

10.
medRxiv (Medicine) 2026-06-10

Assessment of the accuracy of lung lesions diagnosis in adolescents with osteosarcoma using artificial intelligence

Background. Lung metastases in osteosarcoma (OS) are the main cause of the death. The accuracy of the diagnosis of nodules by computed tomography (CT) of the lungs is critically important for determining the disseminated stage of the disease and planning surgical treatment. The use of artificial intelligence (AI) in the search for lung nodules increases the accuracy of diagnosis and reduces the chance of missing metastases. Objective: to evaluate the accuracy of lung nodules diagnosis in adolescents with OS using AI. Methods. A retrospective assessment of CT scans of adolescents with OS was performed. A pathological nodule with an average size of [≥]4 mm was considered a target finding. The diagnostic accuracy of an AI algorithm previously trained on an adult dataset was evaluated, and the number of false positives (FP) and false negatives (FN) was determined. Sensitivity, specificity, accuracy, area under the ROC curve (AUC), positive predictive value, negative predictive value, and F1-measure were calculated. Based on the obtained results, the effectiveness of the algorithm was assessed. Results. 248 CT scans of adolescents with OS were evaluated. The following results were obtained: in 5 cases, the AI algorithm showed a FP result (2.02%), in 34 cases, it showed a FN result (13.71%), and in 209 cases, a correct result (both true positive and true negative) (84.27%). The diagnostic accuracy of the algorithm was 0.843 (95% CI 0.794-0.887). The application of the AI algorithm in the practice of an X-ray doctor in a specific clinical task would allow to increase the sensitivity from 0.805 to 0.891, while ensuring an absolute decrease in the number of FN results by 8.59% and a relative decrease by 44%. Conclusion. The obtained results confirm the practical value of the application of the AI algorithm and justify the implementation of AI-assisted systems in the diagnostic protocols for lung metastases in adolescents with OS.

11.
arXiv (CS.LG) 2026-06-19

Diffuse AI Control on Fuzzy Tasks

arXiv:2606.08892v2 Announce Type: replace Abstract: AI models deployed in critical domains, such as AI safety research, may subtly sabotage our efforts due to misalignment. Diffuse AI Control is a subfield of AI safety concerned with mitigating risks from AI sabotage distributed over long deployment horizons (diffuse threats). These risks are particularly pernicious on fuzzy tasks, i.e. tasks which are hard to grade or require intuition. To understand diffuse threats on fuzzy tasks, we introduce a framework that considers AI control as an adversarial game between a blue team and a red team. The blue team uses a weak trusted model to construct a weak score against which they would train a strong, potentially subversive model to remove the subversion propensity if it were present. The red team then tries to find model behaviors that are rated highly by the weak score, and thus might not be trained out, but actually correspond to poor performance. We test our framework on the task of writing experimental proposals for research questions from recent ML papers. We use a language model with access to the original paper as a proxy "ground-truth" scorer. Our red team discovers subversive behaviors using multi-objective evolutionary prompt optimization. We show that Opus~4.6 can write proposals that are worse according to the ground truth proxy than those of GPT-OSS-20B, while the weak scorer rates them as highly as the best proposals from Opus 4.6. We then propose an adversarial optimization algorithm for the blue team that discovers more robust prompts for the weak model. This algorithm produces a blue team prompt that our red team optimization fails to exploit.

12.
arXiv (CS.CV) 2026-06-16

Implementation of Licensed Plate Detection and Noise Removal in Image Processing

作者:

Car license plate recognition system is an image processing technology used to identify vehicles by capturing their Car License Plates. The car license plate recognition technology is also known as automatic number-plate recognition, automatic vehicle identification, car license plate recognition or optical character recognition for cars. In Malaysia, as the number of vehicle is increasing rapidly nowadays, a pretty great number of vehicle on the road has brought about the considerable demands of car license plate recognition system. Car license plate recognition system can be implemented in electronic parking payment system, highway toll-fee system, traffic surveillance system and as police enforcement tools. Additionally, car license plate recognition system technology also has potential to be combined with various techniques in other different fields like biology, aerospace and so on to achieve the goal of solving some specialized problems.

13.
medRxiv (Medicine) 2026-06-12

Metastatic Patterns and Treatment Characteristics of Triple-Negative Breast Cancer in Nigeria: A Retrospective Cohort Study

Background: Triple-negative breast cancer (TNBC) is an aggressive breast cancer subtype characterized by the absence of estrogen receptor, progesterone receptor, and human epidermal growth factor receptor 2 expression. It is associated with limited targeted treatment options, early relapse, and a high propensity for visceral metastasis. Data describing metastatic patterns and treatment characteristics of TNBC in Nigeria remain limited. Methods: This retrospective descriptive cohort study included 869 patients with TNBC managed at the Medserve-LUTH Cancer Center, Lagos University Teaching Hospital, Nigeria between June 2019 and June 2024. Demographic, clinicopathologic, metastatic, and treatment-related data were extracted from electronic medical records. Descriptive statistics were used to summarize patient characteristics, metastatic patterns, and treatment profiles. Associations between metastatic disease and selected clinicopathologic and treatment variables were explored using Pearsons chi-square test. Complete-case analysis was applied throughout. Results: The mean age at presentation was 52.09 {+/-} 12.26 years. Most patients were married (79.1%), postmenopausal (64.3%), and of Yoruba ethnicity (56.8%). Advanced disease predominated, with Stage III and Stage IV disease accounting for 42.9% and 35.6% of cases, respectively. Invasive ductal carcinoma was the most common histologic subtype (77.0%), while Grade II tumours constituted 51.3% of graded cases. Surgery was performed in 73.1% of patients, predominantly mastectomy (70.9% of surgical procedures). Chemotherapy was administered to 83.2% of patients, most commonly anthracycline-based regimens (41.8%), while radiotherapy was delivered to 63.5% of patients, with hypofractionated schedules of 42-43 Gy in 15-16 fractions accounting for 47.2% of radiotherapy courses. Metastatic disease was documented in 32.9% of evaluable patients. Lung metastasis was the most frequent site (62.5%), followed by bone (46.3%), regional lymph node invasion (38.5%), liver (23.0%), and brain (22.6%). Tumour grade and histologic subtype were not significantly associated with metastatic disease, whereas radiotherapy exposure demonstrated a significant association with metastatic status ({chi}{superscript 2} = 10.35, p = 0.001). Conclusion: TNBC in this Nigerian cohort was characterized by advanced-stage presentation, invasive ductal predominance, extensive use of multimodality treatment, and substantial visceral metastatic burden. Lung metastasis was the most common metastatic site. These findings provide contemporary real-world data on TNBC in Nigeria and highlight the continuing need for earlier diagnosis, timely referral, and sustained investment in comprehensive cancer care services.

14.
Nature (Science) 2026-06-18

Daily briefing: The brain builds a sentence neuron by neuron

作者:

Researchers have tracked the electrical activity of individual brain cells during conversation in real time. Plus, the history of GPS and a cross-species transplant that could reveal clues about the origin of animals. Researchers have tracked the electrical activity of individual brain cells during conversation in real time. Plus, the history of GPS and a cross-species transplant that could reveal clues about the origin of animals.

15.
arXiv (CS.CL) 2026-06-16

Equity with Efficiency: An Empirical Study of Tokenizers for Multilingual Large Language Models

Multilingual large language models (LLMs) depend on subword tokenization to bridge discrete text and continuous neural representation. State-of-the-art multilingual LLMs often use Byte-level Byte-Pair Encoding (BPE) tokenizers that structurally favor high-resource languages and Latin scripts. For speakers of underrepresented languages, particularly those across Southeast Asia, this bias inflates inference costs and widens cross-lingual capability gaps. We present the first systematic comparison of equitable tokenizers on a unified benchmark spanning 11 Southeast Asian languages. Beyond tokenizer-level analysis of compression efficiency and cross-lingual equity, we assess downstream task performance through controlled 1.5B-parameter language model training using the same training data. Our results show that Parity-aware BPE lies on the Pareto frontier of the efficiency-equity trade-off, achieving strong compression parity at competitive cost. Morphology-Driven Byte Encoding delivers the best semantic reasoning performance through morphologically richer representations, albeit at a higher computational expense. Byte Latent Transformer underperforms on downstream tasks, possibly because its architectural assumptions misalign with the constraints of limited low-resource training data. Together, our findings demonstrate that cross-lingual fairness and tokenization efficiency are not fundamentally at odds, and offer practical guidance for designing equitable multilingual models.

16.
arXiv (CS.AI) 2026-06-12

HybridCodeAuthorship: A Benchmark Dataset for Line-Level Code Authorship Detection

arXiv:2606.12620v1 Announce Type: cross Abstract: Thanks to the rapid adoption of AI code assistants powered by large language models (LLMs), industry codebases are, increasingly, a hybrid of AI- and human-authored code. For risk management and productivity analysis purposes, it is crucial to enable fine-grained location detection of AI-generated code. To develop algorithms for this task, quality benchmarks are needed to assess performance. However, existing benchmarks tend to comprise academic, LeetCode-style problems and presume a code snippet is either completely human-authored or completely AI-authored, which is not reflective of the diverse intents and styles of industry codebases utilizing AI code assistants. To fill these gaps, we introduce HybridCodeAuthorship, a novel benchmark of Python code files with interleaved human- and AI-authored lines of code to simulate authentic utilization of AI code assistants. In this paper, we first present our dataset construction pipeline, which leverages CodeSearchNet, a massive collection of links to open sourced repositories on GitHub. We then benchmark the performance of two state-of-the-art AI-generated code detection algorithms at both the line- and chunk-level. Experimental results demonstrate that HybridCodeAuthorship is a challenging benchmark with a top-scoring algorithm, AIGCode Detector, obtaining a highest F1 score of 0.48 and 0.56 on chunk-level and line-level code detection tasks, respectively.

17.
arXiv (CS.CL) 2026-06-16

In-Domain Supervised Pathology Report Classification: A Reproducible Pipeline from Data Curation to Production-Matched Evaluation

We introduce an in-domain supervised pipeline designed to counter the out-of-distribution performance drop that hampers supervised biomedical NLP models, a problem observed when models trained on pathology reports are moved across cancer registries. Our contribution is a reproducible recipe for training a supervised classifier from routinely collected cancer registry data. It describes how to build the in-domain training set and a production-matched holdout, and to choose operating points that keep the false-negative rate (FNR) very low while keeping reviewer workload manageable. The pipeline standardizes data curation with facility-stratified sampling and separate handling of reports linked to registry cases, and includes a blinded manual audit to estimate positive-case prevalence and label noise. On a 418k-report holdout set, the Kentucky model achieved FNR 0.003 and false-positive rate (FPR) 0.097, improving over the Seattle-trained MOSSAIC OncoID baseline (FNR 0.010, FPR 0.183) and raising F1 from 0.860 to 0.922. In a blinded manual review of 600 reports, estimated positive prevalence declined from 0.500 to 0.398, indicating substantial label noise with errors concentrated in rare primary sites.

18.
arXiv (CS.LG) 2026-06-18

A Streaming Sparse Cholesky Method for Derivative-Informed Gaussian Process Surrogates Within Digital Twin Applications

arXiv:2511.00366v2 Announce Type: replace-cross Abstract: Digital twins are developed to model the behavior of a specific physical asset (or twin), and they can consist of high-fidelity physics-based models or surrogates. A highly accurate surrogate is often preferred over multi-physics models as they enable forecasting the physical twin future state in real-time. To adapt to a specific physical twin, the digital twin model must be updated using in-service data from that physical twin. In this paper, we combine and extend several previous surrogate-related advancements with the goal of demonstrating an end-to-end digital twin (DT) solution for predicting performance of an aircraft structure (the physical asset). To this end, we extend Gaussian process (GP) models to include derivative data, for improved accuracy, with dynamic updating to ingest physical twin data during service. Including derivative data, however, comes at a prohibitive cost of increased covariance matrix dimension. We circumvent this issue through our modified dynamic sparse Cholesky linear system solver. Numerical experiments demonstrate that the prediction accuracy of the derivative-enhanced sparse Cholesky GP method produces improved models upon dynamic data additions. Lastly, we demonstrate the developed algorithm within a DT framework to model fatigue crack growth in an aerospace vehicle, thereby exhibiting through our assembled engineered system how digital twin technologies can be combined in practice.

19.
arXiv (CS.LG) 2026-06-18

Stealthy World Model Manipulation via Data Poisoning

arXiv:2606.18697v1 Announce Type: new Abstract: Model-based learning agents use learned world models to predict future states, plan actions, and adapt to new environments. However, the process of updating world models from collected experience creates a training-time attack surface: adversarially poisoned fine-tuning trajectories can manipulate the learned dynamics and thereby corrupt downstream planning. In this paper, we propose SWAAP, the first two-stage data poisoning framework for learned world models. In the first stage, SWAAP identifies a harmful target world model that induces low-return behavior under planning while remaining close to clean dynamics, using first-order bilevel optimization enabled by a transition-gradient theorem. In the second stage, SWAAP realizes this target through stealth-constrained gradient matching, modifying only a limited fraction of fine-tuning transition targets so that the induced training gradients steer the victim model toward the adversarial target, while a prediction-error regularizer encourages the poisoned targets to remain close to the world model's natural approximation error. To assess attack stealthiness, we evaluate defenses and detectability across three stages of the poisoning pipeline: pre-training detection of poisoned transitions, robust training during fine-tuning, and test-time monitoring of the resulting world model. Across diverse continuous-control tasks, SWAAP causes substantial performance degradation while keeping poisoned transitions close to clean data and evading the evaluated non-adaptive residual/CUSUM/TRIM-style defenses. These results reveal a practical vulnerability in world-model adaptation pipelines and highlight the need for robustness methods that protect both world-model training data and learned dynamics.

20.
arXiv (CS.LG) 2026-06-15

Compressed Computation is (probably) not Computation in Superposition

arXiv:2606.14673v1 Announce Type: new Abstract: We study whether the Compressed Computation (CC) toy model (Braun et al., 2025) is an instance of computation in superposition. The CC model appears to compute 100 ReLU functions with just 50 neurons, achieving a better loss than expected from only representing 50 ReLU functions. We show that the model mixes inputs via its noisy residual stream, corresponding to an unintended mixing matrix in the labels. Splitting the training objective into the ReLU term and the mixing term, we find that performance gains scale with the magnitude of the mixing matrix and vanish when the matrix is removed. The learned neuron directions concentrate in the subspace associated with the top 50 eigenvalues of the mixing matrix, suggesting that the mixing term governs the solution. Finally, a semi-non-negative matrix factorization (SNMF) baseline derived solely from the mixing matrix reproduces the qualitative loss profile and improves on prior baselines, though it does not match the trained model. These results suggest CC is not a suitable toy model of computation in superposition.

21.
arXiv (math.PR) 2026-06-16

Exponential Convengence of DLRA for SDEs

arXiv:2606.15843v1 Announce Type: new Abstract: We study dynamical orthogonal (DO) approximations of stochastic differential equations and investigate their long-time behaviour. The DO formulation represents the solution by a low-rank decomposition and leads to a coupled system consisting of an evolution equation on the Stiefel manifold and a reduced stochastic process. We establish the well-posedness of the strong DO system and derive quantitative error estimates between the original stochastic differential equation and its low-rank approximation in the Wasserstein distance. Our main contribution is the analysis of invariant probability measures for the DO dynamics. Under suitable dissipativity, Lipschitz continuity, and non-degeneracy assumptions on the coefficients, we prove the existence of an invariant probability measure for the strong DO system. The proof combines uniform moment estimates, a Krylov–Bogoliubov argument for an associated frozen system, and a Kakutani-Fan-Glicksberg fixed-point theorem to recover the self-consistent dynamics. We further show that the induced low-rank process admits an invariant probability measure and discuss the structure of invariant measures through several illustrative examples. These results provide a rigorous foundation for the use of dynamical low-rank approximations in the approximation of long-time statistical properties of stochastic dynamical systems.

22.
arXiv (CS.AI) 2026-06-19

See-and-Reach: Precise Vision-Language Navigation for UAVs within the Field of View

arXiv:2606.20045v1 Announce Type: cross Abstract: UAV Vision-Language Navigation (UAV-VLN) is typically formulated as a holistic search-and-reach problem, where long-range target discovery and final target approach are optimized and evaluated jointly. This formulation makes it difficult to assess a critical capability of aerial embodied agents, namely whether a UAV can accurately ground a visible target and translate vision-language evidence into precise 3D motion once the target enters its field of view. To address this limitation, we introduce UAV-VLN-FOV, a target-visible navigation task that isolates the see-and-reach stage and enables a more diagnostic evaluation of terminal reaching ability. We further propose 3DG-VLN, a vision-language waypoint prediction framework guided by dynamic 3D direction cues to enhance fine-grained visual grounding and spatial direction alignment for precise target reaching. Specifically, 3DG-VLN adaptively processes high-resolution front-view and downward-view observations to preserve fine-grained visual and geometric details for target grounding. It also updates the target-relative direction online during closed-loop navigation, allowing the agent to maintain spatial alignment with the target and reduce accumulated direction drift. To support this task, we construct a dedicated high-resolution benchmark which contains 2,717 trajectories with target-oriented high-level instructions, high-resolution front-view and downward-view egocentric observations, and continuous 3D waypoint annotations. Experiments show that 3DG-VLN outperforms competitive UAV-VLN baselines, achieving a 13.82\% improvement in success rate. Real-world trials further demonstrate the potential of 3DG-VLN for practical see-and-reach navigation. The source code and benchmark are available at https://github.com/xuefanfu/3DG-VLN.

23.
arXiv (math.PR) 2026-06-11

Heat kernel estimates for Markov processes with blowing-up jump kernels

arXiv:2512.24807v2 Announce Type: replace Abstract: In this paper, we establish sharp two-sided heat kernel estimates for a large class of purely discontinuous symmetric Markov processes on closed subsets $F$ of $\mathbb{R}^d$, whose jump kernels blow up on a Borel subset $\Sigma$ of $F$. We assume that $F\setminus \Sigma$ is a $\kappa$-fat set and is dense in $F$. To the best of our knowledge, this is the first work establishing sharp heat kernel estimates for jump processes whose jump kernels blow up on part of the state space. The jump kernels under consideration take the form $J(x,y)=|x-y|^{-d-\alpha}{\mathcal B}(x,y)$, where $\alpha\in (0,2)$ and the function ${\mathcal B}(x,y)$ blows up at a subset $\Sigma$ of $F$. A fundamental obstacle is that the tails of the jump measures are not uniformly bounded, and hence standard techniques in heat kernel analysis do not provide a priori off-diagonal estimates. To overcome this difficulty, we develop a new approach based on weighted integral estimates for the heat kernel that are sensitive to both the blow-up behavior of the jump kernel and the geometry of $F\setminus \Sigma$. Examples of processes falling within our general framework include traces of isotropic $\alpha$-stable processes in $C^{1,\rm Dini}$ sets, processes in Lipschitz sets arising in connection with the nonlocal Neumann problem, and a large class of resurrected self-similar processes in the closed upper half-space.

24.
arXiv (CS.AI) 2026-06-17

When Rules Learn: A Self-Evolving Agent for Legal Case Retrieval

arXiv:2606.17220v1 Announce Type: new Abstract: Legal case retrieval remains challenging due to the complexity of legal language and the need for precise lexical alignment between queries and relevant cases. Although dense retrieval models have achieved notable progress, empirical studies show that BM25 continues to serve as a strong baseline in this domain. It motivates us to propose a self-evolving framework for rule-driven query rewriting that enhances BM25 without any parameter training. The framework equips an LLM-based agent with an automatic evaluation environment, enabling it to iteratively create rewriting rules, plan validation experiments over rule combinations, and eliminate ineffective rules based on historical feedbacks. We evaluate our method on the Chinese legal case retrieval benchmark LeCaRD-v2. Experimental results demonstrate that the proposed framework outperforms non-evolutionary baselines, including human-designed rules and greedy rule selection, particularly when powered by a highcapacity core LLM. We also conduct detailed analyses to investigate the mechanisms underlying self-evolution. Our findings reveal that LLM's capabilities to leverage previous experimental results and its intrinsic knowledge of rule elimination play critical roles in refining the rule set via self-evolution.

25.
arXiv (CS.LG) 2026-06-16

Deep Learning-Based Lunar Crater Terrain Relative Navigation

arXiv:2606.14776v1 Announce Type: cross Abstract: Accurate position estimation is crucial for the successful implementation of future lunar landings using autonomous vehicles, especially in dangerous environments with sparse terrain features. In this paper, we propose a terrain relative navigation (TRN) algorithm combining our deep-learning crater detector, which was designed specifically for the NASA Crater Detection Challenge problem, and an Extended Kalman Filter (EKF). Our detector analyzes crater features from the monocular images acquired from orbit, and their matches with craters from a global database are identified via a Hungarian assignment approach followed by the consensus-based outliers removal method. The estimated measurements are then used to refine an EKF, where spacecraft pose estimation in the Lunar-Centered Lunar-Fixed (LCLF) frame of reference, augmented with altitude aiding information, constrains radial drift. The simulation results indicate that even if the spacecraft is off from its actual location up to 5 km, TRN could recover from this situation, achieving navigation error reduction to a few hundred meters. It should be noted that in order to maintain crater feature correspondences, it is important to match the image resolution and the scales within the scene to the detector training set distribution.