Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

A Multi-level Analysis of Factors Associated with Student Performance: A Machine Learning Approach to the SAEB Microdata

arXiv:2510.22266v3 Announce Type: replace-cross Abstract: Identifying the factors that influence student performance in basic education is a central challenge for formulating effective public policies in Brazil. This study introduces a multi-level machine learning approach to classify the proficiency of 9th-grade and high school students using microdata from the System of Assessment of Basic Education (SAEB). Our model uniquely integrates four data sources: student socioeconomic characteristics, teacher professional profiles, school indicators, and principal management profiles. A comparative analysis of four ensemble algorithms confirmed the superiority of a Random Forest model, which achieved 90.2% accuracy and an Area Under the Curve (AUC) of 96.7%. To move beyond prediction, we applied Explainable AI (XAI) using SHAP, which revealed that the school's average socioeconomic level is the most dominant predictor, demonstrating that systemic factors have a greater impact than individual characteristics in isolation. The primary conclusion is that academic performance is a systemic phenomenon deeply tied to the school's ecosystem. This study provides a data-driven, interpretable tool to inform policies aimed at promoting educational equity by addressing disparities between schools.

02.
arXiv (CS.AI) 2026-06-12

CausalMoE: A Billion-Scale Multimodal Foundation Model for Granger Causal Discovery with Pattern-Routed Heterogeneous Experts

arXiv:2606.13024v1 Announce Type: cross Abstract: Granger Causal Discovery (GCD) is fundamental for analyzing temporal dependencies in complex systems. However, existing neural GCD methods predominantly rely on a "one-size-fits-all" paradigm, struggling to capture distribution shifts and dynamic regime changes inherent in real-world time series. This often leads to entangled representations and spurious causal graphs. In this paper, we propose CausalMoE, a billion-scale multimodal Granger causal foundation model that explicitly models patch-level heterogeneity. CausalMoE introduces a Pattern-Routed Mixture of Heterogeneous Experts, which dynamically identifies latent temporal patterns and routes patches to specialized domain experts, effectively decoupling regime-specific mechanisms from shared dynamics. To ensure interpretable graph recovery, we design a Causality-Aware Self-Attention mechanism operating across variables, yielding sparse Granger causal graphs via proximal optimization. Furthermore, CausalMoE is the first to integrate LLMs and VLMs to align numerical signals with textual and visual priors, regularizing causal estimation in complex scenarios. Extensive experiments demonstrate that CausalMoE establishes a new state-of-the-art on fully supervised benchmarks, while effectively generalizing to few-shot settings where traditional methods fail.

03.
arXiv (CS.CV) 2026-06-11

AVIS: Adaptive Test-Time Scaling for Vision-Language Models

Modern Vision-Language Models (VLMs) benefit from chain-of-thought prompting and test-time scaling, but these gains often come with prohibitive inference cost due to large visual contexts and long decoding chains. We view this cost through two coupled axes: Visual Context Scaling (VCS), which controls how much visual evidence is passed to the language model, and Visual Reasoning Scaling (VRS), which controls how much inference-time reasoning search is performed. Existing methods typically optimize one axis at a time, leaving the joint allocation of compute across these axes underexplored. We introduce Adaptive Visual Inference Scaling (AVIS), a lightweight policy that adapts both VCS and VRS per query. AVIS realizes VCS through Key Diversity Visual (KDV) pruning, a training-free $O(N)$ key-based rule for removing redundant visual tokens before prefilling, and realizes VRS through adaptive self-consistency, using a learned difficulty predictor to select the number of reasoning rollouts. AVIS is deployment-friendly and compatible with shared-prefill inference, where all rollouts reuse a single prefilling pass and KV cache. Across diverse image and video reasoning benchmarks, AVIS improves the accuracy–compute trade-off relative to VCS-only and VRS-only baselines, and remains effective on top of RL post-trained VLMs while keeping compute and latency low.

04.
arXiv (CS.AI) 2026-06-16

FOUNDv2: Learning Unified User Quantized Tokenizers for User Representation

arXiv:2508.00956v3 Announce Type: replace-cross Abstract: User representation learning serves as a fundamental pillar for personalized services on large-scale web platforms. Despite its importance, conventional continuous embedding methods face significant challenges, including the lack of a unified paradigm for multi-source data integration, prohibitive storage overhead due to low information density, and the lack of multi-scale modeling granularity. To overcome these limitations, we introduce FOUNDv2, a comprehensive user representation scheme centered on the Unified User Quantized Tokenizer U2QT) framework. FOUNDv2 transforms heterogeneous user data into a standardized discrete token space through a robust two-stage architecture. Specifically, the framework first extracts compact feature representations and subsequently employs a multi-view RQ-VAE to discretize them into storage-efficient tokens using shared and source-specific codebooks. To empower these representations with predictive intelligence, we further design multi-scale alignment objectives to capture both fine-grained behavioral dependencies and macro-temporal periodicity. Extensive experiments on various benchmarks demonstrate that FOUNDv2 consistently outperforms task-specific baselines while achieving substantial reductions in storage and computational costs. Finally, the large-scale deployment of FOUNDv2 on Alipay validates its practical scalability and efficiency across diverse industrial scenarios. The main code is available at: https://github.com/chuanhe1999/FOUNDv2.

05.
arXiv (CS.CL) 2026-06-11

A PubMed-Scale Dataset of Structured Biomedical Abstracts

Structured abstracts are important for biomedical literature processing, by facilitating information retrieval, text mining, and knowledge synthesis. However, a vast portion of abstracts indexed in PubMed remain unstructured, presenting a significant bottleneck for downstream text-processing workflows and applications. To resolve this limitation, we introduce Structured PubMed, a comprehensive corpus of section-labeled biomedical abstracts compiled from the complete PubMed database, encompassing over 23.2 million research-article records. The corpus is divided into two distinct subsets: a collection of 5.9 million author-structured abstracts parsed from official XML files, and an automatically labeled collection of 17.2 million originally unstructured abstracts structured via a verbatim-extraction Large Language Model pipeline. Every record is harmonized under a unified five-section schema and mapped to its original PubMed identifier, publication type, and publication date. This dataset can be utilized to train sentence-classification models, benchmark text-segmentation architectures, and perform large-scale, section-specific information extraction at an unprecedented PubMed-wide scale.

06.
bioRxiv (Bioinfo) 2026-06-11

OCOO-T : A SIMPLE AND SCALABLE VIRTUAL CELL MODEL FOR TRANSCRIPTIONAL PERTURBATION RESPONSE PREDICTION

Predicting single-cell transcriptional responses to genetic, chemical and cytokine perturbations is a fundamental challenge in computational biology and AI Virtual Cell (AIVC) modeling, with direct implications for drug discovery and the elucidation of gene regulatory networks. Existing approaches often rely on auxiliary cell-state encoders, hierarchical variational autoencoders, dedicated Transformer encoder-decoder modules, or gene-interaction priors to compress high-dimensional expression profiles into latent representations. While effective, these designs increase architectural complexity and may limit scalability and generalizability. This paper introduces OCOO-T, a minimalist flow-matching-based AIVC model for transcriptional perturbation response prediction. OCOO-T utilizes a vanilla Transformer stack that operates directly on continuous gene expression profiles and formulates perturbation response prediction as a continuous-time denoising process. Perturbation embeddings, dosage information, and cell-line/cell-type specificity are integrated through adaptive layer normalization and in-context tokens. Comprehensive evaluations on Tahoe100M, Replogle, and PBMC benchmarks demonstrate that OCOO-T achieves state-of-the-art performance across diverse perturbations and cell types while effectively scaling to long transcriptional profiles through patching and depatching of cellular contexts. By leveraging the simplicity of Transformer-based denoising for single-cell omics, OCOO-T provides an effective and scalable framework for in-silico cellular simulation.

07.
arXiv (CS.CV) 2026-06-11

Weakly Supervised Segmentation as Semantic-Based Regularization

Weakly supervised semantic segmentation (WSSS) trains dense pixel-level segmentation models from partial or coarse annotations such as bounding boxes, scribbles, or image-level tags. While recent work leverages foundation models such as the Segment Anything Model (SAM) to generate pseudo-labels, these approaches typically depend on heuristic prompt choices and offer limited ways to incorporate prior knowledge or heterogeneous labels. We address this gap by taking a neurosymbolic perspective: integrating differentiable fuzzy logic with deep segmentation models. Weak annotations and domain-specific priors are unified as continuous logical constraints that fine-tune SAM under weak supervision. The refined foundation model then produces improved pseudo-labels, from which we train a second-stage prompt-free segmentation model. Experiments on Pascal VOC 2012 and the REFUGE2 optic disc/cup segmentation dataset show that our logic-guided fine-tuning yields higher-quality pseudo-labels, leading to state-of-the-art segmentation accuracy that often exceeds densely supervised baselines.

08.
arXiv (CS.LG) 2026-06-19

Exploring the potential of AlphaEarth and TESSERA embeddings for Fine-scale Local Climate Zone Mapping: A case study across five cities in Switzerland

arXiv:2606.20034v1 Announce Type: new Abstract: Understanding urban spatial morphology is critical for climate modeling, risk assessment, and sustainable urban design, and Local Climate Zone (LCZ) mapping provides the basic framework for this. However, many cities still use coarse ~100-m resolution LCZ records, which are unsuitable for fine-scale urban research. In this study, precomputed embeddings from TESSERA (Feng et al., 2025) and AlphaEarth (Brown et al., 2025) are compared to traditional Sentinel-1/2 (S1S2) composites in five Swiss cities to see if they can upscale coarse LCZ maps to 10-m resolution using an attention-based U-Net. Three experiments assess multi-city transferability, the impact of higher-resolution reference data, and temporal robustness to year-to-year phenology changes. We find that all datasets achieve strong performance with test data Intersection-over-Union (IoU) ranging from 0.59-0.69 and 0.77-0.82 in the first two experiments. TESSERA consistently outperforms both S1S2 and AlphaEarth across both settings As expected, we find that the transfer of embedding-based models from one year to another remains an open challenge. Overall, however, our results demonstrate the promising potential of embeddings derived from EO foundation models to reduce time consuming preprocessing, respectively, manual feature engineering tasks and to guide a universal deep learning-based LCZ mapping workflow. When combined with a simple location-aware attention U-Net architecture, the embeddings enhance regional transferability and scalability, supporting the development of comprehensive and reproducible fine-scale LCZ maps for global urban climate applications Improving reference data quality remains the strongest lever for further accuracy gains.

09.
arXiv (CS.CV) 2026-06-16

Beyond Scalar Distances: Semantic Attribute Gradients from Frozen MLLMs for Visual Embeddings

Vision encoders for retrieval are typically trained with class-label supervision: each training pair reduces to a scalar that uniformly pushes the embedding apart or pulls it together, as if every visual attribute either differed or matched. A multimodal large language model (MLLM), shown the same pair, can articulate those attributes and use them to predict whether the images share a class. We propose SAGA, a framework that turns this language-grounded, attribute-aware perception into a training signal for the encoder itself. Specifically, we use Group Relative Policy Optimization (GRPO) to reward the MLLM for correct predictions on the vision encoder's tokens. Since correct predictions require those tokens to expose the specific attributes that differ or match between the pair, the gradient pushes the encoder to encode them, replacing the uniform pair-level scalar with attribute-resolved supervision. An auxiliary attention-distillation loss anchors the encoder's embedding to tokens the MLLM attended to, and a standard metric-learning loss shapes the embedding geometry for nearest-neighbour retrieval. The MLLM is frozen throughout and discarded at inference, matching the deployment cost of a metric-learning baseline. SAGA improves Recall@1 by 3 to 6 points over state-of-the-art baselines on CUB-200-2011, Cars-196, FGVC-Aircraft, and iNaturalist Aves on zero-shot image retrieval.

10.
arXiv (CS.AI) 2026-06-12

Speculative Rollback Correction for Quality-Diverse Web Agent Imitation

arXiv:2606.12485v1 Announce Type: cross Abstract: Training interactive web agents through imitation learning from expert trajectories has emerged as a highly effective approach. However, determining the optimal timing for expert intervention presents a critical challenge in this context. Delayed intervention often leads to the accumulation of early-stage errors, pushing the page state into an irrecoverable regime. Conversely, premature or excessive intervention causes the agent to become overly reliant on expert policies, trapping the model in local optima characterized by a single, rigid trajectory. We propose Speculative Rollback Correction (SRC), a branch-level imitation framework for resettable agent environments. Instead of requesting teacher labels at every visited state or correcting only after a completed trajectory, SRC uses fixed-horizon branch review: the student executes a short speculative segment before teacher review, and the teacher localizes the first harmful deviation only when local progress breaks. Rollback preserves useful prefixes, while successful rollouts are filtered by a hard verifier and retained in a lightweight quality-diversity archive. The resulting data supports next-action supervised fine-tuning on both localized corrections and verifier-passing trajectories. On WebArena-Infinity, SRC collects 977 verifier-passing trajectories and 9,183 next-action examples; fixed-horizon review improves the recovery-versus-query tradeoff over step-level review while retaining verifier-passing solution variants. Code is available at https://github.com/LongkunHao/SRC_gui_agent.

11.
medRxiv (Medicine) 2026-06-22

Use of the Pharmacy First service in England in the first 12 months: geographic variation and health system context

Objectives: The Pharmacy First (PF) service was introduced across England from 31 January 2024 to expand the clinical role of community pharmacies and improve access to primary care. This paper describes use of PF in its first 12 months, in terms of uptake, access routes, consultation outcomes, geographic variations, service costs and antimicrobial supply. Methods: A descriptive analysis of all PF consultations submitted for payment to NHS Business Services Authority in England between 31 January 2024 and 31 January 2025. Pharmacy-level consultation data were linked to national data on population, location and pharmacy characteristics. PF use was examined using population-standardised consultation rates and consultations per pharmacy. Results: During the first year of implementation, 2,205,731 PF consultations were recorded as delivered across 11,349 pharmacies, with payment of GBP123 million to pharmacies. Uptake increased steadily over time. Most consultations were for acute sore throat (33%) and uncomplicated urinary tract infection (27%), with corresponding antibiotics, phenoxymethylpenicillin and nitrofurantoin being the most supplied. Most people self-referred (74%) into the service, with 95% of consultations managed without onward referral. Substantial geographic variation was observed. Northern regions had higher use based on the eligible population. The South East and Midlands had higher activity per pharmacy. London showed a distinct pattern, with higher self-referral into the service, lower medication supply and higher referral to other healthcare services. Higher consultation volume was weakly associated with pharmacy characteristics, including opening hours, pharmacy type and retail setting, and local context, in terms of socio-economic and geographic factors. Conclusions: PF had immediate uptake and is operating primarily as a direct-access model for common acute conditions. Findings suggest that PF is contributing to improved access to care and may shift demand away from general practice. However, the service uptake appears to be shaped by geographic location, proximity to other healthcare services and pharmacy characteristics.

12.
arXiv (quant-ph) 2026-06-19

Asymmetric and chiral dynamics of two-component anyons with synthetic gauge flux

arXiv:2512.19139v3 Announce Type: replace-cross Abstract: In this work, we investigate the non-equilibrium dynamics in a one-dimensional two-component anyon-Hubbard model, which can be mapped to an extended Bose-Hubbard ladder with density-dependent hopping phase and synthetic gauge flux. Through numerical simulations of two-particle dynamics and the symmetry analysis, we reveal the asymmetric transport with broken inversion symmetry and two dynamical symmetries in the expansion dynamics. The expansion of two-component anyons is dynamically symmetric under spatial inversion and component flip, when the sign of anyonic statistics phase or the signs of gauge flux and interaction are changed. In the non-interacting case, we show the dynamical suppression induced by both the statistics phase and gauge flux. In the interacting case, we demonstrate that both chiral and antichiral dynamics can be exhibited and tuned by the statistics phase and gauge flux. The dynamical phase regimes with respect to the chiral-antichiral dynamics are obtained. These findings highlight the rich dynamical phenomena arising from the interplay of anyonic exchange statistics, synthetic gauge fields, and interactions in multi-component anyons.

13.
arXiv (CS.AI) 2026-06-16

Prototyping an AI-powered Tool for Energy Efficiency in New Zealand Homes

arXiv:2509.05364v2 Announce Type: replace-cross Abstract: Residential buildings contribute significantly to energy use, health outcomes, and carbon emissions. In New Zealand, housing quality has historically been poor, with inadequate insulation and inefficient heating contributing to widespread energy hardship. Recent reforms, including the Warmer Kiwi Homes program, Healthy Homes Standards, and H1 Building Code upgrades, have delivered health and comfort improvements, yet challenges persist. Many retrofits remain partial, data on household performance are limited, and decision-making support for homeowners is fragmented. This study presents the design and evaluation of an AI-powered decision-support tool for residential energy efficiency in New Zealand. The prototype, developed using Python and Streamlit, integrates data ingestion, anomaly detection, baseline modeling, and scenario simulation (e.g., LED retrofits, insulation upgrades) into a modular dashboard. Fifteen domain experts, including building scientists, consultants, and policy practitioners, tested the tool through semi-structured interviews. Results show strong usability (M = 4.3), high value of scenario outputs (M = 4.5), and positive perceptions of its potential to complement subsidy programs and regulatory frameworks. The tool demonstrates how AI can translate national policies into personalized, household-level guidance, bridging the gap between funding, standards, and practical decision-making. Its significance lies in offering a replicable framework for reducing energy hardship, improving health outcomes, and supporting climate goals. Future development should focus on carbon metrics, tariff modeling, integration with national datasets, and longitudinal trials to assess real-world adoption.

14.
arXiv (CS.CL) 2026-06-11

Context-Aware Multimodal Claim Verification in Spoken Dialogues

Every day, millions absorb claims from podcasts and streams that no fact-checker ever sees. Spoken misinformation is built through conversation, where credibility comes not from facts alone but from how claims are framed, reinforced, or left unchallenged across turns. Yet fact-checking has focused on isolated text, leaving dialogue audio under-studied. We introduce MAD2, a new Multi-turn Audio Dialogues benchmark for spoken claim verification, containing 1,000 two-speaker dialogues with 3,368 check-worthy claims and approximately 10 hours of audio, and propose calibrated multimodal fusion of a context-aware audio encoder and a dialogue-aware text model. Across settings, adding dialogue context improves verification, but the gains depend on scenario type. Using only preceding context often matches offline performance, supporting live-moderation settings, and audio contributes most when transcript-based models are destabilized by additional context. Overall, conversational structure matters more for verification than misinformation framing.

15.
arXiv (CS.LG) 2026-06-16

The Reverse Telescoping Coordinate System for Positive Definite Matrices: Geometry, Computation, and Generative Modeling

arXiv:2606.15442v1 Announce Type: cross Abstract: We design a new unconstrained coordinate system where a $p\times p$ symmetric positive definite (SPD) matrix $\Theta$ is represented by a reverse telescoping map $\Theta(x)=\rm{RT}(x)$, with $x=(v,d,r)\in\mathbb{R}\times\mathbb{R}^{(p-1)}\times\mathbb{R}^{p(p-1)/2}$, representing respectively the log volume or log determinant; and the shape, as encoded by log relative diagonal scales and partial covariances among the nodes. This construction results in important properties not available in other charts, e.g., matrix logarithm, such as Jacobian depending on only the log-determinant. A useful feature of our construction is $x$ contains a lossless symbolic representation of both the matrix and its inverse. Many important computations involving a matrix and its inverse can be performed in $O(p^2)$ in the transformed domain, while it is the rendering of results in matrix forms (on demand) that must incur an $O(p^3)$ cost. Moreover, two unit-determinant matrices in the transformed domain can be joined by a straight line with pathwise unit determinant. For generative modeling, this allows designing a split volume-shape flow model trained by conditional flow matching for transporting the shape over the unit-determinant path, with a separate one-dimensional flow for transporting the volume or the determinant. The forbidding SPD constraint, tamed thus into a powerful guiding force, leads to the surprising insight that it is in some sense easier to design a volume-normalized shape flow for SPD compared to the unconstrained $\mathbb{R}^{p\times p}$, with no intrinsic notion of volume to aid normalization, unlike the determinant of SPD matrices. We apply our construction for up to $p=200$ in generative modeling of SPD matrices on a difficult synthetic bimodal target, and in generating brain connectivity networks by models trained on fMRI data; as well as in intrinsic diffusion on the SPD manifold.

16.
bioRxiv (Bioinfo) 2026-06-12

CAREPath: Semantic Context-Aware Reasoning Paths with Mechanism-Augmented Embeddings for Drug Repurposing

Biomedical knowledge graphs (BKGs) that include drugs, genes, and diseases support drug repurposing by connecting drugs to diseases through gene-mediated multi-hop paths, thereby enabling mechanism-of-action reasoning. However, deeper traversal does not necessarily improve mechanistic reasoning: long paths grow combinatorially and frequently pass through hub genes, producing irrelevant gene regulatory signals, whereas overly constrained or sparse paths may miss broader biological context. We propose CAREPath, a KG-LLM framework inspired by depth-first search (DFS)-like and breadth-first search (BFS)-like reasoning to balance mechanistic specificity, scalability, and context recovery. The DFS-like module constrains traversal to short disease-gene-drug paths, converts each path into a structured prompt, and encodes it with a biomedical language model to generate semantic path embeddings. Complementarily, the BFS-like module constructs entity-level mechanism-context embeddings from one-hop gene neighborhoods and enriches them through similarity-guided augmentation using pharmacologically related drugs and gene-signature-similar diseases. Across five biomedical KGs, CAREPath achieves the best overall AUPRC among 18 baselines, improving performance by up to 3.8%. Additional analyses show that semantic short-path encoding contributes most to performance, while mechanism-context augmentation improves robustness under sparse evidence and strengthens Gene Ontology functional agreement. Case studies and recently FDAapproved indications further demonstrate its practical relevance, positioning CAREPath as an interpretable framework for scalable and mechanism-aware drug repurposing. Source code is available at https://github.com/hamppy-song/CAREPath.

17.
arXiv (CS.CV) 2026-06-18

BindEdit: Taming Attention Leakage for Precise Multi-Object Image Editing

Real image editing enables precise manipulation of visual content, yet existing methods often fail in complex multi-object scenarios, causing semantic blending, object duplication, or incomplete edits. We attribute these failures to attention leakage, where signals across spatial regions and text tokens become entangled during the denoising process. Specifically, we identify two distinct forms of leakage: Edit-Token Leakage, where ambiguous token-region alignment leads to object blending, and Source Dominance Leakage, where tokens of unchanged source objects overwhelm the attention intended for target entities. To resolve these leakages, we propose BindEdit, which enforces attention-level constraints within a single diffusion trajectory. To suppress Edit-Token Leakage, BindEdit jointly regularizes cross- and self-attention so that each target token group is bound to its corresponding spatial region while maintaining instance-level separation. To suppress Source Dominance Leakage, a cross-attention re-balancing mechanism amplifies target token influence and attenuates residual source semantics within editable regions. Moreover, a region fidelity term ensures that each target concept is expressed coherently across the entire editing mask. Additionally, we propose a comprehensive multi-object benchmark encompassing diverse object counts and categories. Extensive experiments demonstrate that BindEdit consistently outperforms existing methods within a single diffusion trajectory, maintaining robust performance across both single- and multi-object editing scenarios.

18.
arXiv (CS.CV) 2026-06-11

Temporal2Seq: A Unified Framework for Temporal Video Understanding Tasks

With the development of video understanding, there is a proliferation of tasks for clip-level temporal video analysis, including temporal action detection (TAD), temporal action segmentation (TAS), and generic event boundary detection (GEBD). While task-specific video understanding models have exhibited outstanding performance in each task, there remains a dearth of a unified framework capable of simultaneously addressing multiple tasks, which is a promising direction for the next generation of AI. To this end, in this paper, we propose a single unified framework, coined as Temporal2Seq, to formulate the output of these temporal video understanding tasks as a sequence of discrete tokens. With this unified token representation, Temporal2Seq can train a generalist model within a single architecture on different video understanding tasks. In the absence of multi-task learning (MTL) benchmarks, we compile a comprehensive co-training dataset by borrowing the datasets from TAD, TAS, and GEBD tasks. We evaluate our Temporal2Seq generalist model on the corresponding test sets of three tasks, demonstrating that Temporal2Seq can produce reasonable results on various tasks and achieve advantages compared with single-task training on this framework. We also investigate the generalization performance of our generalist model on new datasets from different tasks, which yields superior performance to the specific model.

19.
arXiv (CS.CV) 2026-06-19

MMD-SLAM: Structure-Enhanced Multi-Meta Gaussian Distribution-Guided Visual SLAM

3D Gaussian Splatting (3DGS) has significantly boosted novel view synthesis and high-fidelity scene reconstruction, expanding the potential of 3DGS-based Visual Simultaneous Localization and Mapping (SLAM) methods. However, most existing systems fail to fully exploit the underlying structural information, which limits rendering quality and often leads to inconsistent maps. To address these limitations, we propose MMD-SLAM, a structure-enhanced Visual SLAM framework that leverages the Atlanta World (AW) assumption to guide a Multi-Meta Gaussian representation for photorealistic mapping. First, we introduce a point-line fusion strategy for pose optimization, where 3D line segments are incorporated to improve tracking robustness and provide additional constraints for mapping. Second, we design a Multi-Meta Gaussian representation with dominant directions, explicitly encoding structural priors from the AW hypothesis. Finally, we propose a Gaussian evolution strategy that adapts to scene geometry and incorporates structural cues into global optimization. Extensive experiments demonstrate that these innovations enable MMD-SLAM to achieve state-of-the-art performance in both tracking accuracy and mapping quality. e.g., our method achieves a 48.56% reduction in ATE RMSE on ScanNet and a 5.71% improvement in PSNR on Replica, compared with MonoGS.

20.
arXiv (CS.LG) 2026-06-17

Recursive Learning Without Collapse: A Weighting-Based Stabilization Framework

arXiv:2502.18049v5 Announce Type: replace-cross Abstract: Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this issue and developing more effective training strategies have become central challenges in generative model research. In this paper, we investigate this phenomenon within a novel framework, where generative models are iteratively trained on a combination of newly collected real data and synthetic data from the previous training step. To develop an optimal training strategy for integrating real and synthetic data, we evaluate the performance of a weighted training scheme in various scenarios, including Gaussian distribution estimation, generalized linear models, and nonparametric estimation. We theoretically characterize the impact of the mixing proportion and weighting scheme of synthetic data on the final model's performance. Our key finding is that, across different settings, the optimal weighting scheme under different proportions of synthetic data asymptotically follows a unified expression, revealing a fundamental trade-off between leveraging synthetic data and model performance. In some cases, the optimal weight assigned to real data corresponds to the reciprocal of the golden ratio. Finally, we validate our theoretical results on extensive simulated datasets and a real tabular dataset.

21.
arXiv (CS.AI) 2026-06-19

Bridging Distribution Shift and AI Safety: Conceptual and Methodological Synergies

arXiv:2505.22829v2 Announce Type: replace-cross Abstract: This paper bridges distribution shift and AI safety through a comprehensive analysis of their conceptual and methodological synergies. While prior discussions often focus on narrow cases or informal analogies, we establish two types connections between specific causes of distribution shift and fine-grained AI safety issues: (1) methods addressing a specific shift type can help achieve corresponding safety goals, or (2) certain shifts and safety issues can be formally reduced to each other, enabling mutual adaptation of their methods. Our findings provide a unified perspective that encourages deeper integration between distribution shift and AI safety research.

22.
arXiv (CS.AI) 2026-06-17

SSIL: Self-Supervised Imitation Learning for End-to-End Driving

arXiv:2308.14329v4 Announce Type: replace-cross Abstract: In autonomous driving, the end-to-end (E2E) driving approach that predicts vehicle control signals directly from sensor data is rapidly gaining attention. To learn a safe E2E driving system, one needs an extensive amount of driving data and human intervention. Vehicle control data is constructed by many hours of human driving, and it is challenging to construct large vehicle control datasets. Often, publicly available driving datasets are collected with limited driving scenes, and collecting vehicle control data is only available by vehicle manufacturers. To address these challenges, this paper proposes the first self-supervised learning framework, Self-Supervised Imitation Learning (SSIL), for E2E driving. The proposed SSIL framework can learn vision-based E2E driving networks without using driving command data or a pre-trained model. To construct pseudo steering angle data, proposed SSIL predicts a pseudo target from the vehicle's poses at the current and previous time points that are estimated with light detection and ranging sensors. In addition, we propose a new cross-attention-based conditioning approach (CACA) for a vision encoder in E2E driving, where a high-level instruction serves as the conditioning signal for visual information. Our numerical experiments with three different benchmark datasets demonstrate that the proposed SSIL framework achieves very comparable E2E driving accuracy with the supervised learning counterpart. Furthermore, the proposed pseudo-label predictor outperformed an existing one using proportional integral derivative controller, and proposed CACA achieved superior performance over existing conditioning approaches.

23.
arXiv (CS.CL) 2026-06-11

Toward Generalist Autonomous Research via Hypothesis-Tree Refinement

Scientific progress depends on a repeated loop of exploration, experimentation, and abstraction. Researchers test candidate directions, interpret the evidence, and carry the resulting lessons into later attempts. We study how an AI agent can run this loop autonomously over long horizons. We introduce Arbor, a general framework for autonomous research that combines a long-lived coordinator, short-lived executors, and Hypothesis Tree Refinement (HTR), a persistent tree that links hypotheses, artifacts, evidence, and distilled insights across time. The coordinator manages global research strategy over the tree, while executors implement and test individual hypotheses in isolated worktrees. As results return, Arbor updates the tree, propagates reusable lessons, refines the search frontier, and admits verified improvements. This design turns autonomous research from a sequence of local attempts into a cumulative process in which strategy, execution, and evidence are carried across time. We evaluate Arbor under Autonomous Optimization (AO), an operational setting where an agent improves an initial research artifact through iterative experimentation without step-level human supervision. Across six real research tasks in model training, harness engineering, and data synthesis, Arbor achieves the best held-out result on all six tasks, attaining more than 2.5x the average relative held-out gain of Codex and Claude Code under the same task interface and resource budget. On MLE-Bench Lite, Arbor reaches 86.36% Any Medal with GPT-5.5, the strongest result in our comparison.

24.
arXiv (CS.CL) 2026-06-19

Ensembles of Large Language Models for Identifying EQ-5D Studies in PubMed Based on Their Abstracts

The rapid increase in scientific publications leads to the fact that manual study screening in systematic literature reviews (SLRs) is increasingly resource consuming, inefficient, and inconsistent. Classifying studies that clearly report health-related quality-of-life results, such as EQ-5D data, requires a high level of clinical interpretation and poses challenges for human reviewers. This study investigates the use of Google's Gemini and Gemma large language models (LLMs) in automating EQ-5D detection in the PubMed biomedical database based only on published abstracts. A multi-phase framework is proposed that integrates few-shot prompting, weight ensembling aggregation, and a soft stacking meta-classifier. Nine LLMs are evaluated on a dataset of PubMed studies manually labeled by two experts regarding EQ-5D reporting. The weighted ensemble of gemini-2.5-pro, gemma-3-12b, and gemma-3-27b obtained a 0.74 weighted F1-score and 0.74 accuracy, exceeding individually attained results. The ensembling of top-performing models improved the balance between precision and recall compared to individual models, while the soft stacking approach provided greater reliability and interpretability. Feature analysis shows that the probability results from the models are important in guiding the final predictions. The findings suggest that an ensemble-based LLM setup is a reliable and scalable approach for automating screening in biomedical research.

25.
medRxiv (Medicine) 2026-06-15

Investigation of Intra-Fraction Stability and Inter-Fraction Reproducibility of Deep Inspiration Breath-Hold Across Two Hypofractionated Radiotherapy Regimens in the HYPORT Adjuvant Study.

Background: Deep Inspiration Breath Hold (DIBH) is a widely used respiratory motion management technique for minimizing cardiac dose in left-sided breast radiotherapy. In the Breast HYPORT Adjuvant study, DIBH was employed for cardiac sparing in patients without nodal irradiation using a standardized institutional protocol with the Varian Real-time Position Management (RPM) system. Both moderate-hypofractionation (control arm - 40Gy in 15 fractions) and one-week hypofractionation (experimental arm - 26 Gy in 5 fractions) regimens were delivered using this protocol. This study aimed to evaluate the robustness of DIBH by analyzing intra-fraction stability and inter-fraction reproducibility of breath-hold amplitude across the two treatment regimens. Methods: Respiratory waveforms acquired during each treatment session were analyzed to determine the median breath-hold amplitude and its standard deviation during beam delivery. Intra-fraction stability was assessed from vari- ations within individual treatment sessions, while inter-fraction reproducibility was evaluated relative to the simula- tion waveform amplitude across all treatment sessions. These parameters were compared between the two HYPORT regimens to examine breath-hold consistency during treatment delivery. Moreover, an additional comparison was made between the one-week hypofractionation regimen and the first five fractions of the moderate-hypofractionation regimen to evaluate the effect of treatment duration . Lung volumes from free-breathing and DIBH CT scans were analyzed to assess the effectiveness of patient breath-hold training. Results: Both arms demonstrated an average 1.7-fold increase of air volume in lung during the breath-hold position, confirming the effective implementation of DIBH during treatment planning and delivery. Structured training resulted in increased breath-hold amplitudes, with gains of 22.87% and 24.16% with respect to the first trial session in the experimental and control arms, respectively. Both regimens receive equivalent doses for approximately the same air volume in lung . Despite the different prescription doses in the two arms (26 Gy vs. 40 Gy), the experimental arm achieved an equivalent mean heart dose of 2.91% (75.6 cGy) compared with 2.95% (118.51 cGy) in the control arm, suggesting a similar cardiac preservation protocol adopted during treatment planning. Intra-fraction stability was similar between the control arm and the experimental arm, with median amplitude variations of 1.006 mm (95% CI: [0.998-1.015]) and 1.079 mm (95% CI: [1.067-1.097]), respectively. In contrast, inter-fraction reproducibility improved in the experimental arm, with lower deviation from simulation amplitude (0.44 {+/-} 0.24 mm vs. 0.66 {+/-} 0.25 mm) for the entire treatment schedule. The stability and reproducibility of experimental arm were further compared with the first five fractions of the control arm. The results were similar to those of the experimental arm. Conclusion: In this study, we compared two treatment regimens in terms of intra-fraction stability and inter-fraction reproducibility during DIBH radiotherapy. Both regimens demonstrated comparable intra-fraction stability, indicating effective motion management irrespective of treatment duration. However, the experimental arm showed better inter- fraction reproducibility, suggesting more consistent breath-hold performance throughout the treatment course. Based on stability and reproducibility, a reasonable narrowing of the DIBH gating window may be implemented with minor changes to the institutional protocol. The observed trend highlights the potential for improved consistency with the experimental approach and supports further investigation to better understand the underlying factors and strengthen these findings in future studies.