Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
PLOS Computational Biology 2026-06-02

Linking reduced prefrontal microcircuit inhibition in schizophrenia to EEG biomarkers in silico

by Sana Rosanally, Frank Mazza, Heng Kang Yao, Faraz Moghbel, Hannah Seo, Etay Hay Reduced cortical inhibition by parvalbumin-expressing (PV) interneurons in schizophrenia is thought to be associated with impaired processing in the prefrontal cortex and altered EEG signals such as oddball mismatch negativity (MMN). Recent studies also suggest loss of somatostatin (SST) interneuron inhibition. However, establishing the link between reduced interneuron inhibition and reduced MMN experimentally in humans is currently not possible. To overcome these challenges, we simulated spiking activity and EEG during baseline and oddball response in detailed models of human prefrontal microcircuits in health and schizophrenia, with reduced PV and SST interneuron inhibition as constrained by postmortem patient data. We showed that reduced PV interneuron inhibition can account for the decreased MMN amplitude seen in schizophrenia, with a threshold below which the amplitude effect was low as seen in at-risk patients. In contrast, reduced SST interneuron inhibition did not affect the MMN amplitude. We further showed that both types of inhibition loss were necessary to account for changes in resting EEG in schizophrenia, with reduced SST interneuron inhibition increasing broadband power, and reduced PV and SST interneuron inhibition both leading to a right shift from alpha to beta frequencies. Our study thus links reduced PV and SST interneuron inhibition in schizophrenia to distinct EEG biomarkers that can serve to improve stratification and early detection using non-invasive brain signals.

02.
arXiv (CS.CV) 2026-06-12

SalArt-VQA: Diagnosing Whether VLMs Understand Salient Artifacts in Generated Images

Vision-language models (VLMs) are increasingly used to detect whether AI-generated images contain visible artifacts, yet their ability to analyze such artifacts remains poorly understood. A correct image-level decision can still hide important failures: a model may correctly flag an artifact while relying on the wrong visual cue, selecting the wrong region, or describing a defect that the image does not support. To evaluate these behaviors directly, we introduce SalArt-VQA, a diagnostic benchmark for fine-grained SALient ARTifact understanding in AI-generated images. SalArt-VQA contains 950 images and 3,681 human-authored multiple-choice questions spanning artifact images, matched real reference images, and paired generated reference images. Four aligned question types evaluate presence detection, semantic localization, spatial grounding, and evidence-grounded defect identification, while the reference splits test calibration and abstention when the annotated defect is absent. Across 20 VLMs, SalArt-VQA reveals failures that image-level detection accuracy hides: the strongest model reaches 99.37% detection recall on artifact images but answers all four artifact-side questions correctly on only 53.26% of images. Comparing artifact images with artifact-free references reveals a sensitivity-calibration tradeoff: sensitive models often make unsupported artifact claims, while conservative models avoid false alarms largely by missing real artifacts. These results show that high artifact detection accuracy alone does not imply grounded artifact understanding. SalArt-VQA exposes these hidden failure modes and provides a fine-grained evaluation of whether VLM artifact claims are supported by local visual evidence.

03.
arXiv (CS.CL) 2026-06-17

From Trainee to Trainer: LLM-Designed Training Environment for RL with Multi-Agent Reasoning

Reinforcement learning pipelines for Large Language Model (LLM) training often rely on manually redesigned environments between stages, requiring practitioners to heuristically infer which configuration will best improve the current policy. To automate this process, we propose the LLM-as-Environment-Engineer framework in which the current policy model analyzes failure trajectories together with contextual information and proposes modifications to the next-stage training environment configuration. We also introduce MAPF-FrozenLake, a controllable testbed whose generator exposes multi-dimensional environment configurations, making it suitable for studying and benchmarking environment redesign. On this testbed, we condition the environment engineer on structured summaries of policy behavior, failure cases, and environment statistics, from which it produces the configuration for the next training stage. With Qwen3-4B as the backbone, our framework achieves the strongest aggregate performance on our benchmarks, outperforming larger proprietary LLMs (e.g., GPT, Gemini) and fixed-environment training baselines. We further analyze which forms of context are most effective, finding that successful environment updates rely on failure evidence and preserve configurations that already work. Interestingly, the current RL checkpoint serves as a better environment engineer than the original base model, suggesting that policy learning improves the model's ability to diagnose its remaining weaknesses.

04.
arXiv (CS.CV) 2026-06-11

Echoes of the Prior: A Computational Phenomenology of Forgetting

Memory is not merely the storage of data; it is the scaffolding of reality. When biological memory fades, the world does not simply turn black; it regresses into an unrecognizable chaos. Echoes of the Prior is an interactive installation that attempts to visualize this subjective phenomenology of forgetting. By inducing controlled synaptic decay within a Feed-Forward 3D Reconstruction model, we create an artistic analogy for the erosion of the brain's predictive priors. We position the Neural Network not as a tool for engineering, but as a cognitive proxy - a silicon brain whose structural degeneration evokes the disorienting, poetic, and terrifying experience of losing one's grip on the world. Ultimately, we offer this framework as a catalyst, inviting the wider community to explore the uncharted potential of neuromorphic aesthetics in visualizing the fragility of intelligence. Interactive demo see https://decart-4d.github.io/.

05.
arXiv (CS.LG) 2026-06-16

Federated Foundation Language Model Post-Training Should Focus on Open-Source Models

arXiv:2505.23593v4 Announce Type: replace Abstract: Post-training of foundation language models has emerged as a promising research domain in federated learning (FL) with the goal to enable privacy-preserving model improvements and adaptations to user's downstream tasks. Recent advances in this area adopt centralized post-training approaches that build upon black-box foundation language models where there is no access to model weights and architecture details. Although the use of black-box models has been successful in centralized post-training, their blind replication in FL raises several concerns. Our opinion is that using black-box models in FL contradicts the core principles of federation such as data privacy and autonomy. In this paper, we critically analyze the usage of black-box models in federated post-training, and provide a detailed account of various aspects of openness and their implications for FL.

06.
medRxiv (Medicine) 2026-06-12

Heterogeneity of Treatment Effect of Aspirin and Clinically Significant Bleeding in Older Adults

Aim: The global population of older adults is growing, and older age is linked to higher bleeding risk. Although guidelines discourage aspirin for primary prevention in healthy older adults due to bleeding harms outweighing benefits, many continue taking it without a clear indication. It remains unclear whether all older adults face uniform aspirin-related bleeding risk or if certain subgroups are more vulnerable. Methods: We analyzed data from 19,114 ASPREE trial participants to develop machine learning models using 116 baseline variables. Random forest (RF) and random survival forest (RSF) models predicted 5-year bleeding risk, and participants were stratified into low, intermediate, and high-risk groups based on the 20th and 80th percentiles of predicted risk. We assessed heterogeneity of treatment effect (HTE) by testing treatment-by-risk group interactions on the relative scale using Fine-Gray models, and on the absolute scale using observed 5-year cumulative incidence rates. Results: Over a median follow-up of 4.7 years, 626 major bleeding events occurred. The RF model had moderate discrimination (AUC = 0.65, 95% CI: 0.63-0.67) and good calibration (Brier = 0.032, 95% CI: 0.029-0.034). Statistically significant HTE was observed on the relative scale, with the greatest relative increase in bleeding risk seen in the low-risk group (subdistribution hazard ratio = 2.26, 95% CI: 1.27-4.01). On the absolute scale, low-risk participants experienced higher bleeding with aspirin (absolute risk difference (ARD) = 1.17%, 95% CI: 0.37-1.95), but heterogeneity in ARDs was not statistically significant (Cochran's Q p > 0.45). Similar findings were observed when using the RSF model. Conclusion: Participants at lowest baseline bleeding risk experienced the greatest relative increase in bleeding risk with aspirin therapy. We found statistically significant heterogeneity in treatment effects on the relative but not absolute scale. These findings support an individualized, risk-based approach to aspirin therapy decision-making in older adults.

07.
arXiv (CS.CV) 2026-06-15

HiLo-Token: Input-Adaptive High-Low Frequency Token Compression for Efficient Image Editing

Creative image editing tools, such as Photoshop's Remove or Generative Fill buttons, are central to everyday customer use and account for a major share of traffic in Photoshop and Lightroom. However, current generative AI models face significant latency challenges, which become even more pronounced when transitioning from convolution-based U-Nets to Diffusion Transformers (DiTs). In our evaluation on hundreds of representative image editing samples spanning a wide range of mask ratios, the DiT module alone accounts for an average of 73% of the total model latency, even after being distilled from 50 timesteps down to 8 timesteps. To tackle this challenge, we propose $HiLo-Token$, an input-adaptive token compression framework that allocates more token budget to high-frequency, rich-context regions while assigning fewer tokens to low-frequency areas. Specifically, for the editing region specified by the user mask, we retain all tokens within a dilated mask to preserve strong locality and contextual relevance. Outside the editing region, we introduce a simple yet effective high-frequency token selection strategy based on spatial frequency to capture important local details, while using tokens from a 16x downsampled image to represent low-frequency components and preserve the blurry but global structure. Extensive experiments on production-level evaluation data validate the effectiveness of the proposed method, achieving 3.13x, 2.59x, and 1.67x DiT speedups on A100-80GB for image editing tasks across small, medium, and large mask ratio categories with average ratios of 6.38%, 15.92%, and 35.36%, respectively, without any regression in generation quality.

08.
arXiv (CS.CL) 2026-06-18

Rethinking Reward Supervision: Rubric-Conditioned Self-Distillation

Post-training of reasoning language models is commonly driven by supervised distillation and reinforcement learning with verifiable rewards. Distillation often relies on chain-of-thought annotations that are expensive to obtain and may themselves be noisy, incomplete, or partially incorrect; even when the final solution is correct, an imperfect rationale can interfere with learning. Reinforcement learning with verified rewards, on the other hand, typically compresses evaluative feedback into a scalar signal, obscuring which aspects of a response should be improved. We propose Rubric-Conditioned Self-Distillation, a framework that incorporates rubrics as structured, fine-grained feedback for on-policy self-distillation. Our method conditions the teacher model on criterion-level rubrics and uses it to provide token-level guidance on the student's own sampled trajectories. This design avoids treating a single reference rationale as the sole supervision target. Instead, rubrics specify what a strong response should satisfy, enabling more fine-grained credit assignment over the reasoning process than scalar reward optimization. We instantiate this framework with a two-stage pipeline that first learns to generate task-specific rubrics and then trains a rubric-guided reasoner. We evaluate on a diverse suite of science reasoning benchmarks and results show that rubric-conditioned self-distillation effectively converts rubric-level criteria into token-level guidance over the reasoning process, surpassing GRPO by 1.0 points and OPSD by 0.9 points on average.

09.
arXiv (CS.CL) 2026-06-24

The Sparse Frontier: Sparse Attention Trade-offs in Transformer LLMs

Sparse attention offers a promising strategy to extend long-context capabilities in Transformer LLMs, yet its efficiency-accuracy trade-offs remain unclear due to the lack of comprehensive evaluation. We address this gap with the largest-scale empirical analysis to date of training-free sparse attention, evaluating six methods across multiple model families and sizes, sequences up to 128K tokens, and sparsity levels up to 0.95 (i.e., $1/20$ attention budget) on nine diverse tasks. We first organise the rapidly evolving landscape of sparse attention methods into a taxonomy along four design axes. Our analysis then yields actionable insights: 1) sparse attention is effective: larger sparse models outperform smaller dense ones at equivalent cost, improving the Pareto frontier; 2) for the training-free methods we study, fine-grained per-query importance estimation during prefilling remains impractical-due to both the cost of estimation and the lack of sparse kernels that translate fine-grained sparsity into wall-clock gains-forcing a task-dependent choice between global-to-token and block-to-block selection. Instead, during decoding, token-to-page selection becomes feasible, enabling better generalisation and higher sparsity tolerance; 3) longer sequences tolerate higher sparsity, suggesting that fixed-budget methods in production are suboptimal. Together, these findings provide practical guidance for deploying sparse attention and methodological recommendations for future evaluations. Our code is available at https://github.com/PiotrNawrot/sparse-frontier.

10.
arXiv (CS.AI) 2026-06-24

FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation

arXiv:2507.16696v3 Announce Type: replace-cross Abstract: Industrial signal analysis is hindered by severe data heterogeneity, which we characterize as the M5 problem. Existing solutions rely on specialized models that lack robustness and scalability, while large-scale pre-training has rarely been investigated in this area. In this work, we derive a prioritized roadmap for the M5 problem and propose FISHER, a Foundation model for multi-modal Industrial Signal compreHEnsive Representation. To address the foremost multi-sampling-rate problem, FISHER utilizes a novel sub-band modeling approach that treats sampling rate increments as concatenated sub-band information, enabling the adaptive usage of full signal bandwidth without resampling. FISHER is pre-trained by teacher-student self-distillation over external audio and music data. We also establish the RMIS benchmark, comprising 19 datasets across four modalities. In the experiment, FISHER outperforms 24 state-of-the-art series encoders (up to 2B) with much smaller sizes (up to 16x), showcasing groundbreaking diagnostic accuracy and remarkable versatility. We further demonstrate that 1) seamless adaptation to variable sampling rates is the key to generalization 2) audio and music data provide better temporal variability, which is essential for pre-training. Both FISHER and RMIS are open-sourced.

11.
arXiv (CS.CV) 2026-06-24

Spherical-to-ERP Epipolar Rectification for Single-Axis Disparity in 360 Stereo

Omnidirectional stereo images provide full-surround perception but violate the geometric assumptions of classical disparity estimation: in spherical or fisheye views, epipolar correspondences follow curved great-circle paths, producing two-dimensional displacements that cannot be treated as single-axis disparity before geometric rectification. In this work, we adopt a standard spherical-to-equirectangular (ERP) projection as a preprocessing step, which straightens epipolar curves and restores a one-dimensional disparity structure - horizontal for left-right rigs and vertical for top-bottom rigs. Building on our previously introduced RAFT + Epipolar-Aligned Channel Selection (EACS) framework, originally developed for rectilinear and ERP stereo, we examine whether the same modular pipeline remains accurate when the input originates from spherical stereo imagery. After ERP projection, dense optical flow from RAFT is reduced to disparity by retaining only the baseline-aligned flow component. Experiments on synthetic fisheye stereo datasets show that this spherical-to-ERP-to-RAFT+EACS pipeline produces accurate, smooth, and structurally consistent disparity maps at real-time speed. These findings confirm that established ERP preprocessing can be effectively combined with our earlier RAFT+EACS method to enable practical, interpretable, and efficient disparity estimation from spherical stereo, providing a straightforward pathway for extending conventional stereo pipelines to 360 imaging.

12.
arXiv (quant-ph) 2026-06-24

Active interference suppression in frequency-division-multiplexed quantum gates via off-resonant microwave tones

arXiv:2601.14547v3 Announce Type: replace Abstract: The increasing number of control lines connecting quantum processors to external electronics constitutes a major bottleneck in the realization of large-scale quantum computers. Frequency-division multiplexing is expected to enable control of multiple qubits through a single microwave cable; however, interference from off-resonant microwave tones hinders precise qubit control. Here, we propose an active interference suppression method for frequency-division-multiplexed simultaneous gates on microwave-controlled qubits. We demonstrate that the deliberate incorporation of off-resonant microwave tones improves single-qubit gate fidelity. In particular, the gate infidelity scales inversely with the square of the number of microwave tones when off-resonant orthogonal or quasi-orthogonal tones are incorporated. Furthermore, we show that fast oscillations, neglected under the rotating wave approximation, degrade the gate fidelity, and that this degradation can be mitigated through optimized frequency allocation. The proposed approach is simple and effective for improving the performance of frequency-division-multiplexed quantum gates.

13.
medRxiv (Medicine) 2026-06-22

A Plasmodium vivax controlled human infection and transmission model to evaluate interventions across the life cycle

Background Plasmodium vivax is an underappreciated cause of malaria disease burden. No reproducible and standardized full life-cycle controlled human malaria infection (CHMI) model to accelerate development of novel interventions is available. Methods This transmission-CHMI trial was conducted in Nijmegen, Netherlands. Healthy, malaria-naive adults were sequentially enrolled into three cohorts of four and inoculated with the asexual blood-stage isolate PvW1. Primary endpoint was proportion of oocyst-positive laboratory-reared Anopheles stephensi mosquitoes. The sequential design allowed for adaptations between cohorts. At parasitemia >10 parasites/microL or symptom onset, participants received oral gametocyte-sparing treatment (GST): mepacrine (Cohort 1 and 3; 100 mg at 0, 8 16 hours, then once daily for 3 days) or piperaquine (Cohort 3; 480 mg single-dose). Transmission was assessed by direct skin feeding (DSF) and membrane feeding assay (DMFA) with and without enrichment of gametocytes. End-of-study treatment was atovaquone-proguanil (1000/400 mg once daily for 3 days). The trial was registered: NL-OMON57011. Findings Participants were enrolled between September 17, 2024 and March 25, 2025, all (12/12) developed parasitemia and transmitted PvW1 to mosquitoes. No serious adverse events occurred. Most adverse reactions were related to malaria. Mepacrine and piperaquine reduced asexual parasitemia while preserving gametocytemia and transmission. Peak transmission occurred within 3 days after GST and depended on the parasite developmental cycle, with highest gametocyte-infectivity ~48 h post ring-stage. In Cohort 3, mosquito infection reached 100% in all transmission assays. Median peak oocyst counts were 24 (IQR: 14-31) for DSF, 17 (12-19) for DMFA, and 150 (116-199) for enriched DMFA. A two-fold increase in pre-GST maximal parasitemia was associated with 20 additional oocysts (95% CI 8,6-32) in enriched DMFA. Sporozoites were viable in primary human hepatocytes. Interpretation A PvW1 transmission-CHMI is reproducible and safe, enabling P. vivax sporozoite production, relapse models and evaluation of transmission-blocking interventions.

14.
arXiv (CS.CV) 2026-06-16

Metis: A Generalizable and Efficient World-Action Model for Autonomous Driving and Urban Navigation

World action models~(WAMs) have shown great promise for autonomous driving and urban navigation. Built upon Vision-Language-Action models or video generation models, existing approaches suffer key limitations: (1) High inference latency due to future observation prediction at test time, and (2) tightly coupled video and action modeling leading to representational mismatch and degraded generalization. To address both issues, we propose Metis, an end-to-end WAM framework that decouples video generation and action prediction. Specifically, Metis employs a Mixture-of-Transformers architecture with dedicated experts for video generation and action prediction, preserving the intrinsic distributional properties of each task. To enhance efficiency, we introduce an asymmetric attention mask that enables joint training of both experts while allowing the action model to bypass explicit video generation during inference. This design ensures training-inference consistency and significantly reduces computational costs without compromising planning performance. Extensive experiments demonstrate state-of-the-art performance on the NAVSIM navhard and navtest benchmarks and the CityWalker navigation benchmark, validating both the generalizability and efficiency across diverse tasks. Real-robot deployments further confirm the practical feasibility of our approach.

15.
arXiv (CS.LG) 2026-06-16

Empirical Study of Pop and Jazz Mix Ratios for Genre-Adaptive Chord Generation

作者:

arXiv:2605.04998v2 Announce Type: replace-cross Abstract: This revision updates a pop-to-jazz chord-generation rehearsal study. Best-epoch metrics still show that modest pop rehearsal preserves pop accuracy while improving jazz prediction, but v2 corrects released-checkpoint selection: the released F1 equals Phase 0, F2 had a transcription error, and ft-pop80-v2 restores a hash-distinct jazz-adapted F1 across 3 seeds.

16.
arXiv (CS.AI) 2026-06-19

DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis

arXiv:2604.13416v2 Announce Type: replace-cross Abstract: Advances in radiance fields have enabled photorealistic novel view synthesis. In several domains, large-scale real-world datasets have been developed to support comprehensive benchmarking and to facilitate progress beyond scene-specific reconstruction. However, for distractor-free radiance fields, a large-scale dataset with clean and cluttered images per scene remains lacking, limiting the development. To address this gap, we introduce DF3DV-1K, a large-scale real-world dataset comprising 1,048 scenes, each providing clean and cluttered image sets for benchmarking. In total, the dataset contains 89,924 images captured using consumer cameras to mimic casual capture, spanning 128 distractor types and 161 scene themes across indoor and outdoor environments. A curated subset of 41 scenes, DF3DV-41, is systematically designed to evaluate the robustness of distractor-free radiance field methods under challenging scenarios. Using DF3DV-1K, we benchmark nine recent distractor-free radiance field methods and 3D Gaussian Splatting, identifying the most robust methods and the most challenging scenarios. Beyond benchmarking, we demonstrate an application of DF3DV-1K by fine-tuning a diffusion-based 2D enhancer to improve radiance field methods, achieving average improvements of 0.96 dB PSNR and 0.057 LPIPS on the held-out set (e.g., DF3DV-41) and the On-the-go dataset. We hope DF3DV-1K facilitates the development of distractor-free vision and promotes progress beyond scene-specific approaches. The dataset and leaderboard are available at https://johnnylu305.github.io/df3dv1k_web/.

17.
medRxiv (Medicine) 2026-06-15

A More-Than-Human Approach to Designing for Mental Health: Remixing Prototypes for the Contexts of Complex Healthcare Infrastructures

Digital mental health tools (DMHTs) often fail to be successfully implemented in clinical settings. While user- and human-centred design frameworks are frequently proposed for developing effective tools, they are insufficient to address the sociotechnical complexity of healthcare environments. This paper addresses this limitation by detailing the application of a more-than-human design framework to incorporate wider contextual factors into design decisions. To demonstrate the application of this more-than-human design framework, we present a case study showcasing the design of one specific feature within a DMHT intended to support Health Improvement Practitioners (HIPs) in New Zealand's Integrated Primary Mental Health and Addictions (IPMHA) service. Our process blends usage-context storyboards with interface prototypes, using think-aloud interviews to test the contextual fit of our prototypes. The initial design concept failed due to contextual factors such as inconsistent wait times and the administrative burden on clients and clinic staff. This led to a pivot to a more context-appropriate, practitioner-focused, in-session concept for digital psychometric administration and automated scoring. This case study demonstrates that for DMHTs to be viable within complex healthcare environments, design must focus on more than the needs of a single user, incorporating multiple stakeholders and contextual variables across the wider service-delivery context.

18.
arXiv (CS.AI) 2026-06-24

Toward Self-Evolution-Ready Workflow Harnesses: A Reversible Migration Path and Convertibility Taxonomy for Expert LLM Pipelines

arXiv:2606.24598v1 Announce Type: cross Abstract: While expert-validated "LLM + script" workflows deliver significant value, they remain static: they encode hard-won domain knowledge yet fail to adapt execution based on feedback. Existing agent research predominantly targets greenfield agents and synthetic benchmarks, leaving the migration of active legacy workflows unresolved. To bridge this gap, we present a reversible, Strangler-Fig migration path that refactors legacy workflows into composable, typed, and auditable stages. Central to this framework is a three-tier convertibility taxonomy (A/B/C), implemented as a routing stage within the system harness, which diagnoses a workflow's readiness and routes it accordingly.

19.
arXiv (CS.CV) 2026-06-18

Stimulus Motion Perception Studies Imply Specific Neural Computations in Human Visual Stabilization

Even during fixation the human eye is constantly in low amplitude motion, jittering over small angles in random directions at up to 100Hz. This motion results in all features of the image on the retina constantly traversing a number of cones, yet objects which are stable in the world are perceived to be stable, and any object which is moving in the world is perceived to be moving. A series of experiments carried out over a dozen years revealed the psychophysics of visual stabilization to be more nuanced than might be assumed, say, from the mechanics of stabilization of camera images, or what might be assumed to be the simplest solution from an evolutionary perspective. The psychophysics revealed by the experiments strongly implies a specific set of operations on retinal signals resulting in the observed stabilization behavior. The presentation is in two levels. First is a functional description of the action of the mechanism that is very likely responsible for the experimentally observed behavior. Second is a more speculative proposal of circuit-level neural elements that might implement the functional behavior.

20.
PLOS Medicine 2026-05-21

Semaglutide-associated risk of nonarteritic anterior ischemic optic neuropathy in patients with type 2 diabetes: A systematic review and meta-analysis of observational studies

by Jędrzej Chrzanowski, Magdalena Walicka, Jacek Burzyński, Małgorzata Zaraś, Arkadiusz Michalak, Wojciech Fendler Background Semaglutide, a glucagon-like peptide-1 receptor agonist, is widely used for the management of type 2 diabetes (T2DM). Recent case reports have raised concerns about a potential association between semaglutide use and the development of nonarteritic anterior ischemic optic neuropathy (NAION), a rare but vision-threatening condition. We aimed to evaluate whether semaglutide use is associated with an increased risk of NAION in patients with T2DM. Methods and findings We conducted a systematic review and meta-analysis of observational studies comparing patients with T2DM aged ≥12 years treated with semaglutide to those receiving other glucose-lowering therapies. We searched PubMed, Scopus, and Web of Science databases from January 2023 to November 2025. Two reviewers independently extracted data on study design, population characteristics, and outcomes. Risk of bias was assessed using the Newcastle–Ottawa Scale, and ROBINS-I v.2. Certainty of the evidence was graded according to the GRADE framework. Pooled hazard ratios (HRs) and 95% confidence intervals (CIs) were calculated using fixed-effects models; sensitivity analyses included crude and subgroup HRs, and overlapping study replacement. Leave-one-out analysis was conducted to assess small-study effects and publication bias. Results were contextualized within other meta-analyses, systematic reviews, consensus statements, and regulatory communications on the topic.Five eligible observational studies met the inclusion criteria, and 7 additional studies were included in the sensitivity analysis. Semaglutide use was associated with a significantly increased hazard of NAION compared with nonsemaglutide glucose-lowering regimens (HR 2.17, 95% CI [1.73, 2.74]; p 

21.
arXiv (CS.AI) 2026-06-16

Recurrent Reasoning on Symbolic Puzzles with Sequence Models

arXiv:2606.15686v1 Announce Type: new Abstract: Large language models often appear strong on symbolic and algorithmic tasks, yet this apparent strength can hide brittle behaviour when problems become longer, harder, or slightly out of distribution. A major limitation of current reasoning benchmarks is that many primarily test whether a model can produce a valid answer, while paying less attention to whether the solution is minimal, robust, and stable under controlled difficulty scaling. We introduce RecurrReason, a difficulty-controlled benchmark of four recurrent logic puzzles (Tower of Hanoi, River Crossing, Block World, and Checkers Jumping) with BFS-optimal trajectories and a single interpretable difficulty parameter $N \in \{1,\dots,10\}$, totalling 10{,}817 unique puzzles and 285{,}933 moves. We benchmark two Transformer families, an encoder-decoder model (T5-style) and a decoder-only model (GPT-2-style), under consistent data splits and evaluation criteria, training on $N{=}1$ to $7$ and evaluating on both held-out in-distribution instances and harder out-of-distribution instances at $N{=}8$ to $10$. Fine-tuned pre-trained T5 achieves 97.27\% validation and 81.00\% OOD accuracy on Block World; all models score 0.00\% on River Crossing under all conditions. Failure mode analysis reveals that architecture is a stronger determinant of success than scale. Pre-training transfers only to puzzles with locally structured transition functions. Our code and dataset will be open-sourced upon acceptance.

22.
arXiv (CS.LG) 2026-06-15

Provably Safe, Yet Scalable Reinforcement Learning

arXiv:2606.14536v1 Announce Type: new Abstract: Safe reinforcement learning (RL) aims to learn policies that optimize rewards while satisfying constraints. Predominant approaches rely on soft-constrained policy optimization, which has achieved empirical success but does not provide formal safety guarantees for the learned policy. In contrast, methods with strict guarantees typically rely on explicit certificate functions, whose construction requires the direct synthesis and verification of control-invariant sets, a process that scales poorly with state dimension and often yields overly conservative behavior. In this paper, we present the Provably Safe, yet Scalable RL (PS2-RL) framework, a novel two-phase architecture for learning provably safe policies in a scalable manner, designed to overcome the key bottlenecks of prior methods. Rather than explicitly computing invariant sets, PS2-RL leverages a learned backup policy to forward-integrate the system dynamics, generating an implicit control-invariant set online. In the first phase, the backup policy is trained with our proposed safe-arrival value function, which characterizes the optimal backup policy for invariant-set construction. In the second phase, an RL policy is trained end-to-end through a differentiable projection layer that strictly enforces the safety guarantees induced by the learned backup policy. By maximizing the volume of the implicit control-invariant set in the first phase, the resulting PS2 policy from the second phase is performant and scalable, while maintaining provable safety. Crucially, PS2-RL imposes no restrictions on the underlying RL algorithm and can be plugged into any existing training pipeline. We establish theoretical guarantees for the proposed framework and evaluate it on robotic control tasks with state dimensions up to 10, a regime in which prior provably safe RL methods struggle or become impractical.

23.
arXiv (quant-ph) 2026-06-11

An Introduction to the Foundations and Interpretations of Quantum Mechanics

arXiv:2603.09818v2 Announce Type: replace Abstract: This article surveys a selection of key conceptual and interpretational developments in quantum mechanics, tracing the theory from its foundational postulates to contemporary discussions of measurement, nonlocality, and the emergence of classicality. Beginning with the structure of Hilbert space and the postulates governing state evolution and measurement, the epistemic stance of the Copenhagen interpretation and its modern reformulations are examined. The Einstein-Podolsky-Rosen argument, Bell's theorem, and Hardy's paradox are then discussed as probes of locality and realism, alongside the deterministic but explicitly nonlocal de Broglie-Bohm theory. The measurement problem and the implications of contextuality are analyzed in relation to objective collapse models, which introduce new physical dynamics to account for definite outcomes. Finally, the role of decoherence in the suppression of interference and the emergence of classical behavior is explored, together with the interpretational frameworks of many-worlds and consistent histories. This material aims to provide a coherent introductory overview of how several of the most prominent interpretations address the central concern of what quantum mechanics tells us about the nature of physical reality.

24.
arXiv (CS.CL) 2026-06-11

Pretrained self-supervised speech models can recognize unseen consonants

Modern pretrained self-supervised automatic speech recognition models are trained on large-scale audio data to encode speech into contextualized representations. However, their training data are heavily skewed toward high-resource languages with little data from low-resource languages, raising concerns about the potential underrepresentation of typologically uncommon speech sounds such as click consonants primarily found in Khoisan languages. This leads to our central research question: Can these models recognize click consonants as accurately as other speech sounds? To address this question, we fine-tune and compare pretrained self-supervised speech models (Wav2Vec2 and HuBERT) on data from two click-rich Khoisan languages (G|ui and West !Xoon). Our results reveal that the fine-tuned models consistently recognize clicks more accurately than non-clicks, suggesting that self-supervision enables generalization across human speech sounds including rare phonemes.

25.
arXiv (CS.LG) 2026-06-24

Polaris: A Godel Agent Framework for Small Language Models through Experience-Abstracted Policy Repair

arXiv:2603.23129v3 Announce Type: replace Abstract: Gödel agent realize recursive self-improvement: an agent inspects its own policy and traces and then modifies that policy in a tested loop. We introduce Polaris, a Gödel agent for compact models that performs policy repair via experience abstraction, turning failures into policy updates through a structured cycle of analysis, strategy formation, abstraction, and minimal code pat ch repair with conservative checks. Unlike response level self correction or parameter tuning, Polaris makes policy level changes with small, auditable patches that persist in the policy and are reused on unseen instances within each benchmark. As part of the loop, the agent engages in meta reasoning: it explains its errors, proposes concrete revisions to its own policy, and then updates the policy. To enable cumulative policy refinement, we introduce experience abstraction, which distills failures into compact, reusable strategies that transfer to unseen instances. On MGSM, DROP, GPQA, and LitBench (covering arithmetic reasoning, compositional inference, graduate-level problem solving, and creative writing evaluation), a 7-billion-parameter model equipped with Polaris achieves consistent gains over the base policy and competitive baselines.