Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
Nature (Science) 2026-06-16

Mathematicians are developing rules for AI use — other fields should follow

作者: 未知作者

The mathematics community is right to call for transparency, integrity and fairness to be protected when AI tools are used. Researchers in other disciplines could learn from this approach. The mathematics community is right to call for transparency, integrity and fairness to be protected when AI tools are used. Researchers in other disciplines could learn from this approach.

04.
arXiv (CS.AI) 2026-06-24

When Language Overwrites Vision: Over-Alignment and Geometric Debiasing in Vision-Language Models

arXiv:2605.08245v4 Announce Type: replace-cross Abstract: Vision-Language Models (VLMs) increasingly power high-stakes applications, from medical imaging to autonomous systems, yet they routinely hallucinate, confidently describing content not present in the input. We investigate the root causes of these failure modes with a mechanistic analysis focusing on the decoder-based VLMs. We trace these failure modes to a geometric over-alignment: to bridge the modality gap required by attention mechanisms, decoder-based VLMs over-align visual embeddings with the text manifold, injecting a statistical linguistic bias that systematically overshadows fine-grained visual evidence. While prior work either aggressively closes this gap or suppresses hallucinations through expensive black-box decoding strategies, none addresses the underlying geometric cause. We provide the first quantitative characterization of this over-alignment, demonstrating that linguistic bias concentrates in the top principal components of a universal, dataset-agnostic text subspace. Building on this insight, we propose two complementary remedies: a training-free inference strategy and a bias-aware fine-tuning paradigm, both of which explicitly project out this subspace from visual representations. Our methods significantly reduce hallucinations across POPE, CHAIR, and AMBER benchmarks, and improve CLAIR scores on long-form captioning tasks, with the training-free variant adding no computational overhead over the base model.

05.
arXiv (CS.LG) 2026-06-16

Constraining the outputs of ReLU neural networks

arXiv:2508.03867v2 Announce Type: replace-cross Abstract: We introduce a class of algebraic varieties naturally associated with ReLU neural networks, arising from the piecewise linear structure of their outputs across activation regions in input space, and the piecewise multilinear structure in parameter space. By analyzing the rank constraints on the network outputs within each activation region, we derive polynomial equations that characterize the functions representable by the network. We further investigate conditions under which these varieties attain their expected dimension, providing insight into the expressive and structural properties of ReLU networks.

06.
medRxiv (Medicine) 2026-06-22

Three multimodal large language models fail at clinically actionable breast pathology in three different directions

Background. Breast cancer treatment depends on histopathological features, such as grade and receptor-defined subtype; however, specialist pathologist access is constrained when the workforce is limited. Commercial multimodal large language models (MLLMs) accept hematoxylin and eosin (H&E) image tiles through paid interfaces without local hardware or fine-tuning. However, prior pathology evaluations addressed only coarse tasks. Whether they reach treatment-determining accuracy and whether vendors agree remain unclear. Methods. We aimed to evaluate three vendor-designated flagship MLLMs (Claude Sonnet 4.6, Gemini 2.5 Pro, GPT-5.5) in 427 invasive breast cancer cases. Each case went to all three with identical H&E tiles and prompts, and the subtype was inferred in the second call. The reference was an institutional sign-out report of an immunohistochemistry-derived subtype. We calculated the concordance, sensitivity, specificity, Cohen's kappa, and pairwise McNemar and Bowker tests. Findings. Claude ranked highest by raw histologic-type concordance but lowest by kappa, classifying all 23 lobular and seven micropapillary carcinomas as invasive breast carcinoma of no special type. The models anchored the Nottingham grade to three modal grades. None of the models reliably identified human epidermal growth factor receptor 2-positive disease. The failure direction was vendor-specific: Claude and GPT-5.5 were under-detected, whereas Gemini was over-called. Twelve prompt variants (4,056 calls) did not recover sensitivity. Interpretation. No current commercial MLLM reaches deployment-ready accuracy for any treatment-determining feature of breast pathology. As each vendor fails in its own fixed direction, changing vendors alters the type of error rather than removing it; therefore, the value of these models is assistive rather than autonomous. At USD 0.20-0.50 per case, they may serve as supervised draft generators that leave the diagnosis with the pathologist.

07.
medRxiv (Medicine) 2026-06-18

Development and Initial Validation of the Quality of life Evaluation in NF2-related Schwannomatosis Trials (QUEST) Assessment

Individuals with NF2-related schwannomatosis (NF2-SWN) experience a complex constellation of physical, emotional, and social symptoms that substantially impact quality of life (QoL). Although disease-specific patient-reported outcome measures are increasingly important for evaluating treatment benefit in clinical trials, existing NF2-SWN QoL measures have limitations in content coverage and sensitivity to change. This study describes the development and initial validation a new disease-specific QoL assessment – the Quality of Life Evaluation in NF2-related Schwannomatosis Trials (QUEST). Using a three-phase, mixed-methods approach, items were generated through concept elicitation interviews with individuals with NF2-SWN and clinicians, prioritized via patient survey data, and refined through iterative cognitive debriefing procedures. The resulting 21-item QUEST assesses the extent to which NF2-SWN has negatively impacted a persons daily life over the past seven days. Initial psychometric evaluation was conducted in an international sample of 174 individuals with NF2-SWN aged 15 years and older (117 women (67%), 158 White individuals (89%)). Exploratory factor analysis supported a four-factor structure, and the total score demonstrated excellent internal consistency and strong test-retest reliability. Evidence of construct validity was demonstrated through hypothesized associations with disease-specific, generic, and domain-specific QoL measures, as well as known-groups validity based on self-reported disease severity and number of prior surgeries. Incremental validity analyses indicated that QUEST explained unique variance beyond existing measures. Together, findings support the QUEST as a reliable and valid disease-specific QoL measure with strong content validity and feasibility for use as a clinical trial endpoint in NF2-SWN.

08.
arXiv (CS.AI) 2026-06-19

Improving End-to-End Speech Recognition for Dysarthric Speech through In-Domain Data Augmentation

arXiv:2606.19797v1 Announce Type: cross Abstract: Dysarthric speech recognition is crucial for facilitating effective communication among individuals with dysarthria. However, accurately recognizing dysarthric speech poses significant challenges due to varying severity levels and limited data availability. In this paper, we explore data augmentation techniques for dysarthric automatic speech recognition (ASR) systems by fine-tuning the End-to-End pre-trained Wav2Vec2 model, with a specific focus on severity levels. To address the challenges of data scarcity and the need for extensive data in fine-tuning pre-trained ASR systems for dysarthric speech, we investigate four prominent data augmentation methods: Speaking-Rate Modification (SRM), Pitch Modification (PM), Formant Modification (FM), and vocal tract Length Perturbation (VTLP), tailored to different aspects of dysarthria. The study uses individually fine-tuned Wav2Vec2 models for each severity class as baseline systems. Additionally, we conducted severity-specific fine-tuning of the ASR model using augmented data. Results demonstrate distinct efficacy patterns for each augmentation technique across severity levels. The best WERs were achieved with SRM ($s$=0.8) for low (9.02\%) and medium (38.11\%) severities, and with PM ($\tau$=0.8) for high severity (55.15\%), reflecting relative improvements of 30.02\%, 16.64\%, and 15.47\%, respectively. These results confirm the effectiveness of the augmentation methods in improving dysarthric ASR performance.

09.
medRxiv (Medicine) 2026-06-15

Differential DNA Methylation and Delirium After Anesthesia and Surgery

Background: DNA methylation is an epigenetic modification that regulates gene expression in response to environmental exposures. We measured differential DNA methylation levels in blood before after general anesthesia and surgery in participants with and without postoperative delirium (POD) and postoperative neurocognitive disorder (PNCD). Methods: Blood sampling, delirium assessment and cognitive testing were prospectively performed at baseline before non-cardiac, non-neurologic surgery, and at 24 hours (24h) and 6 weeks (6wk) thereafter in 94 participants comprising 13 with POD and 81 without POD, and 40 with PNCD and 54 without PNCD 6wk after surgery who were matched for age and sex in the INTUIT and MADCO cohorts. DNA methylation was assessed using the Illumina Infinium MethylationEPIC Beadchip. Results: 132 differentially methylated positions (DMPs) annotated to 198 differentially methylated genes (DMGs) were identified in 94 participants 24h after surgery compared to baseline with a local false discovery rate (LFDR)

10.
arXiv (CS.CL) 2026-06-19

From 50K to 8.2 Million in 24 Hours: Vozinha's Algorithmic Consecration and the Multilingual Making of World Cup Visibility

We present a multilingual computational discourse analysis of how language constructed the algorithmic consecration of Vozinha, the 40-year-old Cape Verde goalkeeper, after Spain 0-0 Cape Verde at the 2026 FIFA World Cup. The study contributes a multilingual corpus in Portuguese, Spanish, English, and French; a nine-frame narrative taxonomy with cue-based frame annotation; a reproducible annotation pipeline combining LLM-assisted suggestion with human validation; and an analysis of cross-lingual narrative diffusion across discourse phases. We treat the platform follower count itself, narrated as "50k to 8M", as a linguistic object: a circulating and narratable proof of visibility rather than a mere measurement. The follower-growth timeline is used only as contextual metadata: we reconstruct a conservative phase structure, not a continuous API-native series, and type every datapoint by value class, confidence, and evidence type. The only exact primary scraper anchor is 8,235,652 followers at 2026-06-16 15:47 UTC; all other figures are reported as estimated ranges or thresholds, including an estimated pre-match baseline of 45k-56k. Findings suggest that distinct languages carried distinct frames: Portuguese mobilization, Spanish crisis, English nation-making, and a shared platform-metric spectacle through which peripheral athletic performance became globally visible. As a v0.1 pilot, the paper releases the corpus schema, frame taxonomy, annotation guidelines, hashed visual-evidence log, and typed timeline, while flagging full double annotation and inter-annotator agreement as planned work.

11.
arXiv (CS.CV) 2026-06-16

DynFS-MoE: Dynamic Functional-Structural Mixture-of-Experts for Post-Traumatic Epilepsy Diagnosis

Post-traumatic epilepsy (PTE) is a severe complication of traumatic brain injury (TBI), yet early identification remains challenging due to the complex structural and functional alterations it induces in the brain. To address this, we propose a dynamic multimodal Mixture-of-Experts (MoE) framework that integrates functional and structural MRI through time-aware functional-structural encoding and class-conditioned expert routing. Within this framework, modality-specific and cross-modal experts learn complementary representations, while a Modality-Class MoE (MCoE) module dynamically dispatches expert weights according to each classification objective. Experimental results across three binary classification tasks demonstrate that the framework consistently outperforms static fusion baselines, and high-interpretability analyses further reveal meaningful region-of-interest (ROI) interactions. This dynamic multimodal expert framework effectively captures class-dependent brain interaction patterns and provides an interpretable approach for PTE diagnosis and risk stratification.

12.
arXiv (CS.CL) 2026-06-16

Generative causal testing to bridge data-driven models and scientific theories in language neuroscience

Representations from large language models are highly effective at predicting BOLD fMRI responses to language stimuli. However, these representations are largely opaque: it is unclear what features of the language stimulus drive the response in each brain area. We present generative causal testing (GCT), a framework for generating concise explanations of language selectivity in the brain from predictive models and then testing those explanations in follow-up experiments using LLM-generated stimuli.This approach is successful at explaining selectivity both in individual voxels and cortical regions of interest (ROIs), including newly identified microROIs in prefrontal cortex. We show that explanatory accuracy is closely related to the predictive power and stability of the underlying predictive models. Finally, we show that GCT can dissect fine-grained differences between brain areas with similar functional selectivity. These results demonstrate that LLMs can be used to bridge the widening gap between data-driven models and formal scientific theories.

13.
arXiv (CS.LG) 2026-06-16

FEnc$^2$: Unifying Data Packing for Efficient Private Inference via Convolution and Architecture-Aware Fragment Encoding

arXiv:2606.16359v1 Announce Type: cross Abstract: Fully Homomorphic Encryption (FHE) enables privacy-preserving machine learning but incurs extreme computational and memory overhead. These costs come not only from expensive low-level primitives, including Number Theoretic Transform (NTT), rotation, and key-switching, but also from inefficient ciphertext packing at the application level. Existing packing strategies typically preserve either neighboring data elements or feature grouping, but not both, leading to wasted ciphertext slots, excessive rotations, and inflated ciphertext counts. We propose FEnc2, a unified and principled fragment-based encoding framework for CKKS-based private convolutional neural network inference. FEnc2 optimizes slot utilization, rotation complexity, and ciphertext density through two components: 1)Conv-aware Encoding, which analytically selects an optimal fragment size to decouple spatial dependencies and jointly minimize inner-outer rotations across layers, and 2)Arch-aware Ct Compression, which restores ciphertext density after feature- or channel-reduction layers. Together, these transformations reshape encrypted workload structure and reduce homomorphic operations by one to two orders of magnitude. With full memory capacity utilized, i.e., at maximum batch size, FEnc2 achieves end-to-end latency speedups over the state-of-the-art Orion of up to 228.83x on GPU and 226.06x on CPU for LeNet on MNIST, and up to 4.55x on GPU and 9.43x on CPU for MobileNet on ImageNet. FEnc2 is hardware-agnostic yet architecturally transformative: by optimizing encrypted tensor layout before execution, it reduces ciphertext count and workload pressure on hardware, complementing primitive-level optimizations such as NTT and keyswitch accelerators. These results show that application-level data layout is a first-order architectural design dimension for encrypted inference and an important enabler for next-generation FHE systems.

14.
arXiv (CS.AI) 2026-06-11

MoCA-Agent: A Market-of-Claims Code Agent for Financial and Numerical Reasoning

arXiv:2606.11537v1 Announce Type: new Abstract: Financial and tabular question answering requires more than fluent reasoning: answers must be grounded in the exact facts, formulas, units, signs, and scales that support them. A single misread cell or incorrect operation can silently produce a plausible but wrong result. We introduce \textsc{MOCA-Agent}, a market-of-claims code agent that replaces free-form multi-agent debate with claim-level verification. The system decomposes each question into typed atomic claims, asks specialist trader agents to buy or sell those claims, clears their orders into confidence-weighted accept/reject decisions, and synthesizes an executable Python program from market-supported evidence. A code-aware verifier then checks the program for execution, structural consistency, and common financial reasoning errors, with at most one market-aware repair round. Across ten public benchmarks spanning financial numerical reasoning, general tabular reasoning, ESG question answering, and multimodal chart reasoning, \textsc{MOCA-Agent} achieves strong performance using a fixed Qwen3.6-27B backbone, including $78.3\%$ on FinQA, $76.0\%$ on FinanceMath, $71.2\%$ on MultiHiertt, $86.9\%$ on ESGenius, and $85.6\%$ average on FinChart-Bench. These results show that aggregating evidence at the level of atomic claims, rather than whole answers, improves robustness in high-stakes numerical reasoning.\footnote{The code and data are available: https://github.com/UBC-NLP/MoCA-Agent.

15.
arXiv (CS.CV) 2026-06-17

Attention Alignment Between Humans and Vision-Language Models

Visual perception depends on top-down goals and bottom-up sensory mechanisms. Vision-language models implement both, allowing us to treat each component as a separable hypothesis about what drives where we look. We compared spatial attention maps from six vision-language models against human fixation heatmaps recorded on 200 images during two tasks (general description and social captioning). The six models spanned a 2$\times$2 factorial of CNN vs.\ ViT encoders crossed with LSTM vs.\ Transformer decoders, plus Molmo 7B-D and Qwen3.5 9B. We found that both decoder and encoder architecture shaped alignment, but decoder choice dominated. LSTM vs.\ Transformer decoders increased alignment by 40–50 percentage points (80–87\% vs.\ 40–59\% of the human noise ceiling). In contrast, CNN vs.\ ViT encoders contributed a secondary 5–20 point advantage depending on decoder family, with CNN-LSTM the most aligned model overall (85–87\%). Despite their alignment advantage, LSTM-decoder attention maps were spatially diffuse and minimally task-differentiated; ViT-Transformer, the weakest in alignment, showed the sharpest spatial concentration and strongest task differentiation. A hemispatial-neglect simulation confirmed that ablating attention impacted LSTM decoders more than Transformer decoders. In an exploratory extension using TRIBE-simulated synthetic neural responses, fixation alignment and neural relevance dissociate: CNN-Transformer attention maps better predicted synthetic brain activity despite lower fixation alignment, with attention maps best predicting early visual cortex. Together, top-down and bottom-up components trade off what they predict in behavioral and synthetic neural data.

16.
arXiv (CS.LG) 2026-06-24

Parallel Manifold Steering: Efficient Adaptation of Large Associative Memories via Residual Energy Shaping

arXiv:2606.24396v1 Announce Type: new Abstract: Large Transformer models function as Dense Associative Memories (DAMs), retrieving knowledge via high-dimensional attractor dynamics driven by the self-attention mechanism \citep{ramsauer2020hopfield, wu2024attention}. However, adapting these frozen memory systems to new tasks presents a fundamental ``Plasticity-Stability'' dilemma. Current methods either risk catastrophic interference by modifying synaptic weights directly (e.g., LoRA) \citep{hu2021lora} or degrade associative capacity by clogging the retrieval buffer with static prompt tokens (e.g., VPT) \citep{jia2022vpt}. In this work, we propose H-Res (Hierarchical Residual Steering), a mechanism that modulates the effective energy landscape of the Transformer without altering its global equilibrium or expanding its sequence length. By formulating adaptation as a control problem on the activation manifold \citep{chen2018neuralode}, H-Res learns a state-dependent vector field that steers token trajectories into task-specific basins of attraction. We formally prove that H-Res preserves the attention entropy of the foundation model and facilitates Neural Collapse \citep{papyan2020prevalence}. Empirically, Manifold Steering outperforms global weight modification by 26\% on associative retrieval tasks and eliminates the computational overhead of prompt-based methods, scaling effectively to structured domains \citep{zha2023vtab}.

17.
arXiv (CS.AI) 2026-06-11

Rule Taxonomy and Evolution in AI IDEs: A Mining and Survey Study

arXiv:2606.12231v1 Announce Type: cross Abstract: The adoption of AI-powered Integrated Development Environments (AI IDEs) has introduced "Rules" as a novel software artifact, allowing developers to persistently inject project-specific constraints and architectural guidelines into the context of Large Language Models (LLMs). Despite their role in aligning AI behavior with developer intent, the taxonomy, evolution, and practical impact of these rules remain largely unexplored. To bridge this gap, we conducted a mixed-methods empirical study on AI IDE rules. By mining 83 open-source projects and extracting 7,310 rules, we established a comprehensive taxonomy comprising 5 primary and 25 secondary categories. We then triangulated these artifacts with survey responses from 99 practitioners. Our analysis identified a contrast between developer priorities and actual configurations: while practitioners rate architectural constraints as highly important, rule files in repositories primarily consist of low-level workflow and code formatting constraints. Furthermore, our analysis of 1,540 rule evolution events revealed that rules are updated frequently. Repository data further indicate that rule evolution is primarily driven by constructive context expansions (29.17%) and enrichments (26.59%). In contrast, surveyed developers reported modifying rules primarily to correct AI errors (77.78%), typically by adding new negative constraints rather than editing existing ones. Finally, an artifact compliance assessment of 160 rule evolution events revealed that updating rules significantly improves the adherence of software artifacts, with the average artifact compliance rate increasing by 22.99% (from 49.14% to 72.13%) following an update. Our study provides empirical insights that can help developers optimize prompting strategies and guide tool builders in designing automated conflict-detection and context-management mechanisms for AI IDEs.

18.
arXiv (CS.AI) 2026-06-24

Themis: An explainable AI-enabled framework for Reinforcement Learning with Human Feedback

arXiv:2606.24622v1 Announce Type: new Abstract: Training safe Reinforcement Learning (RL) systems is inherently challenging, with no guarantee of avoiding unwanted behaviors. The most effective defenses against this are (i) transparency through explainability and (ii) alignment via human feedback. While both show promising results, no publicly available framework currently combines them. To address this, we introduce Themis, an XAI-enabled testing and evaluation framework for Reinforcement Learning from Human Feedback. Themis supports over 200 widely used environments and is easily configurable for experiments in RL, transparency, and alignment. Our results show that Themis can train reward models that match or outperform the environment's true reward signal using human preferences. We also provide a cloud-based platform for collecting human feedback and managing experiments. It is user-friendly, auto-scalable, and supports large participant groups across multiple experiments without extra development overhead. Tests show Themis can support one thousand users in back-to-back experiments on a modest commercial machine.

19.
arXiv (CS.CV) 2026-06-16

VANDERER: Map-Free Exploration using Future-Aware and Visual-Curiosity-Guided Diffusion Policy

Mobile agents require efficient exploration strategies to map unseen environments and autonomously plan tasks. Traditional methods rely on generating occupancy maps and optimizing the sequence in which unexplored regions are visited. However, in sensor-constrained settings, such as those limited to monocular cameras, generating accurate occupancy maps is challenging. To address this, we propose VANDERER, an exploration framework that leverages a Visual Curiosity Module (VCM) to guide pre-trained diffusion policies using only monocular image data. This curiosity module predicts the outcomes of proposed actions via a navigation world model and evaluates them through a curiosity cost. The cost then guides the diffusion process toward generating actions that maximize exploration. Evaluated across diverse simulated environments, VANDERER consistently outperforms established baselines, exploring an average of 13.4% more area than NoMaD. Our results reveal a direct correlation between visual and geometric curiosity in outdoor environments, demonstrating that VANDERER can effectively leverage this relationship for efficient exploration using sensor-constrained agents.

20.
arXiv (CS.CL) 2026-06-12

MDForge: Agentic Molecular Dynamics Pipeline Design under Sparse Simulator Feedback

Molecular dynamics (MD) is the canonical in-silico method for atomistic molecular science, simulating molecular behavior from first-principle physics. Designing an MD pipeline for a new system requires substantial expert knowledge: running it on even one molecule is expensive, ruling out trial-and-error. We automate this expert pipeline-design process with an LLM agent. Unlike existing MD agents that orchestrate a predefined tool set, we treat pipeline design as open-ended code generation in which the agent's behavior is reshaped online by verbal reward. Specifically, we build MDForge, an LLM agent whose in-context update rule densifies the sparse reward via a multi-agent debate among physics experts. On three SAMPL host-guest binding free-energy benchmarks, MDForge automatically designs MD pipelines competitive with human experts. Deployed on a library of unseen candidate guests, its CB[7] pipeline discovers a novel binder that wet-lab competition NMR confirms is a high-affinity, picomolar CB[7] binder. Our data and code are available at https://github.com/Zehong-Wang/MDForge.

21.
medRxiv (Medicine) 2026-06-10

Trajectories of brain structure and function in young adult carriers of genetic frontotemporal dementia variants

Background and Objectives: Converging evidence hints at neurodevelopmental effects in genetic frontotemporal degeneration (FTD). In cross-sectional studies, for some genes, young adult FTD variant carriers show differences in brain volumes and cognition compared to familial non-carriers. However, longitudinal trajectories may more sensitively capture FTD-related neurodevelopmental vs. neurodegenerative changes than cross-sectional approaches. This study examined longitudinal trajectories of brain volumes, executive function, and plasma biomarkers in young adult carriers compared to familial non-carriers, as measures of neurodevelopmental and neurodegenerative outcomes of FTD-causing variants. Methods: This longitudinal cohort study comprised participants, aged 18-30 years, from the FTD Prevention Initiative across Europe, Canada, and the USA. Genetic groups included C9orf72 (47%), MAPT (30%), and GRN (23%). Linear mixed-effects models were computed to assess longitudinal outcomes across age between groups, controlling for sex, scanner (for brain volumes), and education (for executive function); random effects accounted for between-subject variability nested within family membership. Results: Variant carriers (n=147) and familial non-carriers (n=113) did not differ in age (mean{+/-}SD, 25.9{+/-}3.2 years), sex (53% female), or number of visits (2.1{+/-}1.7). Young adult C9orf72 repeat expansion carriers exhibited smaller thalamic volumes than non-carriers at the reference age of 26 years (b=-982.8mm3, SE=317.0, p=0.0046, f2=0.32), with relatively stable trajectories across ages 18-30 (i.e., no change over time). Trajectories of rostral anterior cingulate volumes differed in C9orf72 carriers and non-carriers across age, where carriers showed relatively stable trajectories and non-carriers showed age-appropriate declines (b=64.4mm3, SE=29.9, p=0.035, f2=0.07). For MAPT and GRN, there were little to no differences in total brain, cortical, or subcortical volumes between groups and over time. No longitudinal differences were observed between carriers and non-carriers in executive function, or plasma NfL or GFAP for any genetic group. Discussion: C9orf72 repeat expansions were linked to smaller average thalamic volumes and stable trajectories between ages 18 to 30, supporting potential neurodevelopmental origins. The modest evidence supporting an absence of difference in neurodegenerative biomarkers and executive function suggests minimal early neurodegeneration and functional preservation in young adulthood.

22.
medRxiv (Medicine) 2026-06-16

Using visual biofeedback to reduce step length error at fast walking speeds is feasible after stroke

Background and Purpose: Walking after stroke is often characterized by persistent biomechanical impairments and reduced walking capacity. While visual biofeedback can improve gait mechanics and fast walking can enhance capacity, it is unclear whether individuals post-stroke can effectively use biofeedback at higher walking speeds to address both deficits simultaneously. This study examined the effects of walking speed on the ability of participants with chronic stroke to reduce step length (SL) errors using visual biofeedback. Methods: Sixteen individuals with chronic stroke walked on a treadmill at slow, self-selected, and fast speeds with and without visual SL biofeedback. Absolute SL error relative to individualized targets was calculated for paretic and non-paretic limbs. Linear mixed-effects models with piecewise linear splines assessed the effects of speed, limb, and feedback condition. Post hoc comparisons were performed for significant interactions. Results: At lower speeds, increasing speed reduced SL error in both limbs (p < 0.001). At higher speeds, the effects of speed were dependent on limb and condition (p < 0.001). Paretic SL error increased with speed without feedback but remained stable with feedback (p < 0.001). Non-paretic SL error decreased with speed regardless of condition. SL error was greater in the paretic limb overall (p < 0.001). Discussion and Conclusions: Fast walking alone did not reduce paretic SL errors. Participants with chronic stroke can effectively use visual biofeedback to reduce paretic SL errors at higher speeds, supporting its integration into high-intensity gait training to simultaneously treat biomechanical impairments and walking capacity deficits after stroke.

23.
arXiv (CS.AI) 2026-06-15

MA-ProofBench: A Two-Tiered Evaluation of LLMs for Theorem Proving in Mathematical Analysis

arXiv:2606.13782v1 Announce Type: new Abstract: Large Language Models (LLMs) have made notable progress in automated theorem proving, yet existing formal benchmarks remain limited in both mathematical coverage and difficulty. Most are concentrated in areas that are easier to formalize, such as algebra and elementary number theory, and provide limited coverage of subfields that require deeper reasoning, including mathematical analysis. To address this gap, we introduce MA-ProofBench, to the best of our knowledge, the first formal theorem-proving benchmark dedicated to Mathematical Analysis. The benchmark contains 200 formalized theorems covering 6 core topics and 27 subcategories, including measure and integration theory, complex analysis, and functional analysis. The problems are divided into two difficulty levels, an undergraduate level (Level I, 100 problems) and a Ph.D. qualifying level (Level II, 100 problems), to evaluate how well LLMs perform formal reasoning at different mathematical depths. Each problem is constructed through a human-led, LLM-assisted formalization pipeline followed by independent expert review, ensuring that the formal statements remain faithful to the original mathematics. We evaluate a range of recent general-purpose reasoning models and formal theorem provers on MA-ProofBench. However, most models perform poorly: even the best-performing model, GPT-5.5, achieves only 16% Pass@8 on Level I and 5% on Level II, while most models stay close to 0% on Level II. Further analysis identifies Mathlib hallucinations and incomplete proofs as the two dominant failure modes, while an evaluation on the natural-language version of the benchmark exposes a clear gap between informal and formal reasoning. MA-ProofBench is intended to serve as a reliable reference for tracking progress in formal mathematical reasoning in advanced domains.

24.
arXiv (CS.LG) 2026-06-19

Score Approximation for Diffusion Models on Arbitrary Low-Dimensional Structures

arXiv:2606.19894v1 Announce Type: new Abstract: The remarkable success of score-based diffusion models has spurred significant efforts to establish their theoretical foundations. However, existing complexity bounds for score approximation rely heavily on restrictive assumptions like Lipschitz continuous densities or smooth manifold supports, which are routinely violated by the singularities, sharp boundaries, and disjoint clusters inherent to real-world perceptual data. This work establishes a universal score approximation theorem that works for any distribution supported on any compact set of upper Minkowski dimension $d$. Using a novel discrete-mixture formulation, we prove that the score function can be approximated with a ReLU network whose complexity grows exponentially only with $d$, thus breaking the exponential curse of ambient dimensionality. Combined with existing theories on accurately solving the backward diffusion SDE for arbitrary compact distributions, our work shows that diffusion models readily adapt to irregular, non-smooth data structures, explaining their competence in real-world generative tasks.

25.
arXiv (CS.CV) 2026-06-24

A Geometry-Informed Computer Vision Method for Detecting and Examining Overtaking Vehicles From A Bicycle

Instrumented bicycle studies have produced direct field evidence on vehicle passing behavior, but extracting overtaking events from continuous rear-facing video has remained dependent on manual, frame-by-frame annotation. This bottleneck constrains sample sizes and limits naturalistic cycling safety research. We present a geometry-informed computer vision pipeline that automates overtaking event detection from a single bicycle-mounted camera without multi-sensor configurations or explicit camera calibration. The system combines RT-DETR object detection with ByteTrack multi-object tracking through a three-stage geometric validation module enforcing bearing angle trend, apparent size growth, and spatial confirmation criteria derived from perspective projection principles. Validated on 315 manually annotated real-world overtaking events from urban roads in Ann Arbor, Michigan, the pipeline achieved 97.8% recall with zero false positives. The system identified overtaking intentions a mean of 2.44 seconds before vehicle passage, with 84.1% of events exceeding the 1.5-second human reaction time threshold, demonstrating feasibility for active cyclist warning. Lateral passing distance measurements from 96 events revealed 33.3% of passes below the 5-foot (152.4 cm) threshold, consistent with non-compliance rates in prior field and self-reported studies. A preliminary calibration-free lateral distance estimation approach using bounding box geometric features achieved mean absolute errors of 13-14 cm under leave-one-out cross-validation, sufficient to distinguish close passes from standard passes for safety categorization. By automating event isolation from consumer-grade footage, the system removes the primary annotation bottleneck of instrumented bicycle research and provides a scalable foundation for vehicle-bicycle interaction analysis across larger datasets and diverse urban environments.