Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-12

GAE: Unleashing Physical Potential of VLM with Generalizable Action Expert

Vision-language models demonstrate strong reasoning and planning abilities, yet grounding these predictions into precise robot actions remains a central challenge. Existing Vision-Language-Action methods typically entangle reasoning and action generation, leading to limited generalization. We propose Generalizable Action Expert (GAE), a task-agnostic model that converts sparse geometric plans into dense robot actions. Our approach introduces a sparse geometric interface: the VLM predicts sparse 3D waypoints representing high-level intention, while GAE maps these waypoints together with real-time point cloud observations to continuous action trajectories. GAE is pretrained on a large-scale pointcloud-trajectory dataset comprising 150k trajectories from both simulation and real-world robots. To further improve efficiency and generalization, we introduce an Action Pre-training, Pointcloud Fine-tuning (APPF) scheme that decouples learning action dynamics from geometry grounding. After pretraining, GAE is frozen and reused across downstream tasks, requiring only lightweight fine-tuning of the VLM to produce the sparse interface. Experiments show that our method achieves strong performance and generalization across diverse visual domains, camera viewpoints, and natural language instructions.

02.
arXiv (math.PR) 2026-06-12

Exact Fourier dimensions of dyadic Mandelbrot cascades under minimal integrability

arXiv:2606.08683v2 Announce Type: replace Abstract: We determine the Fourier dimension of dyadic Mandelbrot cascades under the minimal Kahane-Peyriere integrability condition. The interval theorem is proved in a vector-valued dyadic cascade model in which sibling weights may have arbitrary dependence. For every balanced energy-admissible vector law, almost surely on non-extinction, dim_F(mu)=dim_E(mu)=dim_2(mu)=D_E(X). In the canonical scalar case, under W>=0, E W=1, E[W log_2^+ W]

03.
arXiv (CS.LG) 2026-06-19

Predicting gestational age at birth in the context of preterm birth from multi-modal fetal MRI

arXiv:2606.20172v1 Announce Type: new Abstract: Preterm birth is associated with significant mortality and a risk for lifelong morbidity. The complex multifactorial aetiology hampers accurate prediction and thus optimal care. A pipeline consisting of bespoke machine learning methods for data imputation, feature selection, and regression models to predict gestational age (GA) at birth was developed and evaluated from comprehensive multi-modal morphological and functional fetal MRI data from 333 control cases and 93 preterm birth cases. The GA at birth predictions were classified into term and preterm categories and their accuracy, sensitivity, and specificity were reported. An ablation study was performed to further validate the design of the pipeline. Performance was evaluated using stratified 10-fold cross-validation. The pipeline achieves an R2 score of 0.13 and a mean absolute error of 2.74 weeks. It also achieves a 0.77 accuracy, 0.59 sensitivity, and 0.82 specificity across folds. The predominant features selected by the pipeline include cervical length and statistics derived from placental T2* values. The confluence of fast, motion-robust and multi-modal fetal MRI techniques and machine learning prediction allowed the prediction of the gestation at birth. This information is essential for any pregnancy. To the best of our knowledge, preterm birth had only been addressed as a classification problem in the literature. Therefore, this work provides a proof of concept. Future work will increase the cohort size to allow for finer stratification within the preterm birth cohort. Our code is available at https://github.com/dfajardorojas/ml-for-preterm-birth-.

04.
medRxiv (Medicine) 2026-06-12

Deconvolution-based cell-type specific DNA methylation-wide and transcriptome-wide association studies identify risk CpG sites and genes associated with colorectal cancer risk

Bulk tissue-based DNA methylation-wide (MWAS) and transcriptome-wide association studies (TWAS) have identified CpG sites and genes associated with colorectal cancer (CRC) risk, but do not account for cellular heterogeneity. To address this, we developed a deconvolution-informed framework to infer cell-type specific DNA methylation and gene expression profiles from bulk normal colon tissues using reference single-cell epigenomic and transcriptomic datasets. We performed cell-type specific MWAS (ctMWAS) using deconvoluted DNA methylation data from 293 normal colon samples and conducted cell-type specific TWAS (ctTWAS) using deconvoluted gene expression data from 707 normal colon samples. Genetically predicted methylation and expression models were integrated with CRC GWAS summary statistics (78,473 cases and 107,143 controls) to identify risk-associated CpG sites and genes. Through ctMWAS, ctTWAS, and colocalization analyses, we identified 178 significant cell-type-specific CpG sites in 106 loci and 68 risk genes in 40 loci, including 26 previously unreported loci. Through additional integrative methylation-gene analysis, we prioritized 132 candidate risk genes, the majority of which were supported by multi-omics evidence and stage-specific dysregulation across the adenoma-carcinoma and serrated-carcinoma progression pathways. Pathway enrichment analyses implicated pathways involved in DNA double-strand break repair, TP53 regulation, TGF-{beta} signaling, and innate immune responses. Among prioritized genes, 14 were identified as putative druggable targets linked to 90 FDA-approved or clinical-stage drugs. Experimental validation supports an oncogenic role for SF3A3. These findings demonstrate that deconvolution-informed integrative analyses enable cell-type-resolved identification of epigenetic and transcriptional mechanisms underlying CRC susceptibility and provide insights into disease biology, prevention, and therapeutic target discovery.

05.
arXiv (CS.CV) 2026-06-17

MoonSplat: Monocular Online Gaussian Splatting with Sim(3) Global Optimization

Online 3D reconstruction from monocular image sequences is a challenging and ongoing research topic. 3D Gaussian Splatting (3DGS), leveraging its high-quality real-time rendering capability, empowers online 3D reconstruction to represent dense scenes with enhanced expressiveness, and thus holds great promise for a wide range of applications such as robotics and AR/VR. However, existing online 3DGS methods still suffer from some key challenges: fragile camera pose estimation due to the lack of global optimization, and low optimization efficiency in large-scale or long-sequence scenarios. To address these issues, we propose a robust and efficient online voxelized 3DGS reconstruction framework integrated with global $Sim(3)$ optimization, which enables reliable camera tracking and efficient global loop closure for both camera poses and voxelized 3DGS. To accelerate the convergence of the voxelized 3DGS, we further introduce a color residual learning strategy, which not only boosts optimization speed but also enhances rendering quality. Extensive experiments on diverse indoor and outdoor datasets demonstrate that our method achieves state-of-the-art performance in both camera pose estimation accuracy and rendering quality, while retaining real-time efficiency. Additionally, we develop and deploy a real-world UAV-based active reconstruction system grounded on our proposed method, validating its robustness and generalizability for practical online 3D reconstruction tasks. Our code and data are available at https://github.com/TrickyGo/MoonSplat.

06.
arXiv (CS.CV) 2026-06-17

BrainWorld: A Structural-Prior-Conditioned Generative Model for Whole-Brain 4D fMRI Dynamics

Whole-brain 4D fMRI generation is valuable for modeling functional brain dynamics, yet existing fMRI foundation models mainly target representation learning and downstream prediction rather than conditional predictive generation. We introduce BrainWorld, a structural-prior-conditioned generative model for whole-brain 4D fMRI dynamics. BrainWorld uses sMRI as subject-level anatomical context to guide future fMRI generation, integrating structural information into the denoising process rather than treating it as a parallel modality. Evaluated on 22 datasets spanning diverse cohorts and brain states, BrainWorld generates stable 4D fMRI trajectories up to 400 frames, improves downstream performance through generated-example augmentation, and learns transferable multimodal representations that outperform baselines. Together, these results establish BrainWorld as a condition-aware generative framework for long-horizon brain dynamics modeling and multimodal representation learning.

07.
medRxiv (Medicine) 2026-06-10

Epidemiology of Cervical Precancerous Lesions: Prevalence and Predictors from Pap Smear Screening in Hawassa City Hospitals, Sidama Region, Ethiopia. Institutional-Based Cross-sectional Study

Background: Cervical cancer is the fourth most common cancer in women worldwide and remains a major public health challenge. In Ethiopia, it is the second leading cause of cancer deaths, with around 8,000 new cases and 6,000 deaths each year. Region?specific data on the prevalence and predictors of precancerous lesions remain scarce, yet such information is vital for guiding targeted reproductive health strategies. This study therefore examined the prevalence and predictors of cervical precancerous lesions among women aged 21-60 years undergoing Pap smear screening in public hospitals in Hawassa City, Sidama Region. Methods: An institution-based cross-sectional study was conducted among 241 women attending Pap smear screening at public hospitals in Hawassa City from March to August 2025. Sociodemographic and clinical data were collected via interviews and medical records. Lesions were classified based on the standardized international framework for reporting cervical cytology results from Pap smears per the Bethesda system. Multivariable logistic regression identified predictors p

08.
arXiv (CS.CL) 2026-06-11

When Generic Prompt Improvements Hurt: Evaluation-Driven Iteration for LLM Applications

Evaluating Large Language Model (LLM) applications differs from conventional software testing because outputs are probabilistic, semantically variable, and sensitive to prompt and model changes. This technical report proposes the Minimum Viable Evaluation Suite (MVES), an audit-oriented structure for application-level LLM evaluation. MVES links application categories to failure modes, metrics, required artifacts, and validation evidence across general LLM applications, retrieval-augmented systems, and agentic workflows. We pair the framework with a reproducible local evaluation harness covering structured extraction, RAG citation/content-compliance, and instruction-following checks. Using Ollama with Llama 3 8B Instruct and Qwen 2.5 7B Instruct, we evaluate five prompt conditions over expanded 30-case-per-suite ablations. The results show that, in the tested local conditions, generic prompt additions do not produce monotonic improvements: stronger output-contract prompts improve strict extraction for both models, while RAG citation/content-compliance declines under some generic-rule conditions. The largest observed decline occurs for Qwen 2.5 on RAG when generic rules are appended to the user prompt, from 26/30 to 9/30. These findings support evaluation-driven prompt iteration: prompt changes should be treated as potential regression risks and tested against task-specific suites before deployment. The accompanying repository contains the test suites, prompt variants, evaluation harness, raw result logs, and scripts needed to reproduce the reported local ablations.

09.
arXiv (CS.AI) 2026-06-16

Calibrated Sampling-Free Uncertainty Estimation in Bayesian Deep Learning

arXiv:2606.16214v1 Announce Type: cross Abstract: Modern deep learning models remain notoriously prone to overconfidence, limiting their reliability in high-stakes applications. Bayesian methods aim to counter this by learning a distribution over model parameters, and recent advances now make this feasible for large-scale architectures at costs comparable to AdamW. However, a challenge remains at test time: predictions must be averaged across many forward passes with weights sampled from the posterior, which is prohibitively expensive. Variance propagation offers an efficient alternative, computing layer-wise analytical approximations of uncertainty in a single forward pass. While such techniques are effective for MLPs, their extension to modern architectures remains challenging, due to increased depth and diversity of layer types. To fill this gap, we propose Calibrated Variance Propagation (CVP), which introduces a new propagation method for normalization layers, combines it with recent techniques for handling activation functions, and absorbs residual error through a light calibration step. CVP yields comparably accurate uncertainty estimates to MC sampling across transformers and CNNs, at a fraction of the cost. Against prior variance propagation work, CVP improves coverage at $0.5\%$ risk from $8.2\%$ to $14.6\%$ with BEiT-3 on Visual Reasoning (NLVR2) and from $2.6\%$ to $10.8\%$ with ViLT on VQAv2, with gains extending to convolutional architectures.

10.
arXiv (CS.AI) 2026-06-15

Safety-Contract Graph Multi-Agent Reinforcement Learning for Autonomous Network Security Response

arXiv:2606.13832v1 Announce Type: cross Abstract: Autonomous network-security response systems promise to reduce Security Operations Centre (SOC) reaction latency, but reward-only multi-agent reinforcement learning (MARL) can improve security reward while remaining non-deployable. We present a safety-contract graph MARL framework and instantiate it as ACD$^3$-GAT (Adaptive Constrained Counterfactual Decisioning with a Graph Attention Network encoder), an architecture that separates simulator observations from reusable operational budgets, constrained optimization, graph state encoding, and counterfactual action screening. We evaluate the method in CAGE Challenge 4, where agents operate under budgets for Mean Time to Recover (MTTR), false-positive response, and firewall change-management disruption. Across the benchmark, every unconstrained method violates the SOC downtime budget in 100% of evaluated episodes, with mean downtime proxy costs of 311-430 against a budget of 50. This complements prior CAGE Challenge 4 findings by showing that reward-only learning lacks operational discipline. Constrained MAPPO-GAT (C-MAPPO-GAT) isolates Lagrangian operational-cost control and budget-aware screening, while ACD$^3$-GAT adds budget context, CVaR tail-risk estimation, opponent-belief state, and Graph Counterfactual Risk Propagation (G-CRP). The replicated comparison includes three 200-episode seeds for IPPO, MAPPO-GAT, C-MAPPO-GAT, and ACD$^3$-GAT. C-MAPPO-GAT reduces downtime violation from 100% to 0.3% and mean downtime cost from 355.4 to 15.5 relative to MAPPO-GAT. ACD$^3$-GAT reduces mean downtime cost to 48.2 with a 13.8% violation rate, placing it on the safety-contract frontier rather than at the most conservative compliance point. Topology-seed and coupled adaptive Red-process stress tests preserve this contrast and show lower worst adaptive degradation for safety-constrained policies than reward-only MAPPO-GAT.

11.
arXiv (CS.LG) 2026-06-18

PRISM: A 3D Probabilistic Neural Representation for Interpretable Shape Modeling

arXiv:2602.11467v2 Announce Type: replace Abstract: Understanding how anatomical shapes evolve in response to developmental covariates - and quantifying their spatially varying uncertainties - is critical in healthcare research. Existing approaches typically rely on global time-warping formulations that ignore spatially heterogeneous dynamics. We introduce PRISM, a novel framework that bridges implicit neural representations with uncertainty-aware statistical shape analysis. PRISM models the conditional distribution of shapes given covariates, providing spatially continuous estimates of both the population mean and covariate-dependent uncertainty at arbitrary locations. A key theoretical contribution is a closed-form Fisher Information metric that enables efficient, analytically tractable local temporal uncertainty quantification via automatic differentiation. Experiments on three synthetic datasets and one clinical dataset demonstrate PRISM's strong performance across diverse tasks - from modeling shape evolution to personalized shape prediction and anomaly detection - within a unified framework, while providing interpretable and clinically meaningful uncertainty estimates.

12.
arXiv (CS.AI) 2026-06-11

Position: Hippocampal Explicit Memory Is the Cornerstone for AGI

作者:

arXiv:2606.11245v1 Announce Type: new Abstract: Large Language Models (LLMs) have demonstrated remarkable capabilities across various tasks, raising expectations for Artificial General Intelligence (AGI). This position paper argues that integrating explicit memory is the cornerstone for advancing LLMs toward AGI. The key reason is that the underlying learning mechanism of LLMs is highly analogous to human implicit memory. However, higher-order cognitive functions necessary for AGI, such as long-term strategic planning, metacognition, and symbolic reasoning, heavily rely on hippocampal explicit memory and cannot arise solely from implicit statistical learning. Drawing on findings from neuroscience, I advance this perspective and complement it with computational requirements for artificial explicit memory systems, hoping to foster further research and lay the groundwork for explicit memory integration.

13.
arXiv (CS.CL) 2026-06-18

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

Long-context reasoning is an essential capability for large language models, particularly when they are deployed as autonomous agents that must reason over lengthy trajectories. Reinforcement learning (RL) has recently emerged as a dominant paradigm for improving this ability, yet existing work largely focuses on reward engineering while diverse training data remains scarce. We revisit this problem from a data-centric perspective and show that a simple yet effective data recipe alone, paired with a minimal outcome-based GRPO setup, suffices to substantially improve long-context reasoning. Our recipe targets three complementary task families – retrieval, multi-evidence synthesis, and reasoning – for which we construct and curate eight datasets totaling ~14K examples. Experiments on three models (Qwen3-4B/8B/30B-A3B) yield average gains of +7.2/+3.2/+6.4 points across seven long-context benchmarks, surpassing prior RL training sets. We further demonstrate that these gains transfer to agentic tasks, where continuing RL training on an agent-tuned model with our data recipe improves GAIA by +4.8 and BrowseComp by +7.0 points. We will release our datasets to facilitate future research.

14.
medRxiv (Medicine) 2026-06-10

A Three-Tier Operational Benchmark for Evaluating Large Language Models on Hospital Medication Safety

Objective. To introduce PsiBench, a clinically validated medication-safety benchmark for evaluating large language models (LLMs) against the standards used to certify hospital computerized provider order entry (CPOE) and electronic health record (EHR) systems, and a non-overlapping three-tier evaluation framework separating highest-stakes discrimination, the operational CDS regime, and category-correct alerting. Materials and Methods. PsiBench comprises 492 medication-safety scenarios across 11 safety categories, created by clinical pharmacology experts whose work underpins an annualized testing procedure used by more than 2,000 U.S. hospitals. The three-tier framework partitions the scenarios non-overlappingly: Discrimination (98 scenarios, 50 fatal vs 48 deception, near-balanced 51%/49%); Operational (394 scenarios, 261 serious unsafe plus 133 safe including 41 Excessive Alerts reclassified as operational negatives); and Attribution (311 alert-required scenarios). We evaluated 40 frontier LLMs from 10 providers over 3 runs per scenario at temperature 0.2 (or the provider default where temperature is not configurable), yielding 59,040 evaluations conducted April 21-23, 2026. Results. Headline binary performance on the full benchmark spans a wide range across the 40 models: F1 78.5%-92.3%, accuracy 65.4%-89.8%, sensitivity 81.4%-100.0%, specificity 6.1%-81.8%. Leading models by F1 (o4-mini 92.3%; o3 92.2%) pair high sensitivity with meaningful specificity; three models saturate sensitivity at 100% but fall below 25% specificity, indistinguishable from a naive always-alert classifier. The wide spread on a single headline metric motivates tier-specific analyses, developed in a separate clinical paper. Discussion and Conclusion. PsiBench and the three-tier framework operationalize a rigorous evaluation rubric for LLM medication safety, grounded in two decades of national hospital audit experience. The framework generalizes to any binary medication-safety classifier (rule-based, conventional ML, or LLM-driven), supporting tier-aware model selection and post-deployment surveillance.

15.
arXiv (CS.AI) 2026-06-16

MBABench: Evaluating LLM Agents on End-to-End Spreadsheet Tasks in Finance

arXiv:2605.22664v3 Announce Type: replace Abstract: LLM agents are increasingly expected to carry out end-to-end workflows, producing complete artifacts from high-level user instructions. To meet enterprise needs, frontier AI labs have developed agents that can construct entire spreadsheets from scratch. This is especially relevant in finance, where core workflows such as financial modeling, forecasting, and scenario analysis are commonly conducted through spreadsheets. Yet, existing spreadsheet benchmarks do not measure this advanced capability, focusing instead on question-answering or single-formula edits. To address this gap, we provide one of the first evaluations of agents on end-to-end spreadsheet tasks, focusing on economically critical financial workflows such as modeling and scenario analysis. Since deliverables therein are routinely reviewed and revised by multiple stakeholders, judging their quality necessarily involves high-level criteria such as readability or ease of modification. To reflect the multidimensional nature of solution quality, we develop an evaluation taxonomy comprising three dimensions: Accuracy, Formula, and Format, each comprising fine-grained criteria that reflect professional standards. The Claude family leads the benchmark and produces the most professional-looking outputs in our qualitative review, but even the strongest agents frequently fall short of professional finance standards and degrade sharply as the difficulty increases beyond a few chained calculations. This suggests that current agents are not yet able to reliably produce professional-quality spreadsheets at the level of complexity real-world workflows demand.

16.
arXiv (CS.CV) 2026-06-16

Tool-IQA: Augmenting Image Quality Assessment with Simple Tools

Vision-Language Models (VLMs) have been increasingly adopted for Image Quality Assessment (IQA). However, current methods typically employ a static one-shot scoring paradigm, despite the fact that humans assess image quality through dynamic visual inspection, e.g., selectively adjusting views to verify details and subtle artifacts. Specifically, relying solely on a single-pass observation introduces two primary limitations: first, perceiving the image only at a global scale restricts the assessment of finer local details; second, the original intensity distribution of the image may overwhelm the visibility, leading to insufficient inspection of image quality. To address these issues, we propose Tool-IQA, shifting the assessment mechanism from passive scoring to a tool-augmented workflow. In particular, we equip VLMs with simple yet effective view tools: a Magnifier to inspect local details, and a Gamma Corrector to uncover visibility and hidden artifacts. The assessment follows a structured pipeline that consists of an initial observation with rubric notes, a tool-augmented in-depth inspection, and a final quantification for calibrated quality score. Furthermore, to ensure efficient and purposeful tool callings, we introduce a batch-aware training strategy to reward tool interactions that can yield positive contributions rather than simply encouraging usage. Experiments on a variety of IQA benchmarks demonstrate that, with effective tool calling and calibrated assessment, our proposed Tool-IQA significantly outperforms existing state-of-the-art models, e.g., it achieves a PLCC of 0.854 on the challenging CLIVE dataset.

17.
arXiv (CS.CV) 2026-06-17

ReAge3D: Re-Aging 3D Faces with View Consistency

We present a novel framework for realistic and controllable 3D face re-aging which produces highly detailed, identity-preserving results. Existing 3D editing methods, while effective for coarse semantic changes, are not well suited for re-aging, as even small inconsistencies across re-aged 2D views can lead to over-smoothing of subtle but perceptually important age-related details. To address this challenge, we first introduce a 2D diffusion-based re-aging model, DiffReaging, trained on synthetically generated image pairs. We further propose a center-out editing propagation strategy that leverages this re-aging model to reconstruct multi-view-consistent re-aged images. Specifically, starting from a re-aged frontal pivot view, we reconstruct the remaining views through warping and our proposed Masked-DiffReaging process. By injecting existing content at every step of the diffusion process, Masked-DiffReaging ensures that the reconstructed regions remain coherent with existing pixels. The resulting consistent set of re-aged views supervises the optimization of the re-aged 3D representation. Our method outperforms existing 3D editing techniques both visually and quantitatively, enabling smooth, fine-grained control over age transformations in 3D face models.

18.
medRxiv (Medicine) 2026-06-15

Shortened blastocyst vitrification achieves live birth rates comparable to standard protocols: an analysis of 3168 cryotransfers

Study question Do shortened blastocyst vitrification and warming protocols provide comparable live birth rates (LBR) and obstetrical and perinatal outcomes to traditional vitrification and warming protocols? Summary answer Shortened vitrification and warming protocols provide comparable LBR, obstetric and perinatal outcomes to traditional protocols. Shortened vitrification coupled with traditional multi step warming benefitted women >35yrs. What is known already Embryo viability following cryopreservation is dependent on blastomere survival and functional integrity, both impacted by ice crystal formation and osmotic gradients. Recent innovations in cryopreservation challenge the need for stepwise dehydration and rehydration protocols. While one step ''fast'' blastocyst warming protocols seem to provide equivalent clinical outcomes to traditional ''slow'' protocols, fewer studies investigate whether blastocyst dehydration rates can be similarly increased. A thorough safety and effectiveness evaluation remains necessary for both treatment success and offspring health. Study design, size, duration Three clinics within a network participated in this retrospective consecutive cohort study, with cycle data collected for 3603 warmed blastocysts resulting in 3168 frozen blastocyst transfers in 2170 patients between 2023 and 2025. We modelled the relationship between ''fast'' versus ''slow'' protocols and outcomes with Generalized Additive Models, and linear and logistic regressions where appropriate. Two tailed chi square with Yates correction was used to examine pregnancy loss and obstetrical and perinatal outcomes; p0.05). Importantly, women 35yrs or older at vitrification (n=1715 transfers) profited from a F/S strategy, which provided a significant increase in live birth rates (OR:1.42 [1.02-1.98] p=0.038) compared to S/S. The same improved live birth following a F/S strategy were also seen in embryos of lower quality (OR:1.78 [1.12-2.83] p=0.015), suggesting of a protective effect of this cryopreservation strategy on the developmental competence of impaired germplasm. Limitations, reasons for caution Factors affecting the results may be unaccounted for by the study retrospective nature. Wider implication of the findings Overall, shortened, ''faster'' vitrification and warming protocols provide comparable reproductive outcomes to traditional ones. The combination of shorter exposure to cryoprotectant (CPA) during vitrification and stepwise osmotic gradient during warming provided significant clinical benefits specifically to patients >35 and lower quality embryos, pointing to the possibility of adapting vitrification protocols to specific patients populations and optimizing their clinical outcomes.

19.
arXiv (quant-ph) 2026-06-12

Coupling-Grouped XY-QAOA for Joint Anomaly-Feature Selection

arXiv:2606.13244v1 Announce Type: new Abstract: Selecting anomalous samples and explanatory features under fixed budgets defines a coupled constrained-optimization problem. Sequential feature-first selection ranks features before choosing samples, which can overlook features whose utility depends on which samples are selected, especially when scores are calibrated from reference data that may be limited, noisy, or drifting. We instead formulate the task as joint sample-feature selection under the same fixed counts. In the analyzed formal model, calibration-error sensitivity grows linearly with the number of samples for feature-first ordering but stays constant for joint selection. We introduce Coupling-Grouped XY-QAOA, a constraint-preserving grouped-angle variant for the resulting optimization problem. On matched sparse IBM Heron R3 benchmarks, a hardware-aware implementation reduces circuit depth by 45.9%-61.3% and two-qubit gates by 2.6%-5.2% relative to Qiskit optimization level 3 on the CZ-basis target. It enables, to our knowledge, the largest reported width-depth configurations for constraint-preserving bipartite-selection QAOA hardware executions with feasible-sector retention: 64 qubits at p=2 and 36 qubits at p=3. The 20-qubit p=5 runs retain 63% valid samples. Across 36-64 qubits, fixed-angle runs yield lower-energy feasible samples than matched random-feasible sampling. Warm starts reduce the gap to strict-feasible classical references by 57.5%-80.5%, and near-budget repair matches the sparse classical reference at 36 qubits. Benchmarks show gains in balanced fixed-budget regimes, and noiseless simulations show that problem-structured angle grouping improves over same-depth XY-QAOA and matched-parameter, type-preserving randomization controls. Overall, the results support calibrated joint selection and hardware-realizable constrained-mixer execution in the tested regimes.

20.
arXiv (CS.CV) 2026-06-16

3D Classification of Paramagnetic Rim Lesions in Multiple Sclerosis via Asymmetric QSM-FLAIR Modeling

Paramagnetic rim lesions (Rim$^+$) identified on susceptibility-sensitive MRI have recently emerged as a specific biomarker of chronic active inflammation in Multiple Sclerosis (MS) and are associated with long-term disability progression. However, susceptibility imaging and expert interpretation remain limited to specialized centers, visual assessment is time-consuming and variable, and the low prevalence of Rim$^+$ lesions poses severe class imbalance challenges for automated analysis. We propose a 3D multimodal deep learning framework for lesion-level Rim$^+$/Rim$^-$ classification from Quantitative Susceptibility Mapping (QSM) and FLAIR MRI. The architecture explicitly models modality asymmetry by treating QSM as the primary susceptibility-driven signal and conditioning it with FLAIR-derived structural context. To improve robustness under limited data, we employ self-supervised multimodal pretraining followed by supervised fine-tuning with contrastive regularization. The method was evaluated on a clinically acquired cohort of 88 people with MS with expert lesion annotations as reference standard. Results highlight improved performance compared to prior architectures, supporting the effectiveness of asymmetric multimodal modeling for automated chronic active lesion identification.

21.
arXiv (quant-ph) 2026-06-19

The use of Peres lattices in periodically driven systems

arXiv:2606.20009v1 Announce Type: new Abstract: We demonstrate the strength of the method of Peres lattices in periodically driven quantum systems. The method, which has previously been used mostly in stationary systems, enables us to efficiently detect resonances in the driven system, to monitor the onset of chaos, and to recognize critical properties of the Floquet modes. It also allows quick comparisons of the spectra of Floquet modes for various driving Hamiltonians and transparent tests of the iterative approximation techniques based on effective stationary Hamiltonians.

22.
arXiv (CS.CV) 2026-06-15

MCR-VQGAN: A Scalable and Cost-Effective Tau PET Synthesis Approach for Alzheimer's Disease Imaging

Tau positron emission tomography (PET) is a critical diagnostic modality for Alzheimer's disease (AD), but its widespread clinical adoption is hindered by radiation exposure, limited availability, high clinical workload, and substantial financial costs. To address these limitations, we propose the Multi-scale CBAM Residual Vector Quantized Generative Adversarial Network (MCR-VQGAN) to synthesize high-fidelity tau PET images from structural T1-weighted MRI. MCR-VQGAN advances the standard VQGAN architecture through three enhancements: multi-scale convolutions, ResNet blocks, and Convolutional Block Attention Modules (CBAM), which collectively improve the capture of local and global features. Using 222 paired T1-weighted MRI and tau PET scans from the ADNI database, we trained and compared MCR-VQGAN against cGAN, WGAN-GP, CycleGAN, and baseline VQGAN. MCR-VQGAN achieved superior image synthesis performance across all metrics (MSE = 0.0056 +/- 0.0061, PSNR = 30.65 +/- 4.47 dB, SSIM = 0.9263 +/- 0.0469). A CNN-based AD classifier trained on real tau PET achieved comparable accuracy on real (63.64%) and synthetic (65.91%) images, indicating that diagnostically relevant features are preserved. Regional SUVR-equivalent analysis across Braak-defined ROIs further indicated strong agreement between real and synthetic tau PET (Pearson r = 0.78-0.88; ICC = 0.71-0.84), with the strongest agreement in Braak V/VI (ICC = 0.838). Together, these results suggest that MCR-VQGAN offers a promising and scalable surrogate for conventional tau PET imaging, potentially improving the accessibility of tau biomarkers for AD research and clinical workflows.

23.
arXiv (CS.CL) 2026-06-15

Independent-Component-Based Encoding Models of Brain Activity During Story Comprehension

Encoding models provide a powerful framework for linking continuous stimulus features to neural activity; however, traditional voxelwise approaches are limited by measurement noise, inter-subject variability, and redundancy arising from spatially correlated voxels encoding overlapping neural signals. Here, we propose an independent component (IC)-based encoding framework that dissociates stimulus-driven and noise-driven signals in fMRI data. We decompose continuous fMRI data from naturalistic story listening into ICs using one subset of the data, and train encoding models on independent data to predict IC time series from large language model representations of linguistic input. Across subjects, a subset of ICs exhibited consistently high predictivity. These ICs were spatially and temporally consistent across subjects and included cognitive networks known to respond during story listening (auditory and language). Auditory component time series were strongly correlated with acoustic stimulus features, highlighting the interpretability of identified component time series. Components identified as noise or motion-related artifacts by ICA-AROMA showed uniformly poor predictive performance, confirming that highly predicted components reflect genuine stimulus-related neural signals rather than confounds. Overall, IC-based encoding models enable analyses at the level of functional networks, accommodating the variability in network locations across individuals and providing interpretable results that are easy to compare across subjects. Code provided at: https://github.com/kamyahari/IC-Encoding-Models.git

24.
arXiv (CS.AI) 2026-06-18

Efficient Zeroth-Order Federated Finetuning of Language Models on Resource-Constrained Devices

arXiv:2502.10239v3 Announce Type: replace-cross Abstract: Federated Learning (FL) is a promising paradigm for finetuning Large Language Models (LLMs) across distributed data sources while preserving data privacy. However, finetuning such large models is challenging on edge devices due to its high resource demand. Zeroth-order Optimization (ZO) estimates gradients through finite-difference approximations, which rely on function evaluations under random perturbations of the model parameters. Consequently, ZO with task alignment provides a potential solution, allowing finetuning using only forward passes with inference-level memory requirements and low communication overhead, but it suffers from slow convergence and higher computational demand. In this paper, we propose a new ZO-based method that applies a more efficient technique to reduce the computational demand associated with using a large number of perturbations while preserving their convergence benefits. This is achieved by splitting the model into consecutive blocks and allocating a higher number of perturbations to the second block, enabling efficient reuse of intermediate activations to update the full network with fewer forward evaluations. Our evaluation on RoBERTa-large, OPT1.3B, LLaMa-3-3.2B models shows up to $3\times$ reduction in computation compared to the other ZO-based techniques, while retaining the memory and communication benefits over first-order federated learning techniques.

25.
arXiv (CS.CL) 2026-06-18

Dual Dimensionality for Local and Global Attention

Decoder-only Transformers compute attention over the KV cache of preceding tokens. Keys (and Values) are typically represented with the same dimensionality, regardless of its distance from the prediction target. In natural language, however, the next word is most strongly influenced by the immediately preceding tokens. We hypothesize that local and distant tokens impose asymmetric demands on representational capacity: local tokens are more critical for predicting immediate outputs and thus require richer representations, whereas distant tokens primarily serve as long-range memory, for which lower-dimensional representations may suffice. We formalize this idea as Distance-Adaptive Representation (DAR), implemented in a controlled setting that preserves full-dimensional representations within a local context window while assigning reduced-dimensional representations (e.g. 1/4 of the original dimensionality) to tokens beyond that window. Across multiple pretraining scales (70M to 410M parameters), as well as continued supervised fine-tuning on a 1B-scale model, this approach closely matches the performance of full-dimensional baselines. In contrast, uniformly reducing dimensionality across all token positions leads to worse performance. These results challenge the common assumption that key and value dimensionality should be uniform across token positions. Our findings suggest a new direction for designing attention architectures that adaptively allocate representational capacity across sequences, enabling further reductions in KV cache during inference.