Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-12

Room-Specialized Mixture-of-Experts for In-Home ADL Recognition with Ambient Sensors

Monitoring activities of daily living (ADLs) in the home is a promising approach for tracking dementia progression in older adults. While ambient sensor-based ADL systems are well-studied, most existing ADL recognition systems rely on globally trained models that ignore the spatial organization of in-home activities. In real deployments, where training data are sparse and highly home-specific, global transformer models may fail to capture room-dependent behavioral structure. We propose a deterministic Mixture of Experts (MoE) architecture for in-home ADL recognition, in which each expert is a compact transformer specialized to one room of the home (bedroom, kitchen, bathroom, living area). Input segments are routed using a deterministic gating strategy based on room-level motion activity and time-of-day priors for sleep-related behaviors. Unlike learned routing networks, the proposed gate encodes domain knowledge about where ADLs are likely to occur, reducing model complexity under limited per-home training data. By decomposing ADL recognition into room-specific activity spaces, the proposed architecture reduces competition between dominant and low-frequency activities under highly imbalanced residential data. We evaluated the system on data collected via low-cost ambient sensors (motion, light, temperature, humidity) and Raspberry Pi edge devices across five homes, with ground-truth ADL labels provided by participants and caregivers. Across the five homes, the proposed MoE consistently outperformed global transformer, 1D CNN, and Random Forest baselines, achieving macro-F1 scores ranging from 0.60 to 0.88, highlighting the importance of home-specific modeling in real-world deployments. These findings suggest that room-aware expert specialization may provide a practical and interpretable strategy for low-data ADL recognition in real-world residential environments.

02.
bioRxiv (Bioinfo) 2026-06-14

Robust integration of weakly anchored spatial multi-omics

Spatial multi-omics holds great promise for dissecting complex biological processes, though inherent technical constraints continue to limit its widespread adoption. Currently, most studies therefore measure distinct omics features on separate tissue sections, necessitating spatial diagonal integration. An emerging practical solution is to leverage hematoxylin and eosin (H&E) images as an integration anchor, given their ubiquity, low cost, and compatibility across tissue preparations. However, this anchor is frequently compromised in real-world settings by variations in H&E staining style, absence of reliable histological landmarks, and mismatches in spatial resolutions across omics modalities. To address this, we introduce SpaWeaver, a computational framework that couples a pathology foundation model with a graph Transformer and a latent feature aligner module, providing a highly robust solution for weakly anchored spatial omics data diagonal integration. Extensive experiments demonstrate that SpaWeaver exhibits superior robustness against isolated or synergistic weak-anchoring factors. The spatial multi-omics profiles generated by SpaWeaver link molecular features originally separated on two sections, unlocking diverse downstream analyses once exclusive to co-assayed spatial multi-omics data, including niche-aware cell-cell communication inference and multi-omics resolved cell state. In this study, it unveils tumor-distance-dependent fibroblast-CD4+ T-cell signaling in human colon adenocarcinoma and identifies a hypoxic glycolytic tumor state with pyknotic nuclei in human ovarian cancer. Overall, our approach bridges readily accessible single-omics measurements across weakly anchored tissue sections, enabling unified spatial multi-omics characterization and system-level tissue analysis.

03.
arXiv (quant-ph) 2026-06-16

Optimal Toffoli-Depth Multi-Controlled Toffoli Decomposition in 2D Qubit Layout

arXiv:2606.15113v1 Announce Type: new Abstract: The multi-controlled Toffoli (MCT) gate is a key primitive in quantum arithmetic, oracle construction, and quantum cryptanalysis. Although recent work has established optimal Toffoli-depth MCT decompositions under all-to-all qubit connectivity, their realization on near-term quantum hardware with restricted qubit connectivity remains largely unexplored. While general-purpose quantum mappers can route arbitrary circuits, they do not explicitly exploit the repeated interaction patterns inherent in MCT decompositions. In our present paper, we study architecture-aware mappings of optimal Toffoli-depth MCT decompositions onto restricted two-dimensional qubit layouts. We begin with a structured geometric placements that preserve the parallelism of state-of-the-art Toffoli and MCT decompositions with no additional depth overhead. We further introduce a motif-based packing framework in which decomposition layers are represented by interaction motifs derived from basic Toffoli gates. By embedding these motifs vertex-disjointly into hardware graphs, we characterize the minimum-size topologies supporting the required qubit resources and derive explicit bounds on the resulting depth overhead under tight qubit budgets. Finally, we compare these bounds with routing-aware placement heuristics and empirically evaluate the effectiveness of embedding different motifs across a range of hardware topologies.

04.
arXiv (CS.LG) 2026-06-11

Machine-learning-based multipoint optimization of fluidic injection parameters for improving nozzle performance

arXiv:2409.12707v2 Announce Type: replace-cross Abstract: Fluidic injection offers a promising solution to improve the performance of the overexpanded single expansion ramp nozzles (SERNs) during vehicle acceleration. However, determining the injection parameters that yield the best overall performance across multiple nozzle operating conditions remains a challenge. The gradient-based optimization method requires gradients of injection parameters at each design point, which can lead to high computational costs when using computational fluid dynamics (CFD) simulations. This paper uses a pretrained neural network to replace CFD during optimization, enabling quick calculation of the nozzle flow field at multiple design points. Considering the physical characteristics of the nozzle flow field, a prior-based prediction strategy is adopted to enhance the model's accuracy. In addition, the neural network's back-propagation algorithm computes gradients quickly by running the computation only once, thereby greatly reducing gradient computation time compared to the finite difference method. As a test case, the average nozzle thrust coefficient of an SERN at seven design points is optimized, resulting in a 1.14\% improvement. The time cost is greatly reduced compared with traditional optimization methods, even when the time required to establish the training database is included.

05.
medRxiv (Medicine) 2026-06-22

Understanding and Usefulness of Effect Size and Certainty of Evidence: A Cross-sectional Survey of Evidence-Based Practice Competencies Among Registered Dietitians

Introduction: Understanding of absolute and relative estimates (i.e., effect size), and certainty of evidence corresponding to those estimates, is a fundamental evidence-based practice competency to promote informed clinical decision-making. While research has been conducted in the medical profession, there is no published research on these competencies in the nutrition and dietetics profession. Methods: Among registered dietitians, our main objectives were to assess (1) their understanding and perceived usefulness of three absolute and two relative estimate approaches to assess effect size, (2) their perceived usefulness of certainty of evidence, and (3) factors influencing their understanding and perceived usefulness. We conducted a web-based, cross-sectional survey among dietitians recruited from the Academy of Nutrition and Dietetics (United States). Participants received effect estimates based on hypothetical dietary interventions vs. usual diet for reducing myocardial infarction risk. Results: Of the 11,050 dietitians who received the survey link, 210 participated (2.0% response rate), and only completers (n=114) were included in the analysis. Participants demonstrated a similar understanding of the relative (27.6%) and absolute (27.5%) estimates, with Risk Difference (30.7% correct responses) being the best understood approach and Number Needed to Treat (24.6%) being the least. The understanding of five approaches was not different than random guessing (p>0.05). While perceived usefulness scores were similar between five approaches, they were highest when data was presented as Relative Risk [mean (SD): 4.82 (1.50)]. Dietitians rated the usefulness of certainty of evidence favorably [mean (SD): 5.07 (1.83), on a 7-point scale), and no factors were associated with correct understanding. Conclusion: Dietitians may have limited understanding of how to interpret effect sizes, a finding consistent with surveys of other health professionals. To optimize informed decision-making between dietitians and clients, dietetic programs and continuing education platforms should consider additional training on interpreting effect sizes and certainty of evidence for effect sizes.

06.
arXiv (CS.CV) 2026-06-11

CellNet – Localizing Cells using Sparse and Noisy Point Annotations

Counting living cells is an important step in many biological research workflows. Our collaborators at the Wellcome Sanger Institute study vital genes in humans via large scale saturation genome editing screening, which requires repeatedly counting cells a great number of times. Computer Vision based automation is crucial for high throughput and resource efficiency. In this work, we develop a regression-based deep learning computer vision algorithm to detect and count cells in phase-contrast microscopy images. To reduce annotation effort, which in practice often becomes a bottleneck, we focus on counting cells only using sparse point annotations, which are fast and easy to acquire. By comparison to state-of-the-art 0-shot methods, we show that regression-based counting is a promising alternative in low data regimes. Through developing methods to automatically count living cells in microscopy images, we contribute to valuable research on the human genome. The code is available at https://github.com/beijn/cellnet.

07.
Nature (Science) 2026-06-08

Targeting Cancer-Specific Mutations with RNA-Triggered Chromatin Shredding

作者:

Genetic mutations that drive cancer often occur in tumor suppressor proteins, including the p53 transcription factor which is altered in ~40-50% of cases1,2. However, current therapies fail to target most such mutations because the mutant proteins typically lack defined drug-binding pockets, and restoring the endogenous function has proven challenging. Here, we programmed CRISPR-Cas12a2, an RNA-guided nuclease with trans-nucleolytic cleavage activities3,4, to selectively kill cancer cells by targeting cancer-specific transcripts. This approach limits cell growth by inducing trans shredding of chromatin, triggering DNA damage responses and cell death. Unlike existing methods, RNA-guided Cas12a2 senses cellular RNA signatures, enabling precise targeting of undruggable mutations. Transcript-activated chromatin shredding provides a new approach to precision disease treatments for undruggable targets.

08.
arXiv (CS.CV) 2026-06-15

CaricHarmony: Contrastive Diffusion Paths for Identity-Preserving Caricature Synthesis

Sketch-based caricature synthesis suffers from a fundamental failure mode: when identity and shape conditions are combined in diffusion models, they create destructive interference that causes inevitable collapse toward either bland portraits or unrecognizable distortions. We identify the root cause as condition signal contamination – competing probability distributions in the denoising trajectory that make balanced generation impossible. We present CaricHarmony, the first training-free method that explicitly resolves this contamination through parallel uncontaminated diffusion paths. During inference, we maintain three paths: $\mathcal{P}^{\mathrm{i}}$ (pure identity), $\mathcal{P}^{\mathrm{s}}$ (pure shape), and $\mathcal{P}^{\mathrm{i+s}}$ (harmonized output). Novel energy functions operating on cross-attention features provide gradient guidance that steers $\mathcal{P}^{\mathrm{i+s}}$ toward optimal balance: $\mathcal{E}_{\mathrm{shape}}$ ensures sketch fidelity through layout and semantic alignment, while $\mathcal{E}_{\mathrm{id}}$ employs token-level correspondence matching robust to extreme distortions. Unlike DemoCaricature requiring 70 seconds per-identity fine-tuning or CaricatureBooth constrained to Bezier curves, CaricHarmony accepts any sketch format and generates in under 16 seconds. Experiments demonstrate state-of-the-art performance: 0.8615 shape CLIP score (vs. 0.8450) under comparable identity consistency score, with 7.81 overall user preference score (vs. 6.06). Our method fundamentally reconceptualizes the ID-shape conflict as conditioning signal contamination for diffusion models, enabling unprecedented creative control while preserving recognition.

09.
arXiv (CS.AI) 2026-06-16

PISA: A Pragmatic Psych-Inspired Unified Memory System for Enhanced AI Agency

arXiv:2510.15966v2 Announce Type: replace Abstract: Memory systems are fundamental to AI agents, yet existing work often lacks adaptability to diverse tasks and overlooks the constructive and task-oriented role of AI agent memory. Drawing from Piaget's theory of cognitive development, we propose PISA, a pragmatic, psych-inspired unified memory system that addresses these limitations by treating memory as a constructive and adaptive process. To enable continuous learning and adaptability, PISA introduces a trimodal adaptation mechanism (i.e., schema updation, schema evolution, and schema creation) that preserves coherent organization while supporting flexible memory updates. Building on these schema-grounded structures, we further design a hybrid memory access architecture that seamlessly integrates symbolic reasoning with neural retrieval, significantly improving retrieval accuracy and efficiency. Our empirical evaluation, conducted on the existing LOCOMO benchmark and our newly proposed AggQA benchmark for data analysis tasks, confirms that PISA sets a new state-of-the-art by significantly enhancing adaptability and long-term knowledge retention.

10.
arXiv (CS.LG) 2026-06-17

ReRAM-aware Model Finetuning addressing I-V Non-linearity and Retention Errors

arXiv:2606.17471v1 Announce Type: new Abstract: Traditional CPU, GPU, and NPU architectures are increasingly limited by the von Neumann bottleneck. While In-Memory Computing (IMC) using ReRAM crossbar arrays offers a high-density, energy-efficient alternative, its practical deployment is constrained through their non-idealities. Existing hardware-aware training frameworks often require training from scratch, which is computationally prohibitive for modern large-scale models. In this work, we propose a finetuning-based hardware-aware training algorithm that enables robust DNN deployment on ReRAM with minimal training overhead. Our approach mitigates I-V non-linearity by applying a range-shrunk sinh transformation and incorporates retention errors directly into a regularization loss during the finetuning process. We evaluate our framework across models and tasks such as image classification and question-answering (QA). Experimental results demonstrate that our method achieves similar accuracy on large-scale models like ResNet18 and DeiT-Tiny as the base model. In-case of ImageNet for MobileNetV3 families the technique has only less than 2% accuracy degradation. Further, applying the technique on the SQuAD v2 dataset results in only 1 point degradation of F-1 score.

11.
arXiv (CS.CL) 2026-06-17

Environment-Grounded Automated Prompt Optimization for LLM Game Agents

LLM agents in interactive environments are highly sensitive to their prompts, yet prompt engineering remains a manual, task-specific process. We introduce an automated prompt optimization framework for LLM agents that decomposes the observation-to-action pipeline into a goal-conditioned descriptor agent and an action selection agent, and iteratively refines each module's prompt through an LLM-driven evolutionary loop guided by environment returns. We propose a behavior analyzer to attribute episode outcomes to specific prompt components, and a mutator to propose targeted revisions to the prompt, before validating them through environment rollouts. We evaluate on all five BabyAI tasks in the BALROG benchmark, comparing our pipeline against BALROG's RobustCoTAgent under both plain and guided prompt initializations. Optimization improves performance consistently across tasks and conditions, without requiring updates to the model weights. On PutNext, a multi-step coordination task where the RobustCoTAgent achieves 0% success, our framework reaches up to 72.5% success rate using the same underlying LLM with optimized prompts. These results suggest that a multi-agent framework, combined with automatic prompt optimization, enhances LLMs without the need for fine-tuning or extensive human supervision.

12.
arXiv (CS.CV) 2026-06-16

SUP-MCRL: Subject-aware Unified Pseudo-feature Coded Multimodal Contrastive Representation Learning for EEG Visual Decoding

Non-invasive brain-computer interfaces suffer severe fidelity degradation in neural visual decoding when generalizing to natural visual experiences. Conventional multimodal contrastive representation learning solely optimizes geometric distance alignment, neglecting semantic consistency and subject selectivity, causing spurious zero-shot alignment. We propose SUP-MCRL, a unified framework integrating three collaborative mechanisms: (1) Semantic-entity Aware Visual Encoder (SAVE), learning spatial attention to extract semantic content without pre-trained saliency models; (2 Unified EEG Enhancer (UEE), employing multi-scale atrous convolutions and inter-band attention for adaptive cross-subject robustness; and (3) Prototype-based Progressive Augmenter (PPA), maintaining an EMA-updated pseudo-feature pool to prevent representation collapse. Zero-shot experiments on THINGS-EEG achieve 66.0%/91.9% (Top-1/Top-5) intra-subject and 24.0%/52.9% LOSO accuracy, surpassing state-of-the-art methods. Code is available at https://github.com/NZWANG/SUP-MCRL.

13.
arXiv (CS.AI) 2026-06-16

FlowState: Sampling-Rate-Equivariant Time-Series Forecasting

arXiv:2508.05287v3 Announce Type: replace-cross Abstract: Existing time series foundation models (TSFMs), often based on transformer variants, lack adaptability to different sampling rates, struggle with generalization across varying context and target lengths, and are computationally inefficient. We introduce FlowState, a novel TSFM architecture that achieves sampling-rate-equivariant forecasting through a unified design that pairs a state space model (SSM) encoder with a functional basis decoder (FBD). This design enables continuous-time modeling and dynamic time-scale adjustment, allowing FlowState to inherently generalize across all possible temporal resolutions, and dynamically adjust the forecasting horizons without retraining. We further propose an efficient pretraining strategy that improves robustness and accelerates training. Despite being one of the smallest TSFMs, FlowState achieves state-of-the-art results on the widely used GIFT-Eval benchmark, while demonstrating superior adaptability to unseen sampling rates. Our detailed analyses confirm the effectiveness of its components, and we demonstrate its unique ability to adapt to varying input sampling rates.

14.
arXiv (CS.CL) 2026-06-19

Closing the Calibration Gap in Semantic Caching

Semantic caching cuts LLM inference costs by serving a cached response to semantically similar queries. Standard practice evaluates these systems using PR-AUC, a metric that only measures how well scores rank and ignores whether they are usable at a fixed threshold. We show this mismatch leads to systematically poor deployment choices, as models with the highest PR-AUC are often the worst in operation. We introduce Precision-Cache Hit Ratio (P-CHR) AUC, a cache-aware metric that measures precision across cache utilization levels, and Calibration Retention Rate (CRR), which captures how much offline ranking quality survives at deployment. We decompose the operational gap between offline and deployed quality into a recoverable calibration component and an irreducible structural component fixed by the dataset's positive rate. Our experiments show that the calibration gap is governed by the training objective rather than data scale, and post-hoc calibration only partially closes it. Ultimately, model selection for semantic caching is a calibration problem, not a ranking one, and measuring it is the first step to closing the gap.

15.
medRxiv (Medicine) 2026-06-15

Pulmonary extracellular vesicles drive alveolar macrophage dysfunction via microRNA transfer in Acute Respiratory Distress Syndrome

Background: Alveolar macrophage (AM) dysfunction contributes to Acute Respiratory Distress Syndrome (ARDS) pathogenesis. We investigated the role of extracellular vesicles (EVs) in mediating this dysfunction. Methods: Pulmonary EVs were isolated from broncho-alveolar lavage and non-directed bronchial lavage samples of ventilated sepsis patients with and without ARDS, and post-operative control patients via ultracentrifugation. AMs were isolated from lung tissue resections of lobectomy patients. AMs were treated with pooled EVs for 24 hours prior to functional, metabolic and autophagy profiling. EV cargo was profiled via small RNA transcriptomics and proteomics. Mechanistic role of EV microRNAs was assessed via mimic / antagomir transfection. Results: Pulmonary EVs from sepsis patients with ARDS impaired AM efferocytosis, and control EVs had no effect. ARDS EV treatment enhanced AM mitochondrial-linked respiration, but not glycolysis. ARDS EV treatment impaired LC3B-II and LAMP1 expression, indicating dysregulated AM autophagy-lysosomal machinery. Proteomics revealed downregulation of innate immune pathways in ARDS EVs. Transcriptomics revealed enrichment of 24 microRNAs in ARDS EVs; miR-652-3p was the most enriched, validated by RT-qPCR. EV miR-652-3p was associated with 90-day mortality (9.20 vs 0.59 RQ, p=0.0295) and inversely correlated with oxygenation (PaO2/FiO2). AM transfection with miR-652-3p mimic induced similar dysregulation of function and autophagy as ARDS EVs. Transfection of ARDS EVs with antagomirs to miR-652-3p prior to AM treatment partially rescued efferocytosis and autophagy. Conclusions: Targeting EV miR-652-3p may restore alveolar macrophage function and reduce excessive inflammation, thus offering a novel therapeutic strategy for patients with ARDS.

16.
PLOS Medicine 2026-05-08

Climate change and non-communicable diseases: An invisible syndemic

by Gokul Parameswaran, Sadeer Al-Kindi, Sanjay Rajagopalan Climate change accelerates non-communicable diseases (NCDs) through cascading environmental disruptions and is attributed to driving increased NCD-related mortality. Yet this syndemic remains invisible and underfunded. We detail why addressing the climate-NCD intersection is critical for improving health. In this Perspective, Sanjay Rajagopalan and colleagues discusses how climate change accelerates non-communicable diseases (NCDs) and exacerbates NCD-related mortality, and calls for greater visibility and funding to address this syndemic and improve human health.

17.
arXiv (CS.AI) 2026-06-16

Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time

arXiv:2606.15631v1 Announce Type: cross Abstract: Extending a vision-language-action (VLA) policy to a new task typically requires task-specific teleoperated demonstrations and per-task fine-tuning, making adaptation costly in both data collection and compute. In this paper, we show that this target-side per-task adaptation cost can be replaced by retrieval. Our retrieval-augmented policy is trained once on paired demonstrations from the target embodiment (query) and a cheaper embodiment (pool, e.g., human-hand video), then frozen. New tasks are added at deployment by appending pool-side demonstrations to a retrieval pool. The frozen policy conditions on retrieved trajectories at every control step, so new tasks are absorbed by indexing data rather than updating parameters. Fine-tuning is needed only to take on a new, unseen embodiment, not for each new task. We show that retrieval improves policies beyond a specific backbone, including standard VLA policies, but its effect is especially pronounced in Cosmos Policy, a video-generation-based world-action model (WAM). In this setting, retrieval supplies coarse task progression, while the WAM's future-image objective provides an additional visual consistency signal that strengthens the retrieval-conditioned actions. On PushT, we study how retrieval provides a reusable high-level motion prior for cross-embodiment generalization to unseen goal angles, while on RoboTwin 2.0 our method outperforms cross-embodiment baselines on unseen tasks, and we additionally demonstrate the method on a real robot.

18.
arXiv (CS.LG) 2026-06-16

A Fully First-Order Layer for Differentiable Optimization

arXiv:2512.02494v2 Announce Type: replace Abstract: Differentiable optimization layers enable learning systems to make decisions by solving embedded optimization problems. However, computing gradients via implicit differentiation requires solving a linear system with Hessian terms, which is both compute- and memory-intensive. To address this challenge, we propose a novel algorithm that computes the gradient using only first-order information. The key insight is to rewrite the differentiable optimization as a bilevel optimization problem and leverage recent advances in bilevel methods. Specifically, we introduce an active-set Lagrangian hypergradient oracle that avoids Hessian evaluations and provides finite-time, non-asymptotic approximation guarantees. We show that an approximate hypergradient can be computed using only first-order information in $\tilde{O}(1)$ time, leading to an overall complexity of $\tilde{O}(\delta^{-1}\epsilon^{-3})$ for constrained bilevel optimization, which matches the best known rate for non-smooth non-convex optimization. Furthermore, we release an open-source Python library that can be easily adapted from existing solvers. The source code is available at https://github.com/guaguakai/FFOLayer.

20.
arXiv (CS.AI) 2026-06-15

CSPO: Constraint-Sensitive Policy Optimization for Safe Reinforcement Learning

arXiv:2606.14415v1 Announce Type: new Abstract: Safe reinforcement learning (Safe RL) aims to maximize expected return while satisfying safety constraints, typically modeled as Constrained Markov Decision Processes (CMDPs). While primal-dual methods scale well to deep RL, they often suffer from delayed constraint correction, leading to oscillatory behavior and prolonged safety violations. In this paper, we propose Constraint-Sensitive Policy Optimization (CSPO), a first-order primal-dual method that incorporates local constraint sensitivity into policy updates. CSPO augments the primal objective with a constraint-sensitive correction derived from the shortest signed distance to the safety boundary, enabling smarter recovery steps back to safety, compensating for delayed Lagrange multiplier updates, reducing oscillations near the boundary, and preserving the KKT solutions of the original constrained problem. Experiments on navigation and locomotion benchmarks demonstrate that CSPO achieves faster safety recovery and high reward preservation, resulting in higher constrained returns compared to state-of-the-art primal-dual and penalty-based methods

21.
arXiv (CS.LG) 2026-06-17

Clarify Before You Draw: Proactive Agents for Robust Text-to-CAD Generation

arXiv:2602.03045v2 Announce Type: replace Abstract: Large language models have recently enabled text-to-CAD systems that synthesize parametric CAD programs (e.g., CadQuery) from natural-language prompts. In practice, however, geometric descriptions can be under-specified or internally inconsistent: critical dimensions may be missing and constraints may conflict. However, existing fine-tuned models tend to reactively follow the user instructions and hallucinate dimensions when the text is ambiguous. To address this, we propose a proactive agentic framework for text-to-CadQuery generation, named as ProCAD, that resolves specification issues before code synthesis. Our framework pairs a proactive clarifying agent, which audits the prompt and asks targeted clarification questions only when necessary to produce a self-consistent specification, with a CAD coding agent that translates the specification into an executable CadQuery program. We fine-tune the coding agent based on a curated high-quality text-to-CadQuery dataset and train the clarifying agent via agentic SFT on clarification trajectories. Experiments show that proactive clarification significantly improves robustness to ambiguous prompts while keeping interaction overhead low. ProCAD outperforms frontier closed-source models, including Claude Sonnet 4.5, reducing the mean Chamfer distance by 79.9% and lowering the invalidity ratio from 4.8% to 0.9%. Our code and datasets are made publicly available on https://github.com/BoYuanVisionary/Pro-CAD.

22.
arXiv (quant-ph) 2026-06-11

Classical representation of the dynamics of quantum spin chains

作者:

arXiv:2502.10502v3 Announce Type: replace-cross Abstract: Since the advent of quantum mechanics, classical probability interpretations have faced significant challenges. A notable issue arises with the emergence of negative probabilities when attempting to define the joint probability of non-commutative observables. In this work, we propose a resolution to this dilemma for quantum spin chains, by introducing an exact representation of their dynamics in terms of classical continuous-time Markov chains (CTMCs). These CTMCs effectively model the creation, annihilation, and propagation of pairs of classical particles and antiparticles. The quantum dynamics then emerges by averaging over various realizations of this classical process.

23.
arXiv (CS.CV) 2026-06-16

Deep Residual Injection for Full-Spectrum Forensic Signal Perception in Multimodal Large Language Models

Multimodal large language models (MLLMs) have been increasingly adopted in forensics for their robust semantic understanding. As AI-generated images become realistic, semantic-level inconsistencies alone are often insufficient for reliable detection. This motivates a critical question: whether MLLMs can achieve full-spectrum forensic signal perception, i.e., capturing low-level generator artifacts without sacrificing pre-trained semantic knowledge. We further perform a layer-wise analysis of forensic signal perception in MLLMs, showing that semantic information is primarily formed in the early-to-middle layers, whereas direct fine-tuning for artifact learning disrupts these semantic representations. Based on this insight, we propose Deep Visual Residual MLLM (Deep-VRM) to preserve early semantic processing while injecting artifact-specific visual signals as a residual path into an intermediate layer, where they are fused with semantic token representations and propagated through subsequent trainable layers. This enables later layers to jointly model semantic reasoning and signal-level forensic cues, and surprisingly, the model learns to adaptively leverage different levels of forensic signals depending on the input, achieving robust and generalizable detection performance. Extensive experiments show that our method achieves state-of-the-art across most benchmarks. The code and data are available at https://github.com/KQL11/Deep-VRM.

24.
arXiv (CS.AI) 2026-06-17

Riemann-Bench: A Benchmark for Moonshot Mathematics

arXiv:2604.06802v2 Announce Type: replace Abstract: Recent AI systems have achieved gold-medal-level performance on the International Mathematical Olympiad, demonstrating remarkable proficiency at competition-style problem solving. However, competition mathematics represents only a narrow slice of mathematical reasoning: problems are drawn from limited domains, require minimal advanced machinery, and can often reward insightful tricks over deep theoretical knowledge. We introduce Riemann-Bench, a private benchmark of expert-curated problems designed to evaluate AI systems on research-level mathematics that goes far beyond the olympiad frontier. Problems are authored by Ivy League mathematics professors, graduate students, and PhD-holding IMO medalists, and routinely took their authors weeks to solve independently. Each problem undergoes double-blind verification by two independent domain experts who must solve the problem from scratch, and yields a unique, closed-form solution assessed by programmatic verifiers. We evaluate frontier models as unconstrained research agents, with full access to coding tools, search, and open-ended reasoning, using an unbiased statistical estimator computed over 100 independent runs per problem. Our results reveal that all frontier models currently score below 10%, exposing a substantial gap between olympiad-level problem solving and genuine research-level mathematical reasoning. By keeping the benchmark fully private, we ensure that measured performance reflects authentic mathematical capability rather than memorization of training data.

25.
arXiv (CS.AI) 2026-06-17

A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction

arXiv:2606.17649v1 Announce Type: cross Abstract: The high cost of fine-tuning LLMs poses a significant economic barrier; pre-hoc performance prediction offers a critical solution to substantially reduce this expense. However, the theoretical limits of pre-hoc performance prediction remain unexplored. We formulate it as a stochastic estimation problem under information constraints, decomposing prediction risk into two components: an intrinsic limit (static data-model compatibility) and a reducible optimization variance. We prove that optimization variance admits a necessary lower bound on its decay rate, implying fundamental constraints on how quickly uncertainty dissipates, regardless of the predictor used. Based on these dynamics, we derive a budget-optimal probing principle and introduce a predictability phase diagram that organizes tasks into three distinct regimes: Static-Sufficient, Dynamic-Critical, and Noise-Dominant. Extensive experiments on synthetic and real-world benchmarks validate these theoretical regimes and demonstrate the efficiency of our probing strategy.