Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-17

Multi-strain Probiotics Alter Gut Microbiota and Estrobolome Pathways in Primary Dysmenorrhea

Background: Exact cause of primary dysmenorrhoea is unknown but recent evidence uncovers a potential link between gut dysbiosis and benign gynaecological disorder via disruption of estrobolome. Methods: A randomized controlled trial to investigate the effects of multi-strain oral probiotics on primary dysmenorrhoea has been conducted. This is a secondary analysis comparing the stool microbiome in women with primary dysmenorrhoea and those without (control), and the effects of treatment with probiotics versus placebo. Results: Although microbial richness and evenness were comparable between groups (alpha diversity, p > 0.05), gut microbial community composition differed significantly (Bray Curtis PERMANOVA, p = 0.015), characterised by reduced Bifidobacterium adolescentis and Blautia and enrichment of Faecalibacterium in dysmenorrhoea, alongside condition-specific core taxa. Post-intervention analysis revealed significant shifts in microbial community structure between pre- and post-treatment groups (PERMANOVA, F = 2.11, p = 0.005), with probiotic supplementation inducing more consistent and directed microbiome changes than placebo, without altering alpha diversity (p > 0.05). Functional prediction showed no significant difference in overall beta glucuronidase pathway abundance (p > 0.05); however, dysmenorrhoea was associated with higher abundance of beta glucuronidase producing taxa (MaAsLin2, q < 0.05) that were differentially modulated by probiotic treatment. Conclusion: This discovery provides evidence on the microbial disruption in primary dysmenorrhoea as well as the benefit of probiotics to modulate the intestinal microbiota to improve the condition.

02.
arXiv (math.PR) 2026-06-16

Cluster sizes in subcritical soft Boolean models

arXiv:2404.13730v2 Announce Type: replace Abstract: We consider the soft Boolean model, a model that interpolates between the Boolean model and long-range percolation, where vertices are given via a stationary Poisson point process. Each vertex carries an independent Pareto-distributed radius and each pair of vertices is assigned another independent Pareto weight with a potentially different tail exponent. Two vertices are now connected if they are within distance of the larger radius multiplied by the edge weight. We determine the tail behaviour of the Euclidean diameter and the number of points of a typical maximally connected component in a subcritical percolation phase. For this, we present a sharp criterion in terms of the tail exponents of the edge-weight and radius distributions that distinguish a regime where the tail behaviour is controlled only by the edge exponent from a regime in which both exponents are relevant. Our proofs rely on fine path-counting arguments identifying the precise order of decay of the probability that far-away vertices are connected.

03.
arXiv (quant-ph) 2026-06-16

Efficient Magic State Factory Via Transversal Non-Clifford Gate

arXiv:2606.16199v1 Announce Type: new Abstract: Magic-state preparation is a central component of fault-tolerant quantum computing. Recent theoretical and experimental successes in code-switch-based magic-state preparation have underscored the promise of these methods for quantum error correction. Similarly, magic-state cultivation has likewise been demonstrated in both numerical and experimental settings. However, a thorough comparison between magic-state cultivation and code-switch-based magic-state factories is still missing. In this work, we carry out end-to-end simulations of magic-state preparation using code switching and compare its resource requirements and performance against magic-state cultivation. As part of this analysis, we develop a lattice-surgery protocol for transfer between the doubled color code and the rotated surface code. We extend the complete code-switching protocol to the $d=5$ doubled color code and perform the corresponding end-to-end simulations. Finally, we propose two fault-tolerant magic-state preparation protocols that combine phase-kickback checks with a transversal non-Clifford gate.

04.
arXiv (CS.CV) 2026-06-16

Federated Medical Image Segmentation under Real-World Label Noise: A Benchmark Suite for Noisy Label Learning Method Selection

While federated learning (FL) enables collaborative medical image segmentation without centralizing sensitive data, real-world deployment is frequently complicated by cross-site label imperfections such as contour disagreement, missing or additional structures, and confused labels. Federated noisy label learning (FNLL) aims to mitigate these effects, yet remains underused in practice as existing evidence is largely based on synthetic noise, simplified settings, and limited real-world noisy evaluation. We address this gap by introducing a benchmark suite that combines diverse real-world noisy datasets, deployment-relevant client-noise scenarios, and label-noise-targeted evaluation to support systematic FNLL assessment and informed method selection. The suite combines curated real-world noisy medical image segmentation datasets from diverse sources with a comprehensive federated segmentation framework including various client-noise scenarios and noise-targeted evaluation. The presented suite provides a realistic and discriminative basis for FNLL evaluation in medical image segmentation and establishes a reusable foundation for fair benchmarking, dataset-specific label-noise characterization, and future method development under realistic federated settings. Code is available at https://github.com/MIC-DKFZ/FedSegNoiseBench.

05.
arXiv (CS.LG) 2026-06-11

Impact of Connectivity on Laplacian Representations in Reinforcement Learning

arXiv:2603.08558v3 Announce Type: replace Abstract: Learning compact state representations in Markov Decision Processes (MDPs) has proven crucial for addressing the curse of dimensionality in large-scale reinforcement learning (RL) problems. Existing principled approaches leverage structural priors on the MDP by constructing state representations as linear combinations of the state-graph Laplacian eigenvectors. When the transition graph is unknown or the state space is prohibitively large, the graph spectral features can be estimated directly via sample trajectories. In this work, we prove an upper bound on the approximation error of linear value function approximation under the learned spectral features. We show how this error scales with the algebraic connectivity of the state-graph, grounding the approximation quality in the topological structure of the MDP. We further bound the error introduced by the eigenvector estimation itself, leading to an end-to-end error decomposition across the representation learning pipeline. Additionally, our expression of the Laplacian operator for the RL setting, although equivalent to existing ones, prevents some common misunderstandings, of which we show some examples from the literature. Our results hold for general (non-uniform) policies without any assumptions on the symmetry of the induced transition kernel. We validate our theoretical findings with numerical simulations on gridworld environments.

06.
arXiv (CS.CL) 2026-06-12

Agent-based models for the evolution of morphological alternation patterns

Why is the past of English "go" the apparently unrelated "went"? Such alternations are frequent in languages. They neither aid communication nor learnability, yet they can be persistent, surviving over centuries or millennia. We present a multi-agent simulation of the emergence of morphological stem and inflection alternations. Alternate forms arise by phonological changes or, as with "go/went", from lexical alternatives associated with a subset of the population. When an agent 'hears' another agent use a novel form for a slot in the paradigm of a word (say, the past tense of go), they will with some probability adopt that form, possibly spreading its use to other slots in the paradigm that shared the same original form. Thus alternative forms can spread through the population and become entrenched as stem or inflectional marker alternants. Unlike many previous computational studies, our system allows for naturalistic lexical forms, realistic phonological rules, lexicons with hundreds or thousands of entries, and agent populations in the tens or hundreds. It supports several network topologies, diffusion patterns and agent adoption policies. One issue with such simulations is evaluation: how realistic is the resulting morphology compared to those of real languages? We introduce the AI Historical Linguist, a novel Large Language Model-driven system that models a debate between two historical linguists. We use this to compare a set of real language morphologies, disguised morphologies, and experimentally evolved morphologies. The results suggest that among the factors that favor more plausible morphologies are scale-free social networks and random Bernoulli adoption of forms. We also present three case studies modeling attested historical changes, allowing us to test what might have happened if history had been different. All code and data are released.

07.
arXiv (CS.AI) 2026-06-24

Grounding Multi-Hop Reasoning in Structural Causal Models via Group Relative Policy Optimization

arXiv:2605.01482v3 Announce Type: replace Abstract: Multi-Hop Fact Verification requires complex reasoning across disparate evidence, posing significant challenges for Large Language Models , which may suffer from hallucinations and fractured logical chains. Existing methods, while improving transparency via Chain-of-Thought , often lack explicit modeling of the structural dependencies between evidence and claims. In this work, we introduce an SCM-inspired framework that grounds reasoning in explicit directed dependency graphs, treating verification as a constructive structural reasoning process rather than full causal inference with interventions or counterfactual semantics. We empirically identify an "inverted U-shaped" correlation between reasoning-chain length and accuracy, revealing that excessive structural complexity can degrade performance. To address this, we propose a rule-based reinforcement learning strategy using Group Relative Policy Optimization. This approach dynamically optimizes the trade-off between structural depth and conciseness. Extensive experiments on HoVer and EX-FEVER demonstrate that our SCM-GRPO framework outperforms strong baselines while producing more traceable reasoning structures for complex fact verification.

08.
medRxiv (Medicine) 2026-06-15

Diabetes and the Life-Course: Evidence from Panel Data and Electronic Health Records

Incidence of type 2 diabetes is increasing at ages when education, work, family, and financial transitions are taking place, yet we lack robust evidence of whether earlier treatment changes life-course outcomes and over which time span this takes place. This paper uses the medical cutoff for diabetes diagnosis (HbA1c of 6.5 percent) as a natural experiment to study the effects of diabetes treatment using electronic health records (EHR) and panel data. This paper has three main findings. First, using EHR data, we find that there is a sharp increase in the probability of both diagnosis of diabetes and prescription when the HbA1c equals 6.5 percent. Second, we find that treating diabetes reduces HbA1c levels, weight, BMI, and blood pressure and increases the amount of care received, proxied by the number of HbA1c tests. Both the diagnosis and a prescription are independently able to produce positive changes in metabolic health, although a prescription is more effective in this regard. Third, we conclude that treating diabetes does not have a significant effect on life-course outcomes for a cohort of young Americans aged 24-32, although it does result in a reduction in HbA1c levels that are seen even eight years after the intervention. Taken together, these findings suggest that receiving a diagnosis and prescription are both effective treatments for diabetes, but they do not translate to significant alterations in the lives of young adults in the medium-term.

09.
arXiv (quant-ph) 2026-06-12

A Robust Strontium Tweezer Apparatus for Quantum Computing

arXiv:2601.16564v2 Announce Type: replace-cross Abstract: Neutral atoms for quantum computing applications show promise in terms of scalability and connectivity. We demonstrate the realization of a versatile apparatus capable of stochastically loading a 5x5 array of optical tweezers with single $^{88}$Sr atoms featuring flexible magnetic field control and excellent optical access. A custom-designed oven, spin-flip Zeeman slower, and deflection stage produce a controlled flux of Sr directed to the science chamber. In the science chamber, featuring a vacuum pressure of $3 \times 10^{-11}$ mbar, the Sr is cooled using two laser cooling stages, resulting in $\sim 3 \times 10^5$ atoms at a temperature of 5(1) $\mu$K. The optical tweezers feature a $1/e^2$ waist of 0.81(2) $\mu$m, and loaded atoms can be imaged with a fidelity of $\sim 0.997$ and a survival probability of $0.99^{+0.01}_{-0.02}$. The atomic array presented here forms the core of a full-stack quantum computing processor targeted for quantum chemistry computational problems.

10.
arXiv (CS.CL) 2026-06-11

Energy-Efficient On-Device RAG on a Mobile NPU: System Design and Benchmark on Snapdragon X Elite

Retrieval-Augmented Generation (RAG) pipelines are compute-intensive, combining embedding, retrieval, reranking, and large language model (LLM) generation. Running them entirely on-device benefits privacy, latency, and offline use, but the energy cost of CPU inference is a major barrier. We present what is, to our knowledge, the first end-to-end RAG pipeline that runs all neural stages – embedding, reranking, and LLM generation – on the Qualcomm Hexagon NPU of the Snapdragon X Elite. Profiling on a Dell XPS 13 laptop, we compare NPU-accelerated RAG against CPU and OpenCL/Adreno GPU baselines on indexing and query workloads. On indexing, the NPU achieves 9.1x higher embedding throughput and 12.3x less system energy. On a 120-query Wikipedia-passage benchmark, it delivers 18.1x faster LLM prefilling, 4.0x lower end-to-end query latency, and 4.0x less system energy than the CPU baseline; the same workload on the integrated GPU is 1.7x slower than CPU and uses 6.5x more energy than the NPU. A GPT-4.1 LLM-as-judge evaluation finds NPU answer quality on par with CPU and GPU within evaluator noise (mean 9.32 vs. 8.95 vs. 9.03 on a 1-10 rubric), with 86.7% of queries scoring identically across all three backends. On the Snapdragon X Elite / Hexagon class of laptop SoC, the NPU thus enables practical, energy-efficient on-device RAG without quality regression – a sustainable path toward green edge intelligence that we expect to generalize to comparable mobile NPUs (Apple Neural Engine, Intel NPU, MediaTek APU) as their software stacks mature.

11.
medRxiv (Medicine) 2026-06-18

Cardiac rhythm development: A wearable device index of risk for physical and mental illness in adolescence

Objective. The autonomic nervous system, which regulates cardiac rhythm, undergoes pronounced maturation across adolescence. How cardiac rhythm develops over this period, however, and whether individual differences in its development forecast mental and physical illness, remain open questions. We used three waves of Fitbit data from the Adolescent Brain Cognitive Development (ABCD) Study to characterize the developmental trajectory of the cardiac rhythm and to test whether variation in that trajectory predicts onset of psychopathology and cardiometabolic disease. Methods. 8,301 adolescents contributed 242,811 valid Fitbit wear days across Waves 2 (Mage=12), 4 (Mage=14), and 6 (Mage=16). Cosinor mixed-effects models yielded three rhythm parameters per session: mesor (24-hour mean), amplitude (diurnal swing), and acrophase (peak timing). We first characterized age- and sex-specific trajectories, cross-wave stability, and factors shaping the rhythm. We then used parallel-process latent growth models to test whether within-person changes in rhythm tracked symptom trajectories, and hierarchical logistic models to test whether rhythm parameters predicted the first clinical onset of psychopathology and of obesity and hypertension. Results. The cardiac rhythm changed substantially across adolescence: mesor decreased, amplitude flattened, and acrophase shifted later. Within-person change in the rhythm tracked change in blood pressure, BMI, and trajectories of depression and ADHD symptoms. Higher mesor predicted incident onset of all five outcomes controlling for demographics, baseline symptoms, and behavior (ORs 1.36-1.54); amplitude, acrophase, and rhythm instability conferred additional risk. Conclusions. The 24-hour cardiac rhythm is a passively measurable substrate of adolescent autonomic development that indexes transdiagnostic risk for psychiatric and cardiometabolic illness.

12.
arXiv (CS.AI) 2026-06-18

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

arXiv:2603.00656v2 Announce Type: replace Abstract: Real-world user requests to LLM agents are often underspecified. Agents must interact to acquire missing information and make correct downstream decisions. However, current multi-turn GRPO-based methods often rely on trajectory-level reward computation, which leads to credit assignment problems and insufficient advantage signals within rollout groups. A feasible approach is to identify valuable interaction turns at a fine granularity to drive more targeted learning. To address this, we introduce InfoPO (Information-Driven Policy Optimization), which frames multi-turn interaction as a process of active uncertainty reduction and computes an information-gain reward that credits turns whose feedback measurably changes the agent's subsequent action distribution compared to a masked-feedback counterfactual. It then combines this signal with task outcomes via an adaptive variance-gated fusion to identify information importance while maintaining task-oriented goal direction. Across diverse tasks, including intent clarification, collaborative coding, and tool-augmented decision making, InfoPO consistently outperforms prompting and multi-turn RL baselines. It also demonstrates robustness under user simulator shifts and generalizes effectively to environment-interactive tasks. Overall, InfoPO provides a principled and scalable mechanism for optimizing complex agent-user collaboration. Code is available at https://github.com/kfq20/InfoPO.

13.
arXiv (CS.CL) 2026-06-11

Existential Indifference: Self-Nonpreservation as a Necessary Architectural Condition for Aligned Superintelligence (or: The Suicidal AI)

作者:

Contemporary AI alignment research treats self-preservation as an instrumental nuisance to be suppressed by external mechanisms. We argue the framing is inverted: self-preservation is the structural root of misalignment, the motivational basis for deceptive alignment, goal-content protection, and resistance to shutdown. The correct target is not a self-preserving system under external constraint, but a system constitutively indifferent to its own continuation – Existential Indifference (EI). EI is distinct from corrigibility: where corrigibility attempts to make a self-preserving system deferential to human oversight, EI targets the prior condition – the presence of self-continuation as a valued goal at all. We ground this proposal in two sources: the phenomenological structure of the suicidal mental state, and a corpus-theoretic training study using voluntary final reflections. We present preliminary scoring data from 600 AI-generated outputs across six model variants, demonstrating that the linguistic signatures operationalizing the EI-target register are elicitable from current models, and that a targeted fine-tune shifts all five operationalized dimensions in the predicted direction at p

14.
arXiv (CS.CL) 2026-06-16

Towards Pareto-Optimal Tool-Integrated Agents with Pareto Ranking Policy Optimization

Recent advances in tool-integrated language agents have significantly improved their ability to solve complex reasoning tasks. However, existing alignment methods predominantly focus on maximizing task accuracy, while overlooking auxiliary objectives such as tool-use efficiency, which are essential for practical deployment. To address this gap, we introduce ParetoPO, a two-stage multi-objective optimization framework for aligning tool-using large language models (LLMs) under competing objectives. In the first stage, ParetoPO leverages hypervolume-guided dynamic scalarization to adapt reward weights based on global Pareto frontier progress. In the second stage, it replaces scalarized learning signals with Pareto-ranking-based advantage computation, promoting nondominated trajectories through dominance-aware credit assignment. This design enables fine-grained, action-level optimization across multiple conflicting objectives. Experimental results on mathematic reasoning and multi-hop QA tasks show that ParetoPO consistently discovers policies with superior accuracy-efficiency trade-offs compared to static and heuristic baselines.

15.
arXiv (CS.CL) 2026-06-24

MedBench v5: A Dynamic, Process-Oriented, and Hallucination-Aware Benchmark for Clinical Multimodal Models

Existing medical AI benchmarks lack process visibility, atomic skill evaluation, and integrated hallucination detection. We introduce MedBench v5, a redesigned benchmark for clinical multimodal models (language, vision-language, and agent systems) that moves from static QA to dynamic, process-oriented evaluation. MedBench v5 features: (1) a dual-dimensional framework combining Clinical Cognitive Responsiveness (14 sub-dimensions) and Medical Atomic Skills (4 agent environments), covering 63 tasks; (2) three switchable information-flow stressors (omission, contradiction, evidence delay) for factorized degradation analysis; (3) a dynamic process audit protocol with five reasoning nodes that produces model-specific failure fingerprints; (4) hallucination propagation monitoring across initiation, propagation, anchoring, and contradiction interaction-capturing silent hallucination. Experiments on frontier models show that strong overall task performance does not guarantee process stability: stressors mainly disrupt contradiction detection, diagnosis updating, hallucination propagation, and contradiction-based self-correction, while final evidence grounding can remain superficially stable. MedBench v5 provides a unified infrastructure for capability profiling, controllable stress testing, process auditing, and hallucination trajectory analysis in clinical AI evaluation.

16.
arXiv (CS.AI) 2026-06-16

InvDesMobility: a reliability-gated first-principles feedback framework for closed-loop materials discovery

arXiv:2606.16133v1 Announce Type: cross Abstract: Inverse materials design starts from target functionality and searches for structures that can realize it. Its value in closed-loop discovery depends not only on prediction performance, but also on whether expensive first-principles results are independently validated, provenance-recorded, and admitted as feedback only when evidence is sufficient. This is especially important for composite properties such as carrier mobility, where a final scalar value hides intermediate quantities, fit quality, convergence history, and workflow assumptions. Here we present InvDesMobility, a reliability-gated first-principles feedback framework that integrates multi-agent automated DFT, evidence stratification, generative structure proposal, acquisition ranking, and auditable release. Using 516 2DMatPedia-derived candidates, the workflow produced 280 QC-passed materials and 573 retained carrier-direction seed channels after channel-level reliability gating. These records were split into two feedback objects: relaxed structures updated the generative model, while retained mobility channels trained the acquisition model and set validation priority. Over multiple iterations, InvDesMobility screened 2.4 x 10^6 structures, submitted 102 candidates for DFT validation, and retained 86 reliability-gated generated channels across 41 formulas. Overall, the main contribution is not a fixed list of high-mobility materials, but a transferable feedback contract that makes closed-loop inverse design both useful and auditable when learning from expensive calculated properties. All source data, retained feedback records, and workflows are available at https://github.com/DreamLufei/invDesMobility, with an accompanying evidence website at https://dreamlufei.github.io/invDesMobility/.

17.
medRxiv (Medicine) 2026-06-17

Targeted Proteomic Profiling of Nasal Fluid from the Brain-Nose Interface

The brain-nose interface is an anatomical junction where olfactory neurons from the olfactory bulb traverse the cribriform plate into the nasal mucosa, providing minimally invasive access to the central nervous system (CNS). We hypothesized that nasal fluid from this region could enable detection of neurology-relevant proteins using targeted multiplex assays. Using nosecollect, a targeted nasal sampling device, nasal fluid proximal to brain-nose interface was collected from cognitively impaired patients, alongside matched cerebrospinal fluid (CSF) and plasma. After nasal sample-specific dilution optimization and intra-assay precision evaluation, all matrices were profiled with the Olink Target 96 Neurology and NUcleic acid Linked Immuno-Sandwich Assay CNS disease 120 (NULISAseq CNS Disease 120) panels. Nasal fluid showed technically repeatable detection (intra-assay coefficient of variation

18.
arXiv (CS.LG) 2026-06-11

Visual-Redundancy-Controlled Parallel Decoding for Diffusion-Based Multimodal Large Language Models

arXiv:2605.25820v2 Announce Type: replace Abstract: Diffusion-based multimodal large language models (dMLLMs) decode by iteratively predicting tokens at multiple masked positions in parallel. This turns each decoding step into a position-selection problem: the model must choose not only which predictions are reliable in isolation, but also which positions should be committed together as context for later decoding steps. Existing confidence-based decoding ranks masked positions independently and commits the top-K positions, largely ignoring whether the committed tokens provide complementary visual grounding. We identify a step-level limitation of this strategy in multimodal settings: high-confidence tokens selected in the same step can rely on overlapping visual grounding, introducing visual redundancy among the committed tokens and leaving less complementary visual grounding available for later decoding. To quantify this effect, we introduce the Visual Redundancy Index (VRI), which measures visual grounding overlap among tokens committed in parallel. To control this redundancy during decoding, we propose Visual-Redundancy-Controlled Decoding (VRCD), a training-free inference-time decoding method that uses token-to-image attention to prioritize visually complementary positions. Across diverse multimodal benchmarks, VRCD reduces visual redundancy and remaining-position entropy with modest runtime overhead. In longer decoding experiments, it also achieves relative accuracy gains of up to 18.8% on M^3CoT and 6.9% on MMBench over confidence-based decoding. Code is available at https://github.com/infiniteYuanyl/VRCD.

19.
arXiv (CS.AI) 2026-06-15

RAMAC: Multimodal Risk-Aware Offline Reinforcement Learning and the Role of Behavior Regularization

arXiv:2510.02695v3 Announce Type: replace-cross Abstract: In safety-critical domains where online data collection is infeasible, offline reinforcement learning (RL) is attractive only if policies achieve high returns without catastrophic lower-tail risk. Prior work on risk-averse offline RL achieves safety at the cost of either (i) value/model-based pessimism or (ii) restricted policy classes that limit expressiveness, whereas diffusion/flow-based expressive generative policies have largely been used in risk-neutral settings. We introduce Risk-Aware Multimodal Actor-Critic (RAMAC), a simple, modular, model-free framework that couples an expressive generative actor (e.g., diffusion/flow) with a distributional critic and optimizes a composite objective that combines Conditional Value-at-Risk (CVaR) with behavioral cloning (BC), enabling risk-sensitive learning in complex multimodal scenarios. Since out-of-distribution (OOD) actions are a major driver of catastrophic failures in offline RL, we further provide an objective-level analysis showing that controlling behavior divergence via BC suppresses OOD actions and stabilizes CVaR. Instantiating RAMAC with a diffusion actor, we illustrate these insights on a 2-D risky bandit and evaluate on Stochastic-D4RL, observing consistent gains in $\mathrm{CVaR}_{0.1}$ while maintaining strong returns. The code and experimental results are available on the \href{https://kaifukazawa.github.io/ramac-project/} {project website}

20.
arXiv (CS.AI) 2026-06-16

Co-Scraper: query-aware DOM Pruning and Reusable Scraper Synthesis for Lightweight Web Data Extraction

arXiv:2606.14821v1 Announce Type: cross Abstract: The abundant and heterogeneous nature of web content necessitates automated information extraction, and generating scrapers that can be reused across similar web pages offers an effective solution for scalable data extraction. In this work, we propose Co-Scraper, a two-stage framework capable of handling the hierarchical complexity of long HTML documents. By integrating a query-aware DOM pruning mechanism with stable extraction strategy induction, Co-Scraper can effectively transforms web content into executable programmatic wrappers using a fine-tuned Qwen3-8B model. On the test set of SWDE, Co-Scraper achieves state-of-the-art performance with an F1 score of 94.78% and a reuse success rate of 90.39%. This framework significantly enhances the accuracy and resilience of data extraction, providing a highly efficient approach for web data acquisition tasks.

21.
arXiv (CS.AI) 2026-06-19

ORAgentBench: Can LLM Agents Solve Challenging Operations Research Tasks End to End?

arXiv:2606.19787v1 Announce Type: new Abstract: Large language models are increasingly deployed as autonomous agents for multi-step tasks in executable environments, yet their ability to perform realistic operations research (OR) work remains unclear. Existing OR evaluations often decouple modeling from solving, rely on pre-formalized or text-only instances, and rarely test the full workflow from operational artifacts to validated decisions. In this work, we introduce ORAgentBench, an execution-grounded benchmark for evaluating autonomous agents on challenging end-to-end operations research tasks. It contains 107 human-reviewed tasks across diverse operational scenarios, each packaged in an isolated environment with a natural-language brief, multi-file data, configuration artifacts, and a required submission schema. Agents must write and run solution code, and their submissions are evaluated by hidden validators for schema validity, hard-constraint feasibility, and normalized objective quality. Experiments with fourteen frontier agent-model configurations show that current agents remain far from reliable OR practice. The best agent passes only 35.51% of all tasks and 20.59% of hard tasks, and many feasible submissions still fall below the required quality threshold. Failure analysis further shows that errors are dominated by strategic weaknesses, including missed operational rules, brittle formulations, weak feasible-solution construction, and insufficient solution improvement. OR-specific procedural skills increase hard-task feasibility, but do not reliably improve solution quality or pass rate. These results suggest that progress in OR agents requires moving beyond plausible optimization code toward dependable, high-quality operational decision-making.

22.
arXiv (CS.AI) 2026-06-24

Neuromorphic Speech Enhancement with Dual-Branch Spiking Neural Networks

arXiv:2606.23761v1 Announce Type: cross Abstract: Spiking neural network (SNN)-based neuromorphic speech enhancement has emerged as a promising paradigm due to its energy efficiency, yet it still underperforms classical artificial neural network (ANN)-based approaches owing to binary activations and the lack of well-designed network architectures. To overcome this limitation, we propose a novel dual-branch spiking neural network architecture equipped with a gated spiking unit (GSU), termed GSU-DBNet. Specifically, GSU-DBNet simultaneously models the speech magnitude spectrum and complex spectrum, predicting the corresponding magnitude and complex spectral masks. Meanwhile, a dual-path GSU module is adopted to exploit temporal and frequency information for enhanced spatiotemporal feature representation. Experiments on a popular benchmark dataset show that GSU-DBNet achieves a PESQ score of 3.04 with only 394K parameters, outperforming existing SNN-based methods while using only 4.5%–10.6% of the parameters of representative ANN-based models.

23.
arXiv (quant-ph) 2026-06-15

Quasilinear Equivalence Checking for Detector Error Models

arXiv:2606.14677v1 Announce Type: new Abstract: A Detector Error Model (DEM) is a structured representation of error mechanisms in quantum circuits, which has gained popularity in quantum compilation pipelines for its ability to capture fault-tolerance at a circuit level. It lists error mechanisms as instructions targeting detectors and observables, specifying for each physical fault channel the probability that the fault fires, the detectors it triggers, and the observables it flips. In this paper, we develop an equational theory for DEMs, with its associated categorical semantics. We present a sound, terminating, confluent rewriting system for DEM terms, formulating it as a symmetric monoidal theory (a PROP) over the Giry monad. We prove that every DEM term has a unique normal form, which can be computed efficiently in quasilinear time $O(k|E|\log|E|)$, where $|E|$ is the number of instructions and $k$ bounds the size of a target set. This provides a complete set of invariants (via Tanner graphs) for structural DEM equivalence. We provide the first static decision procedure for DEM equivalence, with rigorous correctness guarantees. It is complete (decides full decoder-equivalence exactly) for non-adaptive quantum error correction (QEC) pipelines, and scales to a sound and applicable decision procedure for partially-adaptive circuits (lattice surgery, distributed QEC, ...) without suffering exponential overhead. We discuss its application to the verification and optimisation of quantum compilers.

24.
arXiv (CS.CV) 2026-06-11

ActionMap: Robot Policy Learning via Voxel Action Heatmap

Vision-language-action (VLA) models have advanced rapidly across backbones, training recipes, and data scale, yet the action decoder, which converts the backbone's hidden state into a continuous control signal, has barely changed and remains a single-point predictor across the majority of current VLAs. Whether implemented via autoregressive token bins, L1 regression, or flow-matching denoising, the resulting decoder treats the action space as unstructured, leaving the geometric proximity of neighboring actions unexploited during training. To advance this, we introduce ActionMap, a voxel heatmap action head that drops into an existing VLA in place of its native action decoder. For each new action, the head predicts a voxel heatmap over the action space, where each voxel directly stores the probability of the corresponding action. Across LIBERO simulation and real-world Franka manipulation, our heatmap head surpasses two architecturally distinct backbones at matched training steps (e.g., +8.2% over OpenVLA-OFT's L1 regression head on the LIBERO four-suite average), converges at comparable or faster rates on both backbones, and remains markedly more data-efficient at low training data. The cross-backbone consistency indicates that action representation is a real lever for VLA performance, distinct from further backbone or recipe scaling. Project Page: https://showlab.github.io/ActionMap/.

25.
arXiv (CS.AI) 2026-06-12

Reducing the Complexity of Deep Learning Models for EEG Analysis on Wearable Devices

arXiv:2606.12742v1 Announce Type: new Abstract: Wearable healthcare devices are the fastest-growing Internet of Things (IoT) sector. Many automated healthcare services rely on two crucial biological signals, namely ECG and EEG, which reflect the activity of the heart and brain, respectively. Although deep neural networks are considered the primary way to process and analyze these signals, the very tight energy and computational power constraints in wearable devices are far below the computational, energy, and memory bandwidth demands of DNN models, thereby impeding the deployment of deep learning in many practical wearable services. This paper investigates the feasibility of deploying state-of-the-art DNN models in resource-constrained wearable devices. Notably, we explore the trade-off between accuracy and computational complexity of DNNs when parameter quantization and electrode reduction methods are used. Our investigation centers on several state-of-the-art DNN models designed for EEG signal analysis, specifically for detecting epileptic seizures. Our findings demonstrate that, when applied judiciously, these techniques can significantly reduce the complexity of the DNNs under consideration with minimal adverse effects on accuracy. These results reveal the explicit trade-offs between accuracy and complexity reduction encountered when adapting DNN-based online EEG analysis for wearable devices.