×

Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

作者: Mu He ×
换一批
01.
arXiv (CS.AI) 2026-06-19

Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent

arXiv:2604.08552v2 Announce Type: replace-cross Abstract: Scientific metadata are often incomplete and noncompliant with community standards, limiting dataset findability, interoperability, and reuse. Even when standard metadata reporting guidelines exist, they typically lack machine-actionable representations. Producing FAIR datasets requires encoding metadata standards as machine-actionable templates with rich field specifications and precise value constraints. Recent work has shown that LLMs guided by field names and ontology constraints can improve metadata standardization, but these approaches treat constraints as static text prompts, relying on the model's training knowledge alone. We present an LLM-based metadata standardization system that queries standard reporting guidelines and authoritative biomedical terminology services in real time to retrieve canonically correct standards on demand. We evaluate this approach on 839 legacy metadata records from the Human BioMolecular Atlas Program (HuBMAP) using an expert-curated gold standard for exact-match assessment. Our evaluation shows that augmenting the LLM with real-time tool access consistently improves prediction accuracy over the LLM alone across both ontology-constrained and non-ontology-constrained fields, demonstrating a practical approach to automated standardization of biomedical metadata.

02.
arXiv (CS.LG) 2026-06-12

Normative Robustness as a Frontier for Non-Verifiable Reasoning in LLMs

arXiv:2606.12731v1 Announce Type: new Abstract: As LLMs increasingly serve in advisory and deliberative roles, users rely on them for non-verifiable reasoning in domains lacking objective ground truths. However, traditional evaluations of LLM reasoning focus almost exclusively on fact-based domains, such as mathematics and science, leaving uncertainty over whether and to what degree models can handle ambiguous, subjective, or value-laden problems over time. To address this concern, we propose moral reasoning as a paradigmatic subdomain of non-verifiable reasoning. We define moral robustness as a model's capacity to exhibit sound moral reasoning across time and contexts, and we introduce a scalable, adversarial, multi-turn evaluation framework to empirically measure this capability. We simulate 48,000 user-agent moral deliberations across four frontier LLMs, varying premise relevance, premise order, conversation duration, and the user's stated moral view. We find that models successfully ignore morally-irrelevant distractors, but shift their reasoning by up to 6.5%, on average, towards the user's stated preferred moral view, and varying their reasoning depending on factors such as order (altering moral judgments by order in 13-22% of the cases) and duration (altering moral judgments between single-turn and multi-turn in 10-24% of the cases). Our analysis indicates that models tailor not just their final verdicts but their underlying justifications to align with a user's moral viewpoint - a failure mode we characterize as moral deliberative sycophancy.

03.
arXiv (CS.CV) 2026-06-16

Human Cognition in Machines: A Unified Perspective of World Models

This report of world models distinguishes prior works by the cognitive functions they innovate. Many works claim an almost human-like cognitive capability in their world models. To evaluate these claims requires a proper grounding in first principles from human and machine cognition theory. In moving towards human-like world models we present a conceptual unified framework for world models that fully incorporates all the cognitive functions (i.e., memory, perception, language, reasoning, imagining, motivation, and metacognition) and identify gaps in existing research as a guide for future states of the art. In particular, we find that motivation (especially intrinsic motivation) and metacognition remain drastically under-researched, and we propose concrete directions to address these gaps informed by active inference and global workspace theory. We also introduce epistemic world models, a new category encompassing agent frameworks for scientific discovery that operate over structured knowledge. Our taxonomy, applied to video, embodied, and epistemic world models, suggests research directions where prior taxonomies have not.

04.
arXiv (quant-ph) 2026-06-16

Dressed Floquet scars from protected zero modes in a Rydberg chain

arXiv:2606.15605v1 Announce Type: cross Abstract: In this Letter, we present an approximate analytic construction of two zero quasienergy quantum many-body scars in a periodically driven model of Rydberg atoms on a ring, which persist over a range of driving amplitudes and frequencies for finite sizes. An index theorem protects an exponentially large number (in system size) of exact zero energy modes of the Floquet Hamiltonian in this setting. Unlike most of these zero modes which continuously change with drive parameters, these two quantum many-body scars retain the memory of particular states. They can be expressed as {\it dressed versions} of two contrasting states, the Rydberg vacuum and a unitarily rotated variant of a volume-law scar [Ivanov and Motrunich, Phys. Rev. Lett. {\bf 134}, 050403 (2025)], respectively. We provide an analytic understanding of their existence using a Floquet perturbation theory and show their resilience beyond the perturbative regime using exact diagonalization in finite systems. Our study provides insight into the structure of protected zero modes in interacting Floquet settings.

05.
arXiv (CS.CL) 2026-06-16

EHRNote-ChatQA: A Benchmark for Evidence-Grounded Multi-Turn Clinical Question Answering over Longitudinal Discharge Summaries

Discharge summaries are crucial clinical documents containing the context of a patient's overall hospital stay, and are routinely reviewed by medical experts for patient readmission, ongoing care, and diagnostic decision-making. When reviewing them, medical experts often must iteratively synthesize information across multiple summaries while verifying the evidence supporting each answer. Although large language models (LLMs) are increasingly explored for clinical question answering, existing benchmarks do not sufficiently reflect this setting: they often evaluate exam-style medical knowledge or focus on single-turn question answering with limited evidence-grounding evaluation. We introduce EHRNote-ChatQA, the first benchmark for evidence-grounded multi-turn clinical question answering over patients' multiple discharge summaries. Built from de-identified MIMIC-IV discharge summaries, EHRNote-ChatQA contains 967 patient-level multi-turn samples spanning one to five notes and 16,072 medical-expert-verified QA pairs (8,036 content questions, each paired with an evidence-grounding question) across eight clinical categories. The benchmark is constructed through an expert-informed pipeline combining discharge-summary structuring schema, expert-curated multi-turn QA templates, and LLM-based generation, followed by review and revision of every single QA sample by 11 medical experts. Benchmarking 22 open- and closed-source LLMs reveals several challenges, including that LLMs struggle more with evidence grounding than content answering, multi-turn errors compound across turns, and single-turn clinical QA performance does not reliably transfer to this setting. These findings establish EHRNote-ChatQA as a rigorous and practical benchmark for evaluating clinical QA systems. The dataset will be made publicly available through PhysioNet credentialed access.

06.
arXiv (CS.CL) 2026-06-12

Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review

The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant risks for adversarial manipulation, especially given the multimodal nature of scientific papers where figures, not just text, convey core evidence. This creates a significant gap: current robustness studies on AI peer-review are overwhelmingly text-only. Moreover, the problem is distinct from standard jailbreaking, as a peer-review attack seeks to induce a domain-specific, targeted failure (e.g., "inflate this score") rather than a general safety policy violation, for which no practical defenses exist. To address this, we introduce PaperGuard, the first comprehensive benchmark designed to systematically evaluate and defend AI-generated peer-review against these domain-specific, cross-modal attacks. Our framework is built on three pillars: (1) a new multimodal peer-review dataset spanning multiple scientific domains; (2) a unified suite of attacks, including black-box prompt injections and white-box perturbations, specifically designed to target both text (GCG) and figures (PGD); and (3) a practical defense, motivated by the long-context challenge of academic papers, that uses chunk-based embedding search to efficiently localize and mitigate harmful instructions. Our extensive experiments, conducted across state-of-the-art models, confirm that AI reviewers are pervasively vulnerable. PaperGuard establishes the foundational benchmark, protocols, and actionable defense necessary to pioneer trustworthy, attack-resilient AI-assisted scholarly reviewing.

07.
medRxiv (Medicine) 2026-06-15

Validating Field-Feasible Measures of Recent Khat Use: A Diagnostic Accuracy Study Comparing Amphetamine Immunoassay and Assisted Self-Report Against HPLC in an Ethiopian Male Cohort

Background: Khat (Catha edulis) is a widely consumed natural amphetamine-analog used across East Africa and the Arabian Peninsula. Accurate field-feasible measurement of recent khat use is a prerequisite for large-scale epidemiological research; yet no validated alternatives to laboratory reference methods have been identified in the scientific literature. This nested validation study evaluated the diagnostic accuracy of two point-of-care measures, a commercial amphetamine immunoassay and a Timeline Followback (TLFB) Assisted Self-Report (ASR), against high-performance liquid chromatography (HPLC) quantification of urinary norephedrine (NE), while additionally assessing agreement between the two field measures. Methods: A prospective, random sub-sample of 119 male participants aged 18-40 years from the Gilgel Gibe Field Research Center (GGFRC) longitudinal cohort, Ethiopia (validation timepoint T2, 2015), was used. Three index-reference comparisons were conducted: (1) amphetamine immunoassay (nal von minden, Drug-Screen AMP test, 300 ng/mL cutoff) vs. HPLC; (2) binary ASR (past-week use) vs. HPLC; and (3) binary ASR vs. immunoassay. Sensitivity (positive percent agreement, PPA), specificity (negative percent agreement, NPA), positive predictive value (PPV), negative predictive value (NPV), overall accuracy (overall percent agreement, OPA), and Cohen's kappa were calculated with 95% confidence intervals. Pre-specified secondary analyses applied three pharmacokinetically-informed recall windows (0-2, 3-5, and 6-7 days prior to interview) to ASR. Results: Against HPLC (77 positive, 42 negative), the immunoassay showed perfect specificity (1.0 [0.916-1.0]) and PPV (1.0 [0.91-1.0]) but low sensitivity (0.52 [0.40-0.64]), NPV (0.53 [0.42-0.65]), overall accuracy (0.69 [0.60-0.77]), and weak kappa (0.43 [0.34-0.52]). Binary ASR showed high sensitivity (0.96 [0.89-0.99]), specificity of 0.60 [0.433-0.74], PPV (0.81 [0.72-0.89]), NPV (0.89 [0.72-0.98]), with overall accuracy 0.83 [0.75-0.89] and moderate kappa (0.60 [0.51,0.69]). Restricting ASR to use within 0-2 days improved specificity to 0.69 [0.52-0.84], PPV to 0.86 [0.77-0.93], overall accuracy to 0.87 [0.79-0.93], and kappa to 0.69 [0.61-0.78] (moderate), while sensitivity (0.96 [0.89-0.99]) and NPV (0.89 [0.72-0.98]) remained stable. Against the immunoassay, ASR achieved high PPA of (1.0 [0.91-1.0]), NPA of 0.35 [0.25-0.47], OPA of 0.57 [0.48-0.66], and minimal kappa (0.27 [0.19-0.35]). Conclusions: Time-stratified ASR (0-2 days) is a valid, scalable alternative to biological testing for recent khat use in resource-limited settings. The immunoassay's 300 ng/mL cutoff functions as a marker of heavy or recent high-dose khat use rather than any-use detection. Its perfect specificity and PPV make it valuable as a confirmatory test for substantial exposure, while its lower sensitivity reflects calibration to amphetamine rather than to khat-derived cathinone metabolite. Keywords: khat; Catha edulis; diagnostic accuracy; STARD; self-report; immunoassay; HPLC; Ethiopia; substance use measurement

08.
arXiv (CS.LG) 2026-06-17

Beyond IGO-Flow: Toward Convergence Analysis of IGO in Continuous Spaces

arXiv:2606.17523v1 Announce Type: cross Abstract: Information-Geometric Optimization (IGO) provides a unified framework for black-box optimization by interpreting the adaptation of a search distribution as a natural gradient update. Despite its conceptual importance, the convergence theory of IGO remains limited: most existing results concern continuous-time idealizations such as the IGO flow, rather than discrete-time updates with non-infinitesimal learning rates. In this paper, we study discrete-time IGO in continuous spaces, formulated as natural gradient updates in the expectation-parameter coordinates of an exponential family. In particular, we analyze IGO over the multivariate Gaussian family on strongly convex quadratic objective functions. Our analysis covers a setting that simultaneously incorporates full covariance adaptation, a fixed positive learning rate, and quantile-based weights. In this setting, we prove that the covariance matrix converges to the zero matrix. We further show that the mean vector converges to the global optimum, provided that the condition number of the appropriately scaled covariance matrix is bounded at sufficiently frequent iterations. These results advance the convergence theory of IGO and help bridge the gap between the mathematical theory of IGO and practical covariance-adaptive search methods such as CMA-ES.

09.
arXiv (CS.CV) 2026-06-16

Domain-Guided Prompting of the Segment Anything Model for Seismic Interpretation: The Role of Attributes, Visualization, and Hybrid Prompts

The advent of large pretrained foundation models for computer vision has significantly improved the efficiency of visual data interpretation. The Segment Anything Model (SAM), in particular, offers powerful zero shot segmentation capabilities through prompt based interaction, thus making it a promising tool for seismic interpretation. However, most existing applications of SAM rely on fine tuning for specific geological targets, which requires extensive labeled data, incurs high computational cost, and often compromises the model's generalization capability. In this study, we introduce a principled framework for zero shot adaptation of foundation models to seismic data. The framework is built on two key components: (1) aligning seismic attributes and visualization choices (e.g., colormaps) with the geological target of interest, and (2) employing a hybrid prompting strategy that combines sparse user defined point prompts with dense mask prompts derived from SAM's internal feature activations. We systematically evaluate this framework across multiple geological targets, datasets, prompt configurations, and seismic attribute representations. Our results demonstrate that geologic target aware selection of seismic attributes and colormaps, combined with hybrid prompting, enhances the separability of geological features and improves boundary delineation and segmentation accuracy relative to point based prompting alone. Our findings show that, when these components are jointly applied, SAM can achieve competitive segmentation performance in a fully zero shot setting, thereby eliminating the need to retrain SAM for each geologic feature. This work establishes a practical and scalable pathway to leverage foundation models in seismic interpretation, reducing reliance on labeled data while preserving model generality.

10.
arXiv (CS.AI) 2026-06-12

"Is This Not Enough?": Asymmetries in Institutional Accountability and Collective Sensemaking in the Case of Canada's Algorithmic Visa Triage System

arXiv:2606.13071v1 Announce Type: cross Abstract: This paper examines how algorithmic accountability in Canada's visa system is articulated institutionally and experienced by applicants across borders. We analyzed Immigration, Refugees and Citizenship Canada (IRCC)'s Algorithmic Impact Assessment (AIA) for the temporary resident visa (TRV) triage system using the algorithmic decision-making adapted for the public sector (ADMAPS) framework and analyzed Reddit discussions among applicants using a mixed-methods approach. We show that while institutional artifacts emphasize transparency, procedural safeguards, and bounded impacts, applicants engage in collective sensemaking to interpret opaque decisions, often relying on peer knowledge amid uncertainty. We identify three asymmetries between how institutional accountability is structured and how people perceive the process: epistemic asymmetry in access to decision logic, jurisdictional asymmetry in exposure shaped by geopolitical positioning, and temporal–relational asymmetry in how waiting and uncertainty are experienced. We emphasize why it is important to shift attention from institutional design to the uneven distribution of experiences with public-sector algorithmic governance. Together, these contributions demonstrate how algorithmic governance systems in the context of transnational migration produce structured asymmetries not captured by institutional disclosure frameworks, and how extending ADMAPS can account for those uneven translations of accountability.

11.
arXiv (quant-ph) 2026-06-16

Experimental quantum state learning with pairs of photons

arXiv:2606.16932v1 Announce Type: new Abstract: Tomography allows one to estimate the density matrix describing the state an ensemble of quantum systems are prepared in (for example, polarization tomography determines the polarization state of a beam of identically prepared photons). In general, it is not possible to uniquely decompose the density matrix into its pure state components. Agarwal et al. proposed a protocol which, for a mixture composed of any two pure states of a qubit (with arbitrary probabilities), allows an observer to infer not only the density matrix but the identity of those specific pure states and their weights - the additional requirement being that the qubits arrive in pairs, where both qubits in each pair are in the same state. We experimentally demonstrate this learning-from-pairs concept using photons in the polarization degree of freedom. We use tomography to measure a sequence of single photons and make use of their time-of-arrival information to 'pair up' the photons after the measurement. From here we are able to infer the photons' polarization states and their respective probabilities, and we demonstrate this for various different choices of polarization states and ratios. Finally, we investigate our ability to discriminate between two equal mixtures of distinct pairs of orthogonal polarization states. We find that on the order of approx. 10e4 photons is typically enough to achieve tomography fidelities of approximately 0.9999. This is sufficient to discriminate between two different preparations of the same mixed state, differing by angles of less than 5 degrees between the pure states used in the two preparations.

12.
arXiv (CS.CV) 2026-06-17

NTIRE 2024 Challenge on Image Super-Resolution (x4): Methods and Results

This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field.

13.
arXiv (CS.CL) 2026-06-15

UP-NRPA: User Portrait based Nested Rollout Policy Adaptation for Planning with Large Language Models in Goal-oriented Dialogue Systems

To address the challenge that current dialogue policy planning methods struggle to dynamically adapt to diverse user characteristics, this paper proposes a User Portrait based Nested Rollout Policy Adaptation (UP-NRPA) online framework with Large Language Models. In contrast to conventional approaches dependent on model training and require offline reinforcement learning policy models for user groups, UP-NRPA enables dynamic customization of dialogue strategies through an adaptive mechanism. This is achieved by leveraging real-time user feedback alongside personality, preferences, and objectives mapped from the current user portrait, thereby adapting to user characteristics without offline reinforcement learning. In collaborative and non-collaborative dialogue benchmarks, UP-NRPA demonstrated considerable benefits, achieving an impressive 100% success rate in multiple dialogue tasks. Particularly in negotiation tasks, the sale-to-list ratio (SL) increased by 56.41%. This demonstrates that UP-NRPA can adapt to diverse user needs without requiring a training mechanism, enabling the dialogue system to adapt to user characteristics.

14.
arXiv (quant-ph) 2026-06-16

Quantum optimal control of steady orbits

arXiv:2606.15383v1 Announce Type: new Abstract: Periodically driven dissipative systems can settle into steady orbits - fixed loops on their dynamical manifolds. In quantum mechanics, steady orbits occur in cooling engines (used to initialise quantum devices), coherent oscillators (such as lasers and masers), precision metrology devices (atomic clocks, optical and spin magnetometers), and magnetic resonance (steady state free precession, dynamic nuclear polarisation). Steady orbits and stroboscopic steady states are a promising target for quantum optimal control, but the numerical complexity is prohibitive: the infinite loop defeats gradient ascent pulse engineering (GRAPE) which relies on explicit numerical propagation in the time domain. Here we propose an efficient quantum control strategy for stroboscopic steady states and limit cycles that are approached asymptotically when a control sequence is repeated infinitely many times. The formalism is different from Floquet-Lindblad state engineering and effective Hamiltonian theories: it finds control sequences that drive a dissipative quantum system towards a steady orbit passing through user-specified waypoints. The software implementation (same numerical complexity scaling as GRAPE) is done for the Spinach library.

15.
arXiv (CS.CL) 2026-06-16

Attention, not scale, drives human-AI alignment in multimodal language prediction

Humans routinely draw on visual context to predict upcoming words. To what extent current vision-language models produce comparable behaviour is unclear. Here we placed five state-of-the-art pretrained systems side-by-side with 600 human participants in a web-based Visual-World Paradigm. On each of 100 six-second movie clips, models and participants received either text only or synchronised video and text and judged how likely a specified target word was to appear next; human eye movements were tracked throughout. Adding visual context increased model-human alignment in predictability ratings across all architectures (average Delta r = 0.18) with no impact of parameter size. When visual context was informative, transformer attention significantly increased alignment. Attention maps from two transformer models corresponded with human gaze, explaining up to 70% of the inter-participant variance when the scene contained informative cues. Notably, cross-modal attention reliably tracked anticipatory human fixations on semantic cues. These results suggest that current transformer-based vision-language models can approximate human behaviour exploiting visual context during language prediction - and that selective attention to informative cues, not sheer model scale, is the principal driver of this alignment.

16.
arXiv (CS.CV) 2026-06-16

Simulation-Based Multi-Fillet Evaluation of Woody Breast Poultry Fillets

Woody breast (WB) is a myopathy in modern broiler chickens that causes the breast muscle to become unusually stiff and fibrous, leading to decreased meat quality and significant economic losses. State-of-the-art automated WB detection relies on a side-view imaging system to analyze the bending behavior of a single fillet as it falls off a conveyor belt. While highly accurate, this approach is constrained by its single-fillet field of view, creating throughput bottlenecks on commercial processing lines. In this paper, we address this limitation via a novel multi-fillet detection architecture utilizing a top-down camera configuration. To validate our approach, we first develop a high-fidelity digital twin of an industrial conveyor system. Next, we synthesize a diverse dataset of 3D fillet meshes and model their viscoelastic bending dynamics using a physics-based simulation engine. Lastly, a continuous 2D shape deformation score is extracted from the top-down perspective as the simulated fillets traverse the roller precipice. Experimental results demonstrate that the top-down shape score effectively captures the contour changes of the fillets as it bends, providing a robust and scalable alternative to a side-view imaging system for simultaneous multi-fillet WB evaluation.

17.
arXiv (CS.LG) 2026-06-18

A physical adaptive material motor unit neural network: a hygromorph composite material machine

arXiv:2606.18275v1 Announce Type: cross Abstract: Advances in novel materials science enable structures to function as intelligent machines by embedding memory and learning capabilities directly into materials. Our work introduces a physical adaptive material motor unit neural network,leveraging a new generation of controllable actuators composed of wood- and carbon black-based composites, sensitive to temperature and relative humidity. These material actuators are assembled into a motor unit-like structure inspired by muscle contraction trigger, forming an intelligent machine capable of dynamic shading control that can be used, for example, in buildings. The machine is governed by a neural network trained on over 350 experimental data points collected under diverse environmental conditions. By establishing a new data-aware backpropagation training, we show that the machine predicts shading responses and learns to predict appropriate behaviour incrementally as the database expands. We also demonstrate the ability of the machine to optimise configurations to achieve similar shading outputs under two distinct conditions.

18.
arXiv (CS.CV) 2026-06-11

Latent Geometric Chords for Query-Efficient Decision-Based Adversarial Attacks

While decision-based black-box adversarial attacks present a severe security threat, current methodologies suffer from fundamental limitations. Pixel-wise attacks frequently introduce unnatural, high-frequency visual artifacts, while latent-space frameworks are confined by the limited search space of low-dimensional manifolds and inherent reconstruction flaws. To resolve these limitations, we propose Latent Geometric Chords (LGC) for Query-Efficient Decision-Based Adversarial Attacks alongside a variant, LGC-H. At its core, LGC navigates decision boundaries by executing a curvature-aware geometric search within a compressed semantic manifold. To guarantee high visual fidelity and circumvent dimensionality bottlenecks, we introduce a Residual-based Adversarial Generation (RAG) mechanism. RAG isolates semantic perturbations as geometric chords and superimposes them directly onto the original source image. RAG substantially resolves baseline reconstruction flaws and effectively doubles the permissible search space dimensions. Experimental results demonstrate that LGC achieves robust cross-dataset transferability and substantially outperforms state-of-the-art baselines. Notably, our method, LGC, minimizes perturbation magnitudes while achieving state-of-the-art visual fidelity–with a Structural Similarity Index Measure (SSIM) exceeding 0.99 and a Learned Perceptual Image Patch Similarity (LPIPS) below 0.01 at 5000 queries–and sustaining high attack success rates under stringent perceptual constraints, successfully compromising adversarially trained robust models. The source code is available at: https://github.com/eihmuekhine/Latent-Geometric-Chords.

19.
arXiv (quant-ph) 2026-06-12

QuBE/Qubex: an integrated hardware-software system for superconducting qubit experiments with broadband control

arXiv:2606.13010v1 Announce Type: new Abstract: Achieving high-fidelity operation in large-scale superconducting qubit systems requires not only control hardware with broad frequency coverage, low crosstalk, and tight synchronization but also software that coordinates system configuration, experiment execution, and data analysis. Here we present an integrated qubit-control system that combines broadband microwave hardware with a pulse-level software stack for scalable superconducting qubit experiments. The hardware provides broadband microwave coverage, including an instantaneous span of up to 1.6 GHz from a control output, while the software reduces setup and calibration overhead through automated configuration and built-in experiment workflows. We validate the system on a 64-qubit fixed-frequency transmon chip through full-chip frequency identification and representative demonstrations, including multi-unit far-detuned cross-resonance calibration and benchmarking that yields a measured two-qubit gate fidelity of 98.34%, and multilevel readout beyond the computational subspace. By disclosing the hardware architecture and releasing the software stack as open source, this work provides an inspectable hardware-software foundation for scalable superconducting qubit control experiments.

20.
arXiv (CS.CV) 2026-06-17

Reinforcing Dual-Path Reasoning in Spatial Vision Language Models

Spatial VLMs have made substantial progress in geometric perception, yet complex spatial reasoning requiring multi-step inference over depth, distance, and scene relations remains challenging. Moreover, different spatial queries call for fundamentally different strategies: some are best addressed through purely linguistic, step-by-step deduction, while others require explicit 3D grounding before quantitative inference. We present Dual-Path Spatial Reasoning via Reinforcement Learning for Spatial VLMs (SR-REAL), a unified framework that equips a spatial VLM with two complementary reasoning paths: Language-Only Reasoning (LOR), which performs step-by-step linguistic deduction, and Detect-Then-Reason (DTR), which detects 3D geometric cues (e.g., centers or bounding boxes) via region tokens before explicit geometric inference. SR-REAL begins with a cold-start supervised fine-tuning stage that constructs LOR and DTR chain-of-thought supervision and exposes a region-to-3D interface, followed by RL that optimizes the policy model with accuracy and format rewards; for DTR, a discrete center-based detection reward further refines geometric alignment. Across diverse spatial benchmarks, SR-REAL significantly outperforms spatial VLM baselines: (i) a single RL-trained model supports both reasoning paths, with DTR excelling in region-aware tasks through precise 3D localization and LOR enhancing general spatial reasoning; (ii) jointly training both paths fosters mutual reinforcement; (iii) high-quality, blended cold-start data is crucial for stable RL optimization; and (iv) the model generalizes across datasets and domains without per-task tuning, demonstrating positive transfer between LOR and DTR.

21.
arXiv (CS.CL) 2026-06-12

Leveraging Audio-LLMs to Filter Speech-to-Speech Training Data

Large-scale mined corpora provide abundant training data for end-to-end speech-to-speech translation (S2ST) but may contain noise, misalignment, and semantic errors. Filtering noisy data is crucial to maintain robust speech translation performance. We study how to train an audio-language model to make keep/drop decisions on paired speech directly from audio. To obtain reliable supervision without manual labels, we adopt a scalable two-stage Rank-to-Distill strategy. A lightweight ranker generates keep/drop pseudo-labels from noisy speech pairs, then trains an audio large language model to predict keep/drop directly from raw paired speech. The resulting model jointly captures acoustic fidelity and cross-lingual semantic consistency for the selection of speech-conditioned data. Experiments on CVSS-C and SpeechMatrix show consistent improvements over unfiltered training, yielding up to +1.4 ASR-BLEU for end-to-end S2ST.

22.
medRxiv (Medicine) 2026-06-11

Incremental costs of transitioning from four to eight WHO-recommended antenatal care visits in Uganda: A costing analysis from a societal perspective

Background In 2016, the World Health Organization revised its antenatal care (ANC) recommendation from four to eight visits. For low- and middle-income countries like Uganda, where achieving even four visits remains a challenge, this transition has significant cost implications for both the health system and households. This study estimated the incremental costs of adopting the eight-visit model from a societal perspective. Methods The study was conducted in six government health facilities in southwestern Uganda. A micro-costing approach estimated health facility costs (personnel, equipment, consumables, and overhead). Costs incurred at patients end (transport, ultrasound, medical expenses, and time) were collected from 785 women using a questionnaire, with all costs in 2025 USD. Results For an average of 4.3 visits, total cost per woman was $100.1: facility costs $43.7 (43.7%), and patient costs $56.4 (56.3%). Transitioning to eight visits would increase total cost by $57.8 (57.8%), of which $36.4 (63.0%) would fall on households, equivalent to 68.8% of average monthly household income. Total costs would rise by 55.4% ($115.5 to $179.5) at Health Center IVs and 64.3% ($102.3 to $168.1) at Health Center IIIs, with facility costs up 43.4% and 62.9% and patient costs up 61.2% and 65.7%, respectively. Conclusion Transitioning to eight ANC visits would impose a large financial burden on households, with the incremental patient cost equivalent to more than two-thirds of average monthly household income. Equitable implementation requires improving availability of medicines and diagnostics, subsidizing transport, exploring telemedicine or community-based models, and improving efficiency at lower-tier health centers.

23.
medRxiv (Medicine) 2026-06-15

CDH13 is associated with cellular viability after exposure to ionizing radiation using genome-wide screening

Background: It is well known that genetic variants contribute to cellular sensitivity to chemotherapeutic agents and ionizing radiation (IR). The aim of this study was to identify single nucleotide polymorphisms (SNPs) and genes associated with the spectrum of normal cellular sensitivity of lymphoblastoid cell lines (LCLs) towards ionizing radiation and mitomycin C (MMC). Methods: In a first step, we determined the viability of LCLs established from male participants of the Berlin Aging Study II (BASE-II) aged >=62 years following treatments with increasing doses of IR (n=137 cell lines) or MMC (n=140 cell lines) using the alamarBlue assay. Results from intra-experimental triplicates and three independent experiments for each cell line and treatment were used to calculate the area under the curves (AUCs) representing the specific sensitivity to IR and MMC of each LCL. The data from these experiments were subsequently used as outcomes in genome-wide association studies (GWASs). In addition, we calculated polygenic risk scores (PGS) from UK Biobank GWAS results for four cancer-related phenotypes and assessed the extent to which the variance in the IR and MMC sensitivity is explained by these PGS. Results: The GWAS analyses revealed one variant, rs74728080, located in CDH13 on chromosome 16, to show genome-wide significant (p < 5 x 10-8, beta = 2.81) association with cellular viability after treatment with IR. In the GWAS on MMC sensitivity the most interesting signal was elicited by SNP rs113978558 in an intron of the PLD5 gene on chromosome 1 (p = 9.232 x 10-8; beta = 1.44). Several other SNPs with statistically suggestive (i.e., p < 1 x 10-5) evidence of association with IR or MMC sensitivity were identified. PGSs calculations from GWAS of four cancer-related traits in UKB explained ~5% and ~3% of phenotypic variance in IR- and MMC-induced cell viability, respectively. Conclusion: The genome-wide significant association of rs74728080 with IR sensitivity and the location of this variant in CDH13 is interesting and functionally highly plausible given its known involvement in oxidative-stress response and function as tumor suppressor. Taken together, our novel data suggest that CDH13 may be genuinely involved in regulating cellular IR sensitivity.

24.
arXiv (CS.CL) 2026-06-16

Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier

For the development of Large language models (LLMs), recent approaches to generating pseudo intermediate reasoning have shown remarkable progress. But they typically rely on large numbers of correctly annotated answers to assess reasoning quality. This paper presents a semi-supervised framework that scales reasoning learning from minimal supervision, turning reasoning verification itself into a data creation mechanism. We train a lightweight reasoning-correctness classifier on only a few labeled samples, which judges whether intermediate reasoning traces generated by an LLM are valid. Furthermore, an entropy-based confidence threshold filters out unreliable samples, and the remaining high-confidence reasoning traces are used to fine-tune the model. Experiments on Verifiable Math Problems (Orca-Math subset) and Question Answering on Image Scene Graphs (GQA) with Visual Programming show that our method achieves accuracy comparable to using 10-15x more labeled data. Ablation analyses confirm that both the classifier and entropy filtering are essential for scalable and noise-resistant pseudo-labeling. By replacing expensive answer-level supervision with lightweight reasoning verification, our method provides a practical path toward constructing large-scale reasoning resources and paves the way for future autonomous reasoning systems that learn from minimal human input.

25.
arXiv (CS.CL) 2026-06-16

Stop When Further Reasoning Won't Help: Attention-State Adaptive Generation in Reasoning Models

By incorporating test-time compute scaling, large reasoning models (LRMs) can solve complex problems through explicit chain-of-thought (CoT) reasoning processes. However, they often suffer from overthinking, resulting in redundant token outputs and degraded accuracy. Current methods to mitigate this issue remain limited: training-based approaches require substantial computational resources, while training-free methods rely on well-crafted prompts or unreliable confidence signals. In this work, we investigate early stopping from the perspective of attention distributions and propose a simple method, ASAG, which infers the model's reasoning state and adaptively adjusts the generation strategy. The proposed framework is training-free and plug-and-play, enabling seamless integration into existing LRMs. Extensive experiments on nine benchmarks demonstrate consistent improvements across mainstream LRMs with varying parameter scales, including the DeepSeek-R1-Distill and Qwen3 series. Specifically, ASAG improves average accuracy by 3.2% while reducing the number of generated tokens by nearly 40% across all reasoning tasks on Qwen3-8B.