Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
Nature (Science) 2026-06-09

People are turning to AI chatbots to plug gaps in health information

A systematic assessment of health-related queries to a chatbot powered by artificial intelligence highlights shortfalls in health-care provision and the responsibilities of AI companies. A systematic assessment of health-related queries to a chatbot powered by artificial intelligence highlights shortfalls in health-care provision and the responsibilities of AI companies.

02.
arXiv (CS.LG) 2026-06-15

Optimal Hidden-Target Learning for Online Inventory Optimization on General Convex Sets

arXiv:2606.14679v1 Announce Type: new Abstract: Online inventory optimization (OIO) is online convex optimization with physical memory: inventory carryover makes the feasible action set depend on the past. A natural principle, used in stochastic inventory learning and recently in OIO under a single linear capacity constraint, is to maintain a hidden target chosen by an online learner and implement its projection onto the currently feasible order-up-to set. We prove that this simple principle is optimal for OIO on arbitrary bounded convex capacity sets. With online gradient descent as the base learner, the method improves the best known regret guarantee for OIO on general convex sets from inverse to inverse-square-root dependence on the common-demand probability, and we prove a matching lower bound. The same principle gives the first polylogarithmic regret guarantee for strongly convex losses and the first dynamic regret guarantee adapting to Euclidean path variation on general convex capacity sets. The analysis introduces a norm alignment principle: the right state variable is the distance from the hidden target to the feasible set, measured in the same norm as the projection. Under norm alignment, this distance evolves pathwise as a scalar queue, with target movement as arrival and common demand as service. This reduction to one-dimensional queue control resolves the state dependence and extends the guarantees to general convex capacity sets, beyond the reach of prior productwise approaches. Experiments on synthetic and real-world inventory data corroborate the theory.

03.
arXiv (CS.AI) 2026-06-16

MiroBench: Benchmarking Realism in Agentic Simulation of Real-world Discussions

arXiv:2606.14715v1 Announce Type: cross Abstract: LLM agents are increasingly used to simulate real world interactions, but it remains unclear whether simulated behaviors preserve the content patterns and interaction dynamics of real human behaviors. Existing evaluations remain fragmented, which makes it difficult to compare systems or measure progress. In this paper, we focus on Reddit discussions as a concrete first step toward evaluating real-world social simulation. Reddit threads provide public, topic-grounded, multi-party interactions where people share experiences, debate, seek advice, express emotion, and collectively respond to products, events, and social issues. These discussions offer an observable window into broader social behavior, making them a useful setting for testing whether LLM agents can reproduce not only fluent text, but also the distributional patterns and interaction dynamics of real online communities. We introduce MiroBench, a benchmark for Reddit discussion simulation built from 4,292 real Reddit threads. MiroBench uses statistical tests to compare generated and real discussions across four major aspects: repetition and semantic uniformity, narrative content, toxicity and aggression, and structural complexity. Experiments across five domains and five models show that current simulators remain distributionally mismatched with real Reddit threads, while a lightweight prompt-based improvement procedure provides only limited gains. MiroBench offers a concrete benchmark for measuring, diagnosing, and improving realism in LLM-based social simulation.

04.
medRxiv (Medicine) 2026-06-15

VarEx: A Large Language Model Pipeline for Automated Extraction of Exposures, Outcomes, and Covariates from Epidemiologic Studies

Objective: Observational studies are essential for investigating risk factors for Alzheimer's disease and related dementias (ADRD), but inconsistent reporting and selection of covariates can contribute to residual confounding, omitted-variable bias, and reduced reproducibility. We developed and evaluated VAREX (Variable Extraction), a large language model (LLM)-based information extraction framework designed to automatically identify exposures, outcomes, and covariates from epidemiologic studies and populate structured evidence repositories. Materials and Methods: VAREX combines retrieval-augmented generation, biomedical language-model embeddings, semantic chunking, cross-encoder reranking, and prompt-engineered LLM workflows to extract epidemiologic variables from full-text biomedical articles. The framework was evaluated using a reference-standard corpus of observational studies examining blood pressure variability (BPV) and Alzheimer's disease-related dementias (ADRD), together with external validation datasets involving other exposure-outcome relationships. Extracted variables were compared with independently curated human reference standards using semantic matching and one-to-one assignment procedures. Covariates were additionally classified into ten epidemiologically relevant semantic categories. Results: In the primary BPV[->]ADRD corpus (10 studies), VAREX achieved a precision of 0.91, recall of 0.84, and F1-score of 0.87 for variable extraction. Covariate classification accuracy was 0.90, yielding a strict extraction-and-classification F1-score of 0.78. External validation datasets demonstrated comparable performance across diverse epidemiologic domains, with extraction F1-scores ranging from 0.73 to 0.85. Category-level performance was strongest for health behaviors (F1=0.96), sociodemographic variables (F1=0.90), and medication exposures (F1=0.89). Compared with published estimates of manual systematic-review effort, VAREX reduced processing time from approximately 61 minutes to 9 minutes per article, representing an 85.7% reduction in review time. Discussion: These findings demonstrate that LLM-based information extraction can accurately identify and classify epidemiologic variables across heterogeneous observational-study designs. Automated extraction enables scalable construction of structured repositories of exposures, outcomes, and covariates while substantially reducing the labor required for evidence synthesis and systematic reviews. Conclusion: VAREX provides an effective framework for automated extraction and classification of epidemiologic variables from the biomedical literature. By supporting large-scale evidence synthesis and structured knowledge resource development, VAREX may facilitate more rigorous observational research, improved confounder identification, and enhanced reproducibility in epidemiology.

05.
arXiv (CS.CV) 2026-06-12

Ex-Omni: Enabling 3D Facial Animation Generation for Omni-modal Large Language Models

Omni-modal large language models (OLLMs) aim to unify multimodal understanding and generation, yet extending them to jointly produce speech and 3D facial animation remains largely unexplored despite its importance for natural human-computer interaction. A key challenge is the mismatch between the discrete semantic reasoning of LLMs and the dense temporal dynamics required for 3D facial motion. We propose Expressive Omni (Ex-Omni), an open-source model that augments OLLMs with native speech-accompanied 3D facial animation. Ex-Omni decouples semantic reasoning from temporal generation through a blendshape-aware speech unit generator and a blendshape decoder, where speech units provide temporal scaffolding and hidden speech representations carry facially relevant cues. We further introduce a unified token-as-query gated fusion (TQGF) mechanism for controlled semantic injection, as well as InstructS2SF-1200K, a dataset consisting of 1200K samples for pre-training. Extensive experiments show that Ex-Omni maintains competitive speech understanding and generation ability while achieving better audio-visual synchronization and lower face-generation latency than cascaded pipelines.

06.
arXiv (CS.CL) 2026-06-11

A Controlled Study of Decoding-Time Truthfulness Methods on Instruction-Tuned LLMs

作者:

In this work, we introduce CHAIR (Classifier of Hallucination As ImproveR), a supervised framework for detecting hallucinations by analyzing internal logits from each layer of every token. Our method extracts a compact set of features such as maximum, minimum, mean, standard deviation, and slope-from the token logits across all layers, enabling effective hallucination detection without overfitting. Experiments on TruthfulQA and MMLU datasets demonstrate that CHAIR significantly improves detection accuracy, particularly in zero-shot scenarios, showcasing its robustness and generalizability. Beyond hallucination detection, CHAIR highlights the potential of using internal representations for designing advanced decoding strategies. By leveraging patterns in logits, we suggest that more sophisticated models and adaptive decoding methods could further reduce hallucinations and enhance text completion quality. CHAIR not only offers a practical solution for detecting hallucinations but also lays the groundwork for exploring richer representations in LLMs to improve their factuality and coherence.

07.
arXiv (CS.LG) 2026-06-15

A Longitudinal Attribute-Conditioned Neural Network for Modeling Health-State Transition Probabilities in Temporally Irregular Data: The LANTERN Framework

arXiv:2606.13880v1 Announce Type: new Abstract: Accurate estimation of long-term care transition probabilities is central to disability insurance pricing, reserving, and solvency assessment. Classical actuarial multi-state models commonly rely on Markov, semi-Markov, or proportional-hazard specifications, which provide a direct connection to cohort projection but may be restrictive for irregular longitudinal health data with nonlinear aging patterns and heterogeneous covariate histories. This paper develops a well-calibrated estimator of multi-state transition probabilities for irregular longitudinal health data. The model learns from individual health history, incorporates the time elapsed between observations, and conditions transition probabilities on demographic and socioeconomic attributes. It produces a valid probability distribution over the next observed health state, with four possible states: healthy, mild disability, severe disability, and death. Individual probabilities are aggregated by age group and origin state to form transition matrices compatible with actuarial cohort projection. Using longitudinal data from the Health and Retirement Study, we compare the proposed estimator with logistic regression, gradient-boosted trees, a recurrent neural network, and a last-state persistence benchmark. The evaluation considers probabilistic accuracy, endpoint discrimination and calibration for severe disability and death, risk concentration, and transition matrix error after aggregation. The proposed estimator improves severe disability discrimination relative to logistic regression and gradient-boosted tree benchmarks, maintains strong calibration, and yields the lowest transition matrix error among the evaluated models in the held-out test analysis. Results show that a structured machine learning estimator can support long-term care transition modeling when judged by calibration and projection fidelity, beyond discrimination.

08.
arXiv (CS.LG) 2026-06-18

Neural Network Implementation of the Renormalization Group for Fault Diagnosis with Class Imbalance

arXiv:2606.18326v1 Announce Type: new Abstract: The application of machine learning models in practical tasks faces challenges such as class imbalance and multidimensional noise. This paper proposes RGNet, a neural network architecture based on the concept of the renormalization group (RG), for hierarchical coarse-graining of the feature space. The model sequentially compresses the input dimensionality and concatenates all scales before classification, allowing it to capture both local details and global patterns. The notion of RG-flows is introduced - interpretable low-dimensional representations whose visualization via t-SNE reveals a discrete curvilinear structure confirming the effectiveness of coarse-graining. Experimental results are presented on the imbalanced AI4I dataset. The obtained results demonstrate that RGNet is a universal, interpretable, and competitive solution for fault prediction in applications with imbalanced classes.

09.
arXiv (CS.LG) 2026-06-16

Scalable Graph Condensation with Evolving Capabilities

arXiv:2502.17614v3 Announce Type: replace Abstract: The rapid growth of graph data creates significant scalability challenges as most graph algorithms scale quadratically with size. To mitigate these issues, Graph Condensation (GC) methods have been proposed to learn a small graph from a larger one, accelerating downstream tasks. However, existing approaches critically assume a static training set, which conflicts with the inherently dynamic and evolving nature of real-world graph data. This work introduces a novel framework for continual graph condensation, enabling efficient updates to the distilled graph that handle data streams without requiring costly retraining. This limitation leads to inefficiencies when condensing growing training sets. In this paper, we introduce GECC (\underline{G}raph \underline{E}volving \underline{C}lustering \underline{C}ondensation), a scalable graph condensation method designed to handle large-scale and evolving graph data. GECC employs a traceable and efficient approach by performing class-wise clustering on aggregated features. Furthermore, it can inherit previous condensation results as clustering centroids when the condensed graph expands, thereby attaining an evolving capability. This methodology is supported by robust theoretical foundations and demonstrates superior empirical performance. Comprehensive experiments including real world scenario show that GECC achieves better performance than most state-of-the-art graph condensation methods while delivering an around 1000$\times$ speedup on large datasets.

10.
arXiv (CS.LG) 2026-06-19

Online Dynamic Batching with Formal Guarantees for LLM Training

arXiv:2606.19989v1 Announce Type: cross Abstract: Modern LLM training breaks a core assumption behind offline batch samplers: the true training cost of a sample is only observable after preprocessing, augmentation, templating, tokenization, and multimodal visual-token expansion. Unless one pays for a preprocessing- and augmentation-dependent length cache, batch construction is therefore blind to the quantity that determines padding, memory use, and GPU saturation. We introduce Online Dynamic Batching (ODB), a DataLoader-side drop-in system that moves batch formation to this point of accurate observability while preserving DDP step alignment. We formalize this synchronization requirement as the Distributed Group Alignment Problem and prove deadlock-free bounded termination with default join-mode identity coverage and opt-in non-join sample-quota closure. ODB requires no model, optimizer, or attention-kernel changes and is released as online-dynamic-batching with lightweight trainer adapters. Across public 2B/8B Qwen3-VL runs on UltraChat/LLaVA/ShareGPT4o, ODB improves literal emitted-sample throughput vs. fixed-batch Standard by 1.58-2.51x on single-node Full FT/LoRA and 1.71-3.78x on two-node Full FT, with Standard-comparable quality; production MM-Mix reaches 4.43x. Against GMT/BMT offline token-budget oracles, ODB is within 15% on UltraChat/LLaVA and faster on high-CV ShareGPT4o: 2.24-2.39x single-node Full FT/LoRA and 3.06-3.69x two-node Full FT. Together, ODB occupies the online/drop-in regime for high-heterogeneity LLM fine-tuning: large throughput gains at Standard-comparable quality, formal DGAP guarantees, and no length-cache precompute or kernel rewrites.

11.
arXiv (CS.AI) 2026-06-16

MemPO: Self-Memory Policy Optimization for Long-Horizon Agents

arXiv:2603.00680v4 Announce Type: replace Abstract: Long-horizon agents face the challenge of growing context size during interaction with environment, which degrades the performance and stability. Existing methods typically introduce the external memory module and look up the relevant information from the stored memory, which prevents the model itself from proactively managing its memory content and aligning with the agent's overarching task objectives. To address these limitations, we propose the self-memory policy optimization algorithm (MemPO), which enables the agent (policy model) to autonomously summarize and manage their memory during interaction with environment. By improving the credit assignment mechanism based on memory effectiveness, the policy model can selectively retain crucial information, significantly reducing token consumption while preserving task performance. Extensive experiments and analyses confirm that MemPO achieves absolute F1 score gains of 25.98 over the base model and 7.1 over the previous SOTA baseline, while reducing token usage by 67.58% and 73.12%. The code is released at https://github.com/TheNewBeeKing/MemPO.

12.
arXiv (quant-ph) 2026-06-11

Integrable Massless and Massive Fermions

作者:

arXiv:2603.11172v2 Announce Type: replace-cross Abstract: One-dimensional integrable fermions can be classified into massless and massive regimes, and the $R$-operator for the latter can be constructed from that of the former. Here, I define integrable massless fermions by the simultaneous satisfaction of the Yang-Baxter equation (YBE) and Shastry's decorated YBE (DYBE) by the $R$-matrix. This notion is strictly more general than Maassarani's `free-fermion algebra', yet more restrictive than the notion of free fermions in exactly solvable quantum models or in integrable two-dimensional classical vertex models dual to quantum spin chains. Within this framework, there emerge two archetypal mechanisms for opening a spectral gap and generating massive fermions: (i) breaking time-reversal symmetry by coupling to external field, and (ii) introducing time-reversal symmetric interactions. These paradigms are realized, respectively, in the XY chain in a longitudinal field and in the Hubbard model, both of which possess non-relativistic, bivariate $R$-matrices. Integrability conditions on local Hamiltonians for both massless and massive fermions are identified, and schematic procedures for uniquely determining their $R$-matrices are proposed.

13.
arXiv (CS.LG) 2026-06-16

Mixtures of Subspaces for Bandwidth Efficient Context Parallel Training

arXiv:2606.16384v1 Announce Type: new Abstract: Pretraining language models with extended context windows enhances their ability to leverage rich information during generation. Existing methods split input sequences into chunks, broadcast them across multiple devices, and compute attention block by block which incurs significant communication overhead. While feasible in high-speed clusters, these methods are impractical for decentralized training over low-bandwidth connections. We propose a compression method for communication-efficient context parallelism in decentralized settings, achieving a remarkable compression rate of over 95\% with negligible overhead and no loss in convergence. Our key insight is to exploit the intrinsic low-rank structure of activation outputs by dynamically constraining them to learned mixtures of subspaces via efficient reparameterizations. We demonstrate scaling billion-parameter decentralized models to context lengths exceeding 100K tokens on networks as slow as 300Mbps, matching the wall-clock convergence speed of centralized models on 100Gbps interconnects.

14.
arXiv (CS.CL) 2026-06-16

LiFT: Local Search via Linear Programming for Overfitting-Controlled Transformers

This paper proposes a Linear Programming (LP)-based local search framework for fine-tuning pretrained transformer models with explicit control against overfitting. The approach formulates transformer fine-tuning as a bilevel optimization-based regularization problem, in which model parameters and regularization hyperparameters are jointly updated. Information collected during initial warm-up iterations, including validation gradients and training Hessian information, is used to construct a local descent direction by solving an LP that minimizes a scaled directional derivative while preserving training optimality. This validation-aware descent direction enables focused local updates of both parameters and regularization hyperparameters, reducing overfitting without requiring repeated full retraining cycles. The resulting method, termed Linear Programming-based Fine-Tuning (LiFT) for transformers, differs from conventional fine-tuning by systematically identifying task-specific updates rather than relying on heuristic or grid-based hyperparameter selection. Experiments on GPT-2 Small fine-tuned on WikiText-2 demonstrate that LiFT enables effective adaptation through selective tuning of transformer blocks and regularization parameters, yielding consistent improvements in test perplexity across multiple layer configurations and regularization settings, with particularly pronounced gains in overfitting-prone scenarios. Beyond empirical performance, LiFT establishes a principled connection between transformer fine-tuning, bilevel optimization, local search, and regularization theory.

15.
arXiv (CS.CL) 2026-06-16

Evaluating the Robustness of Proof Autoformalization in Lean 4

Proof autoformalization aims to translate a mathematical informal proof written in natural language into a formal proof in a formal language such as Lean~4. Several works have developed LLM-based models for proof autoformalization. However, existing evaluations have typically focused on translating well-formed informal proofs from curated datasets. We argue that a robust proof autoformalizer must remain faithful even for informal proofs that diverge from these idealized ones, and we present the first study on the robustness of proof autoformalization models. We formulate two categories of perturbations and evaluate robustness under each: a global perturbation paraphrases the informal proof in a different style, under which the formalization should remain consistent; a local perturbation alters a value, symbol, or proof step, possibly in a counterfactual way, and a robust formalization should faithfully reflect the perturbation rather than reverting to the original one or inferring a different one on its own. We build a benchmark with both perturbations on miniF2F and MATH-500, and automatically measure how stable a proof autoformalization's correctness is under global perturbations and how faithfully its output reflects local perturbations. We evaluate seven recent models, all of which are sensitive to global perturbations and mostly fail to remain faithful under local perturbations. Code and data are available via https://github.com/ucr-rai/robust-proof-autoformalization.

16.
arXiv (CS.CL) 2026-06-11

Massive Open-Vocabulary Keyword Spotting

Automatic speech recognition systems have been shown to under-perform when it comes to transcribing words rarely seen in the training data, namely specialized terminology. Open-vocabulary keyword spotting, combined with contextual biasing, has been shown to mitigate this issue. However, existing systems can only handle glossaries of a few hundred terms without becoming an infeasible bottleneck. We propose a system that stores features with a memory footprint up to 128 times smaller than a comparable baseline and allows users to process massive databases while remaining open-vocabulary. Without fine-tuning the speech recognition model, our system achieves a comparable entity recall as uncompressed solutions, even in languages not seen during training.

17.
arXiv (CS.CL) 2026-06-16

HyDRA: Hybrid Dynamic Routing Architecture for Heterogeneous LLM Pools

Production LLM deployments increasingly maintain heterogeneous model pools spanning order-of-magnitude cost differences. Existing routers make binary strong-vs-weak decisions and couple learned parameters to specific model identities, requiring retraining whenever the catalog changes. We present HyDRA (Hybrid Dynamic Routing Architecture), a framework that predicts fine-grained, multi-dimensional capability requirements per query and matches them against configuration-defined model profiles via shortfall matching. A ModernBERT encoder with K=4 independent sigmoid heads scores each query along reasoning, code generation, debugging, and tool use; a shortfall-matching algorithm then selects the cheapest model whose capabilities meet the predicted requirements. The deployed predictor runs at 86 ms median CPU inference latency in production, and is fully decoupled from the model catalog – adding or removing models requires only a configuration change, with zero retraining. On SWE-Bench Verified (5-model pool: GPT-5.4-mini, Claude Haiku 4.5, GPT-5.3 Codex, Claude Sonnet 4.6, GPT-5.4), HyDRA's tunable shortfall threshold spans three regimes: peak-quality exceeds the always-strong Claude Sonnet 4.6 baseline (75.4% vs. 74.2% resolution) at 12.9% cost savings; iso-quality matches Sonnet at 54.1% cost savings, a 6x improvement over our prior in-house binary router at 9.1%; aggressive pushes savings to 72.5% for a 3.2-point quality trade. Results generalize across LiveCodeBench, BigCodeBench, and tau-bench. HyDRA is deployed to all users in GitHub Copilot's VS Code Chat auto-mode and – to our knowledge for the first time in the LLM routing literature – demonstrates language-invariant routing across CJK, European, and other script families.

18.
arXiv (CS.LG) 2026-06-11

How Low Can You Go? Active Learning for Sparse Model Discovery in the Ultra-Low-Data Limit

arXiv:2606.12182v1 Announce Type: new Abstract: Identifying the governing equations of complex dynamical systems remains a fundamental challenge across science and engineering. While early approaches relied on empirical data and heuristics, modern data-driven methods offer greater flexibility and fewer assumptions. However, data acquisition in real-world settings is often expensive. This work addresses this challenge by introducing an active learning strategy for dynamics discovery in the ultra-low data limit. Rather than sampling randomly, our method iteratively prioritizes regions that are most informative for model identification. This approach builds on Sparse Identification of Nonlinear Dynamics (SINDy), and utilizes an ensemble extension, E-SINDy, to estimate epistemic uncertainty and guide the sampling for both ordinary and partial differential equations (ODEs/PDEs). For ODEs, an exhaustive analysis is conducted on the Lorenz system across varying data budgets and noise levels. For PDEs, two systems with contrasting dynamical characteristics are examined: the Burgers' equation, where a sharp shock front creates a distinction between informative and uninformative regions, and the Kuramoto-Sivashinsky equation, which presents a more spatially complex sampling landscape. Across all scenarios, the proposed method accurately identifies the governing dynamics with significantly fewer data samples than random sampling.

19.
arXiv (math.PR) 2026-06-12

Data-driven subsampling rates for diffusion parameter estimation of SDEs

arXiv:2606.13615v1 Announce Type: new Abstract: We study the problem of diffusion parameter estimation for stochastic differential equation (SDE) models in scenarios where data and model are compatible only on specific scales that have yet to be determined. We introduce a simple and efficient method for selecting suitable rates at which given time series data should be subsampled in order to ensure that the statistical structure of the subsampled data is consistent with the behavior of the SDE model on an infinitesimal scale. Our approach is based on analyzing the statistics of the lengths of monotonically increasing or decreasing segments in the subsampled data sequence, which we refer to as monotone runs. As an analytical foundation, we prove for a large class of SDEs with additive noise that the lengths of monotone runs at an infinitesimal scale are approximately geometrically distributed with success probability $1/2$. This universal characterization is employed to derive an automated method for selecting appropriate subsampling rates for given time series data that is directly applicable in real-world scenarios and does not rely on an asymptotic framework of multiscale diffusions. The approach is demonstrated using an application from industrial mathematics concerning surrogate models for fiber lay-down curves in production processes of nonwoven textiles.

20.
arXiv (CS.CL) 2026-06-17

HistoRAG: Embedding Historical Methodology in Retrieval-Augmented Generation Through Critical Technical Practice

Retrieval-Augmented Generation (RAG) is the prevailing architecture for grounding language model outputs in external evidence, yet its dominant evaluation paradigms and default configurations remain oriented toward factual question-answering. For interpretive disciplines such as historical studies, RAG embeds assumptions that conflict with scholarly practice. We introduce HistoRAG, a framework that translates historiographical principles into concrete architectural interventions. Separated retrieval and generation decouples source discovery from interpretation, temporal windowing enforces balanced source representation across the research period as a methodological requirement of historical inquiry, and LLM-as-judge evaluation makes relevance judgments transparent and contestable. We evaluate these interventions using SPIEGELragged, applied to 102,189 articles from Der Spiegel (1950-1979). Each intervention addresses a measurable deficiency in standard RAG: era-specific vocabulary retrieves zero chunks from the 1950s when using 1970s terminology, evidence of the temporal skew that motivates windowing; vector similarity and LLM-assessed relevance correlate only weakly (Spearman rho = 0.275), motivating post-retrieval evaluation; and keyword-based and semantic retrieval surface largely disjoint source pools, motivating an architecture in which both operate as complementary retrieval layers under a shared LLM evaluation filter. We also introduce the concept of Zwischentexte (intermediate texts that function as interpretive proposals rather than findings) as a framework for responsible integration of LLM-generated text into scholarly practice. The architecture offers a model for how domain-specific epistemological commitments can be translated into RAG design decisions, and may transfer to other interpretive disciplines working with large corpora.

21.
arXiv (quant-ph) 2026-06-11

On the Addressability Problem on CSS Codes

arXiv:2502.13889v4 Announce Type: replace Abstract: Recent discoveries in asymptotically good quantum codes have intensified research on their application in quantum computation and fault-tolerant operations. This study focuses on the addressability problem within CSS codes: we ask what circuits might implement logical gates on strict subsets of logical qubits. With some notion of fault-tolerance, we prove several impossibility results: for CSS codes with non-zero rate, one cannot address a logical $H$, $HS$, $SH$, or $\mathsf{CNOT}$ to any non-empty strict subset of logical qubits using a circuit made only from 1-local Clifford gates. Furthermore, we show that one cannot permute the logical qubits in a code purely by permuting the physical qubits, if the rate of the code is (asymptotically) greater than 1/3 and the distance is at least 3. We can show a similar no-go result for $\mathsf{CNOT}$s and $\mathsf{CZ}$s between two such high-rate codes, albeit under a more restrictive assumption on the circuit, which we call "global" (though recent addressable CCZ gates use global circuits). This work pioneers the study of distance-preserving addressability in quantum codes, mainly by considering automorphisms of the code. This perspective offers new insights and potential directions for future research. We argue that studying this trade off between addressability and efficiency of the codes is essential to understand better how to do efficient quantum computation.

22.
arXiv (CS.CL) 2026-06-11

LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization

We introduce LibriConvo, a synthetic conversational speech corpus for speaker diarization and automatic speech recognition (ASR), built by instantiating the previously proposed Speaker-Aware Simulated Conversation (SASC) framework in a dataset and benchmarking setting. The main contribution of this paper is a corpus construction pipeline and benchmark derived from that framework. To make the data more suitable for downstream ASR and diarization, conversational timing statistics are estimated from English CallHome using external voice activity detection, long pauses are compressed, LibriTTS utterances are grouped by book to improve local semantic continuity, and room impulse responses are selected with a spatial-plausibility heuristic. The resulting corpus contains 240.1 hours of audio across 1,496 dialogues involving 830 speakers, partitioned into speaker-disjoint train, validation, and test splits. We report baseline results for both diarization and ASR. On the test split, Sortformer outperforms the pyannote pipeline in diarization (11.1\% vs.~24.4\% DER). For ASR, a Fast Conformer-CTC XLarge model fine-tuned with Serialized Output Training achieves 7.29\% WER and 6.97\% cpWER, outperforming zero-shot Whisper-large-v3. These results position LibriConvo as a practical benchmark for studying synthetic conversational speech and for evaluating multi-speaker speech processing systems.

23.
arXiv (quant-ph) 2026-06-16

Boson Sampling as a Probe of Chaotic and Integrable Quantum Dynamics in a Photonic Chip

arXiv:2605.25398v2 Announce Type: replace Abstract: Quantum chaos plays a key role in understanding complex quantum dynamics, while integrated photonics offers unique advantages for quantum applications, including high-speed operation, scalability, and programmable unitary transformations. However, integrated photonic approaches to probing quantum chaos remain largely unexplored, owing to the absence of a clear connection between programmable photonic dynamics and established chaos diagnostics. In this work, we establish Fock-state boson sampling as a practical probe of quantum chaos by exploiting the sensitivity of multiphoton interference to the random-matrix properties of underlying single-particle unitary dynamics. More importantly, we design and fabricate a programmable quantum photonic chip to experimentally implement this framework, achieving the first integrated-photonic demonstration of quantum-chaos probes based on boson sampling. Experimental results show that the three complementary probes proposed in this work, namely the distance to Porter–Thomas statistics, Shannon entropy, and Out-of-Time-Ordered-Correlator-equivalent observables, exhibit close agreement with theoretical predictions and consistently distinguish chaotic and integrable dynamics. Our work provides a scalable route for investigating complex quantum dynamics on programmable photonic platforms while leveraging the intrinsic advantages of boson sampling through multiphoton interference and complex output statistics.

24.
arXiv (CS.CL) 2026-06-12

From Passive Generation to Investigation: A Proactive Scientific Peer Review Agent

Large language models (LLMs) have shown promise in automating scientific peer review. However, existing approaches often struggle to generate in-depth reviews supported by concrete evidence. We argue that a key limitation is the lack of flexibility to proactively investigate suspicious parts of a paper based on accumulated evidence, as human reviewers do. In this paper, we explore how to enable an LLM-based review agent to perform such proactive investigation. We find that this can be naturally formulated as a Markov Decision Process (MDP), and propose ProReviewer, a scientific peer review agent that proactively reviews a paper guided by a maintained, structured review log. The structured review log serves as a workspace for the agent to track evidence and intermediate findings collected during review. Experiments show that ProReviewer with an 8B backbone, trained by supervised fine-tuning and optimized by reinforcement learning, achieves the highest average score across five quality dimensions, outperforming prompt-based methods with much larger frontier LLMs by up to 39% and the strongest fine-tuned baseline by 16% relatively. It also attains the highest win rates against baselines in human evaluation.

25.
arXiv (quant-ph) 2026-06-12

Stable, bidirectional electro-optic transduction in thin film lithium tantalate

arXiv:2606.12726v1 Announce Type: new Abstract: Efficient and stable microwave-optical transduction is a key enabling technology for distributed superconducting quantum computing and heterogeneous quantum networks. Electro-optic transducers based on thin-film lithium niobate (TFLN) have shown strong promise, but demonstrations to date have been limited by various factors such as low frequency bias drift, low efficiency, fabrication complexity, and scalability. Here we demonstrate the first integrated electro-optic microwave-optical transducers realized in thin-film lithium tantalate (TFLT), a material platform offering Pockels nonlinearity comparable to TFLN together with improved bias stability and high-power handling. We fabricate superconducting microwave resonators coupled to tunable photonic-molecule optical resonators using wafer-scale deep ultraviolet lithography, offering high-throughput production of hundreds of devices per wafer. Across six devices we observe coherent bidirectional conversion between C-band optical photons and 4.9-5.5 GHz microwave photons, with measured on-chip efficiencies and inferred single-photon coupling rates g_0/2{\pi} ~ 1 kHz consistent with theory. Continuous operation over multiple days is achieved using a static bias field with minimal feedback, demonstrating a major operational advantage. We further characterize optical loss statistics, microwave resonator performance, and optically induced added noise under pulsed pumping, finding less than one added photon for 100 microsecond pulses at the highest measured efficiencies. These results establish TFLT as a scalable and robust electro-optic platform for future quantum interconnects and modular quantum processors.