论文广场 - AcademicHub

01.

arXiv (quant-ph) 2026-06-11 DOI: arXiv:2602.14736

Coupled integrated photonic quantum memristors using a single photon source made of a colour center

作者:

Alessio Baldazzi ↗Roy Philip George Konnoth Ancel ↗Sebastiano Guaraldo ↗Ivan Fattori ↗Xuan Chen ↗Ziad Abi Akar ↗Regis Deturche ↗Stefano Azzini ↗Christophe Couteau ↗Lorenzo Pavesi ↗

arXiv:2602.14736v2 Announce Type: replace Abstract: Photonic quantum memristors provide a measurement-induced route to nonlinear and history-dependent quantum dynamics. Experimental demonstrations have so far focused on isolated devices or simple cascaded devices configurations. Here, we experimentally realize and characterize a network of two coupled photonic quantum memristors with crossed feedback, implemented on a silicon nitride photonic integrated circuit and fed by a room-temperature single-photon source based on a silicon-vacancy color center SiV$^-$ in a nanodiamond. Each memristor consists of an integrated Mach-Zehnder interferometer whose transfer function is adaptively updated by photon detection events on another memristor, thus generating novel non-Markovian input-output dynamics with an enhanced memristive behaviour compared to single devices. In particular, we report inter-memristor input-output hysteresis curves exhibiting larger form factors and displaying self-intersecting loops, respectively revealing marked bistability and self-intersecting hysteresis geometry. Furthermore, numerical simulations show how these features emerge from the interplay between memory depth and relative input phase, for both intra- and inter-memristor input-output relations. We experimentally test the performance of our system in the NARMA task. Our results establish coupled integrated photonic quantum memristors as scalable nonlinear building blocks and highlight their potential for implementing compact quantum neuromorphic and reservoir computing architectures.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.CL) 2026-06-12 DOI: arXiv:2606.13209

Understanding helpfulness and harmless tension in reward models

作者:

Eshaan Tanwar ↗Pepa Atanasova ↗

Reward models are a key component of reinforcement learning from human feedback (RLHF), aligning language models toward both helpful and harmless behaviour. However, the internal mechanisms underlying these objectives and their conflicts remain poorly understood. We study alignment tension in reward models trained under helpfulness-only, harmlessness-only, and mixed-objective settings. We find that mixed-objective models often underperform single-objective models, indicating interference between objectives. Using activation-based methods, we identify neurons associated with each objective and study their functional roles via targeted ablations. We find that these neurons causally support their corresponding objectives while often negatively affecting the opposing one. We find that a substantial proportion of neurons are shared between helpfulness and harmlessness, and that these shared neurons exert a disproportionate influence on model behaviour, contributing to alignment tension. Additionally, our results provide insights and mechanistic interpretation into how alignment objectives are represented in reward models and why multi-objective alignment remains challenging, motivating future work on disentangled and controllable alignment methods.

阅读与讨论 → 访问原文 →

03.

arXiv (quant-ph) 2026-06-11 DOI: arXiv:2505.10530

Exploring Variational Entanglement Hamiltonians

作者:

Yanick S. Kind ↗Benedikt Fauseweh ↗

arXiv:2505.10530v3 Announce Type: replace Abstract: Recent advances in analog and digital quantum-simulation platforms have enabled exploration of the spectrum of entanglement Hamiltonians via variational algorithms. In this work we analyze the convergence properties of the variationally obtained solutions and compare them to numerically exact calculations in quantum critical systems. We demonstrate that interpreting the cost functional as an integral permits the deployment of iterative quadrature schemes, thereby reducing the required number of measurements by more than an order of magnitude even in the presence of noise. We further show that a modified ansatz captures deviations from the Bisognano-Wichmann form in lattice models, improves convergence, improves trainability and provides a cost-function-level diagnostic for quantum phase transitions. Finally, we establish that a low cost value does not by itself guarantee convergence in trace distance. Nevertheless, it faithfully reproduces degeneracies and spectral gaps, which are essential for applications to topological phases.

阅读与讨论 → 访问原文 →

04.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.19683

Exit-and-Join Dynamics for Decentralized Coalition Formation

作者:

Quanyan Zhu ↗

arXiv:2606.19683v1 Announce Type: new Abstract: This paper studies coalition formation as a decentralized dynamical process driven by unilateral exit-and-join decisions. Agents evaluate local moves using the Aumann-Dreze value, so payoffs are computed within the agent's current coalition rather than through a globally negotiated coalition structure. The resulting model links cooperative payoff allocation with noncooperative best-response behavior: a terminal partition is precisely a coalition structure with no admissible, individually profitable exit-and-join deviation. We establish equilibrium characterizations, identify conditions under which the dynamics admit scalar Lyapunov or exact-potential representations, and analyze how switching and acceptance costs shape local stability. Numerical experiments test finite-time stabilization, cost sensitivity, and a special convex-game benchmark.

阅读与讨论 → 访问原文 →

05.

medRxiv (Medicine) 2026-06-16 DOI: HASH:f761cb028fc6d57c528a58d82118838e

Higher Population Coverage with Typhoid Conjugate Vaccine is Needed to Induce Herd Protection: Evidence from a Cluster-Randomized Trial in Urban Bangladesh

作者:

Ahmmed ↗khanam ↗Islam ↗Park ↗S. E ↗Ongadi ↗Im ↗Zhang ↗Khan ↗A. I ↗Aziz ↗A. B ↗…

Introduction: A cluster randomized trial (CRT) in Bangladesh found that Vi-tetanus toxoid (Vi-TT) vaccine conferred 85% protection to vaccinees at 18 months of follow-up; however, it failed to confer significant herd protection to non-vaccinees. Methods: In the CRT, children aged 9 months to

阅读与讨论 → 访问原文 →

06.

arXiv (CS.CL) 2026-06-11 DOI: arXiv:2601.03792

VietMed-MCQ: A Consistency-Filtered Data Synthesis Framework for Vietnamese Traditional Medicine Evaluation

作者:

Huynh Trung Kiet ↗Dao Sy Duy Minh ↗Nguyen Dinh Ha Duong ↗Le Hoang Minh Huy ↗Long Nguyen ↗Dien Dinh ↗

Large Language Models (LLMs) have demonstrated remarkable proficiency in general medical domains. However, their performance significantly degrades in specialized, culturally specific domains such as Vietnamese Traditional Medicine (VTM), primarily due to the scarcity of high-quality, structured benchmarks. In this paper, we introduce VietMed-MCQ, a novel multiple-choice question dataset generated via a Retrieval-Augmented Generation (RAG) pipeline with an automated consistency check mechanism. Unlike previous synthetic datasets, our framework incorporates a dual-model validation approach to ensure reasoning consistency through independent answer verification, though the substring-based evidence checking has known limitations. The complete dataset of 3,190 questions spans three difficulty levels and underwent validation by one medical expert and four students, achieving 94.2 percent approval with substantial inter-rater agreement (Fleiss' kappa = 0.82). We benchmark seven open-source models on VietMed-MCQ. Results reveal that general-purpose models with strong Chinese priors outperform Vietnamese-centric models, highlighting cross-lingual conceptual transfer, while all models still struggle with complex diagnostic reasoning. Our code and dataset are publicly available to foster research in low-resource medical domains.

阅读与讨论 → 访问原文 →

07.

arXiv (CS.CV) 2026-06-24 DOI: arXiv:2606.24499

GeoIMO: Geometry-Driven Independent Motion Classification for Event Cameras

作者:

Anil Bayram Gogebakan ↗Filippo Marostica ↗Alessio Caviglia ↗Alessandro Savino ↗Stefano Di Carlo ↗

Existing automotive event datasets rely on appearance-based annotations from frame pipelines, making them poorly suited for motion-aware event perception. We present a geometry-driven, annotation-free framework that classifies detected objects as static or independently moving by exploiting ego-motion structure directly from the event stream. A Focus of Expansion model with yaw compensation estimates global background motion, while objects are labeled as moving when local motion deviates from this prediction, as quantified by a scale-invariant residual. Temporal stabilization improves robustness across consecutive event windows. The method requires no learning, no manual motion labels, and works with any input bounding boxes. Experiments on MVSEC and the Prophesee 1 Megapixel Automotive Detection dataset demonstrate consistent performance across diverse driving scenarios, with yaw compensation improving results during turns and a simple translational local model offering a favorable accuracy-efficiency trade-off.

阅读与讨论 → 访问原文 →

08.

arXiv (math.PR) 2026-06-18 DOI: arXiv:2606.18722

Very large cliques in a scale-free random graph

作者:

Carlo De Ambroggio ↗Umberto De Ambroggio ↗

arXiv:2606.18722v1 Announce Type: new Abstract: In this short article we consider a preferential attachment random graph model with edge steps, studied by Alves, Ribeiro and Sanchis. Starting with an initial graph $\mathbb{G}_1$ formed by a vertex with a self-loop attached to it, the model evolves as follows. At every subsequent (discrete) time step, either with probability $p$ we add a vertex to the graph and connect it to exactly one of the older vertices selected with probability proportional to its degree, or with probability $1-p$ we add one edge between two existing vertices, both selected (independently) with probability proportional to their degrees. Let $\omega(\mathbb{G})$ be the clique number of a graph $\mathbb{G}$, i.e.\ the number of vertices in a largest complete subgraph of $\mathbb{G}_{}$. Alves, Ribeiro and Sanchis showed that, for any given $\varepsilon>0$, we have $\omega(\mathbb{G}_{2t})\geq t^{\frac{1-p}{2-p}(1-\varepsilon)}$ with high probability (i.e.\ with probability tending to $1$ as $t\rightarrow \infty$). Here we strengthen this bound by showing that, for any function $f:\mathbb{N}\mapsto \mathbb{N}$ that satisfies $f(t)\rightarrow \infty$ as $t\rightarrow \infty$, with high probability \[\omega(\mathbb{G}_{2t}) = \Omega\left(t^{\frac{1-p}{2-p}}\Big(\log^{\frac{1}{2-p}}(t)f(t)\Big)^{-1}\right).\]

阅读与讨论 → 访问原文 →

09.

arXiv (CS.LG) 2026-06-15 DOI: arXiv:2606.14388

A Low-Rank Subspace Analysis of LLM Interventions

作者:

Angira Sharma ↗Christian Schroeder de Witt ↗Philip Torr ↗Anisoara Calinescu ↗Jialin Yu ↗

arXiv:2606.14388v1 Announce Type: new Abstract: Interventions designed to modify a particular behavior in LLMs, such as refusal or sycophancy, often produce unintended changes in other behaviors. This lack of targeted control makes it difficult to design and implement reliable safety controls. To understand these side-effects, we introduce a diagnostic framework for analyzing interacting behaviors in LLMs. We model behaviors as low-rank subspaces in activation space, and study how interventions influence across behaviors. Across multiple instruction-tuned models (7B-70B) and across refusal, jailbreak, and sycophancy settings, we find that different behaviors share internal representations, and intervening on one behavior alters others in asymmetric ways. Some behaviors act as upstream control points whose interventions propagate broadly across other behaviors, while others remain more isolated. We relate these effects to two geometric quantities: (i) the overlap between behavior subspaces, measured as the average squared cosine of principal angles, and (ii) the angle between each behavior subspace and the decision subspace (capturing the model's final decision e.g., refuse vs. comply). Empirically, intervention effects on other behaviors tend to be larger for behavior pairs with higher subspace overlap, and for source behaviors whose subspaces lie closer (smaller angle) to the decision subspace. These findings highlight a challenge for targeted behavior control: behaviors are difficult to modify independently, as interventions can propagate through shared representations and asymmetric interactions.

阅读与讨论 → 访问原文 →

10.

arXiv (quant-ph) 2026-06-11 DOI: arXiv:2508.10076

TensorKit.jl: A Julia package for large-scale tensor computations, with a hint of category theory

作者:

Lukas Devos ↗Jutho Haegeman ↗

arXiv:2508.10076v2 Announce Type: replace-cross Abstract: TensorKit$.$jl is a Julia-based software package for tensor computations, especially focusing on tensors with internal symmetries. This paper introduces the design philosophy, core functionalities, and distinctive features, including how to handle abelian, non-abelian, and anyonic symmetries through the ``TensorMap'' type. We highlight the software's flexibility, performance, and its capability to extend to new tensor types and symmetries, illustrating its practical applications through select case studies.

阅读与讨论 → 访问原文 →

11.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.16014

Orchestrated Reality: From Role-Play to Living, Playable Game Worlds – LLM-Driven World Simulation as a Parameterized-Action POMDP

作者:

Yuhang Huang ↗Chenmiao Li ↗Chaowei Fang ↗

arXiv:2606.16014v1 Announce Type: cross Abstract: Many games rely on storytelling combined with systems that track levelling, NPC behaviour, and consequence simulation; bridging tightly-authored narrative with deeply-simulated worlds – most acute in sandbox and open-world settings – has been prohibitively expensive. LLM-driven worlds open a new path: a single harness can coordinate numerical state, narrative voice, storytelling pacing, and rule logic together. Realising this requires the LLM system to sustain a persistent world (who is where, what has just happened, what is currently true), which today's deployed systems do not: the narrative voice asserts state in free prose without any validated representation, so a fully autonomous game engine remains infeasible. We treat this as an architectural choice, not a limitation of language models, and report work in progress on a framework – orchestrated reality – that makes the world a canonical object owned by a singleton orchestration agent analogous to the tabletop-RPG Game Master (GM). We formalise an LLM-driven game world for a human player as a Parameterized-Action POMDP: state is a tree of canonical JSON entities, actions decompose as $a=(k, x_k)$ (a discrete intent kind plus structured JSON parameters), the agent observes only a narrative projection $o=O(s)$ of state, and the transition kernel $F$ is an LLM-driven Plan-Diff-Validate-Apply (PDVA) pipeline that commits schema-validated, content-hashed JSON deltas. We give the formal model, a JSON-state example, a worked single-turn example, and a catalogue of 15 illustrative incidents drawn from a real deployment showing the framework in action. Empirical validation through a planned human player study – together with multi-NPC concurrent agency and deployment as an RL environment – is situated as future work.

阅读与讨论 → 访问原文 →

12.

medRxiv (Medicine) 2026-06-23 DOI: HASH:1179c8614c164c0b59222791d537a450

A pharmacometric grey zone reconciles high metronidazole resistance rates with bismuth quadruple therapy efficacy in Helicobacter pylori

作者:

He ↗Wang ↗Xu ↗

Summary Background Metronidazole (MET) resistance in Helicobacter pylori (H. pylori) exceeds 50-60% globally, yet MET-containing bismuth quadruple therapy (BQT) achieves &gt90% eradication in MET-resistant infections. We hypothesise this discordance stems from a structural limitation of two-fold dilution: a pharmacometric grey zone between the 128 and 256 &microg/mL breakpoints where treatable isolates are systematically misclassified as high-level resistance. Methods In a real-world cohort of 4610 treatment-na&iumlve children (2019-2024), checkerboard assays determined the bismuth-MET synergy factor (SF). Population PK/PD modelling simulated gastric MET exposure (AUC

阅读与讨论 → 访问原文 →

13.

arXiv (quant-ph) 2026-06-17 DOI: arXiv:2604.27457

Demonstration of Exponential Quantum Speedup with Constant-Depth Compiled Circuits for Simon's Problem

作者:

Phattharaporn Singkanipa ↗Victor Kasatkin ↗Daniel A. Lidar ↗

arXiv:2604.27457v2 Announce Type: replace Abstract: We demonstrate exponential algorithmic quantum speedup for a restricted-Hamming-weight version of Simon's problem, in which the hidden string $b$ is promised to satisfy $HW(b)\le w$ for a Hamming-weight cutoff $w$, on present-day superconducting quantum processors. We introduce a hardware-aware compilation strategy that reduces the quantum part of each Simon query circuit to constant depth. The resulting compiled circuits have $O(1)$ depth, require only linear nearest-neighbor connectivity, map directly onto common device layouts, and avoid additional routing and SWAP overhead. Implemented on IBM's $156$-qubit Boston and $120$-qubit Miami processors, these circuits achieve sufficient fidelity to exhibit algorithmic quantum speedup without error suppression. Using the number-of-queries-to-solution (NTS) metric, we observe exponential speedup over the classical lower-bound benchmark for all restricted-Hamming-weight cutoffs $w\ge 4$ on Boston and across low-to-intermediate Hamming-weight cutoffs on Miami; at higher Hamming-weight cutoffs on Miami, we still observe polynomial speedup. The same construction also enables unrestricted instances of Simon's problem, corresponding to $w=n$ for problem size $n$, over the finite problem-size ranges for which our NTS computation is feasible; in this regime, the observed scaling advantage is not limited to the restricted-Hamming-weight setting. These results show that careful hardware-aware compilation can make quantum speedup experimentally accessible for a canonical hidden-subgroup problem in the NISQ regime.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.CL) 2026-06-17 DOI: arXiv:2606.18021

LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI

作者:

Lalit Yadav ↗Akshaj Gurugubelli ↗

AI systems deployed in legal workflows hallucinate at rates that aggregate metrics report at ~52%, but this average conceals where errors concentrate and in which direction they run, leaving compliance officers without an actionable signal for trustworthy deployment. We present LegalHalluLens, an auditing framework with three components: typed hallucination profiles across four legally-motivated claim categories (numeric, temporal, obligation/entitlement, factual) over CUAD (Hendrycks et al., 2021); a Risk Direction Index (RDI) that reduces omission-versus-invention bias to a single deployment-comparable scalar; and a typed debate pipeline calibrated to both magnitudes and directions. Across 510 contracts and 249,252 clause-level instances we measure a within-model gap of approximately 38-40 pp between obligation/numeric and temporal claims that aggregate reporting hides, and show that two systems with matched 52% rates can carry opposite RDIs. The debate pipeline reduces fabricated detections by 45% with per-category gains tracking the diagnosis, matching commercial APIs with a substantially smaller backbone (4B active parameters). Typed profiles and RDI surface failure modes that aggregate metrics hide; we further show these diagnostics serve as calibration inputs for multi-agent debate pipelines, where Skeptic challenges and asymmetric gates targeted at measured failure modes outperform generically-tuned debate. The framework supports direction-aware procurement, accountability, and agent design for legal AI deployed in the wild.

阅读与讨论 → 访问原文 →

15.

arXiv (CS.AI) 2026-06-24 DOI: arXiv:2606.24585

LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context

作者:

Anastasiia Kucherenko ↗Fran\c{c}ois Brouchoud ↗Dimitri Percia David ↗Andrei Kucharavy ↗

arXiv:2606.24585v1 Announce Type: new Abstract: While the validity of LLMs' use in the legal context remains subject to ethical and legal debate, legal professionals are already experimenting with personal LLMs, if only for translation and reformulation. However, even such a seemingly innocuous use can introduce biases through case processing speed if LLM assistants selectively refuse assistance on certain topics. To better anticipate such biases, we investigate several modern small LLMs that are most likely to be used as on-device assistants, to assess the impact of overrefusal on legal prompts. Surprisingly, we find that authority-style prefixes (``you are acting as an assistant of the national supreme court'', ``[...] defense lawyer'') systematically increase refusal rates by 2–20x over the no-prefix baseline, while a known role-play jailbreak prefix shows mixed effects, sharply increasing refusals in some models and barely shifting them in others. The finding suggests that small on-prem deployable LLMs are unstable under contextual framings that a real institutional user might naturally introduce, and further investigation is essential to minimize opportunities for bias.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.15669

Z-Plane Neural Networks: Bounded Geometric Activation Replaces ReLU and LayerNorm

作者:

Sungwoo Goo ↗Hwi-yeol Yun ↗Sangkeun Jung ↗

arXiv:2606.15669v1 Announce Type: cross Abstract: Modern deep neural networks rely on Euclidean scalar activations (e.g., ReLU) and global normalization techniques (e.g., LayerNorm) to prevent gradient instability in deep architectures. However, these mechanisms inherently cause dead neurons, discard critical directional information, and destroy the orthogonality of feature representations. Inspired by the frequency-modulation transmission of biological axons, we propose the Z-Plane Neural Network, which maps hidden states into 2D phasor bundles on a hypersphere. We introduce a novel geometric activation function, Radial Bounding($\mathbf{x} / \max(1, \|\mathbf{x}\|_2)$), which limits the energy magnitude while preserving the phase (direction). We demonstrate mathematically that this isotropic activation maintains 1-Lipschitz continuity and prevents gradient vanishing by preserving tangential gradients. Empirically, a 100-layer Z-Plane Multi-Layer Perceptron (MLP)-entirely devoid of ReLU and LayerNorm-successfully converges on the MNIST dataset with 98.34% accuracy and absolute numerical stability, proving that bounded geometric activation alone is sufficient for stable deep learning.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2606.19755

SafeSpec: Fast and Safe LLM via Dynamic Reflective Sampling

作者:

Haotian Xu ↗Zeyang Zhang ↗Linbao Li ↗Huadi Zheng ↗Yu Li ↗Cheng Zhuo ↗

arXiv:2606.19755v1 Announce Type: cross Abstract: Speculative inference accelerates large language model (LLM) decoding but provides no inherent safety guarantees. Existing safety defenses are largely incompatible with speculative inference: they either introduce additional computation or disrupt the draft-verify mechanism, negating acceleration benefits. This reveals a fundamental incompatibility between current safety methods and speculative decoding. We propose SafeSpec, a safety-aware speculative inference framework that integrates risk estimation directly into the verification process. SafeSpec attaches a lightweight latent safety head to the target model to jointly evaluate semantic validity and safety in a single forward pass. When unsafe generations are detected, SafeSpec applies rollback and safety-guided reflective multi-sampling to recover safe continuations rather than terminating generation. We model jailbreak attacks as distributional shifts over generative trajectories, where adversarial prompts increase the probability of harmful continuations without eliminating safe ones. Under this model, SafeSpec performs risk-aware trajectory recovery within the speculative decoding process. Across multiple models and adversarial benchmarks, SafeSpec achieves a substantially improved safety-efficiency trade-off. On Qwen3-32B, SafeSpec reduces attack success rates by 15% while preserving a 2.06x inference speedup on benign workloads, demonstrating that speculative acceleration and inference-time safety can be jointly optimized.

阅读与讨论 → 访问原文 →

18.

arXiv (CS.AI) 2026-06-16 DOI: arXiv:2606.16113

RecourseBench: A Modular Framework for Reproducible Algorithmic Recourse Evaluation

作者:

Zahra Khotanlou ↗Hashir Ahmed ↗Chenghao Tan ↗Ahmed Abdelaal ↗Amir-Hossein Karimi ↗

arXiv:2606.16113v1 Announce Type: new Abstract: Algorithmic recourse methods provide counterfactual explanations that inform individuals of the actions required to overturn an unfavorable model decision. Despite rapid methodological progress, principled comparison remains elusive; existing frameworks are often difficult to extend and lack both interoperability and systematic verification that integrated methods faithfully reproduce their originally reported results. We introduce RecourseBench, a unified evaluation framework built around three commitments namely, modularity, reproducibility, and interactivity. The framework decomposes the pipeline into five fully decoupled layers – Data, Preprocessing, Model, Recourse Method, and Evaluation – governed by abstract interfaces and a dynamic registry. To address the reproducibility gap in prior benchmarks, we introduce a four-tier classification system in which every integrated method is validated by an automated test suite against its originally reported results. We further provide an interactive web interface for flexible, configuration-driven comparison across methods, datasets, and model architectures. Our framework currently integrates 28 state-of-the-art recourse methods and, to our knowledge, constitutes the first recourse benchmark to explicitly enforce method-level reproducibility through automated, quantitative testing.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.LG) 2026-06-16 DOI: arXiv:2606.15085

An Integrable Token Mixing Layer from the Generalized Yang Baxter Equation

作者:

Snigdha Chandan Khilar ↗

arXiv:2606.15085v1 Announce Type: new Abstract: The YB Mixer is a sequence token mixing layer derived from free fermion and generalized Yang Baxter structures. It applies a core principle from integrable systems where a local algebraic constraint guarantees global computational stability. By using the Ising exchange algebra the mixer creates a free fermionic structure that acts as an exactly norm preserving orthogonal map. This algebra also produces commuting transfer matrices which allow inference to be order free and adaptable to any variable budget. To ensure the model can generalize to longer sequence lengths it uses a spectral circulant generator. This generator maintains the crucial orthogonal and commuting properties of the system. The result is a highly stable and mathematically grounded architecture for sequence processing.

阅读与讨论 → 访问原文 →

20.

arXiv (CS.CL) 2026-06-11 DOI: arXiv:2606.11447

AI Coding Agents Can Reproduce Social Science Findings

作者:

Meysam Alizadeh ↗Mohsen Mosleh ↗Fabrizio Gilardi ↗Atoosa Kasirzadeh ↗Joshua Tucker ↗

Recent anecdotal evidence suggests that AI coding agents can reproduce published findings when provided with original data and code; yet systematic evaluation across social sciences remains limited. Existing evaluation benchmarks are insufficient, either small or conflate agent performance with problems in the reproduction materials themselves, such as code that fails to execute correctly. Here we introduce SocSci-Repro-Bench, a benchmark of 221 tasks spanning four disciplines and 13 substantive domains, constructed from studies whose results are either fully reproducible with available materials or demonstrably non-reproducible due to missing data, allowing us to isolate agents' reproduction capacity. Evaluating two frontier coding agents, Claude Code and Codex, we find that both can reproduce a large share of social science findings, with Claude Code substantially outperforming Codex. These reproduction rates considerably exceed those previously reported for general-purpose LLM-based agents on comparable reproducibility benchmarks. Both agents also perform strongly on a reasoning task requiring identification of underlying research questions, and additional analyses suggest that results are not primarily driven by memorization. Providing the original paper PDF alongside replication materials modestly improves performance but introduces bias on tasks where reproduction is impossible. We also show that agents can be nudged toward confirmatory specification search through subtle prompt framing. Together, these findings suggest that at least some frontier coding agents can serve as reliable executors of computational workflows while underscoring the need for careful benchmarking and prompt design as AI systems assume larger roles in scientific production.

阅读与讨论 → 访问原文 →

21.

medRxiv (Medicine) 2026-06-17 DOI: HASH:78c2a7dab257abe9176b09b85e7085e5

Womens intentions and motivations towards health behaviour change before pregnancy: a cross-sectional survey of pregnant women in Australia

作者:

Steel ↗Schoenaker ↗McIntyre ↗Rogers ↗Hall ↗Adams ↗

Introduction: The preconception period (i.e. the weeks and months before pregnancy) is a critical window during which parental health behaviours can influence pregnancy outcomes and the childs long-term health. Modifiable factors such as nutrition, physical activity, substance use, and environmental exposures play a key role, yet womens ability to adopt and sustain healthy behaviours is shaped by complex psychological, social and environmental influences. This study applies the Theory of Planned Behaviour to identify the beliefs underpinning womens preconception behaviours, with the aim of informing support for effective and sustained health behaviour change. Methods: An Australian national retrospective cross-sectional survey of pregnant women (18-49 years), recruited through social media platforms. The 92-item survey captured respondent socio-demographics, pregnancy status and health conditions, health behaviours, and beliefs regarding preconception health behaviours. Respondents level of pregnancy planning was categorised using the London Measure of Unplanned Pregnancy (LMUP). Items regarding preconception beliefs were structured in accordance with the Theory of Planned Behaviour, with a focus on regular exercise, healthy diet, and alcohol avoidance. These beliefs variables were analysed using structured equation modelling to identify paths between latent variables and the items used to estimate each concept. Results: The study was completed by 430 pregnant women of whom 72.7% had a planned pregnancy. Most had a partner, were university educated and in good health. Structural equation modelling showed intention strongly predicted exercise ({beta}=0.65), healthy diet ({beta}=0.54) and alcohol avoidance ({beta}=0.64). Perceived control and partner norms influenced intentions, whereas health professional norms had limited effect. Positive beliefs were associated with folate supplement use and smoking cessation. Conclusion: These findings highlight intention as a key driver of preconception health behaviours, with perceived control and partner influences playing a more significant role than individual beliefs or health professional input. Effective interventions should therefore address structural barriers and actively involve partners, while respecting womens autonomy. Overall, couples-focused, multi-level strategies are likely essential to support meaningful and sustained preconception health behaviour change.

阅读与讨论 → 访问原文 →

22.

arXiv (CS.CL) 2026-06-17 DOI: arXiv:2602.11715

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

作者:

Haolei Bai ↗Lingcheng Kong ↗Xueyi Chen ↗Jianmian Wang ↗Zhiqiang Tao ↗Huan Wang ↗

Diffusion large language models (dLLMs) have emerged as a compelling alternative to autoregressive (AR) LLMs, owing to their capacity for parallel token generation. This paradigm is particularly well-suited for code generation, where holistic structural planning and non-sequential refinement are critical. Despite this potential, tailoring dLLMs for CUDA kernel generation remains challenging, obstructed not only by the high specialization but also by the severe lack of high-quality training data. To address these challenges, we construct CuKe, an augmented supervised fine-tuning dataset optimized for high-performance CUDA kernels. On top of it, we propose a bi-phase curated reinforcement learning (BiC-RL) framework consisting of a CUDA kernel infilling stage and an end-to-end CUDA kernel generation stage. Leveraging this training framework, we introduce DICE, a series of diffusion large language models designed for CUDA kernel generation, spanning three parameter scales, 1.7B, 4B, and 8B. Extensive experiments on KernelBench demonstrate that DICE significantly outperforms both autoregressive and diffusion LLMs of comparable scale, establishing a new state-of-the-art for CUDA kernel generation.

阅读与讨论 → 访问原文 →

23.

arXiv (CS.CL) 2026-06-17 DOI: arXiv:2606.18158

The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act

作者:

Mich\`ele Finck ↗

Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the ancillary, paralegal tasks that most current legal-AI evaluations measure. This measurement gap is not only methodological but legal: the EU AI Act makes "appropriate accuracy" a binding requirement for high-risk AI used in the judicial domain, yet that requirement cannot acquire operational content without the very doctrinal-reasoning benchmark the field lacks.

阅读与讨论 → 访问原文 →

24.

arXiv (quant-ph) 2026-06-15 DOI: arXiv:2606.14226

Efficient Simulation of Szegedy Quantum Walk Formulations and Algorithms

作者:

Sergio A. Ortega ↗Daniel K. Park ↗

arXiv:2606.14226v1 Announce Type: new Abstract: Quantum walks provide a versatile framework for quantum algorithms across a wide range of applications. We develop efficient classical simulation methods for Szegedy quantum walks that avoid explicit construction of the full unitary evolution operator. Unlike previous approaches restricted to a particular walk formulation, our framework is built from fundamental update and reflection operators, enabling the simulation of a broader class of Szegedy walk formulations. We further extend these methods to phase-estimation-based algorithms coupled to the walk, including implementations suitable for large sparse graphs. The resulting methods achieve optimal $O(N^2)$ complexity for dense graphs with $N$ nodes. For sparse graphs, the computational cost scales linearly with the number of edges, which is $O(N)$ in many cases. We implement the framework in the Python package SQWLib and illustrate its capabilities through simulations of representative algorithms, including quantum simulated annealing and quantum search on graphs. These results provide a practical tool for studying Szegedy-walk-based algorithms numerically beyond purely analytical treatments.

阅读与讨论 → 访问原文 →

25.

medRxiv (Medicine) 2026-06-18 DOI: HASH:b5cd118e839127ccdd43e011db7b98a9

Web-based education on Metabolism and Obesity is associated with improved lifestyle and health behaviours among Brazilian school teachers

作者:

Evangelista-Silva ↗P. H ↗Coutinho ↗C. P ↗Martins Correia ↗C. C ↗Moraes ↗Fernandes Ferreira ↗A. F ↗Medeiros Komino ↗A. C ↗Neves Ramos ↗…

Background: Obesity is a major global public health challenge, and teachers play a critical role in school-based health promotion. This study examined the perceived impact of a web-based educational program on metabolism and obesity delivered to Brazilian school teachers. Methods: This analytical cross-sectional study included 217 teachers who responded to the evaluation questionnaire after attending the course between 2017 and 2022. Statistical analyses included logistic regression and chi-square tests. Findings: Course completion rate was 81.98%, substantially exceeding the 5-15% typical of global MOOCs. However, ethnic disparities were observed: White respondents were 4.95 times more likely to complete the course than Black respondents (p=0.00097) and Brown respondents were 3.05 times more likely (p=0.0268) than Black respondents. Among non-completers, lack of time (64.7%) was the primary barrier. Participation was concentrated in Sao Paulo (77%), with no respondents from three northern states. Perceived difficulty showed a non-significant trend (p=0.0893) where by Black respondents had the lowest predicted difficulty; the most challenging course material was Scientific Content/Reading papers (50%). Completion was strongly associated with applying learned activities in teaching (p

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络