Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

Agent Economics: An Entropy-Controlled Pluralistic Alignment Framework for Preventing Artificial Hivemind in Autonomous Agents

arXiv:2606.09039v2 Announce Type: replace Abstract: This study proposes the Behavioral Protocol Framework (BPF), an entropy-controlled pluralistic alignment framework designed to address two critical challenges in autonomous agent economies: the hivemind effect arising from excessive strategic convergence among agents and the lack of transparency in autonomous decision-making processes. The proposed BPF consists of three core modules: Mentalizing-based Social Intelligence (MbSI) grounded in Theory of Mind (ToM), Pluralistic Alignment (PA), and a Verifiable Execution Kernel (VEK). These modules are organically integrated within a closed-loop architecture that governs the entire lifecycle of agent behavior, from decision-making and execution to verification and feedback. To evaluate the proposed framework, a simulation environment implemented in Python and a Streamlit-based user interface will be developed. Through empirical experimentation, the study aims to examine whether the entropy-control mechanism of the PA module can effectively preserve strategic diversity among agents and mitigate collective convergence, while the VEK module provides a comprehensive and transparent audit trail of the decision-making process. The anticipated results are expected to demonstrate that the proposed framework can simultaneously enhance the stability, efficiency, and trustworthiness of autonomous agent economies. Consequently, this research offers a practical approach for developing robust, transparent, and accountable agent-native economic systems.

02.
Nature (Science) 2026-06-17

Towards Conversational AI for Disease Management

While large language models (LLMs) have shown promise in diagnostic dialogue1, their capabilities for effective management reasoning—including disease progression, therapeutic response, and safe medication prescription—remain under-explored. We advance the previously demonstrated diagnostic capabilities of the Articulate Medical Intelligence Explorer (AMIE)1−3 through a new LLM-based agentic system optimized for multi-visit clinical management and dialogue. To ground its reasoning in authoritative clinical knowledge, AMIE leverages Gemini’s long-context capabilities4, combining in-context retrieval with structured reasoning to align its output with up-to-date clinical practice guidelines and drug formularies. In a randomized, blinded virtual Objective Structured Clinical Examination (OSCE) study, AMIE was compared to 21 primary care physicians (PCPs) across 100 multi-visit case scenarios designed to reflect UK NICE Guidance and BMJ Best Practice guidelines. AMIE was non-inferior to PCPs in management reasoning as assessed by specialists and scored better in both preciseness of treatments and investigations, and in its alignment with and grounding in clinical guidelines. To benchmark medication reasoning, we developed RxQA, a multiple-choice question benchmark derived from two national drug formularies (US, UK) and validated by board-certified pharmacists. Though AMIE and PCPs both benefited from the ability to access external drug information, AMIE outperformed PCPs on higher difficulty questions. While further research would be needed before real-world translation, AMIE’s strong performance across evaluations marks a significant step towards conversational AI as a tool in disease management.

03.
arXiv (CS.LG) 2026-06-17

Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

arXiv:2602.17894v2 Announce Type: replace-cross Abstract: Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as medical studies or political polling, different sources incur different sampling costs. Observations often have associated group identities - for example, health markers, demographics, or political affiliations - and the relative composition of these groups may differ substantially, both among the source populations and between sources and target population. In this work, we study multi-source data collection under a fixed budget, focusing on the estimation of population means and group-conditional means. We show that naive data collection strategies (e.g. attempting to "match" the target distribution) or relying on standard estimators (e.g. sample mean) can be highly suboptimal. Instead, we develop a sampling plan which maximizes the effective sample size - the total sample size divided by $D_{\chi^2}(q\mid\mid\overline{p}) + 1$, where $q$ is the target distribution, $\overline{p}$ is the aggregated source distribution, and $D_{\chi^2}$ is the $\chi^2$-divergence. We pair this sampling plan with a classical post-stratification estimator and upper bound its risk. We provide matching lower bounds, establishing that our approach achieves the budgeted minimax optimal risk. Our techniques also extend to prediction problems when minimizing the excess risk, providing a principled approach to multi-source learning with costly and heterogeneous data sources.

04.
arXiv (CS.CV) 2026-06-16

Segmentation-based Detection for Efficient Multi-Task Spacecraft Perception

Vision-based perception is fundamental to Space Situational Awareness and autonomous on-orbit operations such as rendezvous, docking, servicing, and navigation. However, progress in this area is limited by the scarcity of annotated space imagery and by challenging visual-domain characteristics including severe illumination changes, low signal-to-noise ratio, and high contrast. We address Stream 1 of the SPARK 2026 Challenge, which requires a single model for spacecraft classification, detection, and fine-grained component segmentation across multiple target types. We propose a compact architecture that integrates a MobileNetV3 encoder with a U-Net-style decoder, combining computational efficiency with accurate dense prediction. Detection is derived analytically from the union of predicted component masks, avoiding a separate bounding-box regression head in the single-spacecraft setting. Our method achieved an overall leaderboard score of 0.9482, with task-specific scores of 1.0000 in classification, 0.9788 in detection, and 0.8917 in segmentation. The proposed approach ranked second overall in the SPARK 2026 Challenge, demonstrating that lightweight encoder-decoder architectures can deliver strong multi-task performance for practical onboard space vision systems.

05.
arXiv (CS.CV) 2026-06-19

Does Head Pose Correction Improve Biometric Facial Recognition?

Biometric facial recognition models often demonstrate significant decreases in accuracy when processing real-world images, often characterized by poor quality, non-frontal subject poses, and subject occlusions. We investigate whether targeted, AI-driven, head-pose correction and image restoration can improve recognition accuracy. Using a model-agnostic, large-scale, forensic-evaluation pipeline, we assess the impact of three restoration approaches: 3D reconstruction (NextFace), 2D frontalization (CFR-GAN), and feature enhancement (CodeFormer). We find that naive application of these techniques substantially degrades facial recognition accuracy. However, we also find that selective application of CFR-GAN combined with CodeFormer yields meaningful improvements.

06.
arXiv (CS.CL) 2026-06-16

A Survey on Agentic Security: Applications, Threats and Defenses

LLM-based agents are now used throughout cybersecurity. While these agents facilitate powerful and autonomous security applications, their autonomy opens up new attack surfaces, and the security community is actively building defenses to secure them. Yet the literature on this subject has grown quickly and unevenly. Existing surveys treat applications, threats, and defenses in isolation, leaving no unified account of how an agent's capabilities, vulnerabilities, and countermeasures interconnect. In this work we present the first holistic survey of the agentic security landscape, structuring the field around the fundamental pillars of Applications, Threats and Defenses. We provide a comprehensive taxonomy of over 260 papers, explaining how agents are used in downstream cybersecurity applications, inherent threats to agentic systems, and countermeasures designed to protect them. In addition, we provide detailed pillar-specific and cross-cutting analyses that show the security-lifecycle coverage of agentic applications, comparison between red-teaming and blue-teaming agents, and the adversarial use of red-teaming applications. On the threat side, we analyze the entry points and agent-loop stages that attacks target, their specificity to the agentic setting, and the threat models they assume. On the defense side, we analyze the prevailing defense strategies, their cost and security trade-offs, and where in the agent lifecycle they are deployed. We further map which defenses cover which attack classes and chart trends in agent architecture, backbone model usage, data modality coverage, and the growth of attack and defense research over time. Taken together, these findings indicate that agentic systems are structurally fragile by default and that securing them will require defenses that span the full agent lifecycle rather than single-layer fixes.

07.
PLOS Computational Biology 2026-06-11

A zero-parameter first-principles gate framework for full-length TP53 missense variant interpretation

by Masamichi Iizumi Missense variant interpretation often achieves useful predictive performance but remains mechanistically opaque, particularly in proteins that combine structured domains with intrinsically disordered regions (IDRs). We developed Gate & Channel, a zero-parameter, first-principles framework for full-length TP53 missense variant analysis in which each prediction is generated by explicit IF-THEN gates derived from physicochemistry, geometry, structural constraints, and polymer physics rather than fitted weights. Variants are evaluated across independent channels representing distinct physical failure modes; a variant is predicted disruptive if any gate closes. A second hierarchical layer (“Geta”) encodes physically grounded post-closure exceptions, allowing sensitivity and specificity to be improved on disjoint variant populations. The v18 framework consists of 12 channels and 2 Getas spanning structured domains and IDRs, capturing DNA-contact disruption, Zn coordination, burial-dependent packing, secondary-structure compatibility, post-translational modification chemistry, short linear motif disruption (including a multi-partner coupled-folding face), proline-directed kinase recognition, and IDR-specific proline and glycine backbone constraints. Across 1,369 TP53 missense variants, the framework achieved 84.5% sensitivity and 89.1% positive predictive value, with 90.9% sensitivity preserved in the DNA-binding core and all 9/9 hotspot mutations captured. A post hoc audit of discordant IDR calls indicated that many apparent false positives had plausible molecular rationales, consistent with a distinction between molecular mechanism disruption and clinical penetrance. Applied to KRAS, TDP-43, and BRCA1, the same channels capture the dominant pathogenic mechanisms in each protein as a proof of principle, while residual missed variants name specific gates yet to be written. The framework is distributed as the open-source Python package pathogenicity-gates (v0.5.1, MIT). These results show that a substantial fraction of full-length TP53 missense variation can be resolved through explicit, auditable physical gates that carry meaning beyond TP53, with each remaining failure naming the next rule to be written.

08.
arXiv (CS.CL) 2026-06-11

Unstable Features, Reproducible Subspaces: Understanding Seed Dependence in Sparse Autoencoders

Sparse autoencoders (SAEs) are widely used to interpret neural network representations, but their utility depends on whether the learned features are reproducible across training runs. We study this question through feature stability: for each SAE feature, we estimate the probability that a similar feature reappears in an independently trained SAE. This yields a scalable per-feature signal that separates stable from unstable features. In a large-scale study across seeds, models, layers, dictionary sizes, and SAE variants, we find a pronounced functional asymmetry: stable features carry most of the reconstruction- and prediction-relevant signal, while unstable features have weak marginal impact and are dominated by low-frequency surface-form triggers in both activation statistics and automatic explanations. Geometrically, unstable features are individually non-reproducible but concentrate in reproducible lower-rank subspaces, suggesting that seed dependence often reflects basis ambiguity within a shared region of activation space rather than pure noise. A controlled synthetic model makes this mechanism explicit, showing that low-rank ground-truth features can be recovered at the subspace level while remaining non-identifiable as individual SAE latents across seeds. Finally, by pooling unique cross-seed features, we construct more stable SAEs while preserving explained variance in this setting. Together, these results show that unstable features are not merely failed or noisy latents: they have weak individual functional impact, but reflect reproducible low-dimensional structure that standard SAEs resolve differently across seeds.

09.
arXiv (math.PR) 2026-06-16

Sharp One-Dimensional Sub-Gaussian Comparison in Convex Order

作者:

arXiv:2604.26819v2 Announce Type: replace Abstract: We prove that any random variable $X$ whose moment generating function is point-wise upper bounded by that of $ G \sim \mathcal{N}(0,1) $ must be dominated by $ G/\mathbb{E}[|G|] $ in convex order, meaning $ \mathbb{E}[f(X)] \le \mathbb{E}[f(G/\mathbb{E}[|G|])] $ for all convex $f$. This is sharp as witnessed by $ X \sim \mathrm{Unif}(\{-1,1\}) $ and $ f(x) = |x| $.

10.
arXiv (CS.LG) 2026-06-17

A Dynamical Systems Perspective on the Analysis of Neural Networks

arXiv:2507.05164v2 Announce Type: replace-cross Abstract: In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.

11.
arXiv (quant-ph) 2026-06-16

Single-Image Entanglement Verification with Spatially Encoded Measurement Contexts

arXiv:2606.15382v1 Announce Type: new Abstract: Entangled photon pairs produced by spontaneous parametric down-conversion exhibit rich spatial entanglement structure that is often difficult to probe with conventional measurements. Here, we show that spin-orbit optical elements can convert this spatial structure into directly observable quantum interference patterns. Using a $q$-plate, we demonstrate that the relative wavefront curvature of biphoton states generated by a pair of nonlinear crystals can be retrieved from the spatial modulation of coincidence images. Building on this principle, we introduce a liquid-crystal metasurface that performs spatially multiplexed Bell measurements across the transverse profile of the photon field. The device, which we call a Clauser-Horne-Shimony-Holt (CHSH) plate, assigns different polarization projections to different azimuthal sectors of the beam, allowing the sixteen joint measurements required for a CHSH test to be realized simultaneously in a single acquisition. In this architecture, the spatial coordinate acts as a classical register selecting the measurement context, while photon pairs sample these contexts according to their emission directions. We further demonstrate that the same measurement concept can be implemented using a programmable spatial light modulator, providing a dynamically reconfigurable realization of the scheme. Our results show that spatially structured optical elements can transform Bell tests into parallel measurements distributed across the transverse plane, enabling rapid characterization of spatially varying entanglement. This approach opens new possibilities for structured-light quantum measurements, Bell-inequality-based imaging, and the study of spatially engineered entangled photon sources.

12.
arXiv (CS.CV) 2026-06-11

TRON: Tracing Rays to Orchestrate a Neural Renderer for 3D Gaussian Reconstructions

We introduce TRON, a rendering framework that combines 3D Gaussian ray tracing with neural rendering to enable realistic and controllable rendering of real-world 3D scenes under novel lighting, dynamic object motion, object insertion, and material editing. Prior approaches that rely solely on physically based rendering (PBR) of Gaussian representations struggle to achieve realistic relighting due to imperfections in reconstructed geometry, material estimates, and light transport estimation. At the same time, neural rendering methods often lack an explicit scene representation, limiting their ability to support interactive editing with fine-grained manipulation. TRON bridges these two paradigms. We use intrinsic decomposition priors from a learned inverse rendering model to regularize the material properties of a Gaussian field, and repurpose a ray tracer to provide radiometric guidance rather than final pixels. By treating this output as a structured 3D scaffold, we empower a lightweight neural renderer to bridge the domain gap between shading-model constrained estimates and photorealistic output. Our key insight is that the combination of explicit 3D knowledge with robust material priors provides speed and controllability, while neural rendering enables the synthesis of photorealistic images. To support real-world scenarios, we train our neural renderer with a multi-stage strategy consisting of large-scale pretraining and targeted fine-tuning on a newly constructed dataset of 2.1M rendered synthetic and real-world frames from 3D reconstructions. TRON outperforms Gaussian-based relighting methods in realism, and prior neural renderers in editability and speed. To the best of our knowledge, TRON is the first method to enable practical interactive applications in captured 3D environments, offering realistic appearance under dynamic geometric, lighting and material conditions.

13.
arXiv (CS.CV) 2026-06-17

Universal Image Restoration via Internalized Chain-of-Thought Reasoning

Image restoration seeks to recover high-quality images from degraded inputs but becomes highly ill-posed under complex, mixed degradations. While unified all-in-one models are common, their performance declines as degradation complexity increases. Recent works adopt Chain-of-Thought (CoT) reasoning for multi-round restoration using specialized modules. However, this approach faces two key limitations: (i) increased computational cost due to multi-step processing, and (ii) weak modeling of interactions between degradations during stepwise inference. We introduce CoTIR, a universal image restoration framework that internalizes CoT reasoning within a single model. Concretely, we view image restoration as a specialized subtask of image editing, which implies that a large-scale pre-trained editing model provides a more favorable optimization starting point. Building on this, we fine-tune the model for restoration and further encode structured CoT-style reasoning into the learning objective via a differentiable formulation inspired by Lagrangian optimization, enabling holistic restoration without chaining specialized restorers. To facilitate training and evaluation, we further present CoTIR-Bench, a large-scale benchmark comprising 5.2 million samples with CoT-style reasoning traces. Extensive experiments on CoTIR-Bench and broad real composite degradation scenes show that CoTIR achieves stronger perceptual quality and more competitive fidelity than both all-in-one models and multi-round restoration methods. The source code is available at https://github.com/gy65896/CoTIR.

14.
bioRxiv (Bioinfo) 2026-06-17

An Integrated Framework for Transcriptomic Characterization and Lorentzian Hyperbolic Visualization of a High-Risk Topological Branch in Alzheimer's Disease

Alzheimer's disease (AD) is a highly heterogeneous brain disorder in which molecular alterations vary across brain regions, disease stages, and patient subgroups. This study introduces an integrated analytical framework for characterizing transcriptomic variation associated with a high-risk topological branch, which was identified based on Lorentz distance in postmortem Brodmann area 36 samples from the Mount Sinai Brain Bank cohort, where over 70% of samples were in Braak stages V-VI. The framework integrates weighted gene co-expression network analysis, repeated stability-based differential expression analysis, network-level gene filtering, Gene Ontology enrichment, and nested stratified cross-validation to evaluate whether topological branch-associated genes capture biologically meaningful signals and carry predictive information for high-Braak group status. The identified gene sets were functionally enriched for neuronal development, neuron projection organization, synaptic signaling, vesicle fusion, and regulated synaptic release, suggesting that the high-risk topological branch reflects biologically relevant transcriptomic programs linked to neurodegenerative progression. Nested cross-validation further showed that the selected genes achieved measurable internal predictive performance for distinguishing high-Braak samples. As a second methodological contribution, we introduced a Lorentzian hyperbolic variant of t-distributed stochastic neighbor embedding (Lorentz t-SNE) to explore latent non-Euclidean structure in transcriptomic data. This method embeds samples in hyperbolic space, providing an alternative to Euclidean embeddings for representing hierarchical or nonlinear structures. Compared with conventional Euclidean embeddings, the proposed Lorentz t-SNE revealed a more localized organization of high-Braak samples. Together, these results demonstrate the utility of the proposed analytical framework and Lorentz t-SNE for investigating heterogeneous, potentially non-Euclidean organization in AD transcriptomes.

15.
arXiv (math.PR) 2026-06-17

A note on the $\mathcal{W}_2$-convergence rate of the empirical measure of an ergodic $\mathbb{R}^d$-valued diffusion

arXiv:2502.07704v2 Announce Type: replace Abstract: In this note, we consider a Stochastic Differential Equation under a strong confluence and Lipschitz continuity assumption of the coefficients. For the unique stationary solution, we study the rate of convergence of its empirical measure toward the invariant probability measure. We provide rate for the Wasserstein distance in the mean quadratic and almost sure sense.

16.
arXiv (CS.AI) 2026-06-15

Mask, Sample, Revise: A Revisable CTMC Inference Stack for Guided Discrete Flow Matching Text-to-Speech

arXiv:2606.13989v1 Announce Type: cross Abstract: Recent alignment-free non-autoregressive (NAR) text-to-speech (TTS) models formulate synthesis as a conditional infilling task, bypassing explicit duration predictors and external aligners. When speech is represented with neural codec tokens, the infilling problem becomes discrete, making Discrete Flow Matching (DFM), a Continuous-Time Markov Chain (CTMC) framework for discrete generation, a natural fit. However, inference-time control for stable low-step conditional infilling remains underexplored. We propose Mask, Sample, Revise, an inference-time CTMC stack for alignment-free DFM-TTS. The stack combines predictor-free guidance to strengthen text conditioning, prompt-matched conditional coupling to align the probability path with the acoustic prompt, and SC-ReMask, a schedule-constrained remasking mechanism that introduces token-to-mask transitions so early de-masking decisions can be revised. These components require no post-hoc fine-tuning and operate in a single tau-leaping sampler. Controlled ablations show that this stack improves intelligibility and robustness in the low-NFE prompted setting, outperforming unguided and guidance-only samplers with substantially more steps.

17.
arXiv (CS.LG) 2026-06-16

A Conservation Law for Equilibrium Propagation and Coupled Learning

arXiv:2606.15444v1 Announce Type: cross Abstract: In this paper we show that the physical learning methods known as coupled learning (CL) and equilibrium propagation (EP) conserve a mass-like quantity in the trainable parameters in the continuous-time, small-nudging limit. We prove that this conservation holds in a broad range of physically relevant settings. We then show that the conservation law constrains the training dynamics in a way that makes convergence reliable in important settings for linear circuits. We conclude by discussing some practical implications of this conservation law.

18.
arXiv (CS.AI) 2026-06-17

StepGuard: Guarding Web Navigation via Single-Step Calibration

arXiv:2606.17871v1 Announce Type: new Abstract: Web navigation requires agents to follow natural language goals, interact with web pages, and produce accurate answers. While recent advances leverage vision-language models and reinforcement learning, existing methods still suffer from single-step fragility due to reward misalignment and error propagation. To tackle the reward entanglement, we design Dynamic Dual-Policy Optimization (DDPO), which dynamically switches between a navigation-first mode for exploration and an answer-first mode for question-answering to mitigate reward conflict. To calibrate the single-step error, we propose Confidence-Guided Adaptive Navigation Reflection (CANR), a mechanism that estimates per-step confidence, triggers reflection only when necessary, and uses contrastive rewards to encourage self-correction to calibrate the single-step inaccuracy. With the above as the main components, we finally develop our StepGuard, a new framework of Guarding Web Navigation via Single-Step Calibration. Experiments demonstrate that our approach significantly improves navigation and answer accuracy, setting new state-of-the-art performance on standard web navigation benchmarks.

19.
arXiv (quant-ph) 2026-06-17

Tungsten Germanide Superconducting Nanowire Single-Photon Detectors with Saturated Internal Detection Efficiency at Wavelengths up to 29 {\mu}m

arXiv:2511.20868v2 Announce Type: replace-cross Abstract: Superconducting nanowire single-photon detectors (SNSPDs) are among the most sensitive single-photon detectors available and have the potential to transform fields ranging from infrared astrophysics to molecular spectroscopy. However, extending their performance into the mid-infrared spectral region - crucial for applications such as exoplanet transit spectroscopy and vibrational fingerprinting of molecules - has remained a major challenge, primarily due to material limitations and scalability constraints. Here, we report on the development of SNSPDs based on tungsten germanide, a novel material system that combines high mid-infrared sensitivity with compatibility for large-scale fabrication. Our detectors exhibit saturated internal detection efficiency at wavelengths up to 29 {\mu}m, while using 2.7x thicker films (8 nm vs 3 nm) and up to 4.5x wider nanowires (360 nm vs 80 nm) compared to mid-infrared-optimized SNSPDs fabricated from tungsten silicide. This advance will enable scalable, high-performance single-photon detection in a spectral region that was previously inaccessible, opening new frontiers in remote sensing, thermal imaging, environmental monitoring, molecular physics, and astronomy.

20.
arXiv (quant-ph) 2026-06-16

Black Hole–Entropy Container or Creator

arXiv:2603.18374v3 Announce Type: replace-cross Abstract: Do black holes possess entropy or do they create it? The dominant assumption is that they possess entropy, and a they evaporate that entropy is emitted and decreases. In this paper I use a model of a linear amplifier, in which I argue that the amplifier has not entropy and yet it emits entropy in the process of it operation. This model is closely related to behaviour of black holes, resulting in answer the question of that title that black holes do not have entropy, but nevertheless them create and emit entropy with the total entropy emitted being the same as the usual expression proportional to the square of the mass of the black hole.

21.
arXiv (CS.CL) 2026-06-11

PoQ-Judge: A Multi-Architecture Evaluation Framework for Cost-Aware Proof-of-Quality in Decentralized LLM Inference

Decentralized LLM inference networks need lightweight, reference-free quality evaluation for Proof of Quality (PoQ). We present PoQ-Judge, a framework that trains dedicated judge models to score query-output pairs without ground-truth references. We study three architectures across the quality-cost tradeoff: a TextCNN judge, a MiniLM cross-encoder, and a DeBERTa judge. Using two-stage training on UltraFeedback plus GPT-labeled in-domain data, the best model reaches 0.747 Pearson correlation with the ground-truth proxy on a held-out test set, outperforming reference-based evaluators from prior work. As a reference-free component in composite scoring, it achieves 0.645 Pearson correlation, matching the best single reference-based evaluator while removing the need for reference answers. We also show that online calibration identifies semantic quality as the dominant dimension and that cascade evaluation reduces cost by 72.7 percent with only modest quality loss. Results are much stronger on QA than summarization, pointing to proxy quality as the main remaining limitation.

22.
arXiv (CS.AI) 2026-06-16

LLM4RTL: Tool-Assisted LLM for RTL Generation

arXiv:2606.15500v1 Announce Type: cross Abstract: Large language models (LLMs) have facilitated impressive progress in software engineering, code generation, tooling, and systems. Concurrently, a significant body of research has developed which explores a growing variety of methods and systems for applying LLMs to hardware and chip design (e.g., systems for RTL code generation based on functional description). However, when it comes to open Verilog/RTL code-generation, we need high-quality training samples to build specialized and more effective LLM systems through fine-tuning or low-rank adaptation. Here, we propose a ``judge-renew-check-renew-check'' (JRCRC) pipeline which updates a current public dataset using a hierarchy of state-of-the-art commercial LLM models differing in their costs and capabilities in RTL code generation. This approach achieves a cost-effective mechanism for filtering and refining code-generation samples into a higher-quality training dataset. Our experiments also identify some common weaknesses of LLMs in rule-based reasoning and logic, and consequently, in RTL code-generation. Having identified these weaknesses, we develop an architecture for incorporating pre-processing tools to dynamically assist the LLMs in inferring logical relationships from tabular data formats. With our tools-assisted architecture for RTL code generation, we achieve significant overall performance gains in the VerilogEval benchmark and outperform many state-of-the-art methods. Our LLM4RTL system achieves performance comparable to that of GPT-4O using a significantly much smaller LLM.

23.
arXiv (quant-ph) 2026-06-19

Vine Codes: Low-Overhead Quantum LDPC Codes on a Planar Square Grid

arXiv:2606.20263v1 Announce Type: new Abstract: The surface code is a promising route towards large-scale quantum computing, requiring only nearest-neighbour gates amenable to superconducting hardware. However, surface codes incur large qubit overheads. Novel quantum low-density parity check (qLDPC) codes promise to reduce overheads but require long-range connections that are difficult to achieve on superconducting platforms. Here, we introduce "Vine Codes" - qLDPC codes that are implementable on a planar square grid through nearest-neighbour, two-qubit gates native to superconducting platforms (iSWAP and CZ). Our approach generalises "Directional Codes" recently introduced by Gehér et. al. (2025) which are constrained to a torus. In contrast, vine codes have open boundary conditions constructed with the aid of routing qubits. We perform extensive numeric searches and find promising candidate vine codes, e.g. [[121,4,6]], [[221,6,7]], and [[234,9,6]] codes. We verify the circuit distances and show that data and measure qubits required can be reduced by up to ~28% relative to the surface code at a circuit distance of 7. Even including routing qubits, vine codes require fewer total qubits than the surface code (e.g. ~18% reduction at circuit distance 10) and benefits are expected to increase at higher distances. We perform circuit-level noise simulations to demonstrate that under a realistic noise model and at a near-term noise rate of $10^{-3}$, vine codes can perform better than the surface code while using fewer qubits. We give an exhaustive list of all unique vine codes up to stabiliser-weight 9. We additionally introduce "Flip-Vine Codes" which possess single-qubit transversal Clifford gates useful for fault-tolerant logic and magic state cultivation. We furthermore construct examples of generalised open boundaries for vine codes that go beyond the familiar X/Z boundaries of the surface and tile codes.

24.
arXiv (CS.CL) 2026-06-15

SIMMER: Benchmarking Latent Failures in LLM Executable Planning with a World Model

Large language models (LLMs) are increasingly deployed as planners for autonomous agents in household environments. While existing benchmarks evaluate whether LLM-generated plans execute successfully, they overlook a critical type of failure: latent failures. Unlike immediate failures that trigger instant feedback at execution time and enable timely correction, latent failures do not immediately halt plan execution but silently compromise goal achievement. In severe cases, they cause irreversible harm. To address this gap, we introduce SIMMER, a benchmark for evaluating latent failures in LLM planning through a human-curated symbolic world model grounded in the kitchen domain. SIMMER defines a world model comprising 77 actions, 262 unique objects, and approximately 46,800 possible interactions that are semantically realistic, derived from real-world cooking scripts. It then leverages a state machine executor that validates plans against the world model and detects immediate precondition violations, latent hazards, and irreversible failures. Experiments across six LLMs show that even frontier models achieve at most 17% error-free plans. Moreover, up to 56% of plans contain latent failures, the majority of which lead to irreversible consequences. We further demonstrate that explicit state reasoning via counterfactual foresight simulation can reduce latent failures by up to 72% and irreversible cases by up to 75%, suggesting a promising direction for more robust LLM planners.

25.
medRxiv (Medicine) 2026-06-12

Genetic basis of dynamic brain states reveals cellular and disease associations

Dynamic resting-state fMRI captures the time-varying patterns of brain activity that are obscured by static approaches. Hidden Markov Models (HMMs) characterise these dynamics as recurring whole-brain states and quantify their fractional occupancy (FO), the proportion of time spent in each state, yet the biological basis of inter-individual variation in FO remains unclear. Using data from 52,335 White UK Biobank participants, with replication in East and South Asian subsamples, this study examined the heritability, cellular and neurotransmitter basis of brain states, and their links with complex phenotypes. FO was significantly heritable and enriched for neuronal populations, particularly glutamatergic and GABAergic signalling. Analyses identified shared and state-specific loci and revealed genetic correlations, colocalisation, and potential causal relationships between FO and several phenotypes, including educational attainment, sleep duration, and disease risk. These findings establish dynamic brain states as biologically grounded intermediate phenotypes, linking genetic variation to neural dynamics, diseases and traits.