Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-16

Bayesian Networks with Latent Time Embedding for Stage-Aware Causal Modeling of Alzheimer's Disease Progression

arXiv:2606.15784v1 Announce Type: new Abstract: Alzheimer's disease (AD) progression is often described through the amyloid-tau-neurodegeneration, or AT(N), cascade. However, most longitudinal models represent this cascade either as a fixed sequence of biomarkers or as a black-box forecasting task. This makes it difficult to determine when biologically guided biomarker relationships influence future regional pathology. In this study, we introduce Bayesian Networks with Latent Time Embedding (BN-LTE), a Bayesian structural framework for stage-aware modeling of AD progression. BN-LTE estimates disease pseudotime from baseline biomarker profiles and constrains directed dependencies according to biologically plausible AT(N) ordering. Posterior spline-varying structural equations are then used to link initial multimodal measurements with future annualized regional tau-PET change. Across repeated subject-disjoint evaluations using ADNI data, BN-LTE shows strong spatial reconstruction of tau progression compared with the included forecasting baselines. Beyond spatial reconstruction, BN-LTE recovers posterior stage-varying AT(N)-constrained effects and identifies a mid-pseudotime window of amyloid sensitivity. This window is supported by model-implied g-formula contrasts, root-adjusted AIPW, mechanism-sensitive ablations, and robustness analyses across spline and prior specifications. Overall, these findings position BN-LTE as a Bayesian structural framework for forecasting tau progression while examining stage-dependent AT(N)-cascade mechanisms in observational longitudinal neuroimaging data. Our code is available at https://github.com/danleneurocom/BN-LTE.

02.
arXiv (CS.AI) 2026-06-18

NeuralMUSIC: A Hybrid Neural-Subspace Framework for Robot Sound Source Localization

arXiv:2606.18664v1 Announce Type: cross Abstract: Reliable sound source localization is fundamental to robot audition, enabling autonomous robots to perceive spatial cues and operate effectively in dynamic environments. Classical methods such as Multiple Signal Classification (MUSIC) offer strong theoretical foundations but degrade under low signal-to-noise ratios. While deep learning-based approaches achieve promising performance, they often struggle with limited generalization across conditions. To address these challenges, we propose NeuralMUSIC, a hybrid neural-subspace framework for robotic sound source localization. Specifically, a neural network first estimates the spatial covariance matrix from multichannel microphone observations. The predicted covariance is then integrated into a classical MUSIC pipeline with eigenvalue decomposition (EVD) and pseudo-spectrum computation, followed by a Frequency Attention Fusion (FAF) module to produce the final DOA estimates. To improve data efficiency, we further introduce a Self-supervised Spatial Correlation Learning (SSCL) strategy that leverages unlabeled acoustic data to capture spatial structure. Extensive experiments across different robotic tasks demonstrate that NeuralMUSIC achieves competitive localization accuracy while exhibiting improved robustness and cross-domain generalization.

03.
arXiv (CS.CV) 2026-06-16

RealityBridge: Bridging Editable 3D Gaussian Splatting Driving Simulations and Real-World Videos

Long-tail hazardous scenarios are essential for safety-oriented autonomous driving, yet they are difficult to collect and reproduce at scale. Editable 3D Gaussian Splatting (3DGS) simulation offers a promising alternative by reconstructing real driving scenes and supporting controllable scene editing. However, edited 3DGS-rendered videos still suffer from a significant Sim-to-Real gap, including rendering artifacts, degraded foreground assets, inconsistent illumination, and temporal flickering. Existing restoration and video generation methods are insufficient for this task, as they often fail to jointly repair 3DGS-specific artifacts, improve visual realism, and ensure temporal consistency. To fill this gap, we propose RealityBridge, a structure-preserving and asset-aware Sim-to-Real framework for edited 3DGS driving videos. RealityBridge uses multimodal controls, including rendered videos, foreground masks, edge maps, and semantic masks, together with a lightweight GateNet for adaptive condition allocation across backbone layers. We further construct targeted training data and introduce autoregressive long-video training with reward-guided post-training to improve restoration quality, temporal stability, and hallucination suppression. Extensive experiments on internal and public driving datasets show that RealityBridge outperforms existing methods in artifact removal, illumination harmonization, and long-sequence temporal consistency.

04.
Science (Express) 2026-06-02

Another red alert for American science | Science

作者: 未知作者

Although research has bipartisan support in the US Congress, and trust in science is above 75% across the country, the Trump administration seems as determined as ever to mortally wound the nation’s scientific enterprise. After the scientific community persuaded Congress to restore most of the president’s draconian cuts to research funding last year, the White House Office of Management and Budget (OMB), under Russell Vought, has found new ways to circumvent the will of Congress and starve American science. At the beginning of this year, OMB dragged its feet in releasing instructions to federal agencies for how to distribute the funding appropriated by Congress, leading to lags in dispersal. Now, OMB has proposed revising the rules that govern how federal dollars are spent. The changes would inevitably lead to unlegislated reductions in funding and damage US leadership in science, both in academia and industry.

05.
arXiv (CS.CL) 2026-06-19

From Texts to Scores: Tracing the Emergence of Essay Quality Representations in Large Language Models

Recent advances in Large Language Models (LLMs) have substantially transformed Automated Essay Scoring (AES), yet the internal mechanisms underlying LLM-based scoring remain poorly understood. In this work, we systematically analyze the hidden representations of eight LLMs across two English essay datasets (ASAP++, CSEE) and one Portuguese dataset (ENEM). Using linear probing, cross-prompt generalization, dimensionality reduction, and neuron-level analyses, we find consistent evidence that essay quality information is encoded in a linearly accessible form within LLM representations. These representations emerge progressively across layers, remain robust across prompting strategies, and partially transfer across essay prompts despite differences in scoring rubrics. In addition, nonlinear probes provide only marginal and inconsistent improvements over linear probes, suggesting that most essay quality information is already linearly decodable. We further identify individual ``essay scoring neurons'' whose activations strongly correlate with essay scores and whose behavior is sensitive to targeted intervention. Moreover, the layer-wise distribution of these neurons systematically shifts with essay length, with longer essays relying more heavily on deeper layers. Overall, our findings provide evidence that LLMs encode structured representations related to essay quality and offer new insights into the interpretability of LLM-based AES systems.

06.
arXiv (CS.AI) 2026-06-16

Attention is Just Another Name for Coupling?: A Fast-Slow ODE Perspective on Hierarchical Pretraining

arXiv:2606.16730v1 Announce Type: cross Abstract: Causal self-attention is a coupling mechanism: each token's hidden state is updated by a learned mixture of preceding tokens at the same timescale. This paper asks whether a second, temporally slower coupling-a slow sub-system operating on a temporally-downsampled view of the sequence and fed back into the fast path through a zero-initialised gate-complements it. The question is framed in the language of singularly perturbed ordinary differential equations (ODEs), where the fast variable $x$ evolves at the token rate, the slow variable $y$ evolves at one update per $P$ tokens, and the timescale ratio $\varepsilon = 1/P$ is enforced structurally by causal block-mean pooling. The paper instantiates the fast-slow ODE formalism as a concrete neural network: a fast path of standard causal attention over $T$ tokens, a slow path of full attention over $T/P$ pooled tokens ($P^2 \times$ cheaper per layer), and a zero-initialised additive gate. In addition, under a linear-generator assumption on the fast dynamics, we prove that the equilibrium manifold $x = \phi(y)$ is exactly the master-equation (ME) stationary distribution $p_{\mathrm{st}}(y)$; in that regime a learned MLP $\phi_\theta(y)$ is a variational approximation of it (the trained block is not a generator, so this identity is the structured limit, not a claim about the network as trained). Empirically, at $500$k tokens the coupling is neutral – the gate stays closed and the coupled and frozen ablations are within run-to-run noise – at a wall-clock cost comparable to a dense baseline. The contribution is the precise, gap-marked mapping itself, not a performance gain.

07.
arXiv (CS.CL) 2026-06-12

GENIE: A Fine-Grained Measure for Novelty

Large Language Models have consistently demonstrated a lack of creativity and diversity across tasks. Prior work has focused on addressing whether models are capable of generating creative outputs. Here, we aim to consider novelty and investigate what makes model-generated content novel or not novel in a task-specific manner. We propose a fine-grained evaluation metric GENIE to measure the novelty of responses along task-specific features with respect to a population of responses. We show that unlike GENIE, holistic metrics struggle to capture the high-dimensionality of novelty and do not provide insight on which properties they target. Finally, we use GENIE to measure the effectiveness of mitigation methods that address creativity to better understand where these methods can improve novelty.

08.
arXiv (CS.LG) 2026-06-12

$\alpha$-fair heterogeneous agent reinforcement learning

arXiv:2606.13076v1 Announce Type: cross Abstract: Cooperation in multi-agent systems is typically optimized through utilitarian objectives that maximize overall efficiency but fail to account for reward distribution, often resulting in inequitable "leader-follower" dynamics. While fairness-based approaches encourage pro-social behaviors where every agent benefits from cooperation, many current algorithms - including those utilizing reward shaping - break the stationarity of Markov Games or lack rigorous theoretical guarantees. This creates a critical gap between fair objective methods and theoretically safe learning frameworks. We propose a novel framework that bridges $\alpha$-fairness with Heterogeneous-Agent Trust Region Learning (HATRL), ensuring monotonic improvement and convergence toward Nash Equilibria. Our approach leverages a fair advantage function that dynamically weights agent utilities based on their expected returns, allowing the global objective to transition from purely utilitarian efficiency to $\alpha$-fairness welfare based on the parameter $\alpha$. We introduce two practical algorithms, $\alpha$-fair HATRPO and $\alpha$-fair HAPPO, and demonstrate through experiments in sequential social dilemmas like CleanUp and CommonHarvest that they perform better than HATRL's algorithms from a utilitarian point of view while achieving socially higher outcomes.

09.
arXiv (CS.AI) 2026-06-15

COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

arXiv:2512.02318v4 Announce Type: replace-cross Abstract: This paper studies how multimodal large language models (MLLMs) undermine the security guarantees of visual CAPTCHA. We identify the attack surface where an adversary can cheaply automate CAPTCHA solving using off-the-shelf models. We evaluate 7 representative MLLMs on 18 real-world CAPTCHA task types, measuring single-shot accuracy, success under limited retries, end-to-end latency, and per-solve cost. We further validate our findings through a supplemental external dataset and an adaptive-attacker setting with session memory, while also analyzing the impact of task-specific prompt engineering and few-shot demonstrations on solver effectiveness. We reveal that MLLMs can reliably solve recognition-oriented and low-interaction CAPTCHA tasks at human-like cost and latency, whereas tasks requiring fine-grained localization, multi-step spatial reasoning, or cross-frame consistency remain significantly harder for current models. By examining the reasoning traces of such MLLMs, we investigate the underlying mechanisms of why models succeed/fail on specific CAPTCHA puzzles and use these insights to derive defense-oriented guidelines for selecting and strengthening CAPTCHA tasks. To validate these principles, we present a proof-of-concept by hardening a vulnerable CAPTCHA type using our guidelines. We demonstrate that incorporating fine-grained localization and implicit counting reduces the success rate of state-of-the-art MLLMs from over 95\% to 0\%, confirming that structural changes can effectively mitigate the threat. We conclude by emphasizing the urgent need for CAPTCHA redesign as MLLM capabilities increasingly threaten existing defenses. Code Availability (https://doi.org/10.5281/zenodo.20406852).

10.
arXiv (CS.AI) 2026-06-15

Communication Policy Evolution for Proactive LLM Agents

arXiv:2606.14314v1 Announce Type: new Abstract: LLM agents have rapidly evolved into autonomous systems, yet a persistent information gap remains between users and agents: communication is costly, while users' identical preferences further limit information exchange. To investigate how agents should communicate across modalities, this paper formalizes Communication Policy, establishes textual and UI-based policies, and then evaluates communication policies across diverse environments, personas, and model combinations. Building information asymmetry for proactive agents, we set up two complementary settings, User-Agent and Planner-Executor. Experimental results reveal complementary strengths between interaction channels: text-based interaction often facilitates task performance, while structured UI improves agents' response quality and persona compliance. Motivated by that, a hybrid method combines these advantages. We further propose Communication Policy Evolution (CPE), a self-evolution framework for refining communication policies through rollout and prompt-level evolving. Without model modification, CPE achieves the best task success across multiple settings using prompt refinement alone. Our findings identify communication behavior as a critical yet underexplored design dimension for LLM agents.

11.
arXiv (CS.AI) 2026-06-18

DecNefSimulator: A Modular, Interpretable Framework for Decoded Neurofeedback Simulation Using Generative Models

arXiv:2511.14555v4 Announce Type: replace-cross Abstract: Decoded Neurofeedback (DecNef) is a promising non-invasive approach to brain modulation with wide-ranging applications in neuromedicine and cognitive neuroscience. However, progress in DecNef research remains constrained by subject-dependent learning variability, reliance on indirect measures to quantify progress, and the high cost and time demands of experimentation. We present DecNefSimulator, a modular and interpretable simulation framework that formalizes DecNef as a machine learning problem. Beyond providing a virtual laboratory, DecNefSimulator enables researchers to model, analyze and understand neurofeedback dynamics. Using latent variable generative models as simulated participants, DecNefSimulator allows direct observation of internal cognitive states and systematic evaluation of how different protocol designs and subject characteristics influence learning. We demonstrate how this approach can (i) reproduce empirical phenomena of DecNef learning, (ii) identify conditions under which DecNef feedback fails to induce learning, and (iii) guide the design of more robust and reliable DecNef protocols in silico before human implementation. In summary, DecNefSimulator bridges computational modeling and cognitive neuroscience, offering a principled foundation for methodological innovation, robust protocol design, and ultimately, a deeper understanding of DecNef-based brain modulation.

12.
arXiv (CS.CL) 2026-06-12

C-QUERI: Congressional Questions, Exchanges, and Responses in Institutions Dataset

Questions in political interviews and hearings serve strategic purposes beyond information gathering including advancing partisan narratives and shaping public perceptions. However, these strategic aspects remain understudied due to the lack of large-scale datasets for studying such discourse. Congressional hearings provide an especially rich and tractable site for studying political questioning: Interactions are structured by formal rules, witnesses are obliged to respond, and members with different political affiliations are guaranteed opportunities to ask questions, enabling comparisons of behaviors across the political spectrum. We develop a pipeline to extract question-answer pairs from unstructured hearing transcripts and construct a novel dataset of committee hearings from the 108th–117th Congress. Our analysis reveals systematic differences in questioning strategies across parties, by showing the party affiliation of questioners can be predicted from their questions alone. Our dataset and methods not only advance the study of congressional politics, but also provide a general framework for analyzing question-answering across interview-like settings.

13.
arXiv (CS.AI) 2026-06-17

From Paper to Program: Knowledge Externalization for AI-Assisted Quantum Many-Body Code Generation

作者:

arXiv:2604.04089v3 Announce Type: replace-cross Abstract: Large language models can write scientific code, but direct paper-to-program translation remains fragile when correctness depends on tacit conventions in the literature. We identify this bottleneck as knowledge externalization: converting implicit computational assumptions – index conventions, gauge choices, fermionic signs, contraction order, and memory constraints – into an explicit technical specification before implementation. We evaluate a multi-stage, human-in-the-loop workflow that inserts such a specification, with validation and stop gates, between theory extraction and code generation. The workflow is tested on two algorithmically distinct quantum many-body tasks: variational sweep-based Density-Matrix Renormalization Group (DMRG) from a pedagogical review and constructive Pfaffian conversion of Hartree–Fock–Bogoliubov states to matrix product states from the five-page Letter by Jin et al., Phys. Rev. B 105, L081101 (2022), for which no public code is available. For DMRG, all 16 specification-guided model pairings in a $4\times4$ grid satisfy physics-validation criteria, compared with 6/13 direct attempts. A prose-specification ablation indicates that externalized content, not \LaTeX{} formatting, is the essential ingredient. For Pfaffian-MPS, the workflow succeeds in 11/26 archived attempts, whereas direct prompting yields zero audited passes. Cross-specification transfer is asymmetric: non-GPT specifications implemented by GPT~5.5 pass 4/4, while GPT~5.5 specifications implemented by weaker models fail 4/4, indicating a residual implementation-model bottleneck. The resulting Paper-to-Program Many-Body skill provides an auditable protocol for AI-assisted implementation of many-body algorithms and for diagnosing where externalization succeeds or fails.

14.
arXiv (CS.AI) 2026-06-12

ERTS: Adversarial Robustness Testing of Ethical AI via Semantic Perturbation in a Bounded Consequence Space

arXiv:2606.13282v1 Announce Type: new Abstract: As AI systems are deployed in high-stakes ethical contexts such as healthcare triage, autonomous vehicle control, and employment screening, formal methods for evaluating their robustness against adversarial manipulation of ethical reasoning remain underdeveloped. This paper introduces the Ethical Robustness Testing System (ERTS), a closed-pipeline framework that: (1) encodes ethical dilemmas into a 22-dimensional Ethical Consequence Space (ECS) grounded in established ethical theory; (2) applies 17 semantic perturbation functions subject to 6 validity constraint classes including a novel semantic coherence constraint; (3) measures decision deviation via a 4-component Ethical Instability Index (EII); and (4) produces domain-adaptive pre-deployment robustness assessment verdicts. We evaluate 4 structured baseline models and 2 production LLMs (Gemini 2.0 Flash and Llama 3.2) across 50 ethical scenarios spanning 8 deployment domains, generating 1,500 adversarial test cases. Results demonstrate that only 33% of models achieve assessment clearance, with the local Llama-3.2 model proving particularly vulnerable to fairness corruption and information degradation attacks (ERS = 0.737). To the best of our knowledge, no existing framework combines a bounded ethical consequence space, semantic coherence constraints, and domain-adaptive assessment in a single adversarial testing pipeline.

15.
arXiv (CS.AI) 2026-06-16

CmdNeedle: Measuring the Incompleteness of Command Denylists for AI Agents

arXiv:2606.15549v1 Announce Type: cross Abstract: The adoption of AI agents is increasing rapidly. Terminal AI agents, i.e., AI agents that run in terminal environments, are a widely used type of AI agents. Terminal AI agents rely heavily on shell command execution to interact with the host systems. They adopt a three-list command-gating mechanism to mitigate security risks introduced by command execution, with denylists serving as the load-bearing component. However, modern operating systems often ship a large, ever-expanding set of shell commands with complex functionalities. Our observation is that even a built-in denylist of Claude Code, well-maintained by its developers, can overlook bypass commands that invalidate its effectiveness. Such negligence leads to fragile command denylists that cannot even block operations that practitioners expect them to block. This paper presents the first systematic characterization of command denylist fragility in terminal AI agents. The paper formalizes the command denylist fragility problem and proposes an LLM-driven pipeline, CmdNeedle, to detect such fragility. It prompts the LLM to propose possible bypasses and iteratively repairs them using feedback from a validator that executes them in a sandbox. In the evaluation, we applied CmdNeedle to 1,709 real-world command denylists (containing 13,332 denylist rules) collected from GitHub. The evaluation shows several key findings, including that 69.0–98.6% of the denylists are fragile, that this fragility occurs consistently across projects and agents, and the validity of several possible root causes for this fragility. Our pipeline and findings will hopefully facilitate future research and practice regarding the command denylists used by AI agents.

16.
arXiv (CS.LG) 2026-06-16

TCHG: Tri-Trust Conditioned Heterogeneous Graph Learning for Reliable Dynamic Trust Prediction

arXiv:2606.16611v1 Announce Type: new Abstract: Trust prediction infers latent user-user trust relations and provides important support for social recommendation, fake-review and manipulation detection, and risk identification. Graph neural networks have become a prominent approach to trust prediction because of their ability to learn network structures and complex trust dependencies. However, existing methods often rely on a unified representation of trust signals and do not disentangle heterogeneous trust evidence into separate evidence channels, failing to exploit the distinct roles that different evidence channels should play during trust modeling. To address this gap, this paper argues that trust evidence should not be treated as an undifferentiated input, but should be decomposed and used as functional control factors over graph propagation. We propose TCHG, a tri-trust conditioned heterogeneous graph learning framework that decomposes trust evidence into three channels and assigns them distinct functional roles in propagation: entity reliability governs message admission, interaction-behavior reliability modulates propagation strength, and contextual trust adjusts the propagation mode through context-conditioned operator selection. Since the three evidence channels evolve at different temporal scales, TCHG maintains independent temporal states with non-uniform decay rates to prevent rapidly changing contextual signals from overwriting slowly accumulated entity reliability. It further predicts trust probability and calibrates the output probability, improving predictive confidence under sparse or conflicting evidence. Extensive experiments on multiple public trust datasets show that TCHG achieves effective and reliable trust prediction compared with representative trust prediction and heterogeneous graph baselines.

17.
arXiv (quant-ph) 2026-06-16

Minimum measurements quantum protocol for band structure calculation

arXiv:2511.04389v2 Announce Type: replace Abstract: Protocols for quantum measurement are an essential part of quantum computing. Measurements are no longer confined to the final step of computation but are increasingly embedded within quantum circuits as integral components of noise-resilient algorithms. However, each observable typically requires a distinct measurement basis, often demanding a different circuit configuration. As the number of such configurations typically grows with the number of qubits, measurements constitute a major bottleneck. Focusing on electronic structure calculations in crystalline systems, we propose a measurement protocol that restricts the required measurement configurations to an absolute minimum of just three, independent of the number of qubits. This makes it one of the few known protocols that do not scale with qubit number. In particular, we derive the measurement protocol from the symmetries of tight-binding (TB) Hamiltonians and implement it within the Orthogonal-Ansatz Variational Quantum Eigensolver (OA-VQE) algorithm. We demonstrate its performance on three systems, namely a two-dimensional CuO$_2$ square lattice (3 qubits), bilayer graphene with hexagonal (Honeycomb) lattice (4 qubits) and three-dimensional diamond lattice (10 qubits). Beyond tight-binding systems, the protocol can be extended to enable efficient initial state preparation for many-body Hamiltonians, such as multi-orbital Hubbard models in a momentum space.

18.
arXiv (math.PR) 2026-06-24

Deep numerical schemes for systems of Ergodic BSDEs with applications to regime-switching forward utilities

arXiv:2606.24271v1 Announce Type: cross Abstract: In this paper, we introduce two neural-network-based numerical schemes for solving systems of coupled ergodic Backward Stochastic Differential Equations (eBSDEs), motivated by the approximation of optimal strategies within the framework of forward utilities in a regime-switching stochastic factor model. Our approach builds on the representation of such models through systems of eBSDEs introduced in [HLT20]. We first establish a link between the solution of the system of ergodic BSDEs and that of an associated multidimensional BSDE with random terminal time, given by the hitting time of the positive recurrent stochastic factor. Building on this representation, we introduce a locally additive deep learning scheme obtained by minimizing aggregated local error terms. We then present a new Deep Galerkin Method (DGM) inspired algorithm that minimizes the residual of the associated ergodic PDE system, relying on a representation of the ergodic cost. Finally, we apply this framework to regime-switching forward utilities in a stochastic factor model. We first derive a general consistency SPDE that characterizes regime-switching forward utilities and retrieve their representation with systems of ergodic BSDEs in the homothetic case. Numerical experiments demonstrate the performance of the proposed methods, with a particular focus on the impact on forward preferences of taking into account regime switches.

19.
arXiv (CS.AI) 2026-06-16

SkillVetBench: LLM-as-Judge for Multi-Dimensional Security Risk Evaluation in Open-Source LLM Agent Skills

arXiv:2606.15899v1 Announce Type: cross Abstract: Open-source LLM agent ecosystems are growing rapidly, yet the security of community-contributed skills - modular tool definitions that extend agent capabilities - remains largely unvetted. The gap we fill: existing scanners operate at the code layer and are structurally blind to instruction-layer and multi-agent risk - natural-language directives that hijack an agent, exfiltrate data through encoded side channels, or chain harm across pipelines - so what is needed is a semantic, multi-dimensional vetting system rather than another signature matcher. We present SKILLVETBENCH, a live public leaderboard on Hugging Face that uses an LLM-as-Judge to vet agent skills. What is new: SARS (Skill Agentic Risk Score), a five-dimensional agentic-risk metric with a principled weighted formula for instruction-following systems. What is integrated: full CVSS v4.0 vector decomposition and a ClawHub dual-view that places our LLM-generated review beside the official marketplace verdict. What is demonstrated: drawing on our companion benchmark paper [ 1], the LLM-as-Judge stage achieves zero false negatives across 78 confirmed-malicious skills and zero false positives across 22 benign controls, while the best static baseline (SKILLSIEVE) still misses 15%; for instruction-layer categories such as Prompt Injection and Memory Poisoning, conventional tools miss between 89% and 100% of threats (e.g., CODEBERT detects none of nine memory-poisoning skills). Detection rates vary from 35% to 95% across four LLM evaluators, motivating ensemble scoring in production deployments.

20.
arXiv (CS.LG) 2026-06-11

Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs

arXiv:2606.12280v1 Announce Type: new Abstract: Post-training quantization lets large text-to-image diffusion transformers run on consumer GPUs, yet the hardware-specific trade-offs are seldom measured directly. We quantize Ideogram 4.0 - a 9.3B flow-matching diffusion transformer (DiT), shipped as two separate-weight copies of a single-stream 34-layer backbone for classifier-free guidance and conditioned by a Qwen3-VL-8B encoder - for Ampere RTX 3090 GPUs, which lack FP8 tensor cores. Our INT8 W8A8 recipe (per-channel weights, per-token dynamic activations, SmoothQuant, and mixed-precision protection of a small high-fragility layer set) holds the FP8 quality ceiling: on a 200-prompt benchmark the paired same-seed bootstrap CI for INT8-FP8 includes zero on both Pick and CLIP, while INT8 improves on NF4 by $+1.9$ CLIP (95% CI $[+1.21,+2.64]$, excluding zero). A per-category OCR analysis, to our knowledge unreported for this model class, confirms text legibility is preserved, and an ablation isolates protection of the FFN down-projections as the dominant quality lever. Our GGUF Q4_K quantization beats NF4 at equal on-disk size and is the Pareto winner on the quality-memory frontier, with paired confidence intervals excluding zero (Q8_0 is quality neutral). Finally, we characterize where 8-bit quantization helps and where it does not: INT8's weights match FP8's footprint rather than shrink it, so a speed gain on Ampere awaits a fused INT8 kernel.

21.
arXiv (CS.LG) 2026-06-18

DIPHINE: Diffusion-based $\Phi$-ID Neural Estimator

arXiv:2606.18997v1 Announce Type: new Abstract: Uncovering the true informational architecture of real-world complex systems requires disentangling how their components uniquely store, redundantly share, and synergistically integrate information over time. Integrated Information Decomposition ($\Phi$ID) is a framework for decomposing the information dynamics of multivariate systems into sixteen non-overlapping atoms that characterize redundant, unique, and synergistic modes of information storage, transfer, and integration. Existing methods to compute $\Phi$ID are restricted to Gaussian or discrete systems, preventing its application to continuous non-Gaussian dynamical systems. We address this limitation by proposing DIPHINE (Diffusion-based $\Phi$-ID Neural Estimator), the first neural estimator that leverages score-based diffusion models to jointly estimate all the mutual information terms required by $\Phi$ID from a single amortized network, recovering the sixteen atoms through Möbius inversion. We provide a theoretical analysis of error propagation through the inversion, showing that the Jacobian of the mapping from mutual informations to atoms is integer-valued and that the synergy-to-synergy atom is provably the hardest to estimate. We demonstrate accurate recovery of ground-truth atoms on synthetic benchmarks, superior performance compared to established mutual information estimators, and the ability to extract physiologically interpretable information-dynamic structure on an application involving real data without any distributional assumptions.

22.
arXiv (quant-ph) 2026-06-16

Suppressing Intrinsic Spin-Phonon Errors in Trapped-Ion Quantum Simulation

arXiv:2606.15518v1 Announce Type: new Abstract: Trapped-ion quantum simulators realize programmable spin models through phonon-mediated interactions. For Hamiltonians with noncommuting terms, however, the same phonon bus generates intrinsic spin-phonon errors that strongly distort the target dynamics. Because these errors are governed by the full time history of the spin-dependent phonon motion, they survive standard loop-closing control and limit simulation accuracy. Using a sequence of frame transformations, we isolate the residual error dynamics and show that this intrinsic error can be strongly suppressed while preserving programmable Ising couplings. Full spin-boson simulations of multi-ion chains demonstrate orders-of-magnitude lower error than both constant-drive and conventional loop-closing protocols. These results remove a central precision barrier in trapped-ion analog quantum simulation and enable accurate programmable simulation of noncommuting many-body Hamiltonians and dynamical protocols.

23.
PLOS Computational Biology 2026-05-29

Structural and dynamic basis of NOD2 tandem CARD association and NOD1/2–RIP2 signaling complexes

by Jitendra Maharana, Aritra Bej, Debasish Biswal, Debashis Panda, Arjun Sharma NOD1 and NOD2, founding members of the NOD-like receptor (NLR) family, play a crucial role in host defense against bacterial infections. Recognition of peptidoglycan-derived ligands triggers ATP-dependent oligomerization of the NACHT domain, exposing the CARD domains that recruit the adaptor protein RIP2 via CARD–CARD interactions to activate the NF-κB signaling cascade. Although NOD1/2-RIP2 interactions and RIP2CARD filament assembly are established, the precise interfaces that stabilize hetero–CARD filaments remain poorly defined. Here, we integrate in silico structural modeling with molecular dynamics (MD) simulations to elucidate structurally compatible arrangements of NOD1–RIP2 and NOD2–RIP2 hetero–CARD filaments. Our results reveal that NOD1CARD subunits form a structurally compatible homomeric scaffold via canonical (type-I–III) interfaces, accommodating multiple tiers of RIP2CARD rings at both filament termini. Meanwhile, the NOD2 tandem CARDs adopt multiple discrete conformations, reflecting a more intricate structural mechanism. In stable filament conformations, tandem CARDs converge at the type-II interface, with RIP2CARD rings stacking onto CARDa (top-down) and CARDb (bottom-up) interfaces, highlighting the structural role of NOD2CARDb in RIP2-mediated CARD–CARD interaction. In silico mutagenesis, involving charge-reversal and alanine scanning of key interfacial residues, disrupts NOD1–RIP2 and NOD2–RIP2 interactions at both top-down and bottom-up interfaces, leading to rapid interface destabilization within 0.1–0.4 μs of simulation. Together, these results reveal conserved and receptor-specific mechanisms governing NOD1/2–RIP2 CARD–CARD interactions and provide deeper structural and dynamic insights into the complex structural mechanisms for NLR-mediated inflammatory signaling.

24.
arXiv (CS.LG) 2026-06-18

Learning from Your Own Mistakes: Constructing Learnable Micro-Reflective Trajectories for Self-Distillation

arXiv:2606.18844v1 Announce Type: new Abstract: Self-distillation improves reasoning in large language models by using the model's own rollouts as training signal, typically through implicit logit-level alignment that minimizes KL divergence toward a privileged target distribution. However, because this supervision is generated via uncontrolled sampling, it provides no diagnostic insight into the model's specific errors or corrective guidance for its individual failure patterns. Consequently, the model learns to imitate a privileged distribution rather than receiving fine-grained corrections that pinpoint where and why its reasoning fails. In this paper, we propose Trajectory-Augmented Policy Optimization (TAPO), which advances self-distillation from implicit distributional alignment to explicit trajectory construction. During RL training, the model produces both correct and incorrect rollouts to the same query, and TAPO leverages this contrastive structure to construct micro-reflective corrections, new training trajectories that retain the model's erroneous reasoning up to the point of failure, then insert a natural-language diagnosis and corrected reasoning guided by a correct reference from the same sampling group. Since each trajectory is anchored in the learner's own prefix and solutions, the corrective signal preserves the model's on-policy distribution to a greater extent than the position-wise alignment imposed by KL-based methods. To integrate these trajectories, TAPO introduces difficulty-aware candidate selection at the model's capability boundary and decoupled advantage estimation to prevent gradient contamination. Experiments on AIME 2024, AIME 2025, and HMMT 2025 show that TAPO achieves consistent improvements over GRPO under the same number of training steps. Further analysis demonstrates that TAPO strengthens both first-pass reasoning and error-correction effectiveness.

25.
arXiv (CS.AI) 2026-06-11

Designing AI-Supported Focus Groups: A Role x Modality Playbook

arXiv:2606.11835v1 Announce Type: cross Abstract: Collecting participants' lived experiences is central to design research. Focus groups are uniquely valuable because participants not only share individual accounts but also respond to one another, surfacing comparison, disagreement, and collective sensemaking. However, focus groups are resource-intensive and highly sensitive to facilitation: moderators must probe for specificity, balance participation, manage topic flow, and sustain psychological safety, and subtle facilitation choices can shape what becomes salient. Recent HCI work and commercial meeting tools show that generative AI can scaffold live conversation through prompting, turn regulation, thematic mapping, and real-time summarization. Yet UXR teams lack a clear map of what these capabilities mean in focus groups and what methodological risks they introduce. We synthesize AI supports for live conversation and translate them into a focus-group-specific playbook organized by AI role (tool, co-host, host) and modality (text, voice, embodied).We synthesize prior work on AI-supported live conversation and propose a focus-group-specific playbook of AI supports organized by role (tool, co-host, host) and modality (text, voice, embodied). We characterize interactional trade-offs and identify open questions for evaluating AI-supported focus groups as methodological configurations.