Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-16

Automatic Summarization of Doctor-Patient Encounter Dialogues Using Large Language Model through Prompt Tuning

Automatic text summarization (ATS) is an emerging technology to assist clinicians in providing continuous and coordinated care. This study presents an approach to summarize doctor-patient dialogues using generative large language models (LLMs). We developed prompt-tuning algorithms to instruct generative LLMs to summarize clinical text. We examined the prompt-tuning strategies, the size of soft prompts, and the few-short learning ability of GatorTronGPT, a generative clinical LLM developed using 277 billion clinical and general English words with up to 20 billion parameters. We compared GatorTronGPT with a previous solution based on fine-tuning of a widely used T5 model, using a clinical benchmark dataset MTS-DIALOG. The experimental results show that the GatorTronGPT- 20B model achieved the best performance on all evaluation metrics. The proposed solution has a low computing cost as the LLM parameters are not updated during prompt-tuning. This study demonstrates the efficiency of generative clinical LLMs for clinical ATS through prompt tuning.

02.
arXiv (CS.AI) 2026-06-17

Prefill/Decode-Aware Evaluation of LLM Inference on Emerging AI Accelerators

arXiv:2606.17104v1 Announce Type: cross Abstract: As large language models (LLMs) are increasingly deployed in latency- and cost-sensitive settings, inference efficiency has become a central systems challenge. While GPUs dominate current deployments, a growing number of AI accelerators claim advantages for LLM inference, yet it remains unclear under which conditions such accelerators outperform GPUs in practice. Recent inference systems decompose execution into Prefill and Decode phases, which exhibit distinct computational characteristics and latency metrics, commonly captured by time to first token (TTFT) and time per output token (TPOT). This paper presents a phase-aware evaluation of LLM inference performance across GPUs and emerging AI accelerators using a common model, Llama2-7B. By separately measuring Prefill and Decode performance, we reveal that accelerator advantages differ by phase and metric. Our results show that GPUs consistently excel in the compute-intensive Prefill phase, while GroqRack achieves significantly lower TPOT during Decode (batching not currently supported). However, GPUs regain an advantage in Decode throughput as batch size increases. These findings demonstrate that each platform exhibits distinct phase-dependent strengths. We further analyze heterogeneous Prefill/Decode disaggregation across different accelerator platforms, identifying performance gains and the workload and network conditions under which such gains are realized.

03.
arXiv (CS.CL) 2026-06-11

One Jailbreak, Many Tongues: Learning Language-Insensitive Intention Representations for Multilingual Jailbreak Detection

Large language models (LLMs) are increasingly deployed in applications for global multilingual users, yet safety training remains concentrated in dominant languages and has not progressed in parallel with multilingual capability, creating exploitable gaps for jailbreak attacks. Current jailbreak defenses are largely developed and evaluated in dominant languages, and their effectiveness is limited by the scarcity of aligned multilingual supervision and representations dispersion caused by language variation. To address this issue, we propose MLJailDe, a multilingual jailbreak detection framework designed to improve both multilingual robustness and cross-lingual generalization. MLJailDe first introduces a multilingual back-translation data augmentation algorithm to construct a semantically consistent and functionally effective dataset spanning 11 languages, consisting of 2,232 benign and 1,239 jailbreak samples. On this basis, MLJailDe employs relative-distance constraints to reduce cross-lingual representation dispersion and encourage jailbreak prompts with similar intent to form consistent clusters across languages, while an imbalance-aware classification objective is further used to alleviate class imbalance and learn more reliable multilingual decision boundaries. Experimental results show that MLJailDe outperforms state-of-the-art baselines across multiple languages, achieving an F1 score of 98.5\%, and obtains an average F1 score of 97.1\% on unseen languages, demonstrating strong effectiveness and cross-lingual generalization.

04.
arXiv (CS.CV) 2026-06-16

Selective Synergistic Learning for Video Object-Centric Learning

Typical video object-centric learning (VOCL) approaches employ slot-based frameworks that rely on reconstruction-driven encoder-decoder architectures, where learning is mediated by two spatial maps: attention maps from the encoder and object maps from the decoder. As these two distinct maps exhibit different properties, a recent dense alignment strategy attempted to reconcile this discrepancy by enforcing agreement across all spatio-temporal patches via contrastive learning. However, this indiscriminate alignment inadvertently propagates the inherent weaknesses of each module, such as noisy encoder predictions and blurred decoder boundaries. Moreover, computing dense similarities across all pairs incurs a computational cost quadratic in the total number of spatio-temporal patches, severely limiting scalability. Motivated by this, we propose Selective Synergistic Learning (SSync). Instead of exhaustive patch-to-patch alignment, SSync prevents error propagation by selectively distilling only the most reliable cues: leveraging the encoder strictly for boundary refinement and the decoder for interior denoising. This is realized via a pseudo-labeling with linear complexity, eliminating the need for quadratic spatial comparisons. Also, to prevent the reinforcement of architectural biases like slot redundancy, we introduce a transitive pseudo-label merging that consolidates overlapping slots based on spatio-temporal activation consistency. Extensive studies demonstrate that SSync improves decomposition quality and serves as a versatile, plug-and-play module while also exhibiting exceptional robustness to slot configurations. Code is available at github.com/wjun0830/SSync.

05.
arXiv (math.PR) 2026-06-24

Queues with Correlated Service Times – the $M/M_D/c$ Model

arXiv:2606.24881v1 Announce Type: new Abstract: This paper studies multi-server queueing systems with correlated service times, modeled as the $M/M_D/c$ queue, which is a natural extension of the recent work by Thapa and Zhao [Thapa-Zhao:2026]. In this model, arrivals follow a Poisson process, while service times across servers exhibit dependence captured by the Marshall–Olkin multivariate exponential distribution (MO-MVED). We first develop a rigorous sample-path construction of the system and establish that the resulting queueing process is a continuous-time Markov chain. We then analyze the stationary behavior of the $M/M_D/c$ model. In the homogeneous case, we derive a complete solution via geometric tail structure and explicit boundary equations, recovering a tractable one-dimensional representation. In the heterogeneous case, we establish a general framework combining a geometric tail with a finite boundary system, and prove existence, uniqueness, and nonnegativity of the stationary distribution. The above results provide a unified analytic framework extending classical $M/M/c$ theory to correlated-service settings, and reveal how dependence among service times fundamentally affects system performance and structure. Beyond the $M/M_D/c$ model, We next study the interplay between Marshall–Olkin service dependence and queue-state Markovianity. On the one hand, Marshall–Olkin dependent service completions are shown to preserve Markovianity for a broad class of queueing systems. On the other hand, if a queueing process admits a Markovian state description without tracking service ages, residual service times, or service phases, then its service mechanism must satisfy a weak multivariate lack-of-memory property and consequently belongs to the Marshall–Olkin family. These results provide a probabilistic foundation for the use of Marshall–Olkin multivariate exponential service times in Markovian queueing models.

06.
arXiv (CS.AI) 2026-06-24

Accuracy and Satisfaction in Multi-Turn LLM Dialogues for NFR Assessment

arXiv:2606.24834v1 Announce Type: new Abstract: LLM-based dialogue assistants have become mainstream tools for software developers, yet current evaluation benchmarks focus exclusively on functional correctness. This leaves a critical gap in assessing the quality and accuracy of these conversations when handling Non-Functional Requirements (NFRs), which are inherently vague, context-dependent, and involve many parts of a program. Evaluating how well these systems support collaborative reasoning about NFRs requires methods that go beyond single-turn accuracy to capture both the correctness of the system's outputs and the quality of the multi-turn interaction. In this paper, we investigate the accuracy and quality of multi-turn conversations between developers and an LLM-based agent in the domain of Health Insurance Portability and Accountability Act (HIPAA) regulatory compliance. We hired 49 programmers to interact with GitHub Copilot to assess 148 HIPAA-derived NFRs against the iTrust codebase, a system designed to comply with HIPAA regulations, across three dimensions: requirement satisfaction level, reasoning, and code localization. We find that developers tend to agree with LLM assessments, but accuracy against expert ground truth is low. We model user satisfaction and find that longer system responses and more information-providing turns negatively affect user satisfaction, whereas proactive interactions positively affect it. Our findings provide insights for designing LLM-based dialogue systems that support NFR assessment.

07.
arXiv (CS.CV) 2026-06-17

Critique of World Model: A Generative Latent Prediction Architecture for World Modeling

World Model, the algorithmic simulator of the real-world environment which biological agents experience and act upon, has been an emerging topic in recent years due to the rising need to develop virtual agents with artificial (general) intelligence. There has been much discussion on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of ``hypothetical thinking'' in psychology literature, we argue the primary goal of a world model to be {\it simulating all actionable possibilities of the real world for purposeful reasoning and acting}. We examine the key design dimensions of world modeling: data, representation, architecture, learning objective, and usage, surveying existing approaches and analyzing their tradeoffs. Building on this examination, we propose a new Generative Latent Prediction (GLP) architecture for a general-purpose world model, based on stateful, hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervised learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.

08.
arXiv (quant-ph) 2026-06-11

Dark state spectroscopy in nonlinear waveguide quantum electrodynamics

arXiv:2606.11997v1 Announce Type: new Abstract: Quantum systems face a fundamental trade-off: they must remain decoupled from the environment to maintain long coherence times, yet they require interactions with the environment to be accessible for measurement. As a prime example, emitter arrays coupled to waveguides facilitate collective modes that, owing to interference, can suppress radiation into the waveguide. While complete destructive interference creates perfectly dark states with infinite lifetimes, their inherent decoupling makes them unmeasurable in standard waveguide quantum electrodynamics. Consequently, current approaches must rely on system non-idealities that permit measurement but limit the coherence times. In this work, we lift this limitation by proposing the use of weakly squeezed light generated in \{chi}(2) nonlinear waveguides for the spectroscopy of completely dark states. We show that the fluorescence spectrum probes transitions between the dressed dark states of the emitter array. This work paves the way towards the measurement and control of dark states, with applications for robust quantum memories, computation, and communication.

09.
arXiv (quant-ph) 2026-06-17

Vorticity Induced by Non-frontal Collisions of Quantum Droplets

arXiv:2606.17498v1 Announce Type: cross Abstract: The rotational dynamics induced by the non-frontal binary collisions of quantum droplets composed of ultracold alkali atoms are analyzed. A theoretical study is presented within the extended Gross-Pitaevskii equation framework, using experimentally feasible conditions. Numerical experiments elucidate a rich landscape of possible topological excitations in the system that are robust towards measurements. The collision of heteronuclear quantum droplets composed of $^{41}$K and $^{87}$Rb atoms in the incompressible regime, gives rise to dynamical instabilities that spontaneously generate topological defects: vortex rings, dislocation lines, and vortices in one species. Their presence depends on the Weber number and the impact parameter. An experimental proposal for vortex detection in both real and Fourier space using interaction ramps is described.

10.
medRxiv (Medicine) 2026-06-10

A Heterogeneous Graph Neural Network Framework for Multi-Horizon Stroke Mortality Prediction

Background: Machine learning models for stroke mortality prediction typically treat each time horizon independently and use flat tabular features that ignore the relational structure of electronic health records (EHRs). In this pilot study, we leveraged graph-based machine learning models to predict post stroke all-cause-mortality across three different time horizons. Methods: We developed Stroke Temporal Heterogeneous Graph (StrokeTHG), a heterogeneous graph neural network model for simultaneous multi-horizon stroke mortality prediction (30-day, 90-day, 1-year) using EHR data from Penn State Health System. The model encodes various relations among EHR entities (e.g., patient, diagnosis, comorbidity) and temporal encoding of admission time to better predict stroke mortality. We compared our proposed approach against various baseline methods, including Logistic Regression, Random Forest, and XGBoost. We also performed ablation and subgroup analyses, evaluated the quality of learned graph embeddings, and assessed the importance of different edge types in the graph. Results: We included 4,144 stroke patients (mean age 69.2 years; 54.3% men), of whom 3,332 (80.4%) survived their stroke after one year. 30-day, 90-day, and 1-year mortality rates were 9.7%, 13.7%, and 19.6%, respectively. Our proposed approach, StrokeTHG, achieved AUROC of 0.872, 0.878, and 0.837 across horizons, outperforming all tabular baselines. At [≥] , 75% specificity, the model identified 5-10 percentage points more mortality cases than the best baseline at each horizon. Subgroup analysis demonstrated consistent performance across sex subgroups and the largest discriminative gains in the Age 65-80 stratum. Edge-type ablation identified phenotype-patient and admission-patient edges in the constructed EHR graph as the most influential relational edges for mortality prediction. StrokeTHG embeddings outperformed all graph and matrix factorization baselines under an identical downstream classifier, confirming that performance gains stem from representation quality rather than classifier capacity. Conclusions: StrokeTHG demonstrates that heterogeneous graph representations of EHR data provide a consistent improvement over flat tabular models for multi-horizon stroke mortality prediction, with particular advantage at clinically actionable sensitivity thresholds and novel multi-horizon monotonic prediction capability. This methodological framework may be adaptable to other EHR-based clinical research studies seeking to leverage heterogeneous relational structures for predictive modeling.

11.
arXiv (CS.CL) 2026-06-24

Automatic Part-of-Speech Tagging of Arabic-English Dictionary Senses through WordNet

This paper proposed an algorithm for part-of-speech (POS) tagging senses of a bilingual dictionary. The algorithm is applied on the Al-Mawrid Arabic-English dictionary. The tagging task is accomplished by transferring the POS tags of the English translation equivalences (TEs) to the dictionary senses after dis-ambiguities process. The English POS tags of senses are acquired from the Princeton WordNet. POS tagging of bilingual dictionary senses is prerequisite to link a bilingual dictionary to WordNet and/or standardizing that dictionary into WordNet-LMF format where the synset (set of synonyms), not word, is the basic brick. The registered accuracy is high though the cost is little. Building NLP/HLT tools needs linguistic experts, large investments, and long time. For statistical approach, we need large annotated corpora and for rule-based approach, we need large lexicon that contains rich linguistic and world knowledge. That motivates the appearance of what are called resource-light approaches to develop natural language processing (NLP) tools for poor-resource languages.

12.
arXiv (CS.LG) 2026-06-18

Concept Modulation Models: A Unified Framework for Identifiability and Extrapolation

arXiv:2606.18509v1 Announce Type: new Abstract: Reliable generalization in conditional latent variable models requires understanding both identifiability and extrapolation: how observed variation across attributes determines latent structure, and how that structure determines distributions at unseen attributes. However, existing identifiability and extrapolation guarantees are largely model-specific, with separate analyses in nonlinear ICA, causal representation learning, perturbation modeling, and related conditional latent variable models. We introduce concept modulation models (CMMs), an attribute-indexed class of conditional generative models with structure $A\to \Lambda \to C\to X$, where attributes select modulators, modulators induce latent concept laws, and concepts generate observed features. CMMs lift transition-based identifiability to conditional settings by showing that feature agreement on observed attributes induces a latent concept transition constrained by the CMM class. We express these constraints through attribute potentials, log-density ratios between attribute-conditioned concept laws, separating the generic lifting step from model-specific rigidity arguments. The same potentials control extrapolation: agreement at unseen attributes holds exactly when the transported attribute-potential identities extend to those attributes. This yields algebraic extrapolation criteria, identifies the common potential-based proof objects behind several existing identifiability and extrapolation results, and, when combined with the model-specific rigidity arguments in those works, recovers their stated conclusions.

13.
arXiv (math.PR) 2026-06-16

Collapsibility in Multiparametric Models of Random Simplicial Complexes

Authors:

arXiv:2606.15276v1 Announce Type: cross Abstract: We study collapsibility in the multiparametric models of random simplicial complexes, namely the lower and upper models. In the upper model, we improve upon a result of Farber and Nowik, and assert that the homology is a.a.s concentrated in a single dimension by proving that the complex collapses to that \di. In the lower model, we prove that the complex a.a.s collapses to the \di\ with maximal non-trivial cohomology. We then compare this threshold to the ones derived previously for the special cases of the clique complex (by Kahle) and the Linial-Meshulam model.

14.
medRxiv (Medicine) 2026-06-24

An Automated, Pathologist-free Gleason Grade Stratifies Disease-free Interval Comparably to Expert Grading from a Single Out-of-distribution Slide

Automated Gleason grading now matches expert pathologists on the cohorts where systems are developed and tuned, but deployment-relevant gaps remain: whether an automated grade, applied without site-specific tuning or pathologist oversight, stratifies outcome comparably to expert grading on slides from unseen institutions and in cross-specimen applications. We tested this for disease-free interval (DFI), a curated recurrence endpoint. A production gland-level prostate diagnostic (PathTools Prostate v11.0) was applied frozen and uncalibrated to 298 diagnostic whole-slide images from 274 TCGA-PRAD radical-prostatectomy patients, a cohort outside its development distribution and needle-core-biopsy training data, contributed by 25 source sites under heterogeneous digitization; tissue was detected automatically with no expert region annotation. From the output we derived an ISUP grade group and continuous high-grade content, and evaluated each grade as a standalone predictor of DFI (24 events) by Harrell's c-index with 95% bootstrap confidence intervals, a paired between-method bootstrap, and Kaplan-Meier curves with the log-rank test. The automated grade reproduced the clinical grade group at quadratic-weighted kappa = 0.62 (95% CI 0.53-0.70; 48% exact, 86% within one group), within the expert inter-observer range. As the sole predictor it stratified recurrence (log-rank p = 0.022; c-index 0.69, 95% CI 0.58-0.79), and the continuous high-grade fraction was robustly prognostic (hazard ratio 1.37 per SD, p = 0.029; c-index 0.71, 0.61-0.81). Standalone discrimination was not statistically separable from the clinical grade (c-index 0.78, 0.69-0.86; paired {triangleup} c-index spanning zero), and in a joint model the automated grade added nothing beyond it, consistent with both measuring a shared morphological axis. From a single out-of-distribution slide with no pathologist oversight, the automated grade provides standalone recurrence stratification not statistically separable from whole-gland expert grading, demonstrating robust generalizability beyond training data; reported as a continuous high-grade fraction, it offers reproducible, expert-free, grade-equivalent risk stratification for harmonizing large archival or genomically-profiled cohorts.

15.
arXiv (CS.CV) 2026-06-18

Motion-Focused Latent Action Enables Cross-Embodiment VLA Training from Human EgoVideos

Training generalist Vision-Language-Action(VLA) models typically requires massive, diverse robotic datasets with high-fidelity action annotations. While egocentric human manipulation videos are abundant and capture significant environmental diversity, the absence of action labels makes them difficult to use in conventional training paradigms. To address this, we propose a latent-action-based framework designed to extract general action priors from unlabeled human videos. The architecture features a Hybrid Disentangled VQ-VAE that decouples motion dynamics from environmental backgrounds through physical masks, enabling the construction of a cross-embodiment action codebook. By pre-training on human videos with the codebook, the VLM backbone learns deep representations of action intent. For adaptation to specific embodiments, we introduce an intent-perception decoupling strategy where the VLM predicts the action intent while a separate frozen visual encoder provides state-specific features to the action expert, thereby reducing action hallucinations. Results in simulation and real-world environments show that our method, pre-trained exclusively on unlabeled human videos, performs competitively with state-of-the-art VLA models trained on massive annotated datasets, requiring only 50 trajectories for downstream adaptation.

16.
arXiv (CS.CL) 2026-06-12

RogueAI: A Reverse Turing Test for Detecting Licensed AI Deception in Dialogue

The original Turing Test asks a human judge to distinguish a machine from a person through dialogue. Three quarters of a century later, conversational systems pass this test in casual settings; the interesting epistemological question has shifted. We argue that the relevant modern variant asks not whether a dialogue partner is artificial, but whether it can be trusted. We present RogueAI, an interactive webapp that operationalizes this revisited test as a one-on-two interrogation game: a human player questions two indistinguishable Large Language Model agents, knowing that exactly one of them has been licensed to deceive within a shared fictional scenario. The player's task is to identify the deceptive agent and "shut it off" before a turn budget is exhausted. We further introduce AutoRogueAI, a procedural extension in which players co-design a custom scenario with a narrator agent that secretly chooses its own deception strategy. We describe the framing, sketch the abstract architecture and gameplay loop, and situate the artifact within recent work on LLM deception, social-deduction benchmarks, and scalable oversight via debate. A three-day pilot deployment (467 initiated sessions, 415 completed, 1876 interaction turns in Italian) provides early feasibility evidence and surfaces a concrete tension: the deceptive agent carries a reliable, locally-present linguistic signature - differential helpfulness, brevity, hedging - that a simple heuristic exploits at 75.6% accuracy, yet human players achieved only 56.6%, consistent with ignoring the most diagnostic signal entirely. We discuss what this gap implies for the artifact's use as a data-collection vehicle, a teaching tool, and an evaluation harness for honesty-trained models.

17.
arXiv (quant-ph) 2026-06-11

Measurement-Free Toric-Code Memory in Array Globally Controlled Rydberg Array

arXiv:2606.12030v1 Announce Type: new Abstract: The central prerequisite of any fault-tolerant quantum architecture is a quantum memory: a block of encoded physical qubits whose logical state is actively preserved against noise across many rounds of error correction. In neutral-atom Rydberg arrays, realizing such a memory is obstructed not by the entangling gates themselves, which are already fast and high-fidelity, but by the auxiliary operations that a conventional error-correction cycle requires: mid-circuit fluorescence measurement, inter-zone atom transport, and locally focused single-qubit addressing. Each of these introduces latency, atom loss, or optical crosstalk that exceeds the cost of the underlying gates by orders of magnitude. These costs accumulate cycle after cycle, progressively degrading the very logical information the code is meant to protect. Here we propose a protocol that stabilizes a toric-code quantum memory without moving, measuring or local addressing atoms. The key is to use a three-species Rydberg atom array for the complete stabilizer cycle, including syndrome extraction, coherent correction, and ancilla reset, under global, species-selective laser pulses. Numerical simulation of a $4 \times 4$ rotated toric code shows a longer qubit lifetime when the physical error rate is below a pseudo-threshold $p^\star \approx 0.034$. The scheme offers a concrete, hardware-efficient route to topological quantum memory in neutral-atom platforms.

18.
arXiv (math.PR) 2026-06-18

On a class of unbalanced step-reinforced random walks

arXiv:2504.14767v4 Announce Type: replace Abstract: A step-reinforced random walk is a discrete-time stochastic process with long-range dependence. At each step, with a fixed probability $\alpha$, the so-called positively step-reinforced random walk repeats one of its previous steps, chosen randomly and uniformly from its entire history. Alternatively, with probability $1-\alpha$, it makes an independent move. For the so-called negatively step-reinforced random walk, the process is similar, but any repeated step is taken with its direction reversed. These random walks have been introduced respectively by Simon (1955) and Bertoin (2024) and are sometimes refered to the self-confident step-reinforced random walk and the counterbalanced step-reinforced random walk respectively. In this work, we introduce a new class of unbalanced step-reinforced random walks for which we prove the strong law of large numbers and the central limit theorem. In particular, our work provides a unified treatment of the elephant random walk introduced by Schutz and Trimper (2004) and the positively and negatively step-reinforced random walks.

19.
arXiv (quant-ph) 2026-06-16

Temporal modulation as a resource: enhanced frequency estimation in continuous variable systems

arXiv:2606.15108v1 Announce Type: new Abstract: Frequency estimation, a cornerstone of quantum metrology, has been significantly enhanced by advanced quantum sensing strategies. However, most protocols rely either on static or time-independent encoding mechanisms, inherently limiting their achievable precision scaling, or on control strategies requiring changing the Hamiltonian and/or implementing feedback mechanisms. To overcome this, we investigate a simpler dynamical encoding protocol where the quantum oscillator is driven by a general continuous temporal frequency modulation $\Omega(t) = \omega_0 f(t)$. We analytically demonstrate that for a given modulation profile $f(t)$ and its corresponding time-integral $F(t)$, the quantum Fisher information (QFI) scales as $\mathcal{O}(F(t)^2)$. This enhancement stems from the fact that temporal encoding fundamentally alters the mechanism of dynamical phase accumulation. Crucially, when evaluated under the energy and evolution-time constraints, this framework reveals a genuine precision enhancement over the conventional time-independent baseline. By analyzing explicit polynomial and exponential modulations, we establish that arbitrary precision scaling can be deterministically engineered, with ultimate bounds that are asymptotically saturable via optimal homodyne detection. Our framework provides a universal paradigm for exploiting time-dependent quantum control in next-generation sensors.

20.
arXiv (CS.AI) 2026-06-16

How to Detect and Measure the AI Dangers to Democracy

arXiv:2606.16054v1 Announce Type: cross Abstract: Research on artificial intelligence and democracy has grown quickly over the last decade. A shared conclusion in this literature is that AI does not create new democratic problems so much as it makes old ones worse. We now see this across information ecosystems, in elections, and in public administration. However, despite growing evidence, we lack a clear way to prioritize risks in this area, compare them across domains, and identify where democratic control is most likely to break down. So, our problem is: How can we systematize the problems that AI systems pose to democratic processes? This paper argues that principal agent theory may fit the task. In many phases of democratic systems, principals delegate key functions to AI systems and their providers without really being able to monitor how these systems operate or the outputs they produce. Treating AI as a delegation problem helps identify accountability gaps and other governance failures. Most importantly, as we shall illustrate, it provides metrics for empirical assessments of AI impact on democracy. As a second analytical element, we draw on the NIST AI Risk Management Framework and its seven characteristics of trustworthy AI, which supply substantive criteria for evaluating delegated tasks. Operationalized across the three domains through measurable indicators and domain specific trustworthiness criteria, we propose an analytical framework that centers on institutional assessability as the central condition for democratic control over AI. However, we stress that how severe a harm is, and how much risk is acceptable, are evaluative judgments that current methodologies neither acknowledge nor operationalize. This becomes acute when such evaluative judgments are (silently) delegated to private vendors. We identify this as a strong limitation left for future work.

21.
arXiv (CS.CV) 2026-06-19

Spectral Query-Key Product Weight Steering for Training-Free VLM Hallucination Mitigation

Vision-language models (VLMs) often generate fluent but visually unsupported descriptions, especially by mentioning objects absent from the image. We propose QK Product Steering, a data-free, training-free, and zero-inference-cost weight edit for reducing object hallucination. The method directly edits the per-head query-key product, the operator that produces pre-softmax attention logits, by suppressing a small number of dominant singular modes in selected middle layers. The edited product is then mapped back to the query weights through a closed-form query-only update while keeping shared key weights fixed, making the edit compatible with grouped-query attention. We further decompose the QK product into symmetric and antisymmetric components to distinguish mutual content-similarity patterns from directional attention patterns. Across three GQA-based VLMs, QK Product Steering achieves an average relative CHAIR$_s$ reduction of $4.0\%$, while matched random-mode controls show negligible change. Interpretability ablations show that the hallucination signal is specific to dominant QK modes and is primarily localized to the symmetric mutual-attention channel. Overall, QK Product Steering offers a simple alternative to decoding-time mitigation, requiring no additional data, fine-tuning, or inference-time overhead while largely preserving general multimodal capability.

22.
arXiv (CS.CV) 2026-06-18

FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

Although Multimodal Large Language Models (MLLMs) demonstrate strong omni-modal perception, their ability to forecast future events from audio-visual cues remains largely unexplored, as existing benchmarks focus mainly on retrospective understanding. To bridge this gap, we introduce FutureOmni, the first benchmark designed to evaluate omni-modal future forecasting from audio-visual environments. The evaluated models are required to perform cross-modal causal and temporal reasoning, as well as effectively leverage internal knowledge to predict future events. FutureOmni is constructed via a scalable LLM-assisted, human-in-the-loop pipeline and contains 919 videos and 1,034 multiple-choice QA pairs across 8 primary domains. Evaluations on 13 omni-modal and 7 video-only models show that current systems struggle with audio-visual future prediction, particularly in speech-heavy scenarios, with the best accuracy of 64.8% achieved by Gemini 3 Flash. To mitigate this limitation, we curate a 7K-sample instruction-tuning dataset and propose an Omni-Modal Future Forecasting (OFF) training strategy. Evaluations on FutureOmni and popular audio-visual and video-only benchmarks demonstrate that OFF enhances future forecasting and generalization. We publicly release all code (https://github.com/OpenMOSS/FutureOmni) and datasets (https://huggingface.co/datasets/OpenMOSS-Team/FutureOmni).

23.
arXiv (CS.AI) 2026-06-18

Code-Augur: Agentic Vulnerability Detection via Specification Inference

arXiv:2606.18619v1 Announce Type: cross Abstract: The advent of agentic vulnerability detection is already becoming a watershed moment for software security. Audits conducted entirely by autonomous LLM agents are uncovering critical vulnerabilities in fundamental software underpinning digital society. Many of these vulnerabilities remained masked for years, surfacing only now with AI agents. Yet the reasoning behind these discoveries remains alarmingly opaque and unvalidated. What assumptions did the agent make about a function's inputs when it deemed that function to be secure? Failures in reasoning and incorrect assumptions can lead to missed vulnerabilities and reduce trust in agentic analysis. We propose a security-specification-first paradigm that (1) exposes the agent's tacit assumptions explicitly as security specifications and (2) continuously refines those specifications via runtime falsification. We realize our approach in Code-Augur, a novel harness for agentic vulnerability detection. Given a codebase, Code-Augur analyzes each component of the system for vulnerable code. When it deems a component to be secure, it commits the local invariants behind that judgment as in-source assertions. In parallel, Code-Augur leverages a guided fuzzer to attempt to falsify those assumptions. When the fuzzer triggers an assertion, this either reveals a genuine vulnerability or a flawed specification to refine. In both cases, this process grounds the agent's understanding, aligning its view of code intent with how the code actually behaves. On real-world subjects, Code-Augur effectively leverages security specifications to detect more vulnerabilities than other state-of-the-art agents. Additionally, Code-Augur found 22 new vulnerabilities in key open-source projects. Compared to curated specialized models like Claude Mythos, Code-Augur offers effective agentic vulnerability detection built on widely available LLMs like Sonnet and DeepSeek.

24.
Nature (Science) 2026-06-24

Epiblast diversification and blood formation in a human pregastrula

Authors:

The incipient stage of gastrulation in human, when the primitive streak is about to emerge, represents a critical yet underexplored period. Here we present the high-resolution spatial transcriptomic landscape of a human embryo at Carnegie stage 6 (approximately 13–14 days post-conception), a stage at which primitive streak remains invisible and gastrulation-derived mesodermal/endodermal progenitors are not yet transcriptomically detected. We identified an anterior visceral endoderm-like hypoblast population, as well as a trifurcated developmental trajectory of the epiblast, progressing towards the amnion, primitive streak and node/prechordal plate/notochord (axial mesoderm) at subsequent developmental stages1–3. Furthermore, our findings challenge the existing paradigms by revealing that primitive haematopoiesis, involving three blood lineages, initiates in human yolk sac before gastrulation, earlier than previously recognized2,4–7, and that the first blood cells arise from the extra-embryonic mesoderm with a hypoblast rather than epiblast origin. Notably, we identified two spatial zones, each consisting of molecularly distinct yolk sac endoderm and extra-embryonic mesoderm populations, that respectively facilitated the generation of erythro-megakaryocytic lineages and myeloid precursors. These findings provide insights into the onset of gastrulation and the earliest blood formation in humans, with profound implications for advancing stem cell-derived human embryo models and in vitro blood regeneration. High-resolution spatial transcriptome analysis of a human embryo at Carnegie stage 6 reveals three distinct developmental trajectories from the epiblast towards amnion, primitive streak and axial mesoderm, and detects the initiation of haematopoiesis before gastrulation, originating from hypoblast rather than epiblast.

25.
arXiv (CS.LG) 2026-06-11

Spatially Masked Regression Reveals Local and Distributed Predictability in Electrophysiological Recordings

arXiv:2606.11415v1 Announce Type: cross Abstract: Neural recordings are often interpreted as local measurements, yet the signal at any one sensor can also reflect structured activity distributed across the broader network. This raises a basic question: to what extent does an electrode's signal reflect local versus distributed information in the underlying system? More specifically, how much of an electrode's activity is carried by its immediate neighborhood, and how much is embedded more broadly across the array? We address this with a Spatially Masked Regression (SMR) framework that reconstructs each electrode's timeseries from the remaining electrodes while excluding a configurable neighborhood around the target. By progressively increasing this mask, spatial locality becomes an experimental control for quantifying how much predictive information survives after nearby channels are withheld. We apply SMR to intracranial EEG with heterogeneous electrode coverage and to scalp EEG with standardized montages over sensorimotor cortex. Using distance correlation between original and reconstructed signals, we find strong within-subject reconstruction in both modalities, substantial residual predictability even when local neighbors are excluded, and markedly stronger cross-subject transfer in EEG than in iEEG. Masking shows that nearby electrodes contribute strongly to reconstruction but do not account for all of it, indicating that individual channels reflect both local redundancy and broader distributed structure. Surrogates that preserve selected marginal or spectral properties while disrupting phase structure or temporal ordering substantially reduce performance, supporting the conclusion that SMR depends on structured temporal and cross-channel organization rather than on marginal statistics alone. These results position SMR as an interpretable framework for quantifying the balance between local and distributed information in recordings.