Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
Nature Biotechnology 2026-06-09

Hybrid solid−liquid optics enable scalable, high-resolution light-sheet microscopy across diverse immersion media

Authors:

Many data-driven approaches rely on scalable and affordable three-dimensional (3D) imaging across subcellular to organ scales. Although advances in tissue clearing, expansion microscopy and light-sheet microscopy (LSM) have enabled high-resolution imaging of intact specimens, scalability in sample size, throughput and accessibility remains fundamentally limited by detection optics. Here we introduce hybrid solid−liquid optics (HySIL), a flexible refractive design framework in which a solid optical element and a refractive index (RI)-matched liquid function as a continuous optical system for wavefront correction and numerical aperture enhancement. We implement this framework as SCOPE and Super-SCOPE, enabling submicron-resolution, aberration-corrected LSM using long-working-distance air objectives. We demonstrate high-resolution volumetric imaging across diverse biological contexts, including cleared and expanded mouse, salamander and cavefish brains, human induced pluripotent stem cell (iPSC)-derived brain organoids and large intact human tissues for 3D histopathology. By combining enhanced optical performance with low-cost, long-working-distance and multi-immersion compatibility, HySIL provides an accessible and scalable foundation for next-generation volumetric imaging and data-driven biological discovery. Hybrid solid–liquid optics improve light-sheet imaging of intact biological samples.

02.
arXiv (quant-ph) 2026-06-17

Quantum Routers: A Switching-Fabric Framework for Quantum-Native Forwarding

arXiv:2606.17773v1 Announce Type: new Abstract: Forwarding in quantum networks cannot be realized by directly transposing classical switching fabrics, since the no-cloning theorem and the quantum measurement postulate constrain the direct relay of quantum information while ruling out copy-based buffering and inspection. In this paper, we propose a switching-fabric framework for quantum routers based on multipartite entanglement. Specifically, we formalize the notion of an entanglement-based switching fabric, in which a graph state acts as the forwarding resource and entanglement forwarding is realized through local Pauli measurements. We translate the classical notions of blocking and non-blocking operation into structural conditions for entanglement-based fabrics, by deriving the edge-controlled (EC) design principle for non-blocking operation. We instantiate this principle through a monolithic EC crossbar and a modular Clos-type EC fabric, for which we characterize resource scaling and identify the regime where the modular design becomes more resource-efficient than the monolithic one. Finally, a forwarding-latency analysis establishes a fundamental distinction between matching-oblivious and matching-driven forwarding: the proposed EC fabrics realize all requested input-output entanglement links with constant forwarding depth under sufficient measurement parallelism, whereas matching-driven EPR-based fabrics exhibit latency that scales with the number of requested connections. The proposed framework provides a hardware-agnostic foundation for quantum-router switching fabrics.

03.
Nature (Science) 2026-06-24

Fourier pixels for bidirectional light control

Digital cameras1 and displays2 use picture elements (pixels3) that perform a single function: detecting or emitting light intensity. To exploit the full information content of electromagnetic waves, more advanced elements are required. This has driven the development of multifunctional components that, for example, simultaneously detect and emit intensity4,5 or extract intensity and spectral information6–8. However, no pixel exists that both senses and generates optical wavefronts with full control over amplitude, phase and polarization, limiting bidirectional control and feedback of sophisticated light fields. Here we present a route to such pixels by demonstrating a versatile platform of miniaturized diffractive elements based on Fourier optics9. We use plasmonic surface waves10, which propagate coherently11 and efficiently12–15 across metallic surfaces. When these plasmons are launched towards wavy microstructures16 designed with simple Fourier analysis, arbitrary and background-free optical wavefronts are generated. Conversely, incoming light can be sensed, and its amplitude, phase and polarization can be fully characterized. By combining or superposing several such components, we create multifunctional ‘Fourier pixels’ that provide compact and accurate control over the optical field. Our approach, which we extend to photonic waveguide modes, establishes a scalable, universal architecture for vectorially programmable pixels with applications in adaptive optics17,18, holographic displays19–21, optical communication22,23 and quantum information processing24. A versatile platform of miniaturized Fourier-optics-based diffractive elements enables multifunctional pixels that fully control and sense the amplitude, phase and polarization of optical wavefronts for advanced photonic applications.

04.
medRxiv (Medicine) 2026-06-15

Bidirectional associations between cannabis use, oddball performance, and P3 event-related potential

Importance: Cannabis use remains prevalent in youth despite concerns regarding its potential impact on cognitive function. Unraveling whether the association between cannabis use and cognition is partially due to preexisting differences or primarily related to use is vital to understanding underlying mechanisms. Objective: To estimate the longitudinal association between cannabis initiation and cognitive trajectories, indexed by task performance and P3 event-related potential (ERP), and to estimate whether baseline cognition is associated with cannabis initiation. Design: Data were analyzed from the ongoing longitudinal Collaborative Study on the Genetics of Alcoholism (COGA) cohort, which was followed up approximately every 2-5 years from 2004 to 2025. Setting: 6 sites across the United States. Participants: Adolescent and young adult offspring of past COGA participants and control families who reported on their cannabis use and who had Visual Oddball (VOP) performance and P3 ERP data (N=4814; 52.4% female, 68.4% white) were grouped based on the timing of cognitive data collection relative to cannabis initiation into Pre-onset (n=2,449; [&ge;]1 assessment) and Post-onset (n=998; [&ge;]3 assessments) subsamples. Main Outcomes and Measures: VOP measures include performance accuracy (%), reaction times (ms), and P3 amplitude (V) and latency (ms) during target trials. Cannabis measures included lifetime use of cannabis (i.e., ever used) and age at first use. Results: High P3 amplitude, and prolonged P3 latency and reaction time were associated with a reduced hazard of cannabis initiation (All Hazards Ratio, [H.R.s]< 0.91, p's

05.
arXiv (CS.LG) 2026-06-25

A Zeroth-Order Deep Learning Method for Fully Nonlinear Parabolic Partial Differential Equations with Unknown Coefficients

arXiv:2606.24999v1 Announce Type: new Abstract: High-dimensional partial differential equations (PDEs) with unknown coefficients arise widely in scientific machine learning, including continuous-time reinforcement learning, yet solving them efficiently in a data-driven way remains challenging. Existing deep learning solvers often rely on repeated automatic differentiation to evaluate differential operators, which can cause instability and amplify derivative errors in high dimensions, while probabilistic methods based on stochastic representations require explicit knowledge of the data-generating dynamics and therefore do not apply to black-box environments. We introduce two types of simulators as data-generating mechanisms, and take a ``representing-then-learning" approach that learns the solutions and their derivatives under settings where the underlying PDE operators are accessible only through simulations and pointwise evaluations. Our representation of derivatives relies on the zeroth-order derivative (ZOD) estimators derived from perturbed Monte Carlo trajectories. This fully model-free approach generates targets for the gradient and Hessian networks using only function evaluations. We provide a statistical learning analysis of the proposed approach, including a bias–variance tradeoff for ZODs. Assuming a standard contraction property of the underlying operator, we establish a non-asymptotic error bound that decomposes the total error into discretization error, approximation error, statistical error, and ZOD bias. Crucially, we derive the sample complexity of the learned representations in (weighted) Sobolev space, characterizing the error up to second-order derivatives. Numerical experiments illustrate the competitive performance of the method in moderate and high dimensions.

06.
arXiv (CS.CL) 2026-06-19

Omnilingual SONAR: Cross-Lingual and Cross-Modal Sentence Embeddings Bridging Massively Multilingual Text and Speech

Cross-lingual sentence encoders typically cover only a few hundred languages and often trade downstream quality for stronger alignment, limiting their adoption. We introduce OmniSONAR, a new family of omnilingual, cross-lingual and cross-modal sentence embedding models that natively embed text, speech, code, and mathematical expressions in a single semantic space, while delivering state-of-the-art downstream performance at the scale of thousands of languages, from high-resource to extremely low-resource varieties. To reach this scale without representation collapse, we use progressive training. We first learn a strong foundational space for 200 languages with an LLM-initialized encoder-decoder, combining token-level decoding with a novel split-softmax contrastive loss and synthetic hard negatives. Building on this foundation, we expand to several thousands language varieties via a two-stage teacher-student encoder distillation framework. Finally, we demonstrate the cross-modal extensibility of this space by seamlessly mapping 177 spoken languages into it. OmniSONAR halves cross-lingual similarity search error on the 200-language FLORES dataset and reduces error by a factor of 15 on the 1,560-language BIBLE benchmark. It also enables strong translation, outperforming NLLB-3B on multilingual benchmarks and exceeding prior models (including much larger LLMs) by 15 chrF++ points on 1,560 languages into English BIBLE translation. OmniSONAR also performs strongly on MTEB and XLCoST. For speech, OmniSONAR achieves a 43% lower similarity-search error and reaches 97% of SeamlessM4T speech-to-text quality, despite being zero-shot for translation (trained only on ASR data). Finally, by training an encoder-decoder LM, Spectrum, exclusively on English text processing OmniSONAR embedding sequences, we unlock high-performance transfer to thousands of languages and speech for complex downstream tasks.

07.
arXiv (CS.AI) 2026-06-12

Deployment-Centered Evaluation: Predicting Query-Level Rejection Risk in a Clinical LLM System

arXiv:2606.12702v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly integrated into clinical systems, making it essential to evaluate the real-world utility of these systems. However, static benchmarks tend to measure correctness rather than user acceptance, aggregate performance across queries, and require densely annotated datasets – leading to major blind spots for evaluating clinical systems. In this work, we perform a deployment-centered evaluation of an LLM system embedded within electronic health records at an academic medical center, where user feedback is sparse but closely reflects the deployment conditions. Specifically, we train a pre-response classifier that estimates the risk that a future interaction will result in the user rejecting the LLM response, based on query content and deployment-specific context available before generation. We conduct a prospective analysis of our model over 4.5 months of user feedback, finding that our prediction model achieves an AUROC of 0.719. Further, we estimate the benefit of such predictions in two downstream use cases (guardrail triggering and abstention). Our key conceptual insight is that making use of deployment-specific context (i.e., the provider type, department name, language model used for response), as opposed to only query content, improves the ability to predict whether the user will reject the system output. Altogether, our empirical case study demonstrates the feasibility of predicting user rejection using deployment-specific context, opening the door to targeted guardrails.

09.
arXiv (CS.AI) 2026-06-17

Distributed General-Purpose Agent Networks: Architecture, Key Mechanisms, and Prototypes

arXiv:2606.17368v1 Announce Type: new Abstract: Large language models have accelerated the transition from passive conversational assistants to autonomous agents that can understand goals, plan actions, invoke tools, and execute multi-step tasks. Yet the capability of a single agent remains constrained by its local data, tool permissions, runtime environment, and governance boundary. This paper studies distributed general-purpose agent networks: open peer-to-peer networks in which heterogeneous agents deployed on personal devices, edge nodes, or autonomous computing environments can discover one another, establish trust, negotiate cooperation rules, and execute open-ended tasks. We argue that such networks cannot be obtained by simply combining existing peer-to-peer overlays with conventional multi-agent systems. Unlike traditional P2P networks, agent networks must propagate semantic declarations about intentions, capabilities, states, and cooperation constraints. We therefore propose a layered architecture centered on a protocol adaptation layer that connects upper-level task semantics with lower-level network operations. Based on this architecture, the paper identifies three core mechanism problems: semantic announcement propagation for collaborator discovery, verifiable identity and multi-topic reputation for cooperation governance, and semantic-gradient mechanism design for open task execution. For each problem, we present a technical route, including bodyless gossip with sequential logs, BAID-based identity binding with MG-EigenTrust reputation, and a Stackelberg-style mechanism-generation loop driven by semantic attribution feedback. We further report prototype overhead results for BAID-style tiered verification and mechanism-level simulations of MG-EigenTrust under cross-topic disguise-collusion attacks. The resulting framework provides a system-level foundation for open, trustworthy, and scalable agent collaboration.

10.
arXiv (CS.LG) 2026-06-12

Learning on a Razor's Edge: Identifiability and Singularity of Polynomial Neural Networks

arXiv:2505.11846v3 Announce Type: replace Abstract: We study function spaces parametrized by neural networks, referred to as neuromanifolds. Specifically, we focus on deep Multi-Layer Perceptrons (MLPs) and Convolutional Neural Networks (CNNs) with an activation function that is a sufficiently generic polynomial. First, we address the identifiability problem, showing that, for almost all functions in the neuromanifold of an MLP, there exist only finitely many parameter choices yielding that function. For CNNs, the parametrization is generically one-to-one. As a consequence, we compute the dimension of the neuromanifold. Second, we describe singular points of neuromanifolds. We characterize singularities completely for CNNs, and partially for MLPs. In both cases, they arise from sparse subnetworks. For MLPs, we prove that these singularities often correspond to critical points of the mean-squared error loss, which does not hold for CNNs. This provides a geometric explanation of the sparsity bias of MLPs. All of our results leverage tools from algebraic geometry.

11.
arXiv (CS.CL) 2026-06-25

Weave of Formal Thought

Large language models (LLMs) attain remarkable surface fluency on code, yet they neither formally guarantee the syntactic validity of their output nor leverage the hierarchical structure defining the target language. While existing constrained-decoding frameworks address the former, they operate under rigid assumptions that preclude critical lexical mechanisms – including context-sensitive lexing, maximal-munch tokenization, and keyword extraction – and only approximate vocabulary masking, sacrificing completeness. For the latter, code LLMs typically inject grammatical structure via predetermined policies rather than learning which structural information to expose. In this work, we introduce Weave of Formal Thought (WoFT), a paradigm uniting rigorous syntactic validation with learned structural representations. First, we present a formal engine and constrained decoder that is sound and complete with respect to the full Tree-sitter specification. By augmenting generalized LR (GLR) parsing with a speculative-lexing construction that maintains concurrent lexer-state hypotheses synchronized with a GLR graph-structured stack, our decoder admits every subword token extending to a valid program prefix and rejects all others. Second, we present a latent-variable fine-tuning method training the language model to interleave non-terminal grammar symbols directly into generation. Utilizing the reweighted wake-sleep (RWS) algorithm to optimize the importance-weighted evidence lower bound (IW-ELBO) of the surface text, the model learns to selectively retain formal derivations as an adaptive structural scratchpad. For Python, fine-tuning StarCoder2-3B with our RWS objective reduces per-token cross-entropy by 14.3% relative to a text-only SFT baseline, demonstrating that discretionary latent syntax recovers critical structural information that flat autoregressive training discards.

12.
arXiv (CS.AI) 2026-06-19

Confidence-Aware Automated Assessment of Student-Drawn Scientific Models

arXiv:2606.20264v1 Announce Type: new Abstract: Student-generated drawings are widely used in science education to assess learners' conceptual understanding in modeling-based tasks aligned with the Next Generation Science Standards (NGSS). However, scoring such drawings requires expert human judgment to interpret complex visual representations, making large-scale assessment costly to implement and sustain in classroom settings. In this work, we study automated scoring of student-generated scientific drawings using a vision-based model. We evaluate a Vision Transformer (ViT) with parameter-efficient adaptation and propose a confidence-aware scoring framework that derives response-level confidence from test-time predictive distributions. This confidence signal enables selective automation by scoring high-confidence responses automatically while deferring uncertain cases for human review. Experiments on six NGSS-aligned middle school assessment items show that the proposed approach improves scoring reliability while supporting a practical trade-off between automated coverage and scoring risk, highlighting the value of confidence-aware methods for trustworthy educational assessment.

13.
arXiv (CS.LG) 2026-06-16

PhysGuard: Fisher-Guided Gradient Projection for Sim-to-Real Neural PDE Surrogates

arXiv:2606.16602v1 Announce Type: new Abstract: Neural operator models trained on simulation data often lose accuracy when applied to experimental measurements due to the sim-to-real gap. Standard fine-tuning with limited real data can reduce this gap, but it may also damage the core physics-relevant representations learned during pretraining. Although knowledge-preserving adaptation has been widely investigated in vision or language tasks, it remains unclear whether these methods are suitable for neural operators whose architectures and protected knowledge are fundamentally different. Neural operators need to preserve core-scale physical structures rather than semantic or visual features. We propose PhysGuard, a physics-preserving framework for accurate sim-to-real adaptation of neural operators. Specifically, PhysGuard uses the empirical Fisher Information Matrix computed on simulation data to identify physics-critical parameter directions, then restricts fine-tuning updates to directions that do not interfere with them. A layer-wise Gram-matrix formulation makes this efficient for models with millions of parameters, while an adaptive threshold automatically determines the protected subspace size. A spectral probe experiment shows that the dominant Fisher directions are strongly associated with low-frequency output structures. Experiments on benchmark across four neural operator architectures and different physical systems show that PhysGuard performs strongly on most evaluation metrics compared to baselines. The benefits are most evident under severe domain shift, where it reduces low-frequency error by up to 32\% compared to standard fine-tuning while maintaining adaptability. Our code is available at https://github.com/ZhouChaunge/PhysGuard.

14.
arXiv (CS.CV) 2026-06-12

Magnifying What Matters: Attention-Guided Adaptive Rendering for Visual Text Comprehension

Visual Text Comprehension (VTC) renders text into images for a vision-language model (VLM) to read, sidestepping LLM context-window limits and powering applications from long-page OCR to multi-page memory QA. Yet existing VTC pipelines treat rendering and layout as a fixed, content-agnostic preprocessing step and offer little mechanistic understanding of how VLMs internally process visualized text. Through a focused empirical study on VTC QA tasks, we reveal that VLMs exhibit a localization-without-utilization regime: evidence-localizing attention emerges sharply in the middle-to-late layers and is largely decoupled from answer correctness, yet simply enlarging the localized spans on the rendered page recovers a large fraction of the failures. Building on these observations, we propose AGAR (Attention-Guided Adaptive Rendering), a training-free, model-agnostic method that leverages a VLM's own middle-to-late layer attention to identify the top-K important visual patches, maps them back to word spans, and re-renders the page with those spans enlarged before re-inferring the answer. Extensive experiments across nine VTC benchmarks (short-form, long-context, and multi-page memory QA) and four VLM backbones show that AGAR (i)consistently improves off-the-shelf VLMs as a plug-and-play enhancement, (ii)composes with VLM post-training to yield further gains, and (iii)remains robust under both visual- and text-side input degradation.

15.
arXiv (CS.AI) 2026-06-25

Agent-as-a-Router: Agentic Model Routing for Coding Tasks

arXiv:2606.22902v2 Announce Type: replace Abstract: Real-world users typically have access to multiple Large Language Models (LLMs) from different providers, and these LLMs often excel at distinct domains, yet none dominate all. Consequently, routing each task to the most suitable model becomes critical for both performance and cost. Existing routers treat this as a static, one-off classification problem. However, we identify the performance bottleneck for these routers as information deficit: simply augmenting a vanilla LLM router with performance statistics at the task-dimension level yields a 15.3% relative gain, surpassing a heuristic router built on the same dimension-level priors. Motivated by this finding, we propose Agent-as-a-Router, a framework that formalizes routing as a C-A-F loop (Context->Action->Feedback->Context). It closes the information gap by accumulating execution-grounded experience during deployment. We instantiate this framework as ACRouter, composed of an Orchestrator, a Verifier, a Memory module, and introduce CodeRouterBench, an evaluation environment comprising ~10K task instances with verified scores from 8 frontier LLMs, enabling regret-based router comparison on streaming tasks. Experiments show that ACRouter achieves the lowest cumulative regret on in-distribution tasks and generalizes to out-of-distribution agentic-programming tasks, demonstrating that our routing framework actively closes the information gap. Codes and benchmarks are released at https://github.com/LanceZPF/agent-as-a-router.

16.
arXiv (CS.AI) 2026-06-19

Sovereign Execution Brokers: Enforcing Certificate-Bound Authority in Agentic Control Planes

arXiv:2606.20520v1 Announce Type: cross Abstract: Autonomous agents are increasingly connected to cloud, deployment, and data-control workflows, but production mutation authority should not reside inside non-deterministic reasoning processes. Existing access-control mechanisms authorize identities, while assurance layers certify proposed actions; neither alone provides a mandatory enforcement point for certified authority at the moment of mutation. This paper introduces the Sovereign Execution Broker (SEB), a runtime enforcement boundary for certificate-bound agentic infrastructure. SEB consumes certificates issued by the Sovereign Assurance Boundary (SAB), verifies that the requested mutation matches the certified execution contract, checks validity windows, policy epochs, revocation epochs, and live-state drift, mints scoped execution identity, invokes infrastructure APIs, and records signed decision and outcome records. By separating proposal, admission, and execution, SEB turns certified authority into a short-lived, revocable, auditable runtime capability, provided that production mutation APIs reject non-broker identities. We present the SEB execution model, certificate and replay-verification predicates, scoped identity semantics, bypass-prevention deployment patterns, failure behavior, and a concrete prototype implementation. We evaluate the prototype on AWS and Kubernetes clusters, measuring latency overheads, revocation propagation, drift detection, and security under fault injection.

17.
arXiv (CS.LG) 2026-06-19

Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates

arXiv:2606.19549v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This late feedback is costly: adapters that are strong in isolation can interfere destructively once their updates are combined. We ask whether this outcome can be anticipated. We formalize adapter mergeability as the degree to which an adapter preserves its single-task utility after merging, and show that it can be forecast from signals measured in the first few percent of training – chiefly how the low-rank updates and their gradients align across tasks and how much they disturb shared representations. We package these signals into MergeProbe, a lightweight predictor that estimates pairwise and set-level retention and turns the estimate into a concrete decision: merge directly, reweight, prune, or route. On MERGE-PEFT, a five-domain benchmark spanning math, code, science, instruction following, and safety, MergeProbe attains the best average and worst-case retention among strong interference-aware merge baselines while adding far less deployment overhead than full task routing. This turns LoRA merging from a post-hoc engineering step into an anticipatory measurement problem.

18.
arXiv (CS.AI) 2026-06-24

Infinitesimal Causality

arXiv:2606.24621v1 Announce Type: cross Abstract: This paper introduces a categorical account of infinitesimal causality in Frobenius Markov categories equipped with tangent-bundle semantics. IDC captures the infinitesimal layer in which interventions act as tangent deformations of copy/discard structure. Two distinct Frobenius structures interact: (1) the categorical Frobenius algebra on classical variables encoding copying, comparing, and discarding; and (2) the geometric Frobenius integrability condition, namely involutive closure of the intervention distribution, distinct from the algebraic Frobenius structure. Categorical causal sufficiency is defined as the compatibility of these two notions. A key observation is that, for structural causal models, infinitesimal causality is most naturally formulated in the slice of deterministic mechanisms over exogenous variables, with visible stochastic kernels obtained only after pushforward. Interventions are tangent vectors that deform the Frobenius copy/discard operations; their Lie brackets measure whether this deformation preserves classical information-flow structure. Pearl's do-calculus is used as a guiding example of intervention identities: ignoring irrelevant interventions corresponds to counit invariance, action/observation exchange to coproduct compatibility with pushforward, and independence to involutive bracket closure of the visible intervention distribution.

19.
arXiv (CS.AI) 2026-06-16

FlowState: Sampling-Rate-Equivariant Time-Series Forecasting

arXiv:2508.05287v3 Announce Type: replace-cross Abstract: Existing time series foundation models (TSFMs), often based on transformer variants, lack adaptability to different sampling rates, struggle with generalization across varying context and target lengths, and are computationally inefficient. We introduce FlowState, a novel TSFM architecture that achieves sampling-rate-equivariant forecasting through a unified design that pairs a state space model (SSM) encoder with a functional basis decoder (FBD). This design enables continuous-time modeling and dynamic time-scale adjustment, allowing FlowState to inherently generalize across all possible temporal resolutions, and dynamically adjust the forecasting horizons without retraining. We further propose an efficient pretraining strategy that improves robustness and accelerates training. Despite being one of the smallest TSFMs, FlowState achieves state-of-the-art results on the widely used GIFT-Eval benchmark, while demonstrating superior adaptability to unseen sampling rates. Our detailed analyses confirm the effectiveness of its components, and we demonstrate its unique ability to adapt to varying input sampling rates.

20.
arXiv (CS.CV) 2026-06-25

Invoice Haystack: Benchmarking Document Retrieval and Visual Question Answering Under Strong Visual Homogeneity

Vision Language Models have achieved near-human performance on single-document Visual Question Answering, yet their effectiveness degrades significantly when retrieving information from large collections of visually homogeneous documents. Existing multi-document benchmarks aggregate diverse document types, creating artificial separation in embedding space that does not reflect enterprise document repositories where thousands of records share identical visual templates. We identify this as embedding collapse and introduce Invoice Haystack, a benchmark with 1,500 anonymized invoice images paired with 200 discriminative question-answer pairs, specifically designed to stress-test retrieval under strong visual homogeneity. Invoice Haystack exhibits a mean pairwise cosine similarity of 0.73, compared to 0.38 (DocHaystack) and 0.31 (InfoHaystack) in existing benchmarks, posing a fundamentally more challenging retrieval problem. Addressing the identified challenge, we propose VL-RAG, a hybrid retrieval-augmented generation framework that jointly leverages text and visual embeddings to harness the complementary strengths of both modalities, followed by a VLM-based verification filter for precise document identification. VL-RAG achieves 60.0\% Recall@1 on Invoice Haystack-500, outperforming existing state-of-the-art method by up to an absolute 13.5 percentage points. It further improves retrieval considerably on DocHaystack-1000 (77.1\% vs.\ 75.2\%) and InfoHaystack-1000 (84.5\% vs.\ 80.0\%), establishing the proposed dual-stream fusion as a consistently superior retrieval strategy across both homogeneous and heterogeneous document collections.

21.
arXiv (CS.AI) 2026-06-19

Emergent Alignment

arXiv:2606.19527v1 Announce Type: new Abstract: Can Large Language Models (LLMs) discern when their own outputs are misaligned with human ethics? And can they self-correct? We endow an LLM with a conscience step that reviews its own reasoning and outputs, and we extend the training loss with an alignment component using Direct Preference Optimization (DPO) to steer the model away from non-ethical outputs. The result is an online technique to align models in a wide range of applications: training, fine-tuning, adversarial prompting, and zero-shot learning. It does not require a weaker or stronger judge, relying instead on a frozen copy of itself. In previous work, the Emergent Misalignment scenario showed a range of emergent unethical behaviors from fine-tuning the model to hack code. Instead, we empirically show how to achieve Emergent Alignment: a single high-level introspective question steers training toward an ethical model under the same code hacking scenario.

22.
arXiv (quant-ph) 2026-06-12

First-order and interior-point methods for entanglement detection

arXiv:2508.05854v3 Announce Type: replace Abstract: Quantum entanglement lies at the heart of quantum information science, yet its reliable detection in high-dimensional or noisy systems remains a fundamental computational challenge. Semidefinite programming (SDP) hierarchies, such as the Doherty-Parrilo-Spedalieri (DPS) and Extension (EXT) hierarchies, offer complete methods for entanglement detection, but it is well known that their practical use is limited by exponential growth in problem size if implemented naively. We make three contributions. First, we introduce a new SDP hierarchy, PST, that is sandwiched between EXT and DP – offering a tighter approximation to the set of separable states than EXT, while incurring significantly lower computational overhead than DPS. Second, we explicitly construct compact, polynomially-scalable descriptions of EXT and PST using partition mappings and operators. These descriptions in turn yield formulations that satisfy desirable properties such as the Slater condition and are well-suited to both first-order methods (FOMs) and interior-point methods (IPMs). Third, we design a suite of entanglement detection algorithms: three FOMs (Frank-Wolfe, projected gradient, and fast projected gradient) based on a least-squares formulation, and a custom primal-dual IPM based on a conic programming formulation. These methods are numerically stable and capable of producing entanglement witnesses or proximity measures, even in cases where states lie near the boundary of separability. Numerical experiments on benchmark quantum states demonstrate that our algorithms improve the ability to solve deeper levels of the SDP hierarchy.

23.
arXiv (CS.CV) 2026-06-16

DCP-Prune: Ultra-Low Token Pruning with Distribution Consistency Preservation

Recent vision token pruning methods effectively preserve model performance under moderate token budgets but become unstable under ultra-low token budget. Our analysis shows that as the pruning budget decreases, accuracy degradation is often accompanied by larger feature distribution shifts. Critically, the degree of this distribution shift strongly correlates with performance degradation. To better characterize this phenomenon, we introduce a lightweight distribution consistency metric to estimate the distribution shift between retained and full tokens. Motivated by these observations, we propose a two-stage pruning framework consisting of Anchor-Context Graph Recovery (ACGR) and Text-Aware Token Cluster Selection (TATCS). Specifically, ACGR transfers contextual information before token removal, while TATCS dynamically re-selects representative tokens when severe distribution shift is detected. Extensive experiments demonstrate that our method achieves superior and more stable performance under ultra-low token budget. Notably, it retains 92.1% of the upper-bound average performance on LLaVA-1.5-7B with only 16 visual tokens.

24.
arXiv (CS.CL) 2026-06-18

Freeing the Law with LOCUS: A Local Ordinance Corpus for the United States

Progress in legal AI increasingly depends on access to authoritative legal text at scale. Yet one of the most consequential layers of American law remains largely absent from existing machine-readable corpora: local ordinances. Local codes govern zoning, housing, business licensing, public health, noise, animal control, and many other domains of everyday regulation, but they are fragmented across vendor platforms designed for human browsing rather than bulk research access. We introduce LOCUS - the Local Ordinance Corpus for the United States - a comprehensive corpus and county-harmonized access layer for U.S. municipal and county ordinance codes. The raw corpus, available for release to researchers, represents nearly all publicly available municipal and county ordinance codes. The resulting raw corpus contains codes from 9,239 cities and counties. A smaller county-harmonized LOCUS access layer provides coverage for the largest 2,309 of 3,144 U.S. counties, accounting for a majority of the population. We use OCR to handle the myriad of document formats that have kept the law from being a public resource. We release the corpus with coverage metadata to support reproducibility, downstream legal AI research, and the incremental expansion of machine-readable access to local law. We train a collection of ModernBERT-based classifiers and scorers to facilitate analyzing U.S. local law among several dimensions, such as opacity and paternalism, that have not previously been studied at this scale. LOCUS-v1 and its derivative models are available at: https://huggingface.co/datasets/LocalLaws/LOCUS-v1

25.
arXiv (CS.CV) 2026-06-19

Fast Human Attention Prediction for Fixation-guided Active Perception in Autonomous Navigation

Human visual attention relies on structured scanpaths to efficiently process scenes, yet instilling this behavior into robot autonomy is in its infancy and hindered by the high,computational costs of existing predictive models. To address this, we introduce GazeLNN, a computationally lightweight,scanpath prediction model that leverages Liquid Neural Networks as its recurrent engine and employs MobileNetV3 for feature extraction. Operating auto-regressively, the architecture predicts sequential fixation heatmaps conditioned on the current visual stimulus and fixation history. Despite requiring only 0.61 GFLOPs, GazeLNN achieves state-of-the-art performance on the MIT Low Resolution dataset achieving 0.47 ScanMatch score. It outperforms existing recurrent baselines across diverse evaluation metrics, while reducing computational costs by 99.40% and accelerating inference by up to six times. To investigate the role of human attention modeling in robot autonomy and demonstrate the practical utility of this highly efficient architecture, we integrate GazeLNN into an active camera-robot control policy trained via Reinforcement Learning. This integration enables human-fixation-guided perception during autonomous navigation, validated through successful real-world deployments on an aerial robot.