Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
Nature (Science) 2026-06-10

Improved quantum processor logical error rates via correction and detection

Authors:

Performing quantum algorithms for critical problems in physics and chemistry requires substantially lower error rates than the physical error rates of present quantum computers. Achieving such low logical error rates requires quantum error correction1,2 and physical error rates below a critical threshold value3–8. We experimentally demonstrate on a trapped-ion quantum charge-coupled device (QCCD)9,10 improvements in logical error rates ranging from 11× to 800× compared with several physical circuit baselines, including quantum computation on multiple qubits. Our results hinge on two quantum error correction code constructions optimized for an ion-trap processor: a 12-qubit code encoding two qubits inspired by Knill11 and a 16-qubit tesseract colour code encoding four qubits12,13. These constructions are combined with a scalable method of error detection and post-selection to achieve reduced logical error rates. Our results show that state-of-the-art quantum devices are already able to make use of fault tolerance and error correction to strongly suppress errors in non-trivial quantum circuit computations. Experimental demonstration of quantum error-correcting codes combined with error detection and post-selection applied to a trapped-ion quantum processor shows improvements in logical error rates ranging from 11× to 800× compared with several physical circuit baselines.

02.
arXiv (CS.CV) 2026-06-25

CustomX: Unified Character, Action, and Scene Customization in Video World Models

Recent advances in world models have greatly enhanced interactive environment simulation. Existing methods mainly fall into two categories: (1) static world generation models, which construct 3D environments without active agents, and (2) controllable-entity models, which allow a single entity to perform limited actions in an otherwise uncontrollable environment. In this work, we introduce CustomX, leveraging the realism and structural grounding of static world generation while extending controllable-entity models to support user-specified characters capable of performing open-ended actions. Users can provide a 3DGS scene and a character, then use natural language to direct the character to perform diverse behaviors, ranging from basic locomotion to object-centric interactions, while freely exploring the environment. CustomX synthesizes temporally coherent video clips that preserve visual fidelity with the provided scene and character, formulated as a conditional autoregressive video generation problem. Built upon a pre-trained video generator, our training strategy significantly enhances motion dynamics while maintaining generalization across actions and characters. Our evaluation covers a broad range of aspects, including visual quality, character consistency, action controllability, and long-horizon coherence.

03.
arXiv (CS.AI) 2026-06-16

Imperfect Visual Verification for Code Edition : A Case Study on TikZ

arXiv:2606.15693v1 Announce Type: cross Abstract: LLMs have significantly advanced code generation, enabling the synthesis of functional programs. While recent systems achieve strong performance on many coding benchmarks, tasks involving programs such as TikZ that generate visual artifacts remain challenging, in particular on visual code customization. Unlike generation from scratch, customization requires localized, semantics-preserving edits: the model must locate relevant code, modify it according to the instruction, and preserve the remaining structure and rendering. Approaches based on post-hoc iterative refinement/correction where a verifier provides feedback to guide corrections, have shown promise. However, in the case of programs with a visual outcome such as in TikZ, where correctness is harder or likely impossible to formalize and evaluate automatically, deterministic verifiers do not exist. Hence, developers can only rely on imperfect verifiers. In this paper, we conduct an empirical study to answer:to what extent can iterative refinement remain effective when the verifier itself is unreliable?} We use TikZ as a focused case study that isolates the core difficulties of the problem (weak code structure, fine-grained visual semantics, and difficult feature localization) in a controlled and challenging setting. We define visual code customization as an iterative editing problem with an imperfect oracle, and introduce a framework for analyzing such iterative refinements. We conduct a large-scale study and evaluate multiple LLM-based and tool-augmented visual verifiers within iterative refinement pipelines, and perform extensive manual annotation of refinement trajectories to assess verifier behavior and feedback quality. Our findings show that even imperfect verifiers can determine with moderate accuracy whether visual instructions are applied to code, achieving F1-scores up to 0.815. Feedback improves iterative refinement, especially for weaker models, adding 11–20 perfect customizations for Qwen3-vl-30b-a3b-Instruct, while stronger models like Gemini-3 gain fewer improvements (+5) but benefit more from accurate verification that prevents premature acceptance. Feedback is effective only when it precisely identifies image issues, provides actionable guidance, addresses all relevant problems, and remains grounded in the original instruction.

04.
arXiv (CS.CV) 2026-06-24

Towards Fast and Effective Long Video Understanding of Multimodal Large Language Models via Adaptive Quasi-Gaussian Sampling

Long video understanding remains a daunting challenge for Multimodal Large Language Models (MLLMs) due to the excessive computation and memory footprint. Thus, keyframe selection is often adopted to mitigate this shortcoming, which however still suffers from low flexibility and high noise due to its hard sampling principle. In this paper, we define video frame selection as a problem of Quasi-Gaussian Sampling, and propose an adaptive and training-free approach termed AdaQ. Inspired by the $3$-$\sigma$ rule of Gaussian distribution, the objective of AdaQ is to achieve the optimal $3$-$\sigma$ interval for different examples, i.e., a smaller $3$-$\sigma$ interval for the local query and a larger one for the global query, thereby facilitating robust and adaptive frame sampling. To validate AdaQ, we apply it to four MLLMs with three embedding models. The extensive experimental results not only show its obvious performance gains over the default MLLMs and the SOTA keyframe selection methods, e.g., helping Qwen3-VL-8B outperform GPT4o by 15.8\% on average by using only 64 frames, but also confirm its superior robustness and high efficiency for long-video understanding, e.g., only 1 hyper-parameter needs to be set. Our code project is given at \href{https://github.com/Zkayovo-xmu/AdaQ}{https://github.com/Zkayovo-xmu/AdaQ}.

05.
arXiv (CS.CL) 2026-06-17

OpenLID-v3: Improving the Precision of Closely Related Language Identification – An Experience Report

Language identification (LID) is an essential step in building high-quality multilingual datasets from web data. Existing LID tools (such as OpenLID or GlotLID) often struggle to identify closely related languages and to distinguish valid natural language from noise, which contaminates language-specific subsets, especially for low-resource languages. In this work we extend the OpenLID classifier by adding more training data, merging problematic language variant clusters, and introducing a special label for marking noise. We call this extended system OpenLID-v3 and evaluate it against GlotLID on multiple benchmarks. During development, we focus on three groups of closely related languages (Bosnian, Croatian, and Serbian; Romance varieties of Northern Italy and Southern France; and Scandinavian languages) and contribute new evaluation datasets where existing ones are inadequate. We find that ensemble approaches improve precision but also substantially reduce coverage for low-resource languages. OpenLID-v3 is available on https://huggingface.co/HPLT/OpenLID-v3.

06.
medRxiv (Medicine) 2026-06-17

Womens intentions and motivations towards health behaviour change before pregnancy: a cross-sectional survey of pregnant women in Australia

Introduction: The preconception period (i.e. the weeks and months before pregnancy) is a critical window during which parental health behaviours can influence pregnancy outcomes and the childs long-term health. Modifiable factors such as nutrition, physical activity, substance use, and environmental exposures play a key role, yet womens ability to adopt and sustain healthy behaviours is shaped by complex psychological, social and environmental influences. This study applies the Theory of Planned Behaviour to identify the beliefs underpinning womens preconception behaviours, with the aim of informing support for effective and sustained health behaviour change. Methods: An Australian national retrospective cross-sectional survey of pregnant women (18-49 years), recruited through social media platforms. The 92-item survey captured respondent socio-demographics, pregnancy status and health conditions, health behaviours, and beliefs regarding preconception health behaviours. Respondents level of pregnancy planning was categorised using the London Measure of Unplanned Pregnancy (LMUP). Items regarding preconception beliefs were structured in accordance with the Theory of Planned Behaviour, with a focus on regular exercise, healthy diet, and alcohol avoidance. These beliefs variables were analysed using structured equation modelling to identify paths between latent variables and the items used to estimate each concept. Results: The study was completed by 430 pregnant women of whom 72.7% had a planned pregnancy. Most had a partner, were university educated and in good health. Structural equation modelling showed intention strongly predicted exercise ({beta}=0.65), healthy diet ({beta}=0.54) and alcohol avoidance ({beta}=0.64). Perceived control and partner norms influenced intentions, whereas health professional norms had limited effect. Positive beliefs were associated with folate supplement use and smoking cessation. Conclusion: These findings highlight intention as a key driver of preconception health behaviours, with perceived control and partner influences playing a more significant role than individual beliefs or health professional input. Effective interventions should therefore address structural barriers and actively involve partners, while respecting womens autonomy. Overall, couples-focused, multi-level strategies are likely essential to support meaningful and sustained preconception health behaviour change.

07.
arXiv (CS.LG) 2026-06-15

Stability of a Generalized Debiased Lasso with Applications to Resampling-Based Variable Selection

Authors:

arXiv:2405.03063v3 Announce Type: replace-cross Abstract: We propose a generalized debiased Lasso estimator based on a stability principle. When a single column of the design matrix is perturbed, the estimator admits a simple update formula that can be computed from the original solution. Under sub-Gaussian designs with well-conditioned covariance, this approximation is asymptotically accurate for all but a vanishing fraction of coordinates in the proportional growth regime. The proof relies on concentration and anti-concentration arguments to control error terms and sign changes. In contrast, establishing comparable distributional limits (e.g., Gaussianity) under similar assumptions remains open. As an application, we show that the approximation significantly reduces the computational cost of resampling-based variable selection procedures, including the conditional randomization test and a local knockoff filter.

08.
arXiv (math.PR) 2026-06-24

Genealogical processes of sequential Monte Carlo methods and other non-neutral population models under rapid mutation

arXiv:2406.16465v3 Announce Type: replace Abstract: We show that genealogical trees arising from a broad class of non-neutral models of population evolution converge to the Kingman coalescent under a suitable rescaling of time. As well as non-neutral biological evolution, our results apply to genetic algorithms encompassing the prominent class of sequential Monte Carlo (SMC) methods. The time rescaling we need differs slightly from that used in classical results for convergence to the Kingman coalescent, which has implications for the performance of different resampling schemes in SMC algorithms. In addition, our work substantially simplifies earlier proofs of convergence to the Kingman coalescent, and corrects an error common to several earlier results.

09.
arXiv (CS.AI) 2026-06-11

LLMs+Graphs: Toward Graph-Native, Synergistic AI Systems

arXiv:2606.11560v1 Announce Type: cross Abstract: Large Language Models (LLMs) have advanced rapidly, but their limitations in structured and multi-hop reasoning underscore the need for graph-native, synergistic artificial intelligence (AI) systems. Graph-structured data underpins critical applications across social, biological, financial, transportation, web, and knowledge domains, making it essential to understand how LLMs can leverage graph computation for grounded, context-rich inference. Three complementary synergies are emerging: LLMs augmented with graph computation for retrieval and reasoning; bidirectional integration between LLMs and knowledge graphs (KGs), where LLMs support KG construction and curation while KGs enforce semantic constraints and factual consistency; and AI agents strengthened by graph algorithms for planning, decision making, and multi-step reasoning. In parallel, LLMs introduce new capabilities for graph data management and graph machine learning (ML) through natural language interfaces and hybrid LLM-graph neural network (GNN) pipelines. This tutorial synthesizes the algorithms, systems, and design principles driving these converging directions, offering data science and data mining researchers a unified perspective on integrating LLMs, graph data management, graph mining, graph ML, and agentic computation into next-generation graph-native AI systems.

10.
arXiv (CS.CL) 2026-06-16

P3B3: A Multi-Turn Conversational Benchmark for Measuring European and Brazilian Portuguese Variety Bias in LLMs

As Large Language Models (LLMs) become embedded in everyday communication, capturing regional linguistic variation is essential for reliable and equitable language use. In Portuguese, European (pt-PT) and Brazilian (pt-BR) varieties remain unevenly represented, with pt-BR dominating in data quantity, while LLM preference for Portuguese variants remains underexplored. To address this gap, we introduce P3B3, an expert-curated language variety agnostic benchmark of conversational prompts, along with an evaluation framework for measuring variety bias and controllability. Experiments on several models show that most LLMs exhibit a strong bias toward pt-BR, with variation in controllability across models. These results highlight the need for more balanced multilingual representation across language varieties.

11.
arXiv (CS.CV) 2026-06-24

MSPL: Multi-Step Pseudo-Labeling for Open-Vocabulary Object Detection

Open-vocabulary object detection (OVD) aims to recognize and localize object categories beyond the training set. Recent approaches leverage vision-language models to generate pseudo-labels using image-text alignment, allowing detectors to generalize to unseen classes without explicit supervision. However, these methods depend heavily on single-step image-text matching, neglecting the intermediate reasoning steps crucial for interpreting semantically complex visual contexts, such as crowding or occlusion. In this paper, we introduce MSPL, a framework that incorporates multi-step visual reasoning into the pseudo-labeling process for OVD. It decomposes complex scene understanding into three interpretable steps-object localization, category recognition, and background grounding-where these intermediate reasoning states serve as rich supervision sources. Extensive experiments on standard OVD evaluation protocols demonstrate that MSPL achieves state-of-the-art performance with superior pseudo-labeling efficiency, outperforming the strong baseline by 9.4 AP50 for novel classes on OV-COCO and improving box and mask APr by 3.2 and 2.2, respectively, on OV-LVIS. Code and models are available at https://github.com/hchoi256/mspl.

12.
arXiv (CS.AI) 2026-06-24

Maestro Order: A Model-Agnostic Orchestration Harness

Authors:

arXiv:2606.23983v1 Announce Type: cross Abstract: A single forward pass of a capable model is a fast, fluent, and unreliable problem-solver: it is right often enough to be useful and wrong often enough to be dangerous; in language models, such confident errors are known as hallucinations. We present Maestro Order, a model-agnostic orchestration harness that turns unreliable solvers into reliable problem-solving systems by composing them according to four structural primitives (decompose, ensemble, verify, and recurse) and a budget-aware controller that decides where to spend compute. The harness treats any model as a black-box base solver behind a uniform interface, layers a verifier ensemble whose discrimination is measured online, and allocates verification and voting to the stages with the highest marginal reliability per unit cost. We give the architecture, the message and state schema, the controller algorithm, and the engineering that makes it deterministic, observable, and fault-tolerant. We then specify an evaluation methodology (reliability at fixed cost, coverage, calibration, and ablations) and report results from a faithful Monte Carlo simulation of the harness over a parameterized solver/verifier model. The simulation reproduces the predicted laws quantitatively: verification amplifies reliability geometrically (e.g. $0.55\to0.98$ with two gates, $\to0.999$ with four), voting helps only above chance and is limited by shared errors, and a budget-aware controller reaches a target reliability at a small fraction of the cost of voting alone by selecting the cheapest mechanism for each regime. We close with failure modes (verifier gaming, correlated errors, and decomposition error compounding) and concrete guidance: build robust checkers, diversify solvers, and let the controller put compute where the information is.

13.
arXiv (math.PR) 2026-06-25

Breuer-Major Theorems for Hilbert Space-Valued Random Variables

arXiv:2405.11452v2 Announce Type: replace Abstract: Let $\{X_k\}_{k\in\mathbb{Z}}$ be a stationary Gaussian process with values in a separable Hilbert space $\mathcal{H}_1$, and let $G:\mathcal{H}_1\to\mathcal{H}_2$ be a measurable map into another separable Hilbert space $\mathcal{H}_2$. We derive a central limit theorem for the centered normalized partial sums of the Hilbert space-valued subordinated process $\{G[X_k]\}_{k\in\mathbb{Z}}$. Our result holds under either of two sets of sufficient conditions, formulated in terms of the transformation $G$ and the temporal and cross-sectional dependence structure of $\{X_k\}_{k\in\mathbb Z}$. These conditions coincide in finite dimensions but lead to genuinely different phenomena in the infinite-dimensional setting. The proof relies on the recently developed Fourth Moment Theorem on Hilbert spaces, leveraging tools from the infinite-dimensional Malliavin-Stein framework. We also provide continuous-time and quantitative versions of the central limit theorem. In a series of examples, we recover and strengthen limit theorems for a wide array of statistics relevant in functional data analysis, and present, as an application of our result, a novel limit theorem in the framework of neural operators.

14.
arXiv (quant-ph) 2026-06-25

Efficient Quantum Circuits for Coherent Conversion Between General First- and Second-Quantized Many-Body Representations

arXiv:2606.25029v1 Announce Type: new Abstract: Quantum simulation at fixed particle number admits two equivalent descriptions, a first-quantized (particle) representation and a second-quantized (occupation-number) representation. Their quantum resource costs differ sharply across computational tasks, so the ability to convert coherently between them is valuable. We construct an explicit unitary $Q$, with inverse $Q^\dagger$, that maps a first-quantized state to its fixed-$N$ occupation-number form while diagnosing the input's particle-exchange symmetry. The conversion is therefore symmetry-agnostic at the input yet fully resolved at the output, and it applies uniformly to bosonic, fermionic, and parastatistical sectors. At its foundation lies a structural identification that we place at the center of this work: the quantum Schur transform supplied by Schur-Weyl duality is the non-abelian Fourier transform of the commuting pair $(S_N,U(d))$, and the occupation-number representation is its weight basis, retaining only the labels shared by both factors, the irrep $\lambda$ and the $\mathfrak{u}(d)$ weight. This reduction is lossless for bosons and fermions, while a canonical Gelfand-Tsetlin promise renders it one-to-one for the remaining sectors. Algorithmically, $Q$ composes the strong Schur transform with reversible arithmetic that computes occupations as successive row-sum differences of the Gelfand-Tsetlin pattern, yielding gate complexity $\mathrm{poly}(N,d,\log(1/\epsilon))$. The converted state is prepared efficiently in quantum memory. Any classical algorithm that outputs it explicitly, however, pays a cost set by the sector dimension, which is polynomial of degree $N$ in $d$ at fixed $N$ and exponential in $N$ when $d=\Theta(N)$. Finally, an efficient classical sampler for the induced occupation-number distribution would yield one for arbitrary quantum circuits, contrary to standard complexity assumptions.

15.
arXiv (CS.AI) 2026-06-11

PermDoRA – Understanding Adapter Interference in Language Models: Limits of Parameter-Space Geometry

arXiv:2606.11262v1 Announce Type: cross Abstract: Access control in large language models (LLMs) requires modular mechanisms to enable domain-specific behavior without retraining or cross-domain interference. A common hypothesis is that interference during adapter composition arises from overlap in linear parameter updates, suggesting that enforcing orthogonality or directional independence should improve multi-domain performance. We test this hypothesis using DoRA-RBAC, a hierarchical adapter composition framework based on weight-decomposed low-rank adaptation. We compare conventional Euclidean merging with a geometry-aware Riemannian-inspired merging strategy that approximates the Frechet mean via normalized directional averaging across multiple QA benchmarks (GPQA, PubMedQA, SimpleQA, WMDP) on LLaMA-3.1-8B and Mistral-7B. Our results show that while single-domain performance matches LoRA, geometry-aware merging provides no consistent advantage over standard averaging in multi-domain settings.Diagnostic analysis further reveals that angular alignment and orthogonality of adapter updates are weak predictors of composition performance. These findings suggest that adapter interference is not governed primarily by parameter-space geometry, but is instead consistent with interactions in shared nonlinear representations.

16.
arXiv (CS.AI) 2026-06-24

Scaling Laws for Task-Specific LLM Distillation

arXiv:2606.24747v1 Announce Type: new Abstract: Large Language Models (LLMs) achieve strong performance across a growing range of domains, yet their scale poses deployment challenges in applications where latency and cost constraints are critical. This paper derives empirical scaling laws for domain-specific LLM compression, quantifying how in-domain and general knowledge performance scale with dataset size, compression ratio, supervision format, and iterative pruning schedule. Using quantitative finance as our application domain, we compare logit-based and LoRA-based distillation under iterative structural pruning, introducing a blended chain-of-thought supervision loss that stabilizes KL-divergence distillation over reasoning traces. In-domain task quality degrades predictably under compression while general-knowledge benchmarks collapse well before the same point; supervision format is the key driver of this tradeoff, with chain-of-thought supervision actively recovering general knowledge that pruning erases. We release the headline dataset FinHeadlineMix, scaling law results, and practical recommendations to provide a reusable framework for domain-specific compression decisions.

17.
arXiv (quant-ph) 2026-06-24

Discovery of connectivity-trainability trade-off of IQP Circuits for Hamiltonian Optimization

arXiv:2606.24264v1 Announce Type: cross Abstract: Instantaneous Quantum Polynomial-time (IQP) circuits are promising candidates for near-term quantum advantage due to the conjectured classical hardness of their sampling task. However, their capabilities for optimization remain largely unexplored. We present a systematic investigation of the performance and trainability of IQP circuits for Hamiltonian optimization. Our results reveal a trade-off between optimization performance and circuit connectivity, demonstrating that the circuit structure plays a key role in determining the ability of IQP circuits to reach low-energy states.

18.
arXiv (CS.LG) 2026-06-16

Reinforcement Learning for LLM-based Event Forecasting

arXiv:2606.15917v1 Announce Type: new Abstract: We use Group Relative Policy Optimization (GRPO), a recently devised sample and memory efficient reinforcement learning method, to finetune pretrained LLMs in the range of 1.5B to 14B parameters equipped with the ability to get current information through the use of a Wikipedia revisions tool, or news summaries, to forecast real events beyond the knowledge cutoff of the LLM, as well as problems made to simulate different aspects of the dynamics of that training. We use the results of these experiments to comment on the scaling capability of LLMs for forecasting, as well as classify how judgmental forecasting fits into the verifiable/unverifiable domain taxonomy, considering the impact of the inherent aleatoric uncertainty when forecasting future events (e.g. the roll of a die). As a result of the GRPO training, we manage to bring a 1.5B parameter transformer (Qwen 2.5 1.5B) to forecasting performance superior to Claude Sonnet 3.5 over the same dataset as measured by cross entropy from the market agreed probabilities. We also discuss various dead ends on the path to this result.

19.
arXiv (CS.AI) 2026-06-24

Repeated Shared Access Enables Grokking, but Edit Propagation Depends on an Addressable Memory

Authors:

arXiv:2606.20737v2 Announce Type: replace Abstract: We study factual edit propagation in a controlled synthetic knowledge-graph QA setting using a 2x2 grid that crosses loop recurrence with shared-memory access: a dense transformer (Dense), a looped transformer (Loop), a dense backbone with shared memory (Dense+Mem), and a looped backbone with shared memory (loop-memory coupling, LMC). The two factors dissociate. For learning, both routes to repeated shared access – looped recomputation and repeated memory rereading – cross the out-of-distribution (OOD) grokking barrier that Dense fails, so repeated shared access is the behavioral regularity, not a specific architecture. For editing, the substrates split along a different axis: applying a single localized factual edit (conditioned on direct success) and measuring 2-hop propagation on a shared pre-edit-correct set, the edit propagates strongly in both memory-bearing cells (LMC 0.78-0.92, Dense+Mem 0.71-0.96) and only weakly in the memory-free ones (Loop 0.04-0.30, Dense 0.00-0.03). The split is along the memory axis, not the loop axis: every memory-bearing seed exceeds every memory-free seed, with no detectable difference between the two memory cells. Crucially Dense+Mem has no recurrence, so the propagating ingredient is an addressable site that an edit can write to and later computation rereads, not loop recomputation; Loop is at best a partial intermediate. The affordance survives coarsening the store (N=128 to N=13): propagation attenuates but the memory/no-memory split persists, so fine granularity buys precision rather than the affordance itself. These results dissociate learning competence from editing affordance – repeated shared access suffices to grok, but edit propagation depends on whether the substrate exposes an addressable memory that the forward computation can write to and later reread, an affordance that loop recurrence provides only partially.

20.
arXiv (CS.CV) 2026-06-24

Ill-Posed by Design: Probing Evidence Use in VLMs

Counterfactual analysis is widely used to study evidence use in vision-language models, but its diagnostic value is limited on well-posed tasks: when several cues independently support the same answer, removing one may not change the prediction. We propose monocular metric object-size estimation as an ill-posed diagnostic setting for evidence selection: because physical size cannot be determined from a single uncalibrated image, models must rely on imperfect cues category priors, target appearance, local context, apparent image size, and scene geometry. We assemble Metric VQA ($10{,}813$ dimension queries from Objectron and $331$ tape-measured in-the-wild scenes) and evaluate $12$ open-weight VLMs ($3$–$397$\,B parameters) with counterfactual analysis decomposing six visual and language evidence channels. Even the largest VLMs tested (Qwen3-VL-235B, Qwen3.5-397B, InternVL3.5-241B) trail a text-only frontier LLM on the in-the-wild split. The diagnostic analysis shows: target identity is the most load-bearing cue, target pixels and local context help only some models, apparent size shifts predictions without a directional readout, and global scene geometry is largely unused. We analyze LoRA fine-tuning as an actionable intervention specific to metric estimation: while the task is learnable, the models do not learn to leverage scene geometry.

21.
arXiv (quant-ph) 2026-06-19

QPU-scale randomized benchmarking via Bell-pair injection

arXiv:2606.20123v1 Announce Type: new Abstract: Mirror randomized benchmarking (MRB) is an established technique that provides a global error metric at the scale of a whole QPU. To expand upon this we introduce Mirror Quantum Awesomeness (MQA), a hybrid protocol that adds a structured entangling layer to MRB circuits. This enables per-edge correlation dynamics to be tracked via mutual information while preserving the MRB infidelity estimate. The resulting analysis of the injected entangled pairs locates a critical circuit depth, beyond which rudimentary error mitigation techniques can be expected to fail. A topological variant, Topological MQA, supplies a second critical depth via a decoder based on the surface-code decoding problem. Both are validated in simulation and demonstrated on the 156-qubit \texttt{ibm\_fez} and \texttt{ibm\_kingston} processors, where MQA closely agrees with MRB on the entanglement infidelity and the critical depth for \texttt{ibm\_fez} is found to be $\sim 50$.

22.
arXiv (CS.CV) 2026-06-19

LaTtE-Flow: Layerwise Timestep-Expert Flow-based Transformer

Recent advances in multimodal foundation models unifying image understanding and generation have opened exciting avenues for tackling a wide range of vision-language tasks within a single framework. Despite progress, existing unified models typically require extensive pretraining and struggle to achieve the same level of performance compared to models dedicated to each task. Additionally, many of these models suffer from slow image generation speeds, limiting their practical deployment in real-time or resource-constrained settings. In this work, we propose Layerwise Timestep-Expert Flow-based Transformer (LaTtE-Flow), a novel and efficient architecture that unifies image understanding and generation within a single multimodal model. LaTtE-Flow builds upon powerful pretrained Vision-Language Models (VLMs) to inherit strong multimodal understanding capabilities, and extends them with a novel Layerwise Timestep Experts flow-based architecture for efficient image generation. LaTtE-Flow distributes the flow-matching process across specialized groups of Transformer layers, each responsible for a distinct subset of timesteps. This design significantly improves sampling efficiency by activating only a small subset of layers at each sampling timestep. To further enhance performance, we propose a Timestep-Conditioned Residual Attention mechanism for efficient information reuse across layers. Experiments demonstrate that LaTtE-Flow achieves strong performance on multimodal understanding tasks, while achieving competitive image generation quality with around 6x faster inference speed compared to recent unified multimodal models.

23.
arXiv (math.PR) 2026-06-24

Negative index, matchings, and nonnegative eigenvalues of tridiagonal stochastic matrices

arXiv:2606.21122v2 Announce Type: replace Abstract: We study negative eigenvalues of $n\times n$ stochastic matrices whose off-diagonal support is constrained by a sparse graph. The main tool is a matching-based inertia principle: if $G$ is bipartite with matching number $\mu(G)$, $S$ is a real symmetric matrix supported on $G$ with nonnegative diagonal entries and whose negative index (i.e. number of negative eigenvalues counted with their multiplicities) is denoted by $\nu_{-}(S) $, then \[ \nu_{-}(S)\leq \mu(G). \] In particular, every $n\times n$ nonnegative tridiagonal stochastic matrix $P$ satisfies $ \nu_{-}(P)\leq \left\lfloor \frac{n}{2}\right\rfloor. $ Consequently, after ordering the eigenvalues of $P$ in the decreasing order, we have $ \lambda_{\lceil n/2\rceil}(P)\geq0, \ and hence \ \lambda_2(P)\geq0, \mbox{ for } n\geq3. $ This gives an all-dimensional strengthening of the previously known $4\times4$ tridiagonal stochastic result. Next, we show that this tridiagonal bound is sharp in every dimension in both reducible and irreducible cases. Finally, we explore some possible extension and raise some open questions.

24.
arXiv (CS.CV) 2026-06-12

Augmentation techniques for video surveillance in the visible and thermal spectral range

In intelligent video surveillance, cameras record image sequences during day and night. Commonly, this demands different sensors. To achieve a better performance it is not unusual to combine them. We focus on the case that a long-wave infrared camera records continuously and in addition to this, another camera records in the visible spectral range during daytime and an intelligent algorithm supervises the picked up imagery. More accurate, our task is multispectral CNN-based object detection. At first glance, images originating from the visible spectral range differ between thermal infrared ones in the presence of color and distinct texture information on the one hand and in not containing information about thermal radiation that emits from objects on the other hand. Although color can provide valuable information for classification tasks, effects such as varying illumination and specialties of different sensors still represent significant problems. Anyway, obtaining sufficient and practical thermal infrared datasets for training a deep neural network poses still a challenge. That is the reason why training with the help of data from the visible spectral range could be advantageous, particularly if the data, which has to be evaluated contains both visible and infrared data. However, there is no clear evidence of how strongly variations in thermal radiation, shape, or color information influence classification accuracy. To gain deeper insight into how Convolutional Neural Networks make decisions and what they learn from different sensor input data, we investigate the suitability and robustness of different augmentation techniques...

25.
arXiv (CS.CV) 2026-06-11

STEAM: Squeeze and Transform Enhanced Attention Module

Channel and spatial attention mechanisms introduced in earlier work enhance the representational capabilities of deep convolutional neural networks (CNNs) but often increase parameter and computational costs. While recent approaches focus solely on efficient feature context modeling for channel attention, we aim to model both channel and spatial attention comprehensively with minimal parameters and reduced computation. Leveraging the principles of relational modeling in graphs, we introduce a constant-parameter module, STEAM: Squeeze and Transform Enhanced Attention Module, which integrates channel and spatial attention to enhance the representation power of CNNs. To our knowledge, we are the first to propose a graph-based approach for modeling both channel and spatial attention, utilizing concepts from multi-head graph transformers. Additionally, we introduce Output Guided Pooling (OGP), which efficiently captures spatial context to further enhance spatial attention. We extensively evaluate STEAM for large-scale image classification, object detection and instance segmentation on standard benchmark datasets. STEAM achieves a \(2\%\) increase in accuracy over the standard ResNet-50 model with only a meager increase in GFLOPs. Furthermore, STEAM outperforms the leading modules, ECA and GCT, in terms of accuracy while achieving a threefold reduction in GFLOPs. The code will be made available upon acceptance.