Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-12

Camera and LiDAR BEV Fusion for Cooperative 3D Object Detection on TUMTraf V2X

We describe a Camera and LiDAR fusion detector developed for the TUMTraf V2X cooperative 3D object detection track of the DriveX 2026 challenge. The detector fuses three roadside cameras with a fused infrastructure-plus-vehicle point cloud in a shared bird's-eye-view space and predicts boxes through a CenterPoint-style head with a generalized IoU regression loss and an IoU quality re-ranking head. Trained on the provided train and validation splits, the model reaches a 3D mAP of 0.85 on the public Codabench test split. While iterating on the system, we observed that 44 of the 50 test frames are also present in the released train (40) and validation (4) splits with their labels. We therefore conducted two additional studies to quantify how this overlap affects the final score: (1) a finetuning run that oversamples the 44 overlapping frames, reaching 0.89 mAP, and (2) a post-processing run that replaces predictions on those frames with the released ground truth, reaching 0.99 mAP (uploaded to our Codabench account for testing but not published on the leaderboard). All three configurations and their per-class results are reported.

02.
arXiv (CS.AI) 2026-06-12

Meta-Learning Transformers to Improve In-Context Generalization

arXiv:2507.05019v2 Announce Type: replace-cross Abstract: In-context learning enables transformer models to generalize to new tasks based solely on input prompts, without any need for weight updates. However, existing training paradigms typically rely on large, unstructured datasets that are costly to store, difficult to evaluate for quality and balance, and pose privacy and ethical concerns due to the inclusion of sensitive information. Motivated by these limitations and risks, we propose an alternative training strategy where we leverage a collection of multiple, small-scale, and domain-specific datasets. We empirically demonstrate that the increased quality and diversity of such data improve the generalization abilities of in-context learners beyond their training domain, while achieving comparable performance with models trained on a single large-scale dataset. We investigate this paradigm by leveraging meta-learning to train an in-context learner on the Meta-Album collection under several settings. Firstly, we show the performance in a controlled environment, where the test domain is completely excluded from the training knowledge. Secondly, we explore the robustness of these models to forgetting in a continual scenario where the information is accessible for a limited time. Finally, we explore the more challenging unsupervised scenario. Our findings demonstrate that transformers still generalize for in-context prediction when trained on a curated dataset collection while offering advantages in modularity and replaceability.

03.
arXiv (quant-ph) 2026-06-19

Hybrid VQE-CVQE algorithm using diabatic state preparation

arXiv:2512.04801v2 Announce Type: replace Abstract: We propose a hybrid variational quantum algorithm that has variational parameters used by both the quantum circuit and the subsequent classical optimization. Similar to the Variational Quantum Eigensolver (VQE), this algorithm applies a parameterized unitary operator to the qubit register. We generate this operator using diabatic state preparation. The quantum measurement results then inform the classical optimization procedure used by the Cascaded Variational Quantum Eigensolver (CVQE). We demonstrate the algorithm on a system of interacting electrons and show how it can be used on long-term error-corrected as well as short-term intermediate-scale quantum computers. Our simulations performed on IBM Brisbane produced energies well within chemical accuracy.

04.
medRxiv (Medicine) 2026-06-22

Leishmaniasis on YouTube: a critical appraisal of the quality, reliability, and transparency of educational content

Background: Leishmaniasis is a neglected tropical disease of significant global public health importance, for which accurate information is essential to support prevention and early care-seeking, particularly in endemic, resource-limited settings. YouTube is a widely used source of health information, but the quality and reliability of leishmaniasis-related content have not been evaluated. We aimed to assess the quality, reliability, and transparency of English-language YouTube videos on leishmaniasis. Methods: We conducted a cross-sectional analysis of YouTube videos retrieved via the YouTube Data API on 15 June 2026 using the terms "leishmaniasis," "cutaneous leishmaniasis," and "visceral leishmaniasis." After applying eligibility criteria and screening the 150 most-viewed eligible videos, 48 videos were included. Two reviewers independently assessed each video using the modified DISCERN (mDISCERN) tool, the Global Quality Score (GQS), and the JAMA benchmark criteria, with disagreements resolved by consensus. Inter-rater agreement was assessed using the intraclass correlation coefficient (ICC), and associations were examined using Spearman's rank correlation. Results: Of 402 videos retrieved, 48 met the inclusion criteria. The median GQS was 3.00 (IQR 2.00-4.00) and median mDISCERN was 3.00 (IQR 2.38-4.50), indicating moderate quality and reliability, while the median JAMA score was 2.00 (IQR 1.00-2.00), reflecting limited transparency; no video met all four JAMA criteria. The overwhelming majority of videos (47/48, 97.9%) were of professional or institutional origin. Inter-rater agreement was good to excellent (ICC 0.883 for GQS, 0.896 for mDISCERN, 1.000 for JAMA). The instruments were strongly inter-correlated (mDISCERN-GQS rho = 0.841, p < 0.001). Quality scores did not correlate positively with views, likes, or video duration; comments correlated weakly and negatively with mDISCERN (rho = -0.337, p = 0.031) and JAMA (rho = -0.381, p = 0.014). Conclusions: YouTube videos on leishmaniasis are of moderate quality and reliability but limited transparency, and are produced almost exclusively by professional sources. Video popularity, length, and age were not indicators of quality. There is a need for experts and institutions to produce clearly authored, well-sourced, and transparent educational content on this neglected tropical disease.

05.
arXiv (math.PR) 2026-06-18

Probabilistic representation and classical solutions of wave equations with complex polynomial nonlinearities

arXiv:2606.18919v1 Announce Type: cross Abstract: We review the probabilistic representation of solutions of wave equations with polynomial nonlinearities in spatial dimensions d=1,2,3 using stochastic branching processes. Under regularity assumptions on the initial data, we derive conditions ensuring the integrability of the corresponding Monte Carlo estimator, and the existence and smoothness of mild and classical solutions. We also present numerical results and comparisons with grid-based algorithms for the solution of nonlinear wave equations.

06.
medRxiv (Medicine) 2026-06-16

Sleep regularity outweighs sleep duration as a predictor of disease

Sleep regularity, the consistency of sleep-wake timing from one day to the next, is more strongly associated with longevity than adequate sleep duration. Whether this relationship persists across common diseases is unknown. We compared sleep regularity vs. sleep duration as risk factors for 199 diseases and disorders, using ten million hours of objective sleep-wake data (N=60,998, age[mean{+/-}SD]=62.8{+/-}7.8, 55% female). Multivariable-adjusted risks of incident diseases/disorders for regular/irregular and short/adequate sleepers were compared across 9.5 years of follow-up. Irregular sleep predicted risks for 131 diseases/disorders, more than double the number predicted by short sleep duration (63). Irregular sleep was a superior predictor than short sleep duration for 90 diseases/disorders, including circulatory, metabolic, digestive, renal, infectious, neurological, and musculoskeletal conditions, and mental disorders, whereas short sleep duration was the superior predictor for only 9 diseases/disorders. For models where short sleep duration explained disease risks, 83% were improved by adding sleep regularity. Sleep regularity was a stronger predictor of diseases/disorders than sleep duration in this cohort and should be considered an essential dimension of sleep health.

07.
arXiv (CS.CL) 2026-06-19

Reliability without Validity: A Systematic, Large-Scale Evaluation of LLM-as-a-Judge Models Across Agreement, Consistency, and Bias

LLM-as-a-Judge has become the dominant evaluation paradigm for language models, but judge validation in practice relies on exact-match agreement, a metric that does not correct for chance and systematically overstates discriminative ability. We present the largest systematic evaluation of LLM-as-a-Judge to date: 21 judges from nine providers across MT-Bench, JudgeBench, and RewardBench, evaluated under three protocols (agreement, consistency, bias audit) over 118 runs and approximately 541,000 individual judgments. Four findings emerge, consistent across the full cohort, including the April 2026 frontier: kappa deflation between exact match and Cohen's kappa is universal (33–41 pp on MT-Bench), judge rankings shift by up to 14 positions across benchmarks, high test–retest reliability (>0.95) coexists with severe position bias (>0.10) in two production-deployed judges (instantiating a consistency–bias paradox), and verbosity bias is small (

08.
arXiv (CS.CL) 2026-06-11

3-Key-Input: Exploring the Theoretical Minimum Keys for Text Entry

作者:

How far can we reduce the number of physical keys if we endow an ambiguous keyboard with modern language models? Fewer keys increase hardware design freedom in constrained settings such as assistive devices and mobile form factors. This paper systematically evaluates text entry systems using 2-5 physical keys combined with language-model-based disambiguation. On a 300-sentence English corpus (100 sentences each for Business / Conversational / Technical), we compare key counts (2-5), letter-to-key mappings (layout-based / frequency-based / intentionally worst-case), and decoders (Trie-only, GPT-2 beam search, GPT-4o selection). We find that 3 keys + GPT-4o achieves character error rate (CER) 9.46% and word error rate (WER) 12.20%, reducing CER by 59% relative to 2 keys (CER 23.3%). At 3 keys, the key-stream entropy is 1.54 bits/char; while increasing to 5 keys improves accuracy (CER 5.4%), the marginal gains diminish. Mapping choice has a small impact under standard designs ({\Delta}CER < 0.5 pp), and even an intentionally worst mapping degrades CER by only +0.5 pp, whereas Technical sentences yield roughly twice the error rate of Business. These results suggest that, in our evaluated offline setting under a strong LM prior, 3 keys are a practical minimum for general English.

09.
arXiv (CS.CV) 2026-06-19

Addressing Detail Bottlenecks in Latent Diffusion for RGB-to-SWIR Image Translation

Latent diffusion models (LDMs) enable efficient image-to-image translation but discard fine spatial details during compression, degrading downstream perception tasks. We identify two bottlenecks: the autoencoder, which loses spatial information, and the conditioning pathway, which further degrades the source signal through naive downsampling. We propose two lightweight, backbone-agnostic fixes: a Source-Conditioned Autoencoder (SCAE) that injects high-resolution source features into the decoder via skip connections, and a Learnable Guidance Encoder (LGE) that replaces naive downsampling with a learned conditioning signal. Evaluated on RGB-to-SWIR translation for driving scenes with two denoiser backbones (U-Net and DiT), our approach improves detection mAP by up to 2x over the latent diffusion baseline, with up to 3.4x gains on small objects (COCO-small,

10.
arXiv (CS.AI) 2026-06-15

A Fixed-Point Neural Operator for Size- and Functional-Transferable Hamiltonian Prediction

arXiv:2606.14498v1 Announce Type: cross Abstract: Predicting the Kohn-Sham Hamiltonian with machine learning can accelerate density functional theory while retaining access to molecular orbitals, energy levels, and electronic-structure observables that energy-only surrogates cannot resolve. Yet element-wise agreement with the converged Hamiltonian, an implicit fixed point of the self-consistent field iteration, does not determine the occupied subspace that governs orbital energies and densities. Here we present HamEvo, a neural operator that learns the single-step self-consistent update and returns the converged Hamiltonian as its fixed point. HamEvo is pre-trained on intermediate self-consistent trajectories and calibrated at equilibrium with density-matrix supervision. Across benchmarks from MD17 to drug-like QMugs, HamEvo lowers Hamiltonian errors by 35-49% over direct-regression and deep-equilibrium baselines, and predicts QMugs HOMO and LUMO energies with mean absolute errors of 0.036 and 0.053 eV, near the 1 kcal/mol chemical-accuracy scale. Few-shot fine-tuning with only 20 reference conformations extends HamEvo to molecules of up to 122 atoms, well beyond the size range covered by pre-training. With thermal molecular-dynamics sampling, HamEvo captures temperature-dependent HOMO-LUMO gap renormalization beyond the harmonic approximation. Inference is up to 242 times faster than conventional DFT.

11.
arXiv (CS.LG) 2026-06-15

On the Influence of the Feature Computation Budget on Per-Instance Algorithm Selection for Black-Box Optimization

arXiv:2605.04954v2 Announce Type: replace-cross Abstract: Per-instance algorithm selection (PIAS) takes advantage of complementarity between a set of algorithms by deciding which algorithm to run on a given instance. This decision is based on features of the instances, which, in the context of black-box optimization (BBO), require a part of the optimization budget to be computed. This raises two questions: (a) from which fraction of the budget spent on feature computation does PIAS become worth it for BBO, and (b) which fraction of the budget optimizes the tradeoff between feature accuracy and PIAS performance. To this end, we perform a broad study where PIAS with varying sampling budgets for feature computation is compared to the single best algorithm on a broad range of algorithm selection scenarios. These scenarios consist of two portfolio sizes, three problem sets, 4 dimensionalities, and 10 target budgets. We find that PIAS is viable for the majority of tested scenarios, even when as much as a quarter of the total budget is spent on feature computation. The tradeoff for the fraction of the budget spent on feature computation to maximize the benefit of PIAS is highly dependent on the specific AS scenario. Further, on average 20 percent of PIAS loss to the virtual best solver is explained by the budget spent on feature computation, highlighting the importance of properly accounting for the feature budget.

12.
arXiv (CS.LG) 2026-06-17

Amortizing Maximum Inner Product Search with Learned Support Functions

arXiv:2603.08001v2 Announce Type: replace Abstract: Maximum inner product search (MIPS) is a crucial subroutine in machine learning, requiring the identification of a vector taken within a database (the keys) that best aligns with a given query. We propose amortized MIPS: a regression-based approach that trains neural networks to directly predict MIPS solutions, amortizing the cost of repeatedly solving MIPS for queries drawn from a known distribution over a fixed key database. Our key insight is that the MIPS value function is the support function of the set of keys, a well-studied convex function whose gradient yields the optimal key. This motivates two complementary amortized models: SupportNet, an input-convex neural network trained to regress the support function, and KeyNet, a vector-valued network that directly regresses the optimal key. SupportNet can serve as a cluster router, steering queries toward relevant database partitions, while KeyNet can be used as a drop-in replacement for the original query, fed directly to off-the-shelf indexing pipelines. Our experiments on the BEIR benchmark show that, for document embeddings, learned \SupportNet{}s and \KeyNet{}s significantly improve IVF match rates when accounting for compute effort, whether measured in FLOPs, number of probes, or wall-clock time. Our code is available at: https://github.com/apple/ml-amips.

13.
arXiv (CS.AI) 2026-06-11

Federated continual learning: A comprehensive survey on lifelong and privacy-preserving learning over distributed and non-stationary data

arXiv:2606.11272v1 Announce Type: cross Abstract: Federated Learning (FL) enables collaborative and privacy-preserving model training across distributed clients, but most existing FL systems implicitly assume data stationarity. In real-world settings-such as healthcare, industrial IoT (IIOT), cybersecurity, and smart cities-data streams are inherently non-stationary, leading classical FL methods to suffer from performance degradation, instability, and catastrophic forgetting. Continual Learning (CL) addresses learning under evolving data distributions but has been largely studied in centralized settings, overlooking key constraints of federated systems, including privacy, limited communication, and client heterogeneity. Federated Continual Learning (FCL) emerges at the intersection of FL and CL, aiming to support lifelong, adaptive, and privacy-aware learning over distributed and non-stationary data. This survey provides a comprehensive and systematic overview of FCL. We first present a formal definition of the FCL problem and clarify its distinctive characteristics. We then analyze the limitations of classical FL under non-stationary conditions, highlighting how CL principles support long-term adaptation. To organize the rapidly growing literature, we propose a multi-dimensional taxonomy of FCL approaches. Furthermore, we review representative application domains and data modalities, summarize commonly used evaluation metrics, and discuss experimental perspectives for assessing long-term performance and forgetting. Finally, we highlight key open challenges, including handling extreme heterogeneity under temporal drift, designing scalable and privacy-preserving memory mechanisms, and establishing standardized benchmarks. This survey aims to serve as a reference and a roadmap for advancing FCL toward robust and deployable real-world systems.

14.
medRxiv (Medicine) 2026-06-11

Corticospinal tract risk modifies motor recovery after minimally invasive surgery for intracerebral hemorrhage: a secondary analysis of MISTIE-III

Objective: Outcome after surgical hematoma evacuation for intracerebral hemorrhage (ICH) depends on hematoma location. As corticospinal tract (CST) integrity affects motor recovery after stroke, we hypothesized that CST integrity drives heterogeneity in surgical outcomes and investigated this in a secondary analysis of MISTIE-III participants. Methods: Risk of CST injury was categorized into four levels, based on the interaction between the CST, the hematoma, and perihematomal edema (PHE) on automatically segmented stability CT: no risk, PHE infiltration, hematoma infiltration, and complete interruption of the CST. Associations with outcome were tested using multivariable linear regression for motor National Institutes of Health Stroke Scale (NIHSS) at day 180 and ordinal regression for modified Rankin Scale (mRS) at day 365, introducing an interaction term between CST risk and treatment group. Results: Day 180 motor NIHSS was significantly lower for 'no risk' ({beta}:-3.77, [95% confidence interval [CI]: -5.8 to -1.70], p=0.0003) and 'PHE infiltration' ({beta}:-2.3, [95%CI: -3.5 to -1.1]; p=0.0002) vs. 'complete interruption'. Surgery was associated with lower Day 180 motor NIHSS in participants with hematoma infiltration ({beta}:-2.07, [95%CI: -3.8 to -0.4], p=0.016). Compared to complete interruption, 'no risk' (adjusted odds ratio [aOR]:0.27, [95%CI: 0.10 to 0.74], p=0.01) and 'PHE infiltration' (aOR:0.41, [95%CI: 0.23 to 0.74]; p=0.003) were associated with lower odds of unfavorable day 365 mRS. Surgery was associated with lower mRS in participants with no risk (aOR:0.23, [95%CI: 0.05 to 0.97, p=0.045). Interpretation: Increasing CST risk is associated with worse motor recovery (day 180) and disability (day 365). CST risk modifies the effect of the MISTIE-III procedure on motor recovery and disability.

15.
arXiv (CS.AI) 2026-06-16

Bayesian 3D Steerable CNNs: Enabling Equivariance and Uncertainty Quantification Simultaneously

arXiv:2606.15479v1 Announce Type: cross Abstract: Steerable convolutional neural networks (Steerable-CNNs) guarantee SE(3)-equivariance by parameterizing kernels as linear combinations of steerable basis functions, but their deterministic nature precludes uncertainty quantification - limiting their use in settings where confidence estimates are essential. We propose a Bayesian Steerable-CNN that places posterior distributions over the basis coefficients, yielding stochastic kernels while preserving equivariance exactly. The loss function of the model is obtained via variational inference and minimized by Bayes-by-Backpropagation. The framework admits a decomposition of predictive uncertainty into epistemic and aleatoric components. Empirically, the model attains competitive classification accuracy alongside an expected calibration error of 0.0263 and outperforms its deterministic counterpart by up to 6.17% under distributional shift induced by additive Gaussian noise. Furthermore, we leverage the model's uncertainty estimates to enhance its performance significantly, achieving a notable gain - approximately 4% higher accuracy across 84% of the test dataset. A statistically significant negative correlation between epistemic uncertainty and prediction error confirms that the learned posterior variance is semantically meaningful. The framework unifies Bayesian uncertainty quantification with the inductive bias of equivariant CNNs.

16.
medRxiv (Medicine) 2026-06-15

Unveiling the Awareness of Private Health Insurance Coverage among Healthcare Professionals in Freetown, Sierra Leone: Insights Extracted from Their Perspectives.

Our study is an assessment of the knowledge, personal coverage, and related determinants of private health insurance as revealed by healthcare professionals in Freetown, the urban capital of Sierra Leone. This study stands as a precursor for Low- and Middle-Income Countries (LMICs), like Sierra Leone, seeking to establish Universal Health Coverage (UHC) to provide healthcare access and coverage through publicly arranged risk pooling, designed to help protect against unmanageable medical costs. In parallel, such countries face significant challenges with achieving sustainable universal coverage due to limited public resources, inefficient allocation systems, uneasy reliance on out-of-pocket payments, and large struggling populations. Our research sheds particular light on how healthcare professionals view their own participation with private healthcare options. A cross-sectional, analytical study was conducted, openly recruiting individuals from various facilities in Freetown. Using the Yamane Formula, a sample size of 109 participants was calculated. STATA 14.0 was used for data analysis. Our findings revealed that 96 (88.9%) participants did not have private health insurance, while 12 (11.1%) did have private coverage. However, 105 (97.2%) reported other modes of health insurance, with only 3 (2.8%) uninsured. Notably, 97.2% expressed willingness to join a private health insurance scheme. Our study found no statistically significant associations between selected indicators (demographic or socioeconomic fac tors) and current insurance coverage among study participants. These results highlight a low prevalence and understanding of private health insurance among healthcare professionals in a representative urban center in Sub-Saharan Africa (SSA), while acknowledging high willingness to enroll. The lack of any significant determinants suggests other unexamined factors, such as cost, accessibility, or awareness, capable of influencing the adoption and implementation of a universal health program.

17.
arXiv (CS.LG) 2026-06-12

Quantum Reservoir Computing for Short-Term Power Load Forecasting in Resource-Constrained Energy Systems

arXiv:2606.12806v1 Announce Type: cross Abstract: Short-term load forecasting is essential for reliable energy management, but practical deployment on edge devices requires models that remain accurate under limited memory, finite measurement budgets, and hardware noise. This work proposes a hardware-efficient Quantum Reservoir Computing (QRC) framework for energy load forecasting, where a fixed quantum reservoir transforms temporal input windows into high-dimensional features and only a classical Elastic Net readout is trained. To reduce deployment cost, the trained readout is compressed using post-training fixed-point quantization at bit widths from 8 to 2 bits. The framework is evaluated on the Tetouan and Spain energy load datasets under exact statevector simulation, 512-shot finite sampling, and realistic hardware-noise models from IBM FakeTorino and IBM FakeMarrakesh. Results show that 6-bit readout precision preserves full-precision forecasting performance while reducing readout memory by 81.2%. Below this point, degradation becomes dataset dependent, with Tetouan showing stronger sensitivity and Spain degrading more gradually. Hardware-noise validation further shows that the trained readout transfers to noisy reservoir states without retraining. These findings support quantized QRC as a resource-aware forecasting approach for near-term quantum time-series applications.

18.
arXiv (CS.AI) 2026-06-19

ELVA: Exploring Ranking-Driven Universal Multimodal Retrieval

arXiv:2606.20280v1 Announce Type: cross Abstract: Leveraging Multimodal Large Language Models (MLLMs) via contrastive learning has become a mainstream paradigm for improving the performance of Universal Multimodal Retrieval (UMR). However, previous works have ignored the grain blindness when adapting the contrastive paradigm into retrieval tasks. Grain blindness refers to the tendency of the model to overlook grain-level information contained in the query, which is crucial for effectively handling complex queries. This stems from contrastive learning treating samples as a binary classification (positive/negative), while ignoring the different information carried by each negative sample. To address this, we argue that negatives should be treated differently according to their similarity to the positive sample, enabling the model to learn distinct grain information from each negative. In this paper, we introduce a simple but effective framework, called ELVA, a novel rule-based RL framework that mitigates grain blindness through ranking-driven MLLMs. 1) Instead of relying on reward models, we extend Reinforcement Learning with Verifiable Rewards (RLVR) to retrieval tasks, allowing the model to explore new ranking behaviors without explicit ranking labels. 2) By utilizing rule-based rewards, our approach jointly optimizes the ranking of negative samples while enlarging the similarity gap between positive and negative. To more precisely measure grain blindness, we further introduce MRBench, a new benchmark specifically designed for multi-grain query scenarios. ELVA achieves state-of-the-art results across standard retrieval benchmarks, and its notable 13.1% improvement on MRBench further demonstrates its effectiveness in alleviating grain blindness.

19.
arXiv (CS.LG) 2026-06-15

Direct/adaptive-mixture phase-gradient learning for neural-network quantum states with complex phase structure

arXiv:2606.13912v1 Announce Type: cross Abstract: Neural-network quantum states (NQS) are a leading variational tool for quantum many-body physics, yet their optimization is fragile whenever the ground state carries a non-trivial sign or complex phase structure, a situation generic to gauge fields, broken time-reversal symmetry, and fermionic statistics. We trace this fragility to the stochastic estimator of the phase gradient rather than to network expressiveness. The phase sector of the Monte Carlo energy gradient is a noisy score-function estimator; differentiating the local energy instead yields a direct estimator that is unbiased for the same phase force, has far lower variance, and requires only a separated amplitude–phase ansatz. Demonstrated on a 100-site flux ladder, a small network trained this way reaches $0.89\%$ median error, where tuned standard baselines plateau at $1.8\%$ and wider or deeper standard-gradient networks degrade from $8.4\%$ to $24.6\%$. The advantage carries over to chiral XXX chains: the direct estimator again converges to a markedly lower error than the standard one, across $\alpha$ and size; it grows with flux and vanishes in zero-flux controls. An adaptive-mixture of the two estimators is provably never worse in variance than the better endpoint at the optimal mixing coefficient, with seed-resolved diagnostics tracing much of the gain to eliminating failed runs. Estimator design thus emerges as a first-class lever for complex-valued neural quantum states.

20.
arXiv (CS.CL) 2026-06-17

Algorithmic Prompt Generation for Diverse Human-like Teaming and Communication with Large Language Models

Understanding how humans collaborate and communicate in teams is essential for improving human-agent teaming and AI-assisted decision-making. However, relying solely on data from large-scale user studies is impractical due to logistical, ethical, and practical constraints, necessitating synthetic models of multiple diverse human behaviors. Recently, agents powered by Large Language Models (LLMs) have been shown to emulate human-like behavior in social settings. But, obtaining a large set of diverse behaviors requires manual effort in the form of designing prompts. On the other hand, Quality Diversity (QD) optimization has been shown to be capable of generating diverse Reinforcement Learning (RL) agent behavior. In this work, we combine QD optimization with LLM-powered agents to iteratively search for prompts that generate diverse team behavior in a long-horizon, multi-step collaborative environment. We first show, through a human-subjects experiment, that humans exhibit diverse coordination and communication behavior in this domain. We then present a series of experiments showing that our approach captures behaviors that are difficult to observe without large-scale data collection, and a follow-up user study to show that these generated behaviors are human-like. Our findings highlight the combination of QD and LLM-powered agents as an effective tool for studying teaming and communication strategies in multi-agent collaboration.

22.
arXiv (CS.AI) 2026-06-11

CHORUS: Decentralized Multi-Embodiment Collaboration with One VLA Policy

arXiv:2606.12352v1 Announce Type: cross Abstract: Multi-robot collaboration allows robots to efficiently take on a wide range of tasks, from moving a couch through a doorway to assembling structures on a construction site. However, achieving such coordination in mobile multi-robot settings remains challenging: centralized methods conditioned on the combined observations of a team scale poorly with team size, and decentralized methods that train one policy per robot often require explicit alignment procedures or information sharing at inference time to overcome partial observability. Our key insight is that the visuomotor priors of pretrained vision-language-action (VLA) models should enable reactive, decentralized collaboration from each robot's local observations alone, without these inference-time assumptions. We propose CHORUS, a framework that adapts a single VLA backbone to control diverse, multi-robot teams. At inference time, each robot runs an independent copy of CHORUS, conditioned only on its own observations and a robot-identifying prompt. In real-world experiments including mobile tape measurement, library book handovers, and laundry basket lifting, CHORUS achieves a 64% point improvement over decentralized, from-scratch models, improves reactivity to teammate behavior by 40% points, and outperforms centralized baselines. Together, these results show that a shared VLA backbone is capable of achieving decentralized multi-robot collaboration, without per-robot policies or inter-robot communication at inference.

23.
arXiv (CS.LG) 2026-06-11

FlexiBrain: Resolution-Agnostic Voxel-Level Encoding for Native fMRI

arXiv:2606.11500v1 Announce Type: cross Abstract: The success of large-scale deep learning models in neuroscience is fundamentally constrained by severe data heterogeneity. Native fMRI data aggregated from diverse sources exhibit substantial variation in both spatial and temporal resolutions. Consequently, most existing frameworks rely on lengthy, rigid preprocessing pipelines that enforce uniformity across datasets. This practice introduces two critical limitations: (1) potential degradation of subject-specific anatomical information; (2) significant computational overhead, often requiring hours of processing per subject. Here, we propose FlexiBrain, a resolution-agnostic voxel-level encoding framework for native fMRI based on Mamba-JEPA. FlexiBrain defines patch sizes in real-world physical units and employs a dynamic patch resizing, thereby bypassing destructive spatial standardization while enabling direct ingestion of data in native space. We instantiate the framework using an efficient Mamba-JEPA backbone to model high-dimensional 4D fMRI signals. Across five diverse downstream neuroscience tasks, FlexiBrain consistently outperforms recent state-of-the-art methods, achieving gains of up to 12 percentage points without external data augmentation. Importantly, FlexiBrain functions as a seamless plug-in module, substantially reducing preprocessing costs and accelerating the development of robust voxel-level fMRI foundation models. Code is available at https://github.com/OneMore1/FlexiBrain.

24.
arXiv (CS.CV) 2026-06-16

Multi-HMR 2: Multi-Person Camera-Centric Human Detection, Mesh Recovery and Tracking

Most advances in human mesh recovery (HMR) have focused on pelvis-centered recovery, overlooking metric 3D localization and detection accuracy in the camera coordinate system - two key factors for real-world applications such as human-robot interaction and social scene understanding. Current evaluation protocols often ignore these aspects, emphasizing per-person, root-centered recovery rather than camera-space perception. As a result, existing approaches rely on fixed camera assumptions or handcrafted post-processing, limiting their robustness and practical deployment. We introduce Multi-HMR 2, a simple yet robust DETR-based framework for Multi-person Camera-centric Human detection, mesh Recovery, and tracking. Multi-HMR 2 predicts a scene-consistent camera together with human meshes, enabling metric 3D localization without ground-truth intrinsics. Moreover, by distilling image-based memory features from SAM2, Multi-HMR 2 extends to tracking, achieving consistent identity association without video supervision. Despite its conceptual simplicity - no handcrafted components, no video input, and no ground-truth cameras - Multi-HMR 2 achieves state-of-the-art pelvis-centered performance while substantially improving detection accuracy and metric 3D localization.

25.
arXiv (CS.LG) 2026-06-12

Self-Guidance: Enhancing Neural Codecs via Decoder Manifold Alignment

arXiv:2606.12940v1 Announce Type: cross Abstract: Neural speech codecs based on Vector-Quantized VAEs (VQ-VAEs) are core audio tokenizers for speech LLMs, yet their reconstruction fidelity is bottlenecked by quantization error. Modifying the quantizer or increasing model capacity are common fixes, but they complicate downstream language modeling. Our core idea is to align the decoder's internal feature manifolds when processing both the quantized tokens and their original continuous embeddings, using a lightweight feature-mapping loss. This requires minimal training overhead and no inference-time changes. Applied to XCodec2, self-guidance improves all reconstruction metrics, achieving state-of-the-art low-bitrate performance. Notably, it enables a 4x codebook reduction without fidelity loss, which downstream TTS experiments show significantly improves LLM-based synthesis by simplifying the token modeling space. Multiple statistical observations and visualizations corroborate the enhanced internal manifold alignment in the decoder. Extensive experiments confirm its generality across various inductive biases. Self-guidance thus establishes an efficient, broadly applicable method for high-fidelity neural audio coding.