Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-16

A spectral audit framework reveals task-dependent aperiodic reliance across EEG and ECG deep learning

arXiv:2606.08583v2 Announce Type: replace Abstract: Deep learning on physiological time series is interpreted through domain-specific features – oscillatory rhythms in EEG, morphological complexes in ECG – yet these signals sit atop a broadband aperiodic 1/f-like envelope that covaries with arousal, age, and pathology. We introduce a spectral audit framework combining aperiodic/periodic decomposition, phase-preserving Fourier interventions, sham controls, and simulation validation. Aperiodic reliance was task-dependent and architecture-general: across six neural architectures, flattening drops exceeded 0.42 balanced-accuracy points for sleep-wake classification, reached 0.07-0.13 for clinical abnormality detection, and remained minimal for motor imagery. Six of seven EEG foundation models showed FDR-significant aperiodic reliance on clinical EEG; age/sex and recording-era controls reduced but did not eliminate the effect. Applying the audit to PTB-XL ECG revealed neural drops of 0.32–0.36 persisting after demographic matching, confirming this confound class extends beyond EEG. Aperiodic controls should become standard for interpretable physiological time-series deep learning.

02.
arXiv (CS.CV) 2026-06-16

Timestep Rescheduling in Diffusion Inversion

Diffusion inversion, which maps images back to the Gaussian latent space of a diffusion model, is a critical task for image reconstruction and editing. While DDIM enables fast deterministic inversion, it inherently introduces deviations that accumulate into noticeable inversion errors. Existing methods often address this by solving a fixed-point problem but largely overlook how the selection of the diffusion timestep in the noise scheduler influences inversion fidelity. In this work, we reveal that the deviation scale in diffusion inversion is strongly dependent on the timestep size, and exhibits a parabolic trend, with larger errors concentrated at both small and large timesteps. Based on this finding, we propose a simple yet effective nonuniform timestep scheduler that integrates a global rescaling with a local dynamic programming based rescheduling, enabling a strategic allocation of computational effort that minimizes the overall inversion error and preserves higher inversion accuracy. Our method serves as an off-the-shelf enhancement for existing inversion techniques and requires no extra parameters or computational overhead. Through extensive experiments, we verify that integrating our scheduler consistently boosts the performance of existing inversion methods, achieving superior results in image reconstruction and editing.

03.
arXiv (CS.CV) 2026-06-15

Encoder Winners Do Not Reliably Transfer Across VLA Backbone Scale: A Frozen-Backbone Grafting Diagnostic

Vision-language-action (VLA) policies typically inherit their vision encoder from upstream VLM releases, but it is unclear whether an encoder choice validated on a small VLA transfers to a larger backbone. We introduce a frozen-backbone grafting diagnostic: the vision tower of a released VLA is replaced by a candidate encoder under a fixed protocol (adaptive average pooling, LayerNorm, and a single trainable linear projector), with the language model and action expert frozen. Across four encoders, two LIBERO suites, two backbones (SmolVLA-450M and $\pi_{0.5}$-3.3B), and two-to-three seeds per cell (40 main grafting runs plus native, LoRA, pooling, and zero-/shuffled-image controls, all scored by offline action MSE), the small-backbone winner does not reliably select the large-backbone top tier: SigLIP is best on SmolVLA across both suites, while on $\pi_{0.5}$ DINOv2-small leads the spatial suite and the object suite is a seed-sensitive near-tie band; three of the four backbone-suite comparisons (and 11 of 12 seed-level cells) support backbone-dependent rankings. The grafting wrapper is itself non-neutral with opposite sign across backbones (+45-56% MSE on the SmolVLA native tower, -50-52% on $\pi_{0.5}$), so all conclusions are conditional on the fixed grafting protocol. We position frozen grafting as a cheap target-backbone diagnostic to run before committing to an encoder at scale, not as a closed-loop deployment claim.

04.
bioRxiv (Bioinfo) 2026-06-15

Inferring Cell Fate Trajectories in Time-Resolved Metabolic RNA Labeling data

Single-cell RNA sequencing provides high-resolution snapshots of cellular states but lacks direct information about transcriptional dynamics. Metabolic RNA labeling addresses this limitation by distinguishing newly synthesized RNA, offering insight into the direction of cell state changes, and providing valuable information when attempting to recover the underlying continuous dynamics from static snapshots of cell distributions. However, existing trajectory inference methods do not fully exploit this additional signal. Here, we propose FLOWSATATE, a framework for single-cell trajectory inference that leverages time-resolved RNA labeling within an Optimal Transport setting. We model cell dynamics as a gradient flow in an inferred potential landscape parameterized by a neural network, integrating both total and labeled RNA across time points. The learned potential enables identification of key genes and transcription factors driving cell fate decisions and supports prediction of future cellular states. We benchmark our approach on its ability to generalize unseen data and recover coherent trajectories. We also apply it to study colorectal cancer response to demethylation treatment as well as neuronal differentiation of embryonic stem cells.

05.
arXiv (CS.LG) 2026-06-15

Neither Parallel Nor Sequential: How DiffusionGemma Actually Commits Tokens

arXiv:2606.14620v1 Announce Type: new Abstract: Open diffusion language models are marketed as parallel, non-autoregressive decoders, yet the order in which a shipped checkpoint actually commits its tokens is almost never measured. We instrument DiffusionGemma 26B, a masked discrete-diffusion mixture-of-experts model built on Gemma 4, hooking its sampler's accept step to record which canvas positions commit, when, and at what confidence. Across a 686-prompt, six-regime probe suite we find that its decoding is neither parallel nor block-autoregressive: it follows a partial left-to-right commit bias whose apparent strength depends almost entirely on the granularity at which you look. Order is weak token by token and strengthens smoothly as the analysis is coarsened, so the model's "block size" turns out to be an artifact of the measuring ruler rather than the architecture. The model commits in large simultaneous batches, leaving much of the within-batch order genuinely undefined rather than merely unobserved. The behaviour is regime-dependent: structured JSON is committed in essentially arbitrary order, and a position's commit confidence tracks correctness on mathematical reasoning but carries no signal on factual recall. Commitment is aggressive, finishing in a short late burst well inside the step budget, while task accuracy matches the model's autoregressive Gemma-4 sibling. Beyond these findings, our central contribution is methodological: measuring decoding order honestly demands handling trailing-EOS padding, within-regime confounding, commit non-monotonicity, block-size sensitivity, and large commit-batch ties, each of which can otherwise manufacture a decoding-order result that is not really there.

06.
Nature (Science) 2026-06-17

A prototype differential atom interferometer for fundamental physics

Gravitational waves and ultralight dark matter are among the most compelling frontiers in fundamental physics, motivating proposals for very-long-baseline atom interferometerssuch as AION1, MAGIS2, AICE3 and AEDGE4 that aim to detect at frequencies at which ground-based5 and space-borne6 laser interferometers lose sensitivity. Very-long-baseline atom interferometers look for signals by comparing the quantum phase evolution of widely separated atomic ensembles interrogated by a common laser. However, their performance depends critically on suppressing noise sources, particularly laser phase noise. The experimental validation of such noise rejection remains an important challenge. Here we demonstrate a prototype differential atom interferometer based on the single-photon clock transition of fermionic 87Sr. Thus, we obtain a gradiometer configuration with a species intrinsically suited to kilometre-scale and space-baseline operation. The instrument operates at the standard quantum limit7 with no excess noise beyond atom shot noise. The differential configuration maintains quantum-limited sensitivity in the presence of several radians of artificially injected laser phase noise per shot, which emulates the conditions expected in a very-long-baseline atom interferometer. We also demonstrate the recovery of coherent oscillatory signals across a broad frequency range under fully phase-randomized conditions, a capability that is inaccessible to a single interferometer operating in the same regime. These results provide an experimental validation of the noise-immune measurement principle underlying very-long-baseline atom interferometers and mark an important step towards next-generation quantum sensors for gravitational-wave detection and searches for ultralight dark matter8,9. A prototype differential atom interferometer operates at the standard quantum limit with no excess noise beyond atom shot noise, achieving performance in line with the specifications for future long-baseline atom interferometers.

07.
arXiv (CS.AI) 2026-06-12

MiniMax Sparse Attention

arXiv:2606.13392v1 Announce Type: new Abstract: Ultra-long-context capability is becoming indispensable for frontier LLMs: agentic workflows, repository-scale code reasoning, and persistent memory all require the model to jointly attend over hundreds of thousands to millions of tokens, yet the quadratic cost of softmax attention makes this untenable at deployment scale. We introduce MiniMax Sparse Attention (MSA), a blockwise sparse attention built upon Grouped Query Attention (GQA). A lightweight Index Branch scores key-value blocks and independently selects a Top-k subset for each GQA group, enabling group-specific sparse retrieval while maintaining efficient block-level execution; the Main Branch then performs exact block-sparse attention over only the selected blocks. Designed around a principle of simplicity and scalability, MSA is deliberately streamlined, making it straightforward to deploy efficiently across a broad range of GPUs. To translate sparsity into practical speedups, we co-design MSA with a GPU execution path that uses exp-free Top-k selection and KV-outer sparse attention to improve tensor-core utilization under block-granular access. On a 109B-parameter model with native multimodal training, MSA performs on par with GQA while reducing per-token attention compute by 28.4x at 1M context. Paired with our co-designed kernel, MSA achieves 14.2x prefill and 7.6x decoding wall-clock speedups on H800. Our inference kernel is available at: https://github.com/MiniMax-AI/MSA. A production-grade natively multimodal model powered by MSA has been publicly released at: https://huggingface.co/MiniMaxAI/MiniMax-M3.

08.
arXiv (CS.AI) 2026-06-11

GILT: An LLM-Free, Tuning-Free Graph Foundational Model for In-Context Learning

arXiv:2510.04567v3 Announce Type: replace-cross Abstract: Graph Neural Networks (GNNs) are powerful tools for processing relational data but often struggle to generalize to unseen graphs, giving rise to the development of Graph Foundational Models (GFMs). However, current GFMs are challenged by the extreme heterogeneity of graph data, where each graph can possess a unique feature space, label set, and topology. To address this, two main paradigms have emerged. The first leverages Large Language Models (LLMs), but is fundamentally text-dependent, thus struggles to handle the numerical features in vast graphs. The second pre-trains a structure-based model, but the adaptation to new tasks typically requires a costly, per-graph tuning stage, creating a critical efficiency bottleneck. In this work, we move beyond these limitations and introduce Graph In-context Learning Transformer (GILT), a framework built on an LLM-free and tuning-free architecture. GILT introduces a novel token-based framework for in-context learning (ICL) on graphs, reframing classification tasks spanning node, edge and graph levels in a unified framework. This mechanism is the key to handling heterogeneity, as it is designed to operate on generic numerical features. Further, its ability to understand class semantics dynamically from the context enables tuning-free adaptation. Comprehensive experiments show that GILT achieves stronger few-shot performance with significantly less time than LLM-based or tuning-based baselines, validating the effectiveness of our approach. Our code is available at: https://github.com/yiming421/inductnode/.

09.
arXiv (CS.CV) 2026-06-24

Training-Time Optical Priors for Wireless Capsule Endoscopy Classification: Hemoglobin-Aware Input Fusion with Cross-Vendor Evaluation

Background. RGB-trained classifiers for wireless capsule endoscopy (WCE) conflate hemoglobin contrast with bile staining and illumination falloff, limiting sensitivity to small-vessel vascular findings such as Lymphangiectasia. We introduce a physics-informed framework that injects an analytic, Monte-Carlo-inspired hemoglobin prior into a standard classifier purely at training time – to our knowledge the first use of an explicit optical light-transport prior in WCE classification. Methods. On Kvasir-Capsule (47,238 frames, 43 patients, 11 evaluable classes; patient-disjoint split) we test, across 6 seeds against an RGB-only EfficientNet-B0 baseline: (i) a 5-channel input-fusion variant feeding the prior P_blood alongside RGB; (ii) a distillation variant that runs on plain 3-channel RGB at inference; and (iii) a three-stream extension adding a temporal Transformer and an autoencoder-residual stream. We replicate across ResNet-18 and ConvNeXt-Tiny and report cross-vendor zero-shot transfer on the public Galar cohort. Results. Input fusion lifts cross-seed macro-AUC 0.760 -> 0.783 (5/6 seeds positive); distillation reaches 0.773; the three-stream model reaches 0.804 (+0.044 over baseline, paired DeLong p < 1e-4). Lymphangiectasia AUC rises 0.238 -> 0.337, sign-consistent across all 6 seeds. A four-variant ablation reveals a parameterization-mechanism boundary: only the spatial-channel form lifts. Cross-vendor zero-shot on Galar retains ~60% of the ConvNeXt-Tiny lift.

10.
arXiv (CS.CV) 2026-06-18

Cosmos 3: Omnimodal World Models for Physical AI

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI – effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 License at https://github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3. The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3.

11.
bioRxiv (Bioinfo) 2026-06-23

EnrichViz: An Interactive R Shiny Application for Visualization of Pathway Enrichment Results from Omics Data

Authors:

Pathway and functional enrichment analysis is a cornerstone of omics data interpretation, enabling researchers to map differentially expressed proteins or genes onto curated biological processes, signaling cascades, and molecular functions. While tools such as Ingenuity Pathway Analysis (IPA), g:Profiler, and Enrichr are widely used to generate ranked enrichment results, translating these tabular outputs into clear, publication-ready figures remains a time-consuming step that typically requires custom scripting and familiarity with visualization libraries, a significant barrier for researchers without a computational background. Here we present EnrichViz, a self-contained, browser-based R Shiny application that enables interactive, code-free visualization of pathway and functional enrichment results from quantitative proteomics, transcriptomics, and metabolomics experiments. EnrichViz accepts three standard CSV files as input, a normalized abundance matrix, a sample annotation or metadata file, and enrichment results from any platform that exports tabular output, and produces six complementary, publication-ready visualizations: bar and bubble plots for ranking enriched terms by significance, chord diagrams for exploring pathway-molecule connectivity, clustered heatmaps for displaying Z-score normalized expression patterns across experimental groups, and boxplots or violin plots for examining the abundance distribution of individual proteins, genes, or metabolites. The application supports both raw p-values and pre-transformed -log10(p) values through automatic detection, and all plot parameters are adjustable in real time through a graphical sidebar. Every figure can be exported as a high-resolution PNG file at 300 dpi. EnrichViz is implemented in R using the Shiny, ggplot2, pheatmap, and circlize packages, and is freely available at https://rgmilian.shinyapps.io/EnrichViz/

12.
arXiv (CS.AI) 2026-06-24

DTT-BSR+: A Generative-Regression Cascade for Music Source Restoration

arXiv:2606.24127v1 Announce Type: cross Abstract: Music source restoration (MSR) requires jointly addressing source unmixing and the inversion of non-linear production effects. Current methods struggle to achieve accurate target signal reconstruction while maintaining semantic consistency. To address this limitation, we propose DTT-BSR+, a two-stage cascade MSR system that decouples distribution fitting from signal reconstruction into separate stages. A generative DTT-BSR separator in the first stage produces stems matching the prior of clean sources, and a modified Demucs network in the second stage enhances the first stage output using time-domain and multi-resolution spectral losses. DTT-BSR+ improves multi-mel signal-to-noise ratio (MMSNR) over the single-stage DTT-BSR across all stems, and surpasses the state-of-the-art X-LANCE MSR system on five stems. We also reveal through Fréchet Audio Distance (FAD) decomposition an implicit trade-off between signal reconstruction accuracy and semantic distribution fitting across stems.

13.
arXiv (CS.CV) 2026-06-17

Where Should Action Generation Begin? A Learnable Source Prior for Generative Robot Policies

Generative robot policies typically begin action generation from an observation-independent standard Gaussian distribution, leaving the choice of source distribution underexplored. This work asks a simple question: where should action generation begin? We propose LeaP, a Learnable source Prior that replaces the standard Gaussian with a proprioception-conditioned diagonal Gaussian over action chunks. Parameterized by a lightweight MLP, LeaP jointly predicts the mean and state-adaptive variance of the source distribution, while keeping the downstream generator architecture and inference solver unchanged. This design provides an observation-informed yet stochastic initialization, allowing the generator to focus on precise action refinement rather than transporting samples from an uninformed noise source. On 15 RoboTwin manipulation tasks, LeaP achieves an average success rate of 81.6%, outperforming four representative baselines – including deterministic-source methods, a no-prior counterpart, and a diffusion-bridge policy – by 6.5 to 25.5 percentage points. The same prior consistently improves both flow-matching and diffusion-bridge generators, while using fewer parameters and converging faster. The advantage carries over to real-world deployment, where LeaP attains the best performance. These results suggest that the source distribution is an independent and reusable design axis for generative robot policies, complementary to the choice of generative dynamics.

14.
arXiv (CS.CL) 2026-06-16

Attention, not scale, drives human-AI alignment in multimodal language prediction

Humans routinely draw on visual context to predict upcoming words. To what extent current vision-language models produce comparable behaviour is unclear. Here we placed five state-of-the-art pretrained systems side-by-side with 600 human participants in a web-based Visual-World Paradigm. On each of 100 six-second movie clips, models and participants received either text only or synchronised video and text and judged how likely a specified target word was to appear next; human eye movements were tracked throughout. Adding visual context increased model-human alignment in predictability ratings across all architectures (average Delta r = 0.18) with no impact of parameter size. When visual context was informative, transformer attention significantly increased alignment. Attention maps from two transformer models corresponded with human gaze, explaining up to 70% of the inter-participant variance when the scene contained informative cues. Notably, cross-modal attention reliably tracked anticipatory human fixations on semantic cues. These results suggest that current transformer-based vision-language models can approximate human behaviour exploiting visual context during language prediction - and that selective attention to informative cues, not sheer model scale, is the principal driver of this alignment.

15.
arXiv (CS.CV) 2026-06-24

HANCLIP: A Family of Hyperbolic Angular Negation Vision Language Models

Vision-Language Models (VLMs) are typically pre-trained on large-scale image-text datasets to capture semantic correspondences between visual content and natural language. However, they remain surprisingly brittle to negation: models often rely on shallow word co-occurrence and are easily distracted by misleading or irrelevant textual cues, even when their overall retrieval or classification performance is strong. Moreover, directly finetuning on negation data can interfere with previously acquired knowledge, causing noticeable degradation on standard vision-language benchmarks. To tackle these issues, this work introduces HANCLIP (Hyperbolic + Angular + Negation), a family of VLMs that explicitly restructures the embedding space to encode "what an image is not" alongside "what it is." HANCLIP is trained on a compact set of 20,000 image-text quadruplets and combines a hyperbolic formulation, which models hierarchical semantic relations and asymmetries, with an angular triplet objective that drives systematic separation between negated descriptions and their corresponding positives. This geometry-aware design strengthens negation sensitivity while preserving the global structure of pretrained representations, rather than overwriting them. Extensive experiments across multiple vision-language tasks show that HANCLIP delivers consistent gains on the negation-focused NegBench benchmark, while maintaining competitive or improved performance on standard classification and image-text retrieval benchmarks. The framework is model-agnostic and can be plugged into CLIP, LongCLIP, SmartCLIP, and HiMo-CLIP without large-scale retraining, demonstrating that a carefully designed geometric objective can substantially extend the reasoning capabilities of existing VLMs using only modest additional data.

16.
arXiv (CS.LG) 2026-06-18

P$^2$CE: Model-Agnostic Plausible Pareto-Optimal Counterfactual Explanations

arXiv:2606.18418v1 Announce Type: new Abstract: The increasing use of machine learning algorithms in social applications has raised concerns about fairness and transparency, leading to the development of counterfactual explanations. These explanations supports individuals to understand and potentially alter unfavorable decisions in areas such as loan applications, job selections, and more, by providing actionable changes to input features that would lead to a desired outcome. Existing methods often struggle to balance feasibility, plausibility, and computational efficiency. To address this, we introduce P$^2$CE, an algorithm for generating plausible Pareto-optimal counterfactual explanations, offering users a diverse set of optimal trade-offs between different notions of feasibility. P$^2$CE employs an auxiliary isolation forest outlier detector to ensure that explanations are in accordance with the data distribution and leverages SHAP values to obtain optimal results with short computing times, regardless of the underlying model. Our algorithm was empirically evaluated on three datasets, demonstrating superior performance in terms of both solution quality and computational efficiency compared to related techniques.

18.
arXiv (CS.LG) 2026-06-15

Adaptive Nucleus Truncation for Long-Form Reasoning

arXiv:2606.13982v1 Announce Type: cross Abstract: Sampling plays an important role in long-form language-model reasoning. Over thousands of decoding steps, small changes in the candidate token set can compound into different reasoning trajectories, stability profiles, and final answers. Existing truncation methods such as top-$p$, min-$p$, and fixed top-$n\sigma$ sampling improve over unrestricted sampling, but they rely on fixed thresholds that cannot adapt to changes in entropy, task difficulty, training stage, or generation budget. We introduce Adaptive Nucleus Truncation Sampling (ANTS), which extends top-\(n\sigma\) sampling from a fixed decoding rule into an adaptive rollout-control mechanism for long-form generation. ANTS selects standardized neighborhoods around the maximum logit before temperature scaling, adapts the truncation width using an entropy-conditioned controller, and retains a no-truncation fallback arm to stabilize training when truncation becomes unsafe. On a 33B-total / 4B-active sparse Mixture-of-Experts reasoning model, ANTS improves average performance over percentage-based benchmarks by +1.9, +3.8, and +5.2 points at 8K, 16K, and 32K generation budgets, respectively. The strongest gains appear on instruction following and mathematical reasoning, with IFBench improving by more than 10 points at 32K and AIME 2025 improving by 7 points. Code generation reveals an important budget interaction. On Codeforces, ANTS trails the baseline at 8K, but reverses this gap and substantially improves ELO at 16K and 32K. These results suggest that sampler design should be treated not just as a decoding hyperparameter, but as part of how we stabilize and scale long-budget reasoning.

19.
medRxiv (Medicine) 2026-06-17

Perceptions of aging well among older adults with heart failure: insights from a qualitative study

Background: Heart failure (HF) is a prevalent and often debilitating cardiovascular condition among older adults, frequently accompanied by multimorbidity, functional limitations, and the need to age in place. Traditional models of successful aging emphasize disease absence and preserved function, yet most individuals with HF live with ongoing symptoms and chronic health challenges. How older adults with HF define aging well, particularly across different socioeconomic contexts, remains underexplored. Objectives: To explore how older adults with HF conceptualize aging well and to identify perceived facilitators and barriers across more and less resourced New York City neighborhoods. Methods: We conducted semi-structured interviews with 20 adults diagnosed with HF residing in Manhattan and Brooklyn neighborhoods classified by 2019 United States Census data. Interviews were guided by Rowe and Kahn's model. Transcripts were analyzed using an inductive-deductive thematic approach and interpreted in alignment with the Healthy People 2030 framework. Results: Participants had a mean age of 69 years; 50% identified as Black and 50% were women. Despite functional limitations, 65% reported aging well. Five themes emerged: maintaining physical function, maintaining cognitive function, sustaining social relationships, avoiding pain, and promoting overall well-being. Avoiding pain and promoting well-being extended beyond traditional models. Neighborhood context shaped priorities, with financial stability emphasized in more affluent areas and social cohesion prioritized in less affluent communities. Conclusions: Older adults with HF frequently perceive themselves as aging well despite chronic illness, reframing successful aging beyond disease avoidance. These findings support a patient-centered, place-informed model of aging well with implications for healthcare delivery and policy.

20.
arXiv (CS.CL) 2026-06-15

Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation

The remarkable achievements of Large Language Models (LLMs) have led to the emergence of a novel recommendation paradigm – Recommendation via LLM (RecLLM). Nevertheless, it is important to note that LLMs may contain social prejudices, and therefore, the fairness of recommendations made by RecLLM requires further investigation. To avoid the potential risks of RecLLM, it is imperative to evaluate the fairness of RecLLM with respect to various sensitive attributes on the user side. Due to the differences between the RecLLM paradigm and the traditional recommendation paradigm, it is problematic to directly use the fairness benchmark of traditional recommendation. To address the dilemma, we propose a novel benchmark called Fairness of Recommendation via LLM (FaiRLLM). This benchmark comprises carefully crafted metrics and a dataset that accounts for eight sensitive attributes1 in two recommendation scenarios: music and movies. By utilizing our FaiRLLM benchmark, we conducted an evaluation of ChatGPT and discovered that it still exhibits unfairness to some sensitive attributes when generating recommendations. Our code and dataset can be found at https://github.com/jizhi-zhang/FaiRLLM.

21.
arXiv (quant-ph) 2026-06-15

Local correlations in long-range dual-unitary kicked Hamiltonian chains

arXiv:2606.13857v1 Announce Type: new Abstract: Many-body Floquet models with exact space–time symmetry, such as the kicked Ising spin chain (KIC), provide natural examples of systems with dual-unitary dynamics. The requirement of exact space–time symmetry is, however, highly restrictive, as it permits only nearest-neighbor interactions. Based on a pair of Hadamard matrices, we construct a wide family of dual-unitary kicked spin chains with long-range interactions. We show that local two-point correlations in such models propagate along the light-cone edges \( |n| = r|t| \), where \(r\) is the interaction range, and can be derived analytically for operators with local support. This approach is illustrated using the example of a kicked Ising spin chain with next-to-next-neighbor interactions.

22.
arXiv (CS.AI) 2026-06-24

ASALT: Adaptive State Alignment for Lateral Transfer in Multi-agent Reinforcement Learning

arXiv:2606.24601v1 Announce Type: new Abstract: Multi-agent reinforcement learning (MARL) addresses the problem of training multiple agents that pursue collaborative, competitive, or mixed objectives. Prior work has investigated transfer learning between source and target domains in MARL; however, the majority of existing approaches impose the constraint that the dimensionalities of the observation space and the global state space must be identical across domains. In this paper, we introduce a method that explicitly accommodates mismatched state-space dimensionalities between source and target domains. The proposed approach, ASALT, incorporates both observation-level and state-level adapters that map the target-domain observations and global states into a shared embedding space, thereby enabling more effective transfer of knowledge across both actors and critics. These adapters can generate embeddings that support efficient strategy transfer across heterogeneous domains. Experimental results on multiple configurations in standard benchmark environments demonstrate that ASALT surpasses existing baselines in terms of sample efficiency and global return in cooperative settings, but its effectiveness depends on the degree of mismatch between source and target domains. Furthermore, our findings indicate that ASALT mitigates negative transfer, which frequently constitutes a major obstacle when transferring policies between domains with differing observation and action spaces.

23.
arXiv (CS.CL) 2026-06-19

Vero: An Open RL Recipe for General Visual Reasoning

What does it take to build a visual reasoner that works across charts, science, spatial understanding, and open-ended tasks? The strongest vision-language models (VLMs) suggest that broad visual reasoning is within reach, yet their closed data and reinforcement learning (RL) pipelines make their gains difficult to study, reproduce, or extend. We introduce Vero, a family of fully open VLMs that match or exceed existing open-weight models across diverse visual reasoning tasks. We scale RL data and rewards across six broad task categories, constructing Vero-600K, a 600K-sample dataset from 59 datasets, and designing task-routed rewards that handle heterogeneous answers. Across VeroEval, our 30-benchmark suite, Vero-600K outperforms existing RL datasets under controlled comparisons. Applied to five starting models, Vero variants gain 2.9-5.4 points on average over their initial models. Notably, Vero-Qwen3I-8B, trained on the Instruct model, surpasses Qwen3-VL-8B-Thinking by 3.8 points on average without additional distillation. Systematic ablations reveal that different task categories elicit distinct reasoning patterns and that broad gains depend on learning them jointly rather than in isolation. All data, code, and models are publicly available.

24.
arXiv (CS.CV) 2026-06-16

Open-World Video Segmentation

While video segmentation has advanced rapidly on short clips and closed-set benchmarks, open-world video segmentation remains largely unexplored. The challenge is twofold: (1) existing methods are not designed to support object discovery and identity maintenance in long videos of dynamic ego-motion, and (2) existing evaluation protocols rely on a rigid 1:1 matching that unfairly penalizes semantically valid predictions with mismatched granularity. To address both gaps, we introduce Savvy, a practical and strong system for zero-shot open-world long-horizon video segmentation. Savvy combines hierarchical mask discovery, deferred admission, and track consolidation to support persistent object discovery, safe track promotion, and stable long-range identity maintenance. We further propose OGA, a granularity-aware evaluation suite for open-world video segmentation. Built on a Granularity-Agnostic (GA) matching protocol, OGA relaxes conventional 1:1 matching to an n:1 mapping, but still enforces temporal rigor by detecting support discontinuities through sever points and scoring each reference object through its dominant coherent fragment. This prevents fragmented or flickering support from being over-rewarded while enabling GA-adapted metrics and structural diagnostics: identity persistence (IP), and identity concentration (IC). On VIPSeg, we show that standard 1:1 evaluation substantially underestimates open-world methods, whereas GA evaluation recovers much of their suppressed performance. On the more realistic long-horizon benchmarks: ScanNet and HM3D, Savvy consistently outperforms strong baselines across both classical and proposed metrics, including STQ, VPQ$_\infty$, IP and IC. Together, these results establish a practical benchmark and a strong baseline for open-world long-horizon video segmentation.

25.
arXiv (quant-ph) 2026-06-24

Syndrome aware mitigation of logical errors

arXiv:2512.23810v2 Announce Type: replace Abstract: Broad applications of quantum computers will require error correction (EC). However, hardware roadmaps indicate that physical qubit numbers will remain limited in the foreseeable future, leading to residual logical errors that constrain the size and accuracy of achievable computations. Recent work suggested logical error mitigation (LEM), which applies known error mitigation (EM) methods to logical errors, eliminating their effect at the cost of a runtime overhead. We introduce syndrome-aware logical error mitigation (SALEM), which mitigates logical errors conditioned on the error syndromes measured during error correction. The runtime overhead of SALEM is exponentially lower than that of LEM schemes which do not make use of syndrome data, enabling substantially larger circuit volumes that can be executed accurately. Compared to the routinely used combination of error correction and syndrome rejection (post-selection), SALEM increases the size of reliably executable computations by orders of magnitude. In the practical setting where space and time overheads are fixed and error reduction methods are compared by their resulting estimation errors, we observe a surprising phenomenon: SALEM, which tightly combines EC with EM, can outperform physical EM even above the standard fault-tolerance (pseudo) threshold. Thus, SALEM can make use of EC in regimes of physical error rates where EC is commonly deemed useless.