Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-19

Fidelity bounds for adiabatic gates and other quantum operations with time-dependent dissipation

arXiv:2606.20501v1 Announce Type: new Abstract: As quantum-computing platforms are susceptible to noise, the fidelity of quantum operations is limited by decoherence. Understanding this limitation is crucial for building utility-scale quantum processors. In previous works [Phys. Rev. Lett. 129, 150504 (2022); Quantum 9, 1684 (2025)], we presented analytical formulae for the average gate fidelity of multi-qubit operations under static Markovian noise processes, including operations that temporarily leave the computational subspace. However, some quantum-computing architectures dynamically modulate qubit or coupler frequencies to implement two-qubit gates, e.g., baseband flux gates; such modulation can lead to dissipation rates varying in time. In this Letter, we therefore generalize the fidelity-reduction formulae to encompass time-dependent dissipation. Applying our generalized formula, we obtain a fidelity bound for adiabatic operations and demonstrate that flux-dependent noise sensitivity, combined with qubit-coupler hybridization, significantly reduces the fidelity of adiabatic controlled-Z (CZ) gates in superconducting quantum computers. Our work thus provides essential theoretical tools for evaluating error budgets and optimizing the design of quantum operations in tunable quantum-computing architectures, and may also find applications in quantum-sensing and quantum-communication protocols that are affected by time-dependent dissipation.

02.
medRxiv (Medicine) 2026-06-15

Iron deficiency testing among people with incident heart failure in primary care

Background: Given around 50% of people with heart failure have a degree of iron deficiency, guidelines recommend screening. It is uncertain to what extent this is done in primary care and whether testing is equitable. Aim: To report the proportion of people with incident heart failure who undergo a ferritin test within 12 months. Design and setting: Retrospective primary care cohort study using Clinical Practice Research Datalink Aurum data, between 2016 and 2021. Methods: We report the proportion of adults with an incident diagnosis of heart failure who received a ferritin test within 12 months. Multivariable logistic regression was used to examine the odds of testing based on key demographic covariates and co-morbidities. Results: Among 105,749 individuals with an incident diagnosis of heart failure (mean age 71.6 years, SD 14.3), only 35,688 (33.7%) received a ferritin test within the subsequent year. Increasing age (odds ratio 1.25 per 10-year increase, 95% CI: 1.24-1.27), female sex (male sex OR 0.86, 0.84-0.89) and Asian ethnicity (OR 1.70, 1.59-1.80) were all associated with increased odds of testing as were diagnoses of coeliac disease (OR 1.86, 1.58-2.21), type 1 diabetes (OR 1.82, 1.51-2.19) and cirrhosis (OR 1.64, 1.43-1.87). There was geographic variation in testing, even in adjusted analyses. Conclusion: In a large primary care dataset, two thirds of people with incident heart failure did not receive a ferritin test for iron deficiency within a year of diagnosis demonstrating a gap in current practice and an opportunity for improvements in service delivery.

03.
arXiv (math.PR) 2026-06-18

On two overlooked stick-breaking constructions of the normalized inverse Gaussian process

arXiv:2606.19306v1 Announce Type: new Abstract: We shed light on two alternative stick-breaking constructions of the normalized inverse Gaussian (NIG) random discrete distribution which appear to have been overlooked so far in the Bayesian nonparametric setting. The first is derived from a result in Aldous and Pitman (1998) for the conditional Brownian excursion partition, mixing over the local time at zero up to time one. The second arises as a particular case of a result in James (2013) for priors obtained by a random spatial and temporal change of the normalized generalized Gamma subordinator. Both constructions are in terms of straightforward transformations of standard random variables and can be easily generalized to provide the stick-breaking construction of any element, respectively, in a) the family of mixed Poisson-Kingman models driven by the $1/2$ stable Lévy measure and b) the family of Poisson-Gamma processes driven by the Inverse Gaussian subordinator.

04.
arXiv (CS.CV) 2026-06-11

3D-CBM: A Framework for Concept-Based Interpretability in Generative 3D Modeling

This research introduces a framework for incorporating Concept Bottleneck Models (CBMs) into 3D generative architectures to address the inherent 'semantic gap' in deep geometric learning. As deep models become central to 3D content creation, explainability shifts from a peripheral feature to a fundamental requirement for trust and accountability in safety-critical domains such as healthcare and manufacturing. CBMs provide an intrinsic interpretability solution by constraining latent representations to align with human-defined concepts, yet their application to unstructured 3D data remains largely unexplored. We design, implement, and validate a formal 3D-CBM architecture that maps raw geometric inputs, including point clouds and meshes, into a multi-tiered taxonomy of interpretable primitives and functional attributes. The framework further identifies strategic datasets, such as PartNet and ShapeNet, specialized for concept-based supervision. Experimental results from a 3D part-manipulation proof-of-concept experiment demonstrate the framework's efficacy, achieving a concept prediction accuracy of 88.8\% and a Chamfer Distance of 0.0115. Critically, the model enables precise test-time intervention, allowing for the interactive correction of structural errors. This work establishes a foundation for semantically-steerable 3D generation and invites further exploration into collaborative human-in-the-loop design systems.

05.
bioRxiv (Bioinfo) 2026-06-14

Prediction of parsimonious and temporally sensitive sets of cell fate engineering transcription factors with IMCell

Transcription factor (TF) cocktails used in cell identity reprogramming protocols have largely been developed from experimental approaches. A handful of computational approaches have been reported, though have not been widely adopted by the scientific community. To standardize their use and assess their performance, we built CompForce, a platform that integrates these tools. Using CompForce, we found that existing computational methods offer modest improvements over differential expression on both synthetic and literature-curated data, and that their lackluster and inconsistent performance could be attributed to a reliance on local centrality metrics. To improve upon these methods, we developed IMCell, a prediction method that is inspired by the influence maximization problem. Unlike existing tools, IMCell returns optimized TF sets rather than ranked TF lists. We demonstrate that IMCell vastly out-performs existing tools, and further extend it to dynamic, stepwise contexts. The tools presented here are available in the R packages CompForce and IMCell.

06.
arXiv (CS.LG) 2026-06-15

Arbitrary control over multimode wave propagation for machine learning

arXiv:2402.17750v2 Announce Type: replace-cross Abstract: Controlled multimode wave propagation can enable more space-efficient photonic processors than architectures based on discrete components connected by single-mode waveguides. Instead of defining discrete elements, one can sculpt the continuous substrate of a photonic processor to perform computations through multimode interference in two dimensions. Here we designed and demonstrated a device with a refractive index that can be rapidly reprogrammed across space, allowing arbitrary control of wave propagation. The device, a two-dimensional programmable waveguide, uses parallel electro-optic modulation of the refractive index of a slab waveguide with about $10^4$ programmable spatial degrees of freedom. We implemented neural network inference on benchmark tasks with up to $49$-dimensional vectors in a single pass, without digital pre-processing or post-processing. Theoretical and numerical analyses further indicated that two-dimensional programmable waveguides may offer not only a constant-factor reduction in device area but also a scaling benefit, with the area required growing as $N^{1.5}$ rather than $N^2$.

07.
arXiv (CS.CV) 2026-06-16

HairLRM: Strand-based Hair Modeling via Large Reconstruction Models

The fundamental limitation of traditional strand-based modeling is not simply data scarcity, but the ill-posedness of inferring complex 3D fields from 2D imagery without structural constraints. This unconstrained regression leads to catastrophic failures in resolving both global occlusion (e.g., in ponytails) and local directionality (e.g., in curls), resulting in over-smoothed, plausible-but-incorrect geometries. To resolve this, we integrate the strong geometric priors of Large Reconstruction Models (LRMs) into the strand generation pipeline. Using the LRM mesh as a structural anchor, we employ a novel Dual Orientation AutoEncoder to lift coarse geometry into high-fidelity strands. By resolving vector field singularities through latent-space optimization and surface-guided refinement, our method effectively disentangles complex topological structures, setting a new benchmark for robustness and accuracy in hair reconstruction.

08.
arXiv (CS.CV) 2026-06-17

Universal Image Restoration via Internalized Chain-of-Thought Reasoning

Image restoration seeks to recover high-quality images from degraded inputs but becomes highly ill-posed under complex, mixed degradations. While unified all-in-one models are common, their performance declines as degradation complexity increases. Recent works adopt Chain-of-Thought (CoT) reasoning for multi-round restoration using specialized modules. However, this approach faces two key limitations: (i) increased computational cost due to multi-step processing, and (ii) weak modeling of interactions between degradations during stepwise inference. We introduce CoTIR, a universal image restoration framework that internalizes CoT reasoning within a single model. Concretely, we view image restoration as a specialized subtask of image editing, which implies that a large-scale pre-trained editing model provides a more favorable optimization starting point. Building on this, we fine-tune the model for restoration and further encode structured CoT-style reasoning into the learning objective via a differentiable formulation inspired by Lagrangian optimization, enabling holistic restoration without chaining specialized restorers. To facilitate training and evaluation, we further present CoTIR-Bench, a large-scale benchmark comprising 5.2 million samples with CoT-style reasoning traces. Extensive experiments on CoTIR-Bench and broad real composite degradation scenes show that CoTIR achieves stronger perceptual quality and more competitive fidelity than both all-in-one models and multi-round restoration methods. The source code is available at https://github.com/gy65896/CoTIR.

09.
arXiv (CS.AI) 2026-06-17

FinAcumen: Financial Multimodal Reasoning via Self-Evolving Experience Memory Harness

arXiv:2606.17642v1 Announce Type: new Abstract: Financial multimodal reasoning requires agents to coordinate numerical computation, retrieval, visual interpretation, and temporal grounding across heterogeneous evidence sources. Existing tool-augmented agents improve execution fidelity, yet remain largely stateless across episodes, repeatedly rediscovering reasoning strategies and failure patterns. In high-stakes financial settings, this leads to unreliable tool routing, noisy retrieval, and hallucination-prone reasoning. We present FinAcumen, a financial reasoning agent framework centered on selective experience memory for tool-augmented multimodal reasoning. FinAcumen accumulates financially grounded reasoning experience from prior trajectories, distilling successful strategies and failure-derived cautionary rules into a persistent memory bank. During inference, retrieved experiences condition reasoning only when semantic relevance exceeds a calibrated threshold, while irrelevant memory is explicitly suppressed through a fallback mechanism. A deterministic financial tool environment further grounds numerical computation, retrieval, visual decoding, and answer verification.Across four financial multimodal reasoning benchmarks, FinAcumen consistently improves a frozen 8B vision-language model over finance-specialized models and approaches leading proprietary general-purpose models. Further analysis shows that selective experience activation improves reasoning reliability under retrieval uncertainty. Our code is anonymously available at https://anonymous.4open.science/r/FinAcumen

10.
medRxiv (Medicine) 2026-06-17

Efficacy of a Gamified Digital Platform for Substance Use Education and Overdose Prevention Among College Students: a Pilot and Feasibility Study

Background: For US young adults aged 18-25 in the 2018-2024 period, fentanyl was involved in 78.2% of the 44,020 unintentional or undetermined-intent overdose deaths, most often co-involving stimulants and other non-opioid substances. While fatal overdose rates in this age group have fallen to their lowest recorded level, emergency medical services-attended non-fatal overdose events have reached record highs, shifting the decisive variable toward bystander recognition and response. College students report near-universal alcohol education but minimal education on the substances actually driving overdose mortality. Methods: We conducted a single-group pre-post evaluation of the DopaGE Portal, a gamified, mastery-based digital platform covering cocaine, MDMA, benzodiazepines, and opioid overdose response, deployed at a public university (UNL) and a multi-campus volunteer network (TACO). Paired pre/post surveys (N=42) measured self-efficacy (7 items; primary), behavioral intentions, risk perception, and knowledge/attitudes on 5-point scales, plus four factual knowledge questions. Paired t-tests, exact McNemar tests, and Benjamini-Hochberg correction across eight primary tests were applied. Institutional naloxone distribution at UNL was tracked as an ecological behavioral outcome. A mandated high-school cohort (N=94) provided supplementary acceptability data. Results: Self-efficacy increased from 2.82 to 4.46 (d=2.00, 95% CI 1.46-2.55; adjusted p

11.
arXiv (CS.CL) 2026-06-15

C2-Faith: Benchmarking LLM Judges for Causal and Coverage Faithfulness in Chain-of-Thought Reasoning

Large language models (LLMs) are increasingly used as judges of chain-of-thought (CoT) reasoning, yet it remains unclear whether they can reliably assess process faithfulness rather than merely answer plausibility. We introduce C2-Faith, a benchmark built from PRM800K that explicitly decomposes faithfulness into two complementary dimensions: causality (whether each step logically follows from prior context) and coverage (whether essential intermediate inferences are present). Using controlled perturbations, we construct examples with known causal error positions by replacing a single step with a logically inconsistent variant, and with controlled coverage deletions at varying rates, enabling direct measurement against reference labels. We evaluate three frontier LLM judges across three tasks: binary causal detection, causal step localization, and coverage scoring. Our results reveal that judge reliability is highly task-dependent, with no single model dominating across settings. While models often detect that an error exists, they struggle to accurately localize it, indicating a substantial gap between detection and attribution. Moreover, all judges systematically overestimate reasoning completeness, assigning high coverage scores even when substantial portions of intermediate reasoning are missing. These findings expose fundamental limitations of LLM judges in process-level evaluation and highlight the need for more reliable and calibrated methods when using LLMs to assess reasoning quality.

12.
arXiv (CS.LG) 2026-06-12

BrainPro: Towards Large-scale Brain State-aware EEG Representation Learning

arXiv:2509.22050v2 Announce Type: replace Abstract: Electroencephalography (EEG) reflects underlying brain states, whose activities are distributed across brain regions and manifest as spatial patterns on the scalp. Learning these spatially structured, state-related patterns requires consistent spatial representations across datasets. However, existing EEG foundation models are typically based on self-attention, which does not preserve location-specific information and struggles to align signals recorded with different channel configurations. Moreover, brain states contain both shared and state-specific regional activity, suggesting that learning neurophysiologically plausible, state-aware representations can complement the shared representations targeted by current models and improve downstream decoding. To address these limitations, we propose BrainPro, a large EEG model that combines a retrieval-based spatial learning mechanism for cross-layout spatial alignment with a brain state-decoupling module that learns both shared and state-specific representations through parallel encoders and region-aware reconstruction. Pre-trained on a large EEG corpus, BrainPro achieves state-of-the-art performance across nine public BCI datasets spanning emotion, motor, speech, stress, mental disease, and attention tasks. Analyses of spatial filters, channel-drop robustness, and encoder contributions further validate the effectiveness of its spatial alignment and state-aware pathways. These results show that BrainPro achieves improved interpretability of learned spatial patterns and produces representations that benefit diverse EEG decoding tasks.

13.
arXiv (CS.CL) 2026-06-17

Variable-Width Transformers

Scaling model size, specifically depth and width, has driven significant progress in transformer-based language models. However, most architectures maintain a constant width across all layers, allocating a fixed parameter and computation budget evenly despite different layers potentially playing distinct computational roles. In this work, we empirically investigate nonuniform capacity allocation across network depth by proposing a $\times$-shaped >

14.
arXiv (math.PR) 2026-06-18

On a class of reflected McKean-Vlasov Stochastic Differential Equations with jumps

arXiv:2606.18433v1 Announce Type: new Abstract: This paper investigates a class of reflected McKean-Vlasov Stochastic Differential Equations driven by both Brownian motion and a compensated Poisson random measure. We establish the existence and uniqueness of solutions and provide moments estimates for the state processes.

15.
Nature (Science) 2026-06-10

Building user-driven climate adaptation products

Climate adaptation products have traditionally been developed using a supply-driven model reliant on available climate information, leading to usability gaps1–4. To better meet user needs, the climate services field has recognized a need to shift towards a demand-driven model emphasizing co-production, that is, user-driven, scientifically informed products created through shared knowledge practices1–5. However, co-production can be challenging, especially for researchers unfamiliar with the approach or for digital and software-based products with complex user needs2,5–8. User-centred design, from the human–computer interaction field, offers a process that could complement co-production approaches to product development, yet its potential remains underexplored2. Here we show how user-centred design can be integrated into, and strengthen, co-production approaches for building user-driven climate adaptation products. Through a systematic review of the co-production and user-centred design literature, we identify key processes, mechanisms and best practices for both approaches. Our findings offer practical guidance for researchers and propose an integrated approach for developing climate adaptation products that are useful, usable and used. A systematic review and analysis shows how user-centred design can be integrated into, and strengthen, co-production approaches for building user-driven climate adaptation products.

16.
arXiv (CS.CL) 2026-06-19

Gender Bias in LLM Hiring Decisions: Evidence from a Japanese Context and Evaluation of Mitigation Strategies

Large language models (LLMs) are increasingly deployed in hiring workflows, yet most research on gender bias in LLM hiring decisions has focused on English-language, Western-format resumes. This study examines whether pro-female gender bias extends to a Japanese corporate context and evaluates two practical mitigation strategies. Using a counterfactual resume design with 60 Japanese rirekisho-format resumes, 12 name pairs selected on linguistically grounded gender-signal criteria, and five state-of-the-art LLMs (Claude Sonnet 4.6, GPT-4o, DeepSeek-V3, Gemini 2.5 Flash, Llama 3.3 70B), we conducted 43,200 API calls across baseline, prompt instruction, and privacy filter conditions. A crossed random-effects linear mixed model confirms a significant pro-female bias across all five models, replicating Western findings in a non-Western context. A prompt-level gender-neutrality instruction produces no meaningful reduction in bias. A name-reliance analysis formally identifies the candidate name as the primary gender channel: removing the name from the prompt reduces the female effect by nearly its full magnitude. An unexpected incompatibility between the privacy filter and GPT-4o's content safety filter, resulting in a 42% refusal rate, highlights a practical deployment challenge for name anonymization in LLM-assisted recruitment pipelines.

17.
bioRxiv (Bioinfo) 2026-06-11

SPARK: A Systems-level Computational Framework for Reconstructing Transcriptomic State Organisation in Lung Adenocarcinoma

Lung adenocarcinoma (LUAD) exhibits substantial molecular heterogeneity, which complicates tumour stratification and limits the ability of mutation-centric models to capture tumour behaviour and predict patient outcomes. This study investigates whether coordinated transcriptomic programs can provide a systems-level representation of tumour states. Bulk RNA-sequencing data from the TCGA-LUAD cohort were analysed to reconstruct pathway-level transcriptomic organisation using a stability-optimised network framework (SPARK). This analysis identified eight transcriptomic modules representing coordinated biological processes active across tumours. Module activity scores were subsequently used to derive a composite Transcriptomic Risk Score through elastic-net Cox proportional hazards modelling. The resulting risk score showed a significant association with overall survival in the discovery cohort and improved prognostic discrimination beyond clinical variables. An independent evaluation in the CPTAC-LUAD cohort confirmed the prognostic signal and preserved risk stratification across patient groups. Unsupervised clustering of module activity further revealed three transcriptomic patient groups characterised by distinct biological programs, genomic alteration patterns, and survival outcomes. Single-cell analysis also demonstrated that the identified transcriptomic modules reflect coordinated organisation of the tumour-immune-stromal ecosystem across cellular compartments. Together, these findings suggest that LUAD heterogeneity can be organised into coordinated transcriptomic programs with measurable clinical relevance, providing a systems-level framework for representing tumour molecular states.

18.
arXiv (CS.CL) 2026-06-18

Improving Medical Communication using Rubric-Guided Counterfactual Recommendations

Text-based telemedicine increasingly relies on lightweight patient feedback, however, such feedback primarily reflects perceived communication quality rather than medical accuracy. We introduce an LM-guided counterfactual recommendation pipeline that discovers and refines interpretable communication features such as tone, personalization, actionability and completeness in addressing patient concerns, without interfering with the medical content. These features are used together with patient-doctor interaction metadata to estimate positive feedback. At inference time, the system searches over low-cost ordinal feature changes and recommends minimal communication changes predicted to increase the probability of positive feedback, while independent auditor models test whether these gains generalize beyond the selection model. Across interactions, recommendations yield a mean +6.41% gain in predicted positive feedback probability under independent auditors, and are non-negative for 93.31% of recommendations. These results suggest that small, interpretable communication changes can capture most predicted gains while preserving the doctor's control over medical reasoning and final wording.

19.
arXiv (quant-ph) 2026-06-16

Arbitrarily Configurable Wavefunctions via Imaginary Gauge Phase Imprint in Non-Hermitian Lattices

arXiv:2603.28153v2 Announce Type: replace-cross Abstract: We propose a general framework, termed the imaginary gauge phase imprint (IGPI), which enables engineering arbitrarily configurable wavefunctions with exact solutions and self-organization dynamics in any-dimensional non-Hermitian lattices under imaginary gauge fields. Using this method, we uncover a novel phase with exact critical wavefunctions, dubbed the skin critical phase (SCP), which is marked by unconventional localization, topological-skin, and dynamical characteristics. Furthermore, we validate the IGPI by imprinting and visualizing complex fractal states with Sierpinski-carpet and Koch-snowflake profiles, as well as exotic super-moire and 3D-moire states in regular lattices. Our work not only offers fresh insights into non-Hermitian critical and fractal physics, but also provides a rigorous paradigm for controlling and visualizing wavefunction patterns using the IGPI in engineered non-Hermitian systems.

20.
arXiv (CS.AI) 2026-06-12

AI-Automation Tooling in Computer Engineering Education: Mixed-Methods TAM/UTAUT Evidence for a General Acceptance Attitude

作者:

arXiv:2606.12424v1 Announce Type: cross Abstract: As generative AI and low-code workflow platforms become routine in software practice, a key educational question is whether the next generation of computer engineers will accept these tools as useful, usable, and worthy of sustained engagement. This paper reports a mixed-methods, cross-sectional study of undergraduate computer engineering students' acceptance of AI automation tooling, instantiated through the open-source platform n8n across three identically scripted workshops in Thailand (n = 103). A 12-item, five-point Likert instrument mapped to six TAM/UTAUT constructs - Performance Expectancy (PE), Effort Expectancy (EE), Behavioral Intention (BI), Self-Efficacy (SE), Hedonic Motivation (HM), and Output Quality (OQ) - was complemented by inductive thematic analysis of open-ended feedback. Analyses combined ordinal reliability estimation, bootstrap confidence intervals, non-parametric tests, multiple-comparison-controlled correlations, polychoric dimensionality diagnostics, a common-method-bias check, and between-session comparisons. Acceptance was favorable across all six constructs with large effect sizes, with PE emerging as the strongest construct and HM as the weakest. Dimensionality diagnostics further revealed that canonical TAM/UTAUT sub-facets collapsed into a single general acceptance factor in this short-form post-workshop context, a finding with important methodological and theoretical implications. Qualitative themes converged with the quantitative profile regarding usefulness and enthusiasm but diverged on output quality, revealing a small yet articulate reliability-skeptical minority. The findings support the curricular adoption of AI automation tooling in undergraduate computing education and identify three theory-grounded instructional levers: instruction-sequencing scaffolds, self-efficacy supports, and trust-calibration interventions.

21.
arXiv (quant-ph) 2026-06-16

Analytical solution of the Schr\"{o}dinger equation with $1/r^3$ and attractive $1/r^2$ potentials: Universal three-body parameter of mixed-dimensional Efimov states

arXiv:2601.19517v2 Announce Type: replace-cross Abstract: We study the Schr\"{o}dinger equation with $1/r^3$ and attractive $1/r^2$ potentials. Using the quantum defect theory, we obtain analytical solutions for both repulsive and attractive $1/r^3$ interactions. The obtained discrete-scale-invariant energies and wave functions, validated by excellent agreement with numerical results, provide a natural framework for describing the universality of Efimov states in mixed dimension. Specifically, we consider a three-body system consisting of two heavy particles with large dipole moments confined to a quasi-one-dimensional geometry and resonantly interacting with an unconfined light particle. With the Born-Oppenheimer approximation, this system is effectively reduced to the Schr\"{o}dinger equation with $1/r^3$ and $1/r^2$ potentials, and manifests the Efimov effect. Our analytical solution suggests that, for repulsive dipole interactions, the three-body parameter of the mixed-dimensional Efimov states is universally set by the dipolar length scale, whereas for attractive interactions it explicitly depends on the short-range phase. We also investigate the effects of finite transverse confinement and find that our analytical results are useful for describing the Efimov states composed of two polar molecules and a light atom.

22.
arXiv (CS.CV) 2026-06-18

Reasoning as Intersection: Consensus-Frame Alignment for Visual Focus in Video-MLLMs

Reinforcement learning has improved the reasoning ability of large language models, but applying outcome-only rewards to video multimodal large language models (Video-MLLMs) provides limited guidance on which visual evidence should support the answer. Inspired by multisensory integration, where consistent cues can enhance the salience and reliability of perceptual estimates, we introduce Consensus Frame GRPO (CF-GRPO), a temporal-annotation-free process-level reward framework for evidence-aware video reasoning. CF-GRPO constructs a consensus frame prior from intrinsic video cues, including temporal coverage, scene-transition cues, and query-conditioned visual relevance. It then computes a model-side frame-use score from visual and response representations and optimizes their agreement through the Consensus Frame Reward (CFR). With salience-aware sparse aggregation and distribution sharpening, CFR provides a high-contrast reward signal without requiring human temporal annotations. Experiments show that VideoCFR achieves competitive performance across complex video reasoning benchmarks and improves several metrics over representative Video-MLLM and RL baselines, while the consensus prior provides an interpretable view of the evidence frames emphasized during training. The implementation is available at https://github.com/1Pansy/VideoCFR.

23.
arXiv (CS.CV) 2026-06-24

Dual-Branch Cross-Projection Debiasing through Diffusion-based Disentanglement

Foundation models trained on biased datasets often rely on spurious correlations between target labels and non-causal attributes, resulting in poor generalization on minority groups. Bias mitigation remains challenging due to two fundamental issues. First, when group labels are unavailable, existing group-unsupervised methods typically infer spurious attributes implicitly from model behavior, making it difficult to identify spurious factors that are semantically aligned with real-world biases. Second, even with pseudo spurious supervision, most existing debiasing methods follow a single-branch design that operates within a single shared feature space, where target and spurious attributes are intrinsically entangled. To address the first challenge, we introduce Confidence-guided Bias Concept Mining (CBCM), which leverages diffusion-disentangled, semantically grounded concept representations to identify reliable spurious attributes without attribute annotations. To address the second challenge, we propose Dual-branch Cross-projection Debiasing (DCD), a prompt-tuning framework that separates target and spurious representations into two branches and explicitly removes spurious information through cross null-space projection while preserving target-relevant semantics. Extensive experiments on four benchmark datasets show that our method achieves state-of-the-art worst group accuracy among group-unsupervised approaches, while tuning at most 0.22% of the model parameters. The source code is available in the supplementary materials.

24.
arXiv (CS.CV) 2026-06-24

Modality-Aware Out-of-Distribution Detection for Multi-Modal Action Recognition

The incorporation of additional modalities into action recognition models increases their performance across a wide range of settings. However, how this additional information can contribute to making the models more robust remains underexplored, particularly for the case of multi-modal out-of-distribution (OOD) detection. While methods exist that regularize the multi-modal training process with OOD detection in mind, they still apply off-the-shelf OOD detectors designed for the uni-modal case during inference, discarding important information. Based on an interesting relationship we find between the multi-modal and uni-modal predictions, we propose to use this signal to build a post-hoc detector explicitly designed for the multi-modal scenario. We combine this new source of information with a feature-space score, which detects off-manifold samples in the multi-modal space, and normalize them by the multi-modal logits. In doing so, the proposed hybrid detector is compatible with existing training-time approaches and consistently improves performance. Experiments on a wide range of established datasets from the MultiOOD benchmark show that, on average, our approach outperforms the state of the art. Our results show the importance of explicitly considering the different modalities at inference time for multi-modal OOD detection.

25.
arXiv (CS.CV) 2026-06-16

LUCID: Learned Undersampling-Adaptive Consistency-Guided Inference with Deterministic Flow Matching for Sparse-View CT Reconstruction

Sparse-view CT reduces radiation dose and scanning time by acquiring fewer projection views, but angular undersampling makes reconstruction severely ill-posed, causing streak artifacts, structural blurring, and loss of fine details. Existing supervised methods are often tied to specific sampling settings, whereas generative methods may introduce anatomically inconsistent hallucination-like structures under severe undersampling. We propose Lucid, a sparsity-adaptive, consistency-guided reconstruction framework based on a Flow Matching generative prior for sparse-view CT. Lucid is trained only on high-quality CT images to learn a continuous transport between a Gaussian distribution and the high-quality CT image distribution, independent of view sampling. During inference, the sampling sparsity level is explicitly incorporated to adapt the generative trajectory of a single pretrained model. Specifically, Lucid constructs a degradation-matched initial state by sparsity-weighted fusion of the sparse-view FBP image and Gaussian noise, performs sparsity-modulated Flow Matching updates, and applies projection-domain data-consistency correction after each prior update. Experiments under multiple sparse-view settings show that Lucid achieves stable reconstruction performance across different sampling densities, improves image quality and structural fidelity, and reduces the risk of hallucination-like structures in generative sparse-view CT reconstruction.