Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
medRxiv (Medicine) 2026-06-16

Risk beliefs, intensive digital information and demand for a new preventative health product in public clinics: Evidence from an experiment in Zimbabwe.

Demand for preventative health care is weak in low-income settings. In a field experiment in a low-income, high-risk setting, we evaluated whether demand for a new bio-medical preventative health product, offered free at public health clinics, responds to digital feedback-based intensive information on health risks and benefits of prevention along with a clinic referral enabling access to the product. In our sample of women aged 18-24 years, we find a large correction in risk beliefs sustained six months after the intervention. Against a background of very low baseline usage, within six months we find a 5.8 percentage point increase in take up of the prevention method, a level of uptake which is very large relative to the control group. Reassuringly, there is no meaningful difference in up-take amongst baseline high- risk and low-risk individuals.

02.
arXiv (CS.CL) 2026-06-12

AI SciBrief as a Gateway to Research: A Framework for Onboarding Students into New Research Areas

Students at all levels of higher education face a significant barrier in the form of information overload, which often paralyzes the initial stages of the research process and suppresses motivation. In response, this article introduces a pedagogical framework that leverages AI SciBrief, a platform powered by a Large Language Model (LLM) designed to automatically generate digests of scientific trends. We describe how this multidisciplinary tool - with initial coverage in finance, medicine, and education - can be integrated into the curriculum to overcome this "entry barrier." The framework provides concrete methodologies for utilizing these digests to facilitate topic selection for term papers, accelerate literature reviews for dissertations, and enable postgraduate students to continuously monitor emerging trends. We conclude that AI SciBrief functions as a "gateway to research" effectively reducing students' cognitive load and empowering them to transition more rapidly from information searching to knowledge creation.

03.
arXiv (CS.LG) 2026-06-16

Mean-Field Parallel Decoding for Discrete Diffusion Language Models

arXiv:2606.15805v1 Announce Type: new Abstract: Discrete diffusion language models enable parallel token generation, offering a pathway to low-latency decoding. However, selecting tokens independently by marginal confidence limits effective parallelism: tokens that appear reliable in isolation can form incompatible configurations when several positions are updated at once. We introduce a training-free decoding framework that coordinates these parallel updates. At each forward pass, the method assigns a commit score to each masked position and refines these scores using pairwise interactions derived from the model's predictive distributions. A variational relaxation yields a simple fixed-point update that suppresses conflicting simultaneous commitments within a single forward pass. This mechanism allows the decoder to commit more tokens in parallel while maintaining competitive generation quality. The method is lightweight, requires no auxiliary model or retraining, and drops into existing diffusion decoding pipelines without modification. Experiments on reasoning and code-generation benchmarks show consistent improvements in the quality-latency trade-off.

04.
arXiv (CS.LG) 2026-06-17

Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection

arXiv:2604.17616v3 Announce Type: replace Abstract: Root cause analysis (RCA) for time-series anomaly detection is critical for the reliable operation of complex real-world systems. Existing explanation methods often rely on unrealistic feature perturbations and ignore temporal and cross-feature dependencies, leading to unreliable attributions. We propose a conditional attribution framework that explains anomalies relative to contextually similar normal system states. Instead of using marginal or randomly sampled baselines, our method retrieves representative normal instances conditioned on the anomalous observation, enabling dependency-preserving and operationally meaningful explanations. To support high-dimensional time-series data, contextual retrieval is performed in learned low-dimensional representations using both variational autoencoder latent spaces and UMAP manifold embeddings. By grounding the retrieval process in the system's learned manifold, this strategy avoids out-of-distribution artifacts and ensures attribution fidelity while maintaining computational efficiency. We further introduce confidence-aware and temporal evaluation metrics for assessing explanation reliability and responsiveness. Experiments on the SWaT and MSDS benchmarks demonstrate that the proposed approach consistently improves root-cause identification accuracy, temporal localization, and robustness across multiple anomaly detection models. These results highlight the practical utility of conditional attribution for explainable anomaly diagnosis in complex time-series systems. Code and models are available at: https://github.com/dfki-av/Conditional-Attribution-for-Root-Cause-Analysis-in-Time-Series-Anomaly-Detection.

05.
arXiv (CS.CV) 2026-06-16

Towards Next-Generation Healthcare: A Survey of Medical Embodied AI for Perception, Decision-Making, and Action

Foundation models have demonstrated impressive performance in enhancing healthcare efficiency across a wide range of medical applications. Nevertheless, their limited ability to perceive, understand, and interact with the physical world significantly constrains their effectiveness in real-world clinical workflows, where safety-critical decision-making and physical execution are tightly coupled. Recently, embodied artificial intelligence (AI) has emerged as a promising physical-interactive paradigm for intelligent healthcare, enabling agents to operate in complex medical environments. As research in this area rapidly expands, understanding how intelligent agents function as integrated, end-to-end systems in clinical environments becomes increasingly critical. However, existing surveys on medical embodied AI largely emphasize individual aspects or functional components, lacking a unified system-level organization of the field. To support and consolidate recent advances, we systematically survey the core components of medical embodied AI, with a particular emphasis on the coordinated integration of perception, decision-making, and action. We further review representative medical applications and relevant datasets, and we analyze the major challenges encountered in real-world clinical practice. Finally, we discuss key directions for future research in this rapidly evolving field. The associated project can be found at https://github.com/VMVLab/Medical_Embodied_AI_Paper_List.

06.
arXiv (CS.AI) 2026-06-19

Protein Representation Learning with Secondary-Structure and Energy-Filtered Hydrogen-Bond Graphs

arXiv:2606.19374v1 Announce Type: cross Abstract: Graph-based representations are widely used in protein modeling, yet many existing approaches rely primarily on sequence adjacency or geometric proximity, which only partially reflect the principles governing protein folding. Proteins instead adopt complex three-dimensional conformations organized around secondary structure elements, such as $\alpha$-helices and $\beta$-sheets, which encode recurring local motifs and stabilizing hydrogen-bond interactions. In this work, we introduce a secondary-structure-aware graph neural network for protein representation learning. Residue-level node representations are augmented with secondary structure assignments, and graph edges are constructed from hydrogen-bond interactions filtered by their energetic strength. This design enables the model to capture both local structural context and long-range couplings that are central to protein stability and function. We evaluate the proposed approach on commonly used protein benchmarks and observe consistent improvements over existing graph-based methods. In addition, the resulting graph representations offer enhanced biological interpretability, as the learned connectivity aligns with established structural motifs. These findings suggest that incorporating secondary structure and energy-filtered hydrogen-bond topology provides an effective inductive bias for protein representation learning. The code is released at https://github.com/mohamedmohamed2021/SSProNet

07.
arXiv (CS.LG) 2026-06-15

On the Geometry and Optimization of Polynomial Convolutional Networks

arXiv:2410.00722v3 Announce Type: replace Abstract: We study convolutional neural networks with monomial activation functions. Specifically, we prove that their parameterization map is regular and is an isomorphism almost everywhere, up to rescaling the filters. By leveraging on tools from algebraic geometry, we explore the geometric properties of the image in function space of this map - typically referred to as neuromanifold. In particular, we compute the dimension and the degree of the neuromanifold, which measure the expressivity of the model, and describe its singularities. Moreover, for a generic large dataset, we derive an explicit formula that quantifies the number of critical points arising in the optimization of a regression loss.

08.
arXiv (CS.CL) 2026-06-16

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

We introduce Nemotron 3 Ultra, a 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. We pre-trained Nemotron 3 Ultra on 20 trillion text tokens, then extended the context length to 1M tokens, and post-trained using Supervised Fine Tuning (SFT), Reinforcement Learning (RL), and Multi-teacher On-Policy Distillation (MOPD). Nemotron 3 Ultra is our most capable model yet, employing multiple key technologies - LatentMoE, Multi Token Prediction (MTP), NVFP4 pre-training, multi-environment RLVR, MOPD, and reasoning budget control. Nemotron 3 Ultra achieves up to ~6x higher inference throughput as compared to state-of-the-art publicly available LLMs while attaining on-par accuracy. The state-of-the-art accuracy, high inference throughput, and 1M token context length make Nemotron 3 Ultra ideal for long-running autonomous agentic tasks. We open-source the base, post-trained, and quantized checkpoints, along with the training data and recipe on HuggingFace.

09.
bioRxiv (Bioinfo) 2026-06-22

PhaseWY: A pipeline for haplotype phasing, sex chromosome identification and extraction of sex-limited sequences

Sex chromosomes are central to many ecological and evolutionary processes. Evidence has accumulated that sex chromosome systems vary extensively in age, turnover and transitions, motivating renewed efforts to study the diversity of sex chromosome systems across the tree of life. However, successful genomic detection of sex chromosomes depends on several factors, including the size and divergence time, background genetic diversity, and the number of sequenced females and males. In addition, technical challenges associated with sequencing and analysing the sex-limited Y/W chromosome remain. Here, we present PhaseWY, an automated Snakemake pipeline that uses whole-genome sequencing data from multiple female and male individuals to identify sex-chromosomal regions and extract the corresponding Y/W sequences. PhaseWY (i) detects sex differences in alignment depth, (ii) applies read-based and statistical haplotype phasing, (iii) identifies sex-linked regions using haplotype clustering, and (iv) subsets autosomal, X/Z- and Y/W-linked variants for downstream analyses. We applied PhaseWY to simulated data to benchmark factors influencing sex-linkage detection and successful extraction of Y/W-linked variants. To demonstrate its practical utility, we further applied PhaseWY to the neo-sex chromosome system in Alauda larks (Alaudidae) and performed a range of downstream analyses demonstrating the scope of applications of the PhaseWY output. We conclude that PhaseWY provides an easy-to-use and reproducible tool for population-genomic analyses in non-model organisms, with particular importance for advancing our understanding of sex-chromosome evolution.

10.
medRxiv (Medicine) 2026-06-15

Primary care practitioners preconception health literacy and information-seeking: A cross-sectional survey.

Background Parental health before pregnancy influences maternal and child outcomes. Primary care professionals, including general practitioners [GPs], midwives, and naturopaths, can provide preconception care, yet many report limited knowledge and difficulty accessing relevant information. This study described Australian GPs, midwives, and naturopaths preconception health literacy, including knowledge and ability to access information. Methods Between July and September 2022, Australian GPs, midwives, and naturopaths completed a 32-item online cross-sectional survey. Participants were recruited through professional associations, and data were analysed using descriptive and inferential statistics Results Participants (N=373) included naturopaths (40.7%), GPs (32.4%), and midwives (26.8%). Reported barriers to clinician health literacy including lack of preconception care resources (25.5%), and limited clinician knowledge (23.6%). The proportion identifying limited clinician knowledge differed significantly between professions (GP: 31.4%; midwives: 23.0%; naturopaths: 17.8%; p=0.030). The highest level of accurate knowledge regarding preconception exposures was for pre-pregnancy obesity (82.7%), while low birth weight was the most accurately identified preconception outcomes (83.7%). Incorrect responses were most common for maternal multivitamin use as an exposure (28.3%) and childhood leukaemia as an outcome (26.3%). Differences between professions were strongest for infant outcomes, with moderate associations observed for shoulder dystocia (V=.2355), precipitous labour (V=.2173), macrosomia (V=.2060), labour dystocia (V=.2018) and cryptorchidism (V=.2018). Discussion Preconception health literacy varies across primary care professions. Clinicians require greater access to targeted resources and education tailored to their differing scopes of practice and experience. Improving clinician preconception health literacy may strengthen consistent evidence-based care and support better maternal, child, and long-term family health outcomes.

11.
arXiv (CS.AI) 2026-06-16

Running hardware-aware neural architecture search on embedded devices under 512MB of RAM

arXiv:2606.14824v1 Announce Type: cross Abstract: This document proposes a novel approach to hardware-aware neural architecture search (HW NAS) that considers the resources available on the computing platform running it, enabling its execution on various embedded devices. The presented HW NAS produces tiny convolutional neural networks (CNNs) targeting low-end microcontroller units (MCUs), typically involved in the Internet of Things (IoT) or wearable robotics, opening new use cases. A gateway could run it to tailor CNNs' architecture on the acquired data without using external servers, ensuring privacy. The proposed technique achieves state-of-the-art results in the human-recognition tasks on the Visual Wake Word dataset, a standard TinyML benchmark, on several embedded devices.

12.
arXiv (CS.CL) 2026-06-16

Dealing with Annotator Disagreement in Hate Speech Classification

Hate speech detection is a crucial task, especially on social media where harmful content can spread quickly. Collecting social media content (tweets etc.) to train machine learning models is easy, but detecting and categorizing hate speech can be difficult due to the inherently subjective nature. This subjectivity leads to frequent disagreement among annotators, particularly for subtle or borderline content. Traditional approaches either discard non-consensus samples or force a ''gold standard'' through expert adjudication, ignoring valuable information about uncertainty and diverse human perspectives. We examine the largely overlooked problem of annotator disagreement in hate speech classification and evaluate a range of aggregation methods, including majority voting, ordinal strategies (minimum, maximum, and mean), and analyze their impact across binary, 4-class, and 6-class classification tasks. In addition, we leverage annotators' perceived hate speech strength scores to explore regression-based and hybrid modeling approaches. Among others, we show that filtering non-consensus samples results in over-optimistic results and that the perceived strength provides a complementary signal that enhance classification performance. Finally, we establish new state-of-the-art results for hate speech detection in Turkish tweets, and demonstrate that annotator disagreement, when properly modeled, is a valuable resource for building more robust and reliable systems.

13.
arXiv (CS.LG) 2026-06-16

The Reverse Telescoping Coordinate System for Positive Definite Matrices: Geometry, Computation, and Generative Modeling

arXiv:2606.15442v1 Announce Type: cross Abstract: We design a new unconstrained coordinate system where a $p\times p$ symmetric positive definite (SPD) matrix $\Theta$ is represented by a reverse telescoping map $\Theta(x)=\rm{RT}(x)$, with $x=(v,d,r)\in\mathbb{R}\times\mathbb{R}^{(p-1)}\times\mathbb{R}^{p(p-1)/2}$, representing respectively the log volume or log determinant; and the shape, as encoded by log relative diagonal scales and partial covariances among the nodes. This construction results in important properties not available in other charts, e.g., matrix logarithm, such as Jacobian depending on only the log-determinant. A useful feature of our construction is $x$ contains a lossless symbolic representation of both the matrix and its inverse. Many important computations involving a matrix and its inverse can be performed in $O(p^2)$ in the transformed domain, while it is the rendering of results in matrix forms (on demand) that must incur an $O(p^3)$ cost. Moreover, two unit-determinant matrices in the transformed domain can be joined by a straight line with pathwise unit determinant. For generative modeling, this allows designing a split volume-shape flow model trained by conditional flow matching for transporting the shape over the unit-determinant path, with a separate one-dimensional flow for transporting the volume or the determinant. The forbidding SPD constraint, tamed thus into a powerful guiding force, leads to the surprising insight that it is in some sense easier to design a volume-normalized shape flow for SPD compared to the unconstrained $\mathbb{R}^{p\times p}$, with no intrinsic notion of volume to aid normalization, unlike the determinant of SPD matrices. We apply our construction for up to $p=200$ in generative modeling of SPD matrices on a difficult synthetic bimodal target, and in generating brain connectivity networks by models trained on fMRI data; as well as in intrinsic diffusion on the SPD manifold.

14.
bioRxiv (Bioinfo) 2026-06-21

ReSeT: a taxonomy-aware reference genome selection tool

Motivation: Reference genome composition determines which taxa a profiling pipeline can detect and distinguish, and becomes of critical importance for high-resolution profiling where taxonomic boundaries begin to blur. Existing selection tools optimize within-taxon representativeness but disregard discrimination across taxa, leaving open whether explicitly accounting for inter-taxon discrimination during selection improves profiling. Results: Here we present ReSeT, a facility-location-based reference genome selection tool that operates on arbitrary pairwise distance matrices, extended with a tunable inter-taxon discrimination term and per-genome selection cost, and solved by local search. We benchmark ReSeT against established selection methods on three viral datasets spanning varying degrees of taxonomic ambiguity. On the high-ambiguity SARS-CoV-2 datasets, appropriately tuned ReSeT selections matched or exceeded the strongest alternatives in terms of profiling accuracy, whereas on the low ambiguity IAV dataset VSEARCH remained dominant. Interestingly, we find that the novel inter-taxon discrimination term contributed weakly, indicating that ReSeT's facility-location formulation and selection cost drives ReSeT's performance. We further propose a novel taxonomic ambiguity index, computable from ReSeT's inputs, that summarizes the taxonomic ambiguity of reference genomes and aligns with where ReSeT improves over existing selection methods. Availability and implementation: ReSeT is implemented in Python ([≥]3.10) and is freely available under the MIT license. The source code is available on GitHub at https://github.com/JaspervB-tud/ReSeT and ReSeT can also be installed directly from the Python Package Index (PyPI) via pip install reset-bio.

15.
arXiv (CS.AI) 2026-06-18

Why SWAVE May Not Be All You Need:A Concept-Evolution Retrospective on Complex-Valued Recurrent Language Models

arXiv:2606.18324v1 Announce Type: cross Abstract: SWave is a complex-valued recurrent language model (169.26M parameters, D=384, L=16, T=2048) trained on FineWeb-Edu using 2xH100 NVL. It was designed around three founding premises: that representing language as complex waves rather than real-valued numbers enables richer information encoding; that a Cayley-parameterised unitary transition provides a mathematical guarantee against state decay or explosion; and that a hidden state which rotates rather than shrinks preserves signal integrity over arbitrarily long contexts. The core of SWave evolved substantially across three development phases. The Resonance Head was found to structurally admit imaginary-channel collapse as a global loss minimum (a failure mode we term cos-domination collapse) and was superseded by an untied head with independent real and imaginary embedding tables from the Phase-Associative Memory (PAM) architecture. This resolved the degenerate minimum and enabled stable 200,000-step training (best-step PPL 22.0 at step 89,861). ComplexNorm and the Wave Propagation Scan proved load-bearing throughout all three phases and were retained to the final architecture. ProtectGatedScan was reframed as a structural prior rather than a learned behaviour. The four multi-scale retention concepts showed no measurable improvement under controlled evaluation and were found non-load-bearing. The ComplexGatedUnit was superseded by a real-valued squared-ReLU channel mixer with fewer parameters. The auxiliary training objectives showed no benefit once structural constraints were resolved. The investigation yields a formal characterisation of cos-domination collapse, a parallel scan with a log-space backward pass for numerical stability, six transferable engineering principles for complex-valued recurrent training, and a plan-to-code traceability methodology for catching structural divergences that conventional test suites miss.

16.
arXiv (CS.LG) 2026-06-16

Priority-Aware Shapley Value

arXiv:2602.09326v2 Announce Type: replace Abstract: Shapley values are widely used for model-agnostic data valuation and feature attribution, yet they implicitly assume contributors are interchangeable. This can be problematic when contributors are dependent (e.g., reused/augmented data or causal feature orderings) or when contributions should be adjusted by factors such as trust or risk. We propose Priority-Aware Shapley Value (PASV), which incorporates both hard precedence constraints and soft, contributor-specific priority weights. PASV is applicable to general precedence structures, recovers precedence-only and weight-only Shapley variants as special cases, and is uniquely characterized by natural axioms. We develop an efficient adjacent-swap Metropolis-Hastings sampler for scalable Monte Carlo estimation and analyze limiting regimes induced by extreme priority weights. Experiments on data valuation (MNIST/CIFAR10) and feature attribution (Census Income) demonstrate more structure-faithful allocations and a practical sensitivity analysis via our proposed "priority sweeping".

17.
arXiv (CS.AI) 2026-06-19

Cross-Dataset, Age, and Gender Generalization: A Comprehensive Analysis of Fine-Tuning Strategies for Low-Resource Children's ASR

arXiv:2606.19791v1 Announce Type: cross Abstract: The challenge associated with recognizing dysarthric speech primarily arises from pronounced acoustic variability attributed to impaired articulatory precision. Past research has demonstrated improved recognition through the use of hybrid DNN/HMM sequence discriminative training. This paper presents a comprehensive investigation of various combinations of acoustic features tailored to different Acoustic Models, offering suitable feature selections for each. The incorporation of Pitch features notably improved recognition performance, especially for sentence recognition tasks involving dysarthric speech. Through a systematic examination of the TORGO database, we have demonstrated the potential to enhance the performance of the state-of-the-art Factorized Time Delay Neural Network (F-TDNN) model for recognizing dysarthric speech. Our methods, implemented with the F-TDNN model, resulted in a 4.65\% relative improvement in isolated word recognition and a 4.63\% relative improvement in sentence recognition for dysarthric speech, compared to previous research. This improvement effectively compensates for speech variability, attributable to our deliberate selection of the number of overlapping frames between consecutive training example chunks.

18.
medRxiv (Medicine) 2026-06-15

Nocturnal Respiratory Rate and Variability Predict Long-term Mortality in Stable Outpatients with Cardiovascular Disease

Background: Respiratory rate (RR) predicts short-term mortality in acute care settings, yet its prognostic significance in clinically stable outpatients remains poorly defined. Objectives: To determine whether the median and variability of nocturnal respiratory rate (NRR) are independently associated with long-term cardiovascular and all-cause mortality in outpatients with cardiovascular disease. Methods: We analyzed overnight chest belt waveforms from elective polysomnography in 5,679 older adults with cardiovascular disease enrolled in the Sleep Heart Health Study (SHHS). NRR was quantified at 30-second resolution, and per-subject median NRR and within-night variability (standard deviation) were derived. Kaplan-Meier survival analysis and Cox proportional hazards models were used to evaluate associations with cardiovascular and all-cause mortality over 3-year and 15-year follow-up periods, adjusting for demographic characteristics, cardiopulmonary comorbidities, and sleep apnea severity. Results: Higher median NRR and greater NRR variability were each associated with increased cardiovascular and all-cause mortality. Combining these metrics identified a high-risk group characterized by elevated median and high variability of NRR, with approximately five-fold higher 3-year all-cause mortality compared with a low-risk group; this association remained significant in Cox models (unadjusted HR: 2.61; 95% CI: 1.65, 4.14; p

19.
bioRxiv (Bioinfo) 2026-06-13

PertDiffBench: Benchmarking Diffusion Models for Single-Cell Perturbation Response Prediction

Diffusion models are increasingly used to predict transcriptional responses to perturbations, but whether they improve on simpler generative and representation-based baselines remains unclear. Existing evaluations often do not separate the effects of model architecture, input representation, biological context and metric choice, making it difficult to determine where diffusion-based methods are useful. Here we introduce PertDiffBench, a standardized benchmark for diffusion-based transcriptomic perturbation prediction across single-cell and bulk RNA-seq datasets. PertDiffBench evaluates diffusion-based models across three complementary evaluation settings: standard prediction in known single-cell contexts and bulk perturbation conditions, generalization to unseen cell types, species, drugs and intermediate time points, and stress tests of feature dimensionality, input representation, noise type and gene ordering. Across these settings, diffusion models did not show a consistent advantage. scGen remained a strong baseline in common prediction tasks, whereas scDiffusion was the most competitive diffusion-based method in several generalization settings. Temporal imputation showed a different pattern, with a simple DDPM operating directly in expression space outperforming more specialized models. Stress tests showed that performance was model dependent and sensitive to feature dimensionality, encoder choice, noise type and gene ordering. Pretrained encoders did not consistently improve performance, with the classical scVI representation slightly exceeding STATE in seen-condition and unseen-cell-type settings. These results indicate that diffusion-model performance in perturbation response prediction depends strongly on task design and representation choice. PertDiffBench provides a practical framework for evaluating these models under biologically varied and stress-tested conditions.

20.
arXiv (CS.CV) 2026-06-16

Towards Global AI-Driven Cervical Cancer Screening

The global elimination of cervical cancer is a key public health goal set by the World Health Organization (WHO), with screening programs reducing mortality by up to 80%. However, access to experts and biopsy services is limited in low- to middle-income countries (LMICs). Deep learning (DL)-based algorithms offer promising support for screening, but most existing approaches have been developed and validated on private datasets from single countries. We present the first DL-based approach to cervical cancer screening validated on data from multiple countries. Technically, we phrase the problem of detecting and classifying lesions in colposcopy images as a multi-task learning problem, in which we simultaneously perform image-level classification and lesion segmentation. Our model was trained on a private data set of acid stain colposcopy images with manually generated lesion segmentation masks and corresponding histopathological results, employing extensive data augmentation to address image variability. In an in-distribution validation with pathology results serving as ground truth, our algorithm outperformed medical experts (Balanced Accuracy: 0.68 vs 0.64) in CIN1- (Cervical intraepithelial neoplasia grade 1 or lower) versus CIN2+ (grade 2 or higher) classification. External validation on four colposcopy data sets from four countries featuring radical differences in prevalence and patient characteristics yielded superior performance of our method compared to baseline methods. Performance variability across countries was high with AUC values ranging from 0.54 - 0.80. Overall, algorithm performance varied with age, transformation zone (cervical area most prone to lesion development), presence of comorbidities and pathognomonic signs, with comorbidities having by far the largest negative effect. Future work should focus on improving model robustness and generalizability.

21.
arXiv (CS.AI) 2026-06-18

Learning from Own Solutions: Self-Conditioned Credit Assignment for Reinforcement Learning with Verifiable Rewards

arXiv:2606.18810v1 Announce Type: cross Abstract: Reinforcement learning with verifiable rewards (RLVR) has driven substantial progress in training LLMs for reasoning tasks, but representative methods such as GRPO assign uniform credit across all tokens, wasting gradient on routine tokens while under-crediting pivotal reasoning steps. Existing token-level credit assignment methods require resources beyond the model's own rollouts. GRPO variants rely on process reward models or ground-truth answers. Knowledge distillation assigns credit through per-token divergence but requires external teachers (On-Policy Distillation) or privileged information (On-Policy Self Distillation). However, these dependencies limit applicability in the pure RLVR setting. We observe that conditioning the model on its own verified trajectories induces a measurable per-token KL divergence between the original and conditioned distributions, and prove that distilling from a self-teacher constructed by verified trajectories leads to infeasible weighted-average solutions when multiple verified trajectories exist. We propose SC-GRPO (Self-Conditioned GRPO), which uses KL divergence mentioned before as a multiplicative weight on GRPO gradients. Across five benchmarks spanning math, code, and agentic tasks, SC-GRPO consistently outperforms 8.1% over GRPO and 5.9% over DAPO with stronger OOD performance. Moreover, SC-GRPO achieves higher performance than OPD.

22.
arXiv (CS.AI) 2026-06-16

A comparative and critical study of EEGNet for fNIRS-driven cognitive load classification

arXiv:2606.16160v1 Announce Type: cross Abstract: Accurately classifying cognitive load from functional near-infrared spectroscopy (fNIRS) signals remains a significant challenge due to temporal variability, inter-subject differences, and sensitivity to preprocessing choices. This study provides a comprehensive evaluation of EEGNet for fNIRS-based cognitive load classification by systematically examining the effects of temporal segmentation strategies (overlapping vs. non-overlapping), window lengths (10s, 20s, 30s), feature extraction methods (Analysis of Variance (ANOVA), Principal Component Analysis (PCA), Fast Independent Component Analysis (FastICA)), learning rate configurations (fixed and adaptive), and evaluation protocols (random split vs. subject-independent (SI)). Results from random-split experiments show that overlapping segmentation, combined with smaller fixed learning rates (0.01-0.001), yields the highest accuracies, due to temporal redundancy and dense sampling of hemodynamic transitions. However, SI evaluation reveals a substantial drop in accuracy, demonstrating limited generalization to unseen participants. Under SI evaluation, non-overlapping segmentation outperformed overlapping windows, with the best accuracy of 56.11% achieved using PCA features with a 20-second window and a 0.1 learning rate. These findings indicate that eliminating temporal redundancy helps the model learn more robust and generalizable representations of cognitive load across individuals. Although adaptive learning rate strategy improved training stability, it did not surpass the performance of optimally selected fixed learning rates. The study highlights the critical role of segmentation strategy and learning rate selection in improving model generalization and identifies methodological considerations essential for developing reliable, real-time, and SI cognitive load classification systems using fNIRS.

23.
arXiv (CS.CL) 2026-06-19

StylisticBias: A Few Human Visual Cues Drive Most Social Biases in MLLMs

Multimodal large language models (MLLMs) are increasingly deployed in personally and societally consequential settings, yet the visual cues that shape how these models judge people remain poorly understood. Prior work often compares different (groups of) individuals, making it difficult to separate appearance effects from identity differences. We introduce StylisticBias, a controlled benchmark for evaluating attribute-level social bias in MLLMs. We generate 500 photorealistic base faces and create about 50 single-attribute variations per face, producing about 25K images. This design keeps identity fixed and changes one visual attribute at a time. It lets us measure how specific cues shift model judgments. We evaluate six MLLMs across 25 binary social judgment scenarios. We find that age and body type dominate identity-level effects, while fashion style and other visual cues drive the largest attribute-level shifts. We further find that about 15 attributes account for nearly 80\% of the total variation, showing that bias is concentrated in a small set of visual cues. Sensitivity is strongest in judgments that are semantically aligned with appearance, especially socioeconomic and style-related judgments. We release StylisticBias as a benchmark for fine-grained bias evaluation in multimodal models. Code and dataset: https://github.com/timo-cavelius/StylisticBias and https://hf.co/datasets/shaghayegh/stylistic-bias-dataset.

24.
arXiv (CS.CL) 2026-06-18

Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play

Large language model (LLM)-based multi-agent systems (MAS) have demonstrated great potential in solving tasks with execution complexity, by distributing subtasks across cooperative agents. However, this divide-and-conquer paradigm falls short on decision-making tasks that are also prevalent in the real world. These tasks require simultaneous reasoning from the stances of all involved stakeholders whose decisions are mutually dependent and thus cannot be solved in isolation. We characterize this challenge as stance entanglement, a form of decision complexity distinct from execution complexity. To address it, we propose Multi-Agent Fictitious Play (MAFP), a novel MAS paradigm that represents stakeholder stances as agents and formulates decision-making as an equilibrium-seeking process. Built on the game-theoretic principle of fictitious play, MAFP iteratively updates each agent's decision by best responding to the empirical mixture of other agents' past decisions. This enables agents to expose and address one another's weaknesses, progressively improving decision quality and robustness. We evaluate MAFP on challenging decision-making tasks that test the capability of deciding strategies for competitive scenarios prior to acting. MAFP outperforms both single-round and multi-round baselines on two complementary metrics, tournament strength and robustness, demonstrating its effectiveness in addressing stance entanglement.