Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-24

Solving Semi-Supervised Few-Shot Learning from an Auto-Annotation Perspective

Semi-supervised few-shot learning (SSFSL) resembles real-world applications such as auto-annotation, as it aims to learn a model from a few labeled and abundant unlabeled task-specific examples to annotate the unlabeled ones. Despite the availability of powerful open-source Vision-Language Models (VLMs) and open-world data, existing SSFSL literature largely neglects these resources. In contrast, the related area few-shot learning (FSL) has already exploited them to boost performance. Arguably, to solve real-world auto-annotation, SSFSL should leverage such open resources. To bridge this gap, we explore established SSL methods to finetune a VLM. Unexpectedly, they significantly underperform FSL baselines that do not use unlabeled data. Our in-depth analysis reveals the root cause of failure: VLMs produce flat distributions of softmax probabilities, resulting in zero utilization of unlabeled data and weak supervision signals. To address this challenge, we propose an embarrassingly simple solution that uses temperatures to sharpen the softmax output, which not only increases the confidence scores of pseudo-labels to improve the utilization of unlabeled data, but also strengthens training supervision for effective finetuning. Furthermore, we exploit task-relevant open data, e.g., those retrieved from VLMs' publicly available pretraining set. To mitigate the imbalance and domain gaps in retrieved data, we employ a stage-wise training strategy. Building on the successful finetuning of VLMs and the exploitation of open data, we present a simple yet effective SSFSL method, Stage-Wise Finetuning with Temperatures (SWIFT). Across five benchmarks, SWIFT outperforms recent FSL and SSL methods by $\sim$5 accuracy points. SWIFT even rivals supervised learning, which finetunes a VLM assuming unlabeled data having ground-truth labels!

02.
arXiv (CS.AI) 2026-06-19

Token Factory: Efficiently Integrating Diverse Signals into Large Recommendation Models

arXiv:2606.19635v1 Announce Type: cross Abstract: Large Recommendation Models (LRMs) have demonstrated promising capabilities in industry-scale recommendation tasks. However, holistically integrating traditional signals into these transformer-based architectures effectively and efficiently remains a major challenge. Conventional approaches that "textualize" these signals directly or create discrete item representations often lead to excessively long prompts, substantial memory footprints, and high computational overhead. To overcome these limitations, we propose "Token Factory", a framework designed to transform traditional signals into "soft tokens" that can be directly processed by LRMs. This approach enables efficient integration and compression of heterogeneous input features, preventing prompt length explosion while enhancing model performance. We detail the architecture of Token Factory and present experimental results validating its effectiveness in a production-scale recommendation environment.

03.
arXiv (CS.CV) 2026-06-25

Counterfeit Answers: Adversarial Forgery against OCR-Free Document Visual Question Answering

Document Visual Question Answering (DocVQA) enables end-to-end reasoning grounded on information present in a document input. While recent models have shown impressive capabilities, they remain vulnerable to adversarial attacks. In this work, we introduce a novel attack scenario that aims to forge document content in a visually imperceptible yet semantically targeted manner, allowing an adversary to induce specific or generally incorrect answers from a DocVQA model. We develop specialized attack algorithms that can produce adversarially forged documents tailored to different attackers' goals, ranging from targeted misinformation to systematic model failure scenarios. We demonstrate the effectiveness of our approach against two end-to-end state-of-the-art models: Pix2Struct, a vision-language transformer that jointly processes image and text through sequence-to-sequence modeling, and Donut, a transformer-based model that directly extracts text and answers questions from document images. Our findings highlight critical vulnerabilities in current DocVQA systems and call for the development of more robust defenses. We release our open source code at https://github.com/pralab/adv-docVQA.

04.
arXiv (CS.AI) 2026-06-12

Emotional regulation improves deep learning-based image classification

arXiv:2606.13081v1 Announce Type: cross Abstract: Emotion significantly influences cognition, enhancing memory and learning under certain conditions. Drawing on this principle, emotion-augmented deep learning investigates how affective states can improve neural network architectures and learning paradigms, achieving better generalization than non-emotional models. However, existing methods often rely solely on objective neurophysiological factors, neglecting the role of subjectivity in emotion. To bridge this gap, the present study introduces Emotional Regulation, a novel framework for modeling emotion in deep learning through artificial subjective experience. The method employs pre-training based on affective stimuli, balancing non-emotional and emotionally-influenced responses in downstream task optimization. Extensive experimentation was conducted in image classification, pre-training ResNet and ViT architectures on four emotional datasets, using CIFAR-10 and -100 as target benchmarks. Results reveal improvements over the aforementioned backbones, providing evidence of Emotional Regulation as a promising method for defining emotion-augmented deep learning through artificial subjective experience. Furthermore, the proposed approach overcomes the related work in image classification based on CIFAR, revealing Emotional Regulation as the new state-of-the-art in emotion-augmented deep learning for large-scale vision datasets. The study also enforces evidence of the impact of affective states in improving machine learning tasks' optimization, encouraging further investigation on emotion-inspired architectures.

05.
arXiv (CS.LG) 2026-06-17

From Compression to Deployment: Real-Time and Energy-Efficient FastGRNN on Ultra-Constrained Microcontrollers

arXiv:2606.17249v1 Announce Type: cross Abstract: The dominant trajectory of modern machine learning has been to scale up: larger models, larger accelerators, larger memory budgets. Yet a multi-year global semiconductor supply constraint and the growing energy and carbon cost of always-online inference expose the fragility of this trajectory and motivate the opposite direction: refactoring AI and ML algorithms to fit the small, ubiquitous microcontrollers already in mass production in wearables, sensors, and edge appliances. We present an end-to-end open-source reproduction of FastGRNN, a compact gated recurrent cell, deployed on two bare-metal targets: the 8-bit Arduino (ATmega328P) and the 16-bit MSP430 (no hardware multiplier; 16 KB Flash; 512 B SRAM). Our compression pipeline combines low-rank weight factorization, iterative hard-thresholding sparsity, and per-tensor Q15 post-training quantization with explicit activation calibration. The deployed model occupies 566 bytes of weights and achieves macro F1 = 0.918 (seed 0; five-seed Q15 mean 0.853+-0.107) on the HAPT test set. It matches a PyTorch reference at 100% prediction agreement across 3,399 test windows (MCU seed 0; 99.91-100% C-equivalent across five seeds). Both platforms sustain real-time 50 Hz streaming inference (9.21 ms per sample on Arduino; 13 ms on MSP430), where a 256-entry sigmoid/tanh look-up table delivers a 30.5x speedup on the multiplier-less MSP430. Four contributions extend the original FastGRNN paper: (i) cross-platform bit-equivalent deterministic inference; (ii) characterization of recurrent warm-up latency (median 74 samples, 1.48 s; worst-case 125 samples, 2.50 s over 100 test windows); (iii) a deployable look-up-table recipe for multiplier-less embedded targets; and (iv) hardware energy characterization showing 17.7 mW active inference power,

06.
arXiv (CS.CV) 2026-06-19

CoMo: Learning Continuous Latent Motion from Internet Videos for Scalable Robot Learning

Unsupervised learning of latent motion from Internet videos is crucial for robot learning. Existing discrete methods generally mitigate the shortcut learning caused by extracting excessive static backgrounds through vector quantization with a small codebook size. However, they suffer from information loss and struggle to capture more complex and fine-grained dynamics. Moreover, there is an inherent gap between the distribution of discrete latent motion and continuous robot action, which hinders the joint learning of a unified policy. We propose CoMo, which aims to learn more precise continuous latent motion from internet-scale videos. CoMo employs an early temporal difference (Td) mechanism to increase the shortcut learning difficulty and explicitly enhance motion cues. Additionally, to ensure latent motion better captures meaningful foregrounds, we further propose a temporal contrastive learning (Tcl) scheme. Specifically, positive pairs are constructed with a small future frame temporal offset, while negative pairs are formed by directly reversing the temporal direction. The proposed Td and Tcl work synergistically and effectively ensure that the latent motion focuses better on the foreground and reinforces motion cues. Critically, CoMo exhibits strong zeroshot generalization, enabling it to generate effective pseudo action labels for unseen videos. Extensive simulated and real-world experiments show that policies co-trained with CoMo pseudo action labels achieve superior performance with both diffusion and auto-regressive architectures.

07.
arXiv (CS.CL) 2026-06-19

From Construction to Injection: Edit-Based Fingerprints for Large Language Models

Reliable model fingerprints are essential for protecting large language models (LLMs) against unauthorized redistribution and commercial misuse. In black-box deployment, verification is hindered by defensive filtering of suspected fingerprint queries, as well as by downstream model modifications that may weaken embedded ownership evidence. These risks require fingerprints to be robust in both construction and injection. For construction, prior paradigms face an imperceptibility trade-off: natural-language fingerprints may be accidentally activated, whereas garbled fingerprints are statistically exposed and easier to filter. For injection, existing methods struggle to preserve persistent trigger–target behaviors under model modification. We propose an end-to-end injected fingerprinting framework to address these challenges. Code-mixing Fingerprints (CF) use lowest-perplexity code-mixing under a high-complexity constraint to mitigate this two-sided imperceptibility trade-off. Multi-Candidate Editing (MCEdit) constructs structurally redundant, margin-separated trigger–target mappings to enable graceful degradation under model modification. Extensive evaluations on imperceptibility, detectability, and harmlessness demonstrate robust ownership verification with negligible impact on utility.

08.
arXiv (CS.CV) 2026-06-15

WAM4D: Fast 4D World Action Model via Spatial Register Tokens

World action models (WAMs) have recently shown promise in jointly modeling future observations and executable robot actions. However, most existing WAMs still operate in 2D video or latent spaces, where visually plausible rollouts miss the 3D spatial constraints and occluded contact geometry required for precise manipulation. While geometric foundation models offer strong priors for recovering dense 3D structure and motion from visual observations, forcing WAMs to predict the dense 4D representation introduces costly geometric decoding and slows down causal action generation. To address the trade-off, we present WAM4D, a fast 4D world action model that uses lightweight spatial register tokens as training-time future-depth readouts to transfer pretrained geometric priors into a causal video-action transformer, then removes the register branch for lightweight action inference. To prevent non-causal shortcuts, we further design causal mixture attention for the Mixture-of-Transformers (MoT) WAM backbone, defining modality-specific visibility among video, action, and geometry tokens. Comprehensive experiments on RoboTwin 2.0 and challenging real-world manipulation tasks show that WAM4D improves spatial consistency and achieves competitive action prediction while maintaining efficient inference.

09.
arXiv (CS.CV) 2026-06-19

Learning When to Denoise: Optimizing Asynchronous Schedules for Latent Diffusion

Multi-representation diffusion models can improve visual synthesis by denoising complementary views of an image, but their performance depends critically on the asynchronous schedule that determines when each representation is denoised. We propose to learn this schedule. Our method formulates asynchronous flow matching over multiple representation spaces and uses a schedule-corrected objective that keeps each representation's local noising-time weights fixed as the schedule changes. We instantiate the schedule with a flexible parametric class that is convex and monotone by construction, and learn it using a fast joint probe with less than 1% additional training compute. On ImageNet 256x256, the learned schedule substantially improves both convergence speed and final quality under a matched 675M-parameter XL backbone. With AutoGuidance, our 200-epoch model reaches FID 1.05, matching the 800-epoch SFD-XL baseline with 4x less training. Training to 600 epochs further improves to FID 1.02, outperforming the 1B-parameter SFD-XXL result of FID 1.04 while using a smaller model. In the unguided setting, our 200-epoch model reaches FID 2.37, already below the best 800-epoch SFD-XL result (2.54) at 4x less training, and improves to FID 2.14 at 600 epochs. Code is available at https://github.com/bsq532087/LWD

10.
arXiv (quant-ph) 2026-06-25

Polymer quantum mechanics on compact configuration spaces

arXiv:2606.06019v2 Announce Type: replace Abstract: "Polymer quantum mechanics" is the name given to a quantization scheme inspired by loop quantum gravity in which the configuration space of the theory is chosen to have a discrete topology. Polymer quantization yields a representation of the canonical commutation relations that is genuinely distinct from the conventional "Schrödinger" representation. In this paper, we summarize the main features of polymer quantum mechanics and investigate in detail the polymer quantization of systems with configuration spaces that are classically compact. We show explicitly how using the standard construction of polymer states leads to a Hilbert space of states defined on a finite graph of points. By way of example, we find the exact energy eigenvalues and eigenfunctions for a particle on a ring and a particle in a box defined on such lattices, and discuss similarities and differences from standard Schrödinger quantum mechanics. We also explore the continuum limit of states in these systems, and demonstrate in detail how the exact eigenfunctions in the position representation approach their continuum counterparts.

11.
bioRxiv (Bioinfo) 2026-06-11

Calibrated Uncertainty Quantification for Patient-Level AML Drug Sensitivity Prediction Using Split Conformal Prediction

Accurate prediction of ex vivo drug sensitivity in acute myeloid leukemia (AML) patients from transcriptomic data is a critical challenge for precision oncology. Existing computational approaches have explored uncertainty quantification in cancer drug response prediction primarily using cell line data, while patient-level AML models typically rely on heuristic confidence measures rather than statistically calibrated uncertainty estimates. Here, we present a framework applying split conformal prediction to patient-level AML drug response modeling using the BeatAML 2.0 cohort. We trained Elastic Net and XGBoost regressors on bulk RNA-seq gene expression profiles from 318 AML patients, analyzing 34,764 patient-drug observations across 122 compounds. Baseline models achieved median Pearson R values of 0.291 (Elastic Net) and 0.281 (XGBoost) across 122 drugs. Wrapping these models with split conformal prediction yielded well-calibrated prediction intervals across three confidence levels: empirical coverages of 81.4%, 90.7%, and 95.5% against nominal targets of 80%, 90%, and 95%, respectively. Analysis of prediction interval widths revealed substantial drug-class-specific uncertainty patterns, with HDAC and BCL-2 inhibitors exhibiting markedly higher uncertainty than MDM2 inhibitors, suggesting a potential association between transcriptomic predictability and drug mechanism of action, although several drug classes were represented by only a small number of compounds. Predictive uncertainty was not significantly associated with ELN2017 molecular risk classification (Kruskal-Wallis p=0.395) or NPM1 mutation status (p=0.788). These results demonstrate that statistically valid uncertainty quantification can be achieved for patient-level AML drug response prediction despite substantial biological heterogeneity. to the best of our knowledge, no published study has applied split conformal prediction to patient-level ex vivo drug sensitivity prediction in the BeatAML cohort, providing a principled alternative to heuristic confidence scoring approaches. Keywords: Acute myeloid leukemia (AML); Ex vivo drug sensitivity; Conformal prediction; Uncertainty quantification; Precision oncology; BeatAML; Transcriptomic biomarkers; Machine learning.

12.
PLOS Medicine 2026-06-09

Prediction of hospitalisation in young children with pneumonia in Malawi: A machine learning-based approach

by Patrick Staunton, Mohammad Adib Makrooni, Master Chisale, Billy Nyambolo, Joseph Wu, Damien McCarthy, Mark Ledwidge, Yasir Bin Nisar, Chris Watson, Balwani Mbakaya, Cathal Seoighe, Joe Gallagher Background Globally, pneumonia remains the single biggest cause of mortality in children under 5 years of age. This study sought to train and test a prediction model for hospitalisation within 7 days after initial presentation in 2- to 59-month-old Malawian children with WHO-defined pneumonia in primary care and compare its performance to existing risk prediction models. Methods and findings BIOTOPE is a cohort study of children with pneumonia in a primary healthcare setting in Malawi. The training cohort involved nine primary care centres and the testing cohort involved two primary care centres in Northern Malawi. The training cohort was recruited between December 2022 and April 2023 while the testing cohort was recruited in 2016. Participants were consecutive children aged 2–59 months presenting with cough and/or difficulty breathing and who were diagnosed as WHO-defined pneumonia in primary care of any severity. The training cohort was used to train and validate a machine learning model with a prespecified primary outcome defined as hospitalisation and/or death within 7 days as the outcome. This model was then further evaluated in the testing cohort.Median age was 15 months (interquartile range 8−27) in the training and 17 months (interquartile range 9−29) in the external testing cohort (52.1% and 54.4% male, respectively). Hospitalisation occurred in 14.3% (294) of the training cohort and 12.1% (55) of the testing cohort. There was one death in the training cohort only. WHO danger signs were present in 17.6% (360) and 15.9% (70) of children in the training and testing cohorts, respectively. The optimal machine learning model achieved an area under the receiver operating characteristic and precision recall curves of 0.87 and 0.57, respectively, in the testing cohort outperforming existing risk prediction models; furthermore, this model produced an expected calibration error of 0.16 (a logistic regression model using severity status as the response variable and the log odds of the machine learning model’s calibrated probabilities produced an intercept estimate of −0.32 and a slope estimate of 1.13). Key limitations include the use of hospitalisation and/or death as a severity outcome, which may reflect health system factors rather than true disease severity, that mortality-based comparisons were not possible due to low mortality in these primary care cohorts, and that comparator tools were developed for hospital populations rather than primary care populations. Conclusion This machine learning score outperformed traditional pneumonia risk scores in predicting hospitalisation within 7 days in Malawian children presenting to primary care. Traditional pneumonia risk scores diminish in performance when externally applied to new datasets suggesting they may not generalise well beyond their original derivation settings. Mortality-related findings are not applicable as there was only one death in this cohort. Overall these findings support the potential of machine learning to meaningfully improve early identification of children at risk of severe pneumonia in low-resource primary care settings. Further external validation and clinical impact studies are needed to confirm these results.

13.
arXiv (CS.LG) 2026-06-24

Computational references are not experiments: pre-registered validation of machine-learned sodium-cathode voltages

arXiv:2606.23725v1 Announce Type: cross Abstract: Machine-learning screens for battery materials are trained and judged almost entirely against computed reference voltages, and those references carry their own systematic errors. We report a case in which this matters quantitatively: our own screening stack (a graph-network voltage screen, a prior-art triage layer, and a local PBE+U bench) fails pre-registered validation against experiment-anchored literature values. Verdict thresholds, failure modes, and the primary metric were committed before analysis. On an operator-audited set of known Na-ion cathodes (n = 6 after one documented exclusion; verdict unchanged at n = 7), the raw held-out mean absolute error was 0.67 V, the pre-registered conservative metric, the upper 95% confidence bound of the cross-validated bias-corrected error, was 1.09 V, and the residual was strongly voltage-dependent (r = -0.94), so no additive calibration is valid. On the two compounds where prediction, database reference, and experiment could all be compared, the Materials Project PBE+U reference sat about 0.54 V below measurement: the reference, not the model, dominated the error. A prior-art screen found at least 70% of the targeted Na substitution space already published. We retire the screen, bound what "verified" means for our DFT ledger, and pre-register a calibration audit of it against four benchmark Li couples.

14.
bioRxiv (Bioinfo) 2026-06-19

Children's DNA Methylation and Family Dynamics in a Congo Basin Subsistence Community: Links with Parental Conflict and Fathers' Caregiving

Family environments may contribute to children's long-term health through biological processes, including epigenetic regulation such as DNA methylation (DNAm). However, most studies in this area focus on Euro-American populations while also rarely including fathering data. The current study investigated children's blood DNAm associations with positive (father caregiving) and negative (parental conflict) family dynamics in a smaller-scale subsistence society living in the Congo Basin rainforest. We measured DNAm from dried blood spots of 54 children (mean age=8.48 years) and conducted three epigenome-wide association studies aimed at discovering differential co-methylated regions (CMRs) associated with family dynamics. Via path models, we investigated the health implications and shared contribution of family factors of the identified CMRs. Differential DNAm associated with family dynamics was localized to genes related to stress, immunology, development, and aging, thus possibly linking to children's physical health and were simultaneously connected to other family factors such as number of siblings. Our findings suggested similarities in biological embedding of family factors across socio-ecologically diverse contexts.

15.
arXiv (CS.AI) 2026-06-18

From Memorization to Parameter Interference: How Overtraining Experts Harms Model Merging

arXiv:2506.14126v2 Announce Type: replace-cross Abstract: Modern deep learning is increasingly characterized by the use of open-weight foundation models that can be fine-tuned on specialized datasets. This has led to a proliferation of expert models and adapters, often shared via platforms like HuggingFace and AdapterHub. Model merging has recently emerged as an effective way to leverage these existing resources, enabling the composition of capabilities from different model checkpoints. A natural pipeline has thus formed to harness the benefits of transfer learning and amortize sunk training costs: models are pre-trained on general data, fine-tuned on specific tasks, and then multiple checkpoints are merged to obtain a more capable model. A prevailing assumption is that improvements at one stage of this pipeline propagate downstream, leading to gains at subsequent steps. In this work, we challenge that assumption by examining how expert fine-tuning affects model merging. We show that long fine-tuning of experts that optimizes for their individual performance leads to degraded merging performance across vision and language modalities, multiple model scales, and both fully fine-tuned and LoRA-adapted models. We trace this degradation to the memorization of a small set of difficult examples that dominate late fine-tuning steps. This causes negative parameter interference and encodes knowledge that is forgotten during merging. Finally, we demonstrate that task-dependent aggressive early stopping strategies can significantly improve model merging performance.

16.
medRxiv (Medicine) 2026-06-22

Sex-specific multimorbidity clusters and all-cause mortality in relatively healthy older adults: findings from the ASPREE cohort

Background: Multimorbidity is common in older adults, but sex differences in chronic condition clustering remain unclear. This study explored multimorbidity clusters and their associations with all-cause mortality among community-dwelling adults aged 70 years and over. Methods: This was a secondary analysis of data from 16,095 Australian ASPREE participants aged at least 70 years without prior dementia or cardiovascular disease. Fifteen baseline chronic conditions were grouped using latent class analysis (LCA). Observed-to-expected (O/E) ratios characterised conditions over-represented within clusters, and Cox proportional hazards models assessed associations with all-cause mortality. Results: Among 16,095 participants (mean age 74 years), 88.3% had multimorbidity at baseline; 4,217 deaths occurred over a median follow-up of 10.85 years. Five clusters were identified overall: hypertension and dyslipidemia (52.1%), gout and metabolic (14.4%), depressive symptoms, osteoporosis and frailty (10.0%), anaemia and kidney disease (10.2%), and hypotension, thyroid disorder and past cancer (13.3%). Sex-stratified analyses revealed three clusters in males and four in females. The frailty, depressive symptoms and osteoporosis cluster was associated with higher mortality in both sexes (aHR 1.56 [95% CI 1.40-1.73] in males; 1.68 [1.49-1.89] in females). Higher mortality was also observed for the metabolic, gout and kidney disease cluster in males (aHR 1.63 [1.47-1.81]) and the gout, anaemia and kidney disease cluster in females (aHR 1.96 [1.74-2.21]). Conclusions: Distinct multimorbidity clusters differed by sex and were associated with increased all-cause mortality. These findings may support risk stratification, targeted screening, and more person-centred management of older adults with multimorbidity.

17.
arXiv (CS.AI) 2026-06-18

HAARES Half-Split Residual Basis Routing for Deep Transformers

Authors:

arXiv:2606.06564v2 Announce Type: replace-cross Abstract: Block-level residual routing makes learned residual aggregation practical by routing over block summaries, but each summary compresses an ordered sequence of attention and MLP updates into one cumulative vector. We propose \method{}, a lightweight residual basis router that keeps the cumulative block source and adds one half-split detail basis, computed as the difference between first-half and second-half residual updates. The detail basis is RMS-matched and updated online, exposing coarse intra-block trajectory information without dense sublayer-level routing. Across OpenWebText, cross-domain character-level benchmarks, and BPE-tokenized OpenWebText, the empirical pattern is depth-dependent: gains are small or mixed at shallow depth and most reliable in 48-layer models. In the 201M 48-layer setting, \method{} improves over Block AttnRes across all three seeds, while a 453M two-seed probe shows the same direction. Ablations rule out source duplication, random signed details, fixed detail-source biases, or block-count changes alone. Cost analysis shows that the method is FLOP-light but not wall-clock-free: it adds memory and routing overhead, yet its relative arithmetic cost is amortized as width grows and earlier convergence can reduce time-to-target.

18.
medRxiv (Medicine) 2026-06-18

Guiding the development of climate counterfactuals for health impact attribution studies

Climate change detection and attribution (D&A) methods have become vital for quantifying the influence of anthropogenic forcing on the Earth's systems, including human health. Health impact attribution (HIA) studies seek to disentangle climate-driven health effects from natural variability yet are often constrained by the availability of accessible counterfactual climate scenarios. This tutorial paper presents a flexible, reproducible framework for developing counterfactual climates without reliance on computationally intensive global circulation models. We provide practical, R-based methodologies for constructing both trend-based (temperature and non-temperature) and event-based counterfactual, using a variety of techniques including model residual detrending, data-driven decomposition (e.g., Singular Spectrum Analysis and Empirical Mode Decomposition) and stochastic weather generators. The tutorial also explores the incorporation of greenhouse gas concentrations as forcing variables, rather than global mean temperature anomalies. By operationalising these methods through worked examples and an open code repository, this paper aims to build capacity within the HIA community, enhance methodological transparency, and foster interdisciplinary collaboration between climate and health researchers.

19.
arXiv (CS.AI) 2026-06-18

R2BC: Multi-Agent Imitation Learning from Single-Agent Demonstrations

arXiv:2510.18085v2 Announce Type: replace-cross Abstract: Imitation Learning (IL) is a natural way for humans to teach robots, particularly when high-quality demonstrations are easy to obtain. While IL has been widely applied to single-robot settings, relatively few studies have addressed the extension of these methods to multi-agent systems, especially in settings where a single human must provide demonstrations to a team of collaborating robots. In this paper, we introduce and study Round-Robin Behavior Cloning (R2BC), a method that enables a single human operator to effectively train multi-robot systems through sequential, single-agent demonstrations. Our approach allows the human to teleoperate one agent at a time and incrementally teach multi-agent behavior to the entire system, without requiring demonstrations in the joint multi-agent action space. We show that R2BC methods match, and in some cases surpass, the performance of an oracle behavior cloning approach trained on privileged synchronized demonstrations across four multi-agent simulated tasks. Finally, we deploy R2BC on two physical robot tasks trained using real human demonstrations.

20.
bioRxiv (Bioinfo) 2026-06-16

MetaPilot: genome-aware adaptive search-space refinement for unified DDA and DIA metaproteomics

Metaproteomic peptide identification is constrained by the structure and size of the protein search space. Pooled gene catalogues provide coverage but obscure genome-level evidence, and current workflows for data-dependent (DDA) and data-independent (DIA) acquisition diverge in their database strategies. We present MetaPilot, a genome-aware workflow that uses conserved marker-protein evidence to rank candidate genomes from MGnify catalogues and construct adaptive, sample-specific search spaces. Applied to paired DDA/DIA datasets of defined mixtures and fecal samples, MetaPilot adapted genome selection to community complexity and reproduced published peptide evidence while expanding the detectable peptide space. In DDA-independent reanalysis of Orbitrap human gut DIA data, MetaPilot identified 24.4% more peptides than the published DDA-derived library and 2.06-fold more than the matched DDA-assisted DIA search. On timsTOF DIA-PASEF mouse intestinal data, it outperformed uMetaP by 41.8~119.7%, enabling genome-resolved functional interpretation without DDA-PASEF input.

21.
arXiv (quant-ph) 2026-06-19

Strain- and Electric-Field-Tunable Valley Polarization in Mo0.75V0.25Te2(Mo3VTe8) for Valleytronic Application

arXiv:2606.19954v1 Announce Type: cross Abstract: Valley polarization in 2D TMDs is promising for low-power valleytronic and spin-valley information processing, but time-reversal symmetry in pristine nonmagnetic TMDs keeps the K+ and K- valleys degenerate, limiting device applications. In this work, we investigated the structural stability, electronic properties, and tunable valley polarization of V-alloyed MoTe2 monolayer, Mo0.75V0.25Te2, using first-principles density functional theory (DFT) calculations. Substitutional alloying of MoTe2 with V introduced magnetic exchange interaction, which, together with spin-orbit coupling (SOC), lifted the valley degeneracy at the unequal valleys. The alloyed structure was found to be energetically and dynamically stable due to the absence of imaginary phonon modes. In pristine MoTe2, SOC produced spin splittings of 34.0 meV and 218.9 meV in the conduction bands and valence bands, respectively, but no valley polarization was observed. In contrast, Mo0.75V0.25Te2 exhibited spontaneous valley polarization of 37.3 meV in the conduction band and 78.2 meV in the valence band. The valley polarization was further enhanced by external electric fields and biaxial strain. A transverse electric field along the crystal c axis produced the maximum valley splitting of 132.8 meV in the valence band, whereas biaxial tensile strain increased the valence band valley splitting up to 160.8 meV. The maximum conduction band valley splitting reached 54.4 meV under 2% biaxial compressive strain. These results demonstrated that V alloying, combined with electric-field and strain engineering, provides an effective strategy for achieving large and tunable valley polarization in MoTe2. Thus, Mo0.75V0.25Te2 can be considered a promising 2D platform for tunable valleytronic device applications, such as transistors and sensors.

22.
arXiv (CS.AI) 2026-06-25

CausalRAG2: Hierarchical Causal Knowledge Graph Design for RAG

arXiv:2602.05143v2 Announce Type: replace Abstract: Retrieval augmented generation (RAG) has enhanced large language models by enabling access to external knowledge, with graph-based RAG emerging as a powerful paradigm for structured retrieval and reasoning. However, existing graph-based methods often over-rely on entity-centric node matching and lack explicit causal modeling, leading to unfaithful or spurious answers. Prior attempts to incorporate causality are typically limited to local or single-document contexts and also suffer from information isolation that arises from modular graph structures, which hinders scalability and cross-module causal reasoning. To address these challenges, we propose CausalRAG2, a framework that rethinks knowledge organization for graph-based RAG through causal gating across hierarchical modules. CausalRAG2 explicitly models causal relationships to suppress spurious correlations while enabling scalable reasoning over large-scale knowledge graphs. We also introduce HolisQA, a benchmark for holistic comprehension beyond entity-centric matching. Extensive experiments demonstrate that CausalRAG2 consistently outperforms competitive graph-based RAG baselines across multiple datasets and evaluation metrics. Our work establishes a principled foundation for structured, scalable, and causally grounded RAG systems. Our code and HolisQA benchmark are available at https://github.com/Pwnb/CausalRAG2.

23.
arXiv (CS.AI) 2026-06-25

CauScale: Neural Causal Discovery at Scale

arXiv:2602.08629v2 Announce Type: replace-cross Abstract: Causal discovery is essential for advancing data-driven fields such as scientific AI and data analysis, yet existing approaches face significant time- and space-efficiency bottlenecks when scaling to large graphs. To address this challenge, we present CauScale, a neural architecture designed for efficient causal discovery that scales inference to graphs with up to 1000 nodes. CauScale improves time efficiency via a reduction unit that compresses data embeddings and improves space efficiency by adopting tied attention weights to avoid maintaining axis-specific attention maps. To keep high causal discovery accuracy, CauScale adopts a two-stream design: a data stream extracts relational evidence from high-dimensional observations, while a graph stream integrates statistical graph priors and preserves key structural signals. CauScale successfully scales to 500-node graphs during training, where prior work fails due to space limitations. Across testing data with varying graph scales and causal mechanisms, CauScale achieves 99.6% mAP on in-distribution data and 84.4% on out-of-distribution data, while delivering 4-13,000 times inference speedups over prior methods. Our project page is at https://github.com/OpenCausaLab/CauScale.

24.
arXiv (CS.CL) 2026-06-16

A Large-Scale Multi-Dimensional Empirical Study of LLMs for Conversation Summarization

Despite the significant advancement of LLMs in conversation summarization, their evaluation remains limited by insufficient scenarios, input lengths, and sample sizes. Furthermore, existing benchmarks often omit frontier reasoning systems and efficient small models, or lack fine-grained, multi-dimensional assessments. To bridge these gaps, we propose OmniCSEval, a unified benchmark comprising 1,800 diverse conversations across six real-world scenarios, featuring context lengths ranging from 128 to 32k tokens. For fine-grained evaluation, we employ a bidirectional fact-checking framework that integrates key fact matching to assess completeness and conciseness, alongside summary fact verification to evaluate faithfulness. To ensure reliable assessment, we establish a human-LLM collaborative pipeline for key fact extraction and a multi-LLM consensus verifier for summary fact decomposition. Leveraging this framework, we evaluate 28 LLMs across four distinct categories grouped by reasoning capability and model scale. Our extensive empirical study reveals critical insights regarding the cross-scenario challenges current LLMs continue to face, the impacts of reasoning and scale, and the efficiency and adaptability of reasoning models. We also provide guidance for system selection in real-world deployments.

25.
arXiv (CS.CL) 2026-06-25

Beyond Next-Observation Prediction: Agent-Authored World Modeling for Sequential Decision Making

Recent studies on world modeling for Large Language Model (LLM) agents typically formulate the learning objective as next-observation prediction. However, this objective ties supervision to what a transition happens to reveal, which may omit the dynamics most relevant to the agent's current decision. To bridge this gap, we propose Agent-Authored World Modeling (AAWM), a training procedure that constructs supervision from the policy's own decision needs. Specifically, at each state, the agent identifies what it needs to understand about the environment before acting. These needs drive the retrieval of relevant transition evidence across trajectories, which is then synthesized into training targets that capture decision-oriented dynamics instead of reconstructing the next observation. This aligns the training objective with the dynamics the policy needs before acting, not with the contents of the next observation. Experimental results validate the effectiveness of AAWM across multiple environments and training settings. These results show that decision-aware world-model targets provide a more effective learning signal than next-observation prediction.