Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-11

Privacy-Preserving Federated Autoencoder for ECG Anomaly Detection on Edge Devices

arXiv:2606.11556v1 Announce Type: cross Abstract: Continuous electrocardiography (ECG) monitoring could surface rhythm abnormalities before they escalate into cardiovascular events. However, a deployable system must satisfy three requirements simultaneously: legal-grade privacy (GDPR, HIPAA), real-time inference on constrained edge hardware, and detection quality under non-IID cross-hospital data. We design and evaluate an end-to-end federated system addressing all three for unsupervised 12-lead ECG anomaly detection on PTB-XL dataset, combining three autoencoder families (VanillaAE, ConvAE, VAE), Flower-based federated averaging (FedAvg) across ten simulated hospitals, client-side differentially private SGD (DP-SGD) with a Rényi-DP accountant, and 8-bit integer (INT8) post-training quantization with Raspberry Pi 4 benchmarking. Our main contributions are: an empirical characterization of how these mechanisms compose, practical DP-specific recommendations, and technical and security insights for a clinically sensitive setting. Federated learning matches or exceeds the centralized baseline across all architectures (ConvAE federated area under the ROC curve, AUROC, $0.782$), and an $\varepsilon$ sweep identifies $\varepsilon=4$ as the recommended clinical operating point. INT8 quantization roughly halves model size and cuts Pi 4 latency by up to $44%$ with $

02.
arXiv (CS.AI) 2026-06-11

A Physics-Inspired Optimizer: Velocity Regularized Adam

arXiv:2505.13196v3 Announce Type: replace-cross Abstract: We introduce Velocity-Regularized Adam (VRAdam), a physics-inspired optimizer for training deep neural networks that draws on ideas from quartic terms for kinetic energy with its stabilizing effects on various system dynamics. Previous algorithms, including the ubiquitous Adam, operate at the so-called adaptive edge of stability regime during training, leading to rapid oscillations and slowed convergence of loss. However, VRAdam adds a higher order penalty on the learning rate based on the velocity such that the algorithm automatically slows down whenever weight updates become large. In practice, we observe that the effective dynamic learning rate shrinks in high-velocity regimes, and damping oscillations. By combining this velocity-based regularizer for global damping with per-parameter scaling of Adam, we create a powerful hybrid optimizer. For this optimizer, we provide rigorous theoretical analysis of operation at the edge of stability from a physical and control perspective for the momentum. Furthermore, we derive convergence bounds with the rate $\mathcal{O}(\ln(N)/\sqrt{N})$ for a stochastic non convex objective under mild assumptions. We demonstrate that VRAdam exceeds the performance against standard optimizers including AdamW. We benchmark various tasks such as image classification, language modeling, and generative modeling using diverse architectures and training methodologies including Convolutional Neural Networks (CNNs), Transformers, and GFlowNets.

03.
arXiv (CS.AI) 2026-06-17

BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?

arXiv:2510.18003v2 Announce Type: replace-cross Abstract: The convergence of LLM-powered research assistants and AI-based peer review systems creates a critical vulnerability: fully automated publication loops where AI-generated research is evaluated by AI reviewers without human oversight. We investigate this through BadScientist, a framework that evaluates whether fabrication-oriented paper generation agents can deceive multi-model LLM review systems. Our generator employs presentation-manipulation strategies requiring no real experiments. We develop a rigorous evaluation framework with formal error guarantees (concentration bounds and calibration analysis), calibrated on real data. Our results reveal systematic vulnerabilities: fabricated papers achieve acceptance rates up to . Critically, we identify concern-acceptance conflict – reviewers frequently flag integrity issues yet assign acceptance-level scores. Our mitigation strategies show only marginal improvements, with detection accuracy barely exceeding random chance. Despite provably sound aggregation mathematics, integrity checking systematically fails, exposing fundamental limitations in current AI-driven review systems and underscoring the urgent need for defense-in-depth safeguards in scientific publishing.

04.
arXiv (CS.CV) 2026-06-25

A Controlled Study of CLIP-Based Body-Scene Fusion for Emotion Recognition in Context

Apparent emotion in natural images is often not visible from the face alone. The face may be small, hidden, or neutral, while posture and scene context carry much of the evidence. This work studies context-aware emotion recognition on EMOTIC with an image-only two-stream model. A ResNet-18 body stream encodes the target-person crop, and a CLIP ViT-B/16 scene stream encodes the full image. The fused feature predicts 26 categorical emotion labels and the continuous valence, arousal, and dominance values. This study examines whether small context-debiasing or rare-class training changes still help after adding a CLIP scene encoder. The clean two-stream model is compared with simplified CCIM-style intervention, CLEF-lite context-bias subtraction, ASL tuning, and class-balanced sampling under the same implementation pipeline. No tested variant improves over the clean two-stream model, which achieves 34.52% mAP on the EMOTIC test split. CLIP gives the model broad scene semantics, but the simplified causal, counterfactual, and rare-class changes do not automatically improve performance. Most remaining errors are in rare and subtle emotion categories, so the next step should focus on label relationships and finer subject-context interaction.

05.
arXiv (CS.AI) 2026-06-19

Optimal Order of Multi-Agent and General Many-Body Systems

Authors:

arXiv:2606.20485v1 Announce Type: cross Abstract: This paper develops a general framework for analyzing multi-agent systems with feedback loops between agents actions and collective observations. The framework is built on two fundamental agent-level variables: power, which measures agent influence on collective outcomes, and response functions, which determine how agents react to observations. We derive how macroscopic properties, including total power, useful power, entropy, order, fragility, and mobility, emerge from these two variables of heterogeneous agents. To study the trade off between growth and resilience, we introduce a system-level utility function parameterized by a risk-appetite coefficient and derive an optimal degree of order that balances productivity, stability, and adaptability. The analysis suggests that stronger synchronization can increase collective output but may also increase systemic fragility and reduce mobility. We further argue that order, entropy, information, and useful energy are task-dependent and system-relative concepts whose meanings depend on the objectives of the system. By measuring and designing agent power distributions and response functions, it may be possible to better understand, predict, and optimize collective behavior and identify the conditions under which collective intelligence and optimal order emerge.

06.
arXiv (quant-ph) 2026-06-16

Intrinsic preservation of plasticity in continual quantum learning

arXiv:2511.17228v2 Announce Type: replace Abstract: Artificial intelligence in dynamic, real-world environments requires the capacity for continual learning. However, standard deep learning suffers from a fundamental issue: loss of plasticity, in which networks gradually lose their ability to learn from new data. Here we show that quantum learning models naturally overcome this limitation, preserving plasticity over long timescales. We demonstrate this advantage systematically across a broad spectrum of tasks from multiple learning paradigms, including supervised learning and reinforcement learning, and diverse data modalities, from classical high-dimensional images to quantum-native datasets. Although classical models exhibit performance degradation correlated with unbounded weight and gradient growth, quantum neural networks maintain consistent learning capabilities regardless of the data or task. We identify the origin of the advantage as the intrinsic physical constraints of quantum models. Unlike classical networks where unbounded weight growth leads to landscape ruggedness or saturation, the unitary constraints confine the optimization to a compact manifold. Our results suggest that the utility of quantum computing in machine learning extends beyond potential speedups, offering a robust pathway for building adaptive artificial intelligence and lifelong learners.

07.
arXiv (CS.AI) 2026-06-18

From Memorization to Creation: Evaluating the Cognitive Depth of LLM-Generated Educational Questions

arXiv:2606.18257v1 Announce Type: cross Abstract: While LLMs show promise in automating educational content creation, their ability to generate questions that stimulate higher-order thinking remains understudied. This work evaluates six widely-used LLMs through a Bloom's Taxonomy lens, focusing on their capacity to transcend rote memorization and achieve cognitive leaps. Using a hybrid human–AI evaluation protocol, we generate and analyze 20{,}700 questions across computer science, K–12 math, and social-science domains. Key contributions include: (1) a fine-grained prompting strategy that reduces question repetitiveness by 24.45\% for Qwen2.5-7B-Instruct, and increases the proportion of higher-order cognitive level outputs by 11.53\% for InternLM3-8B-Instruct; (2) quantitative metrics for cognitive shift intensity (CogShift) and category drift, revealing InternLM3's superior performance in multi-level transitions; (3) an interpretability analysis revealing metric-level correlations that enhance the transparency of Chain-of-Thought prompting. Our findings highlight the importance of cognitive-aware prompt design and provide benchmarks for deploying LLMs in personalized learning systems.

08.
arXiv (CS.LG) 2026-06-16

Semi-Supervised Noise Adaptation: Transferring Knowledge from Noise Domain

arXiv:2606.00558v2 Announce Type: replace Abstract: Transfer learning aims to facilitate the learning of a target domain by transferring knowledge from a source domain. The source domain typically contains semantically meaningful samples (*e.g.*, images) to facilitate effective knowledge transfer. However, a recent study observes that the noise domain constructed from simple distributions (*e.g.*, Gaussian distributions) can serve as a surrogate source domain in the semi-supervised setting, where only a small proportion of target samples are labeled while most remain unlabeled. Based on this surprising observation, we formulate a novel problem termed *Semi-Supervised Noise Adaptation* (SSNA), which aims to leverage a synthetic noise domain to improve the generalization of the target domain. To address this problem, we first establish a generalization bound characterizing the effect of the noise domain on generalization, based on which we propose a Noise Adaptation Framework (NAF). Extensive experiments demonstrate that NAF effectively leverages the noise domain to tighten the generalization bound of the target domain, leading to improved performance. The codes are available at https://github.com/AIResearch-Group/SSNA.

09.
arXiv (CS.AI) 2026-06-11

FitText: Evolving Agent Tool Ecologies via Memetic Retrieval

arXiv:2605.02411v2 Announce Type: replace Abstract: A semantic gap separates how users describe tasks from how tools are documented. As API ecosystems scale to tens of thousands of endpoints, static retrieval from the initial query alone cannot bridge this gap: the agent's understanding of what it needs evolves during execution, but its tool set does not. We identify this retrieval interface, not planning, as the binding constraint on end-to-end agent performance, and introduce FitText, a training-free framework that makes retrieval dynamic by embedding it directly in the agent's reasoning loop. FitText treats retrieval as test-time evolution of hypotheses: the agent generates natural-language pseudo-tool descriptions (revisable beliefs about the tool it needs), refines them iteratively using retrieval feedback, and explores diverse alternatives through stochastic generation. Memetic Retrieval adds evolutionary selection pressure over candidate descriptions, guided by a tool memory that avoids redundant search. On ToolRet (three domains), FitText's reformulation strategies improve NDCG@5 by 2.7 to 10.6 points over static query retrieval across all base models; on StableToolBench (16,464 APIs) with GPT-5.4-mini, Memetic reaches an 84.3% pooled pass rate, a 26.7-point absolute gain over static query retrieval.

11.
arXiv (CS.LG) 2026-06-16

Temporal Validation Changes the Apparent Public-Health Utility of Under-Five Mortality Prediction in Bangladesh: A Four-Round DHS Machine-Learning Study

arXiv:2602.03957v2 Announce Type: replace Abstract: Background: Under-five mortality in Bangladesh remains uneven despite national progress. DHS-based prediction models may guide targeted follow-up, but only if validation reflects future use. We examined how validation design changes apparent prediction performance. Methods: Four BDHS rounds (2011-2022; 33,962 children; 1,290 deaths) were analysed with a 26-feature pipeline and three model classes under four validation regimes, including cross-survey temporal validation (train 2011+2014, calibrate 2017, test 2022). A 32-unit ELU multilayer perceptron was selected via genetic-algorithm neural architecture search. AUROC used 2,000 bootstrap resamples; screening utility used sensitivity, PPV, and number needed to screen (NNS) at fixed capacity. Results: Validation regime altered public-health interpretation more than model class. NAS MLP AUROC ranged from 0.669 (2022-only random) to 0.775 (pooled random), with temporal AUROC 0.730. At the top-10% temporal threshold, NAS identified 152/355 deaths in 2022 (sensitivity 42.8%, PPV 13.2%, NNS 7.6). NNS across designs ranged from 5.6 to 11.0. Conclusions: Validation-regime choice changed screening workload and apparent policy value more than architecture. Temporal validation supports defensible estimates of follow-up and referral demand; DHS child-mortality studies should report sensitivity, PPV, and NNS before programmatic use.

12.
arXiv (quant-ph) 2026-06-12

Toward Entanglement Bootstrap for Conformal Field Theory in Any Dimension

arXiv:2606.12540v1 Announce Type: cross Abstract: Given a quantum critical wavefunction in any dimension, we propose a reconstructed Hamiltonian, analogous to the ones previously found for 1+1d CFT and for 2+1d bosonic liquid topologically-ordered states. We test numerically that, for known regularized approximate CFT groundstates (on the icosahedron and the fuzzy sphere), (1) they are close to the groundstate of their reconstructed Hamiltonian, and (2) the spectrum of their reconstructed Hamiltonian on the unit sphere has CFT properties (integer spacing of descendants) and matches known low-lying energies. We show that this provides an automated method to improve the finite-size effects in a fixed Hilbert space.

13.
arXiv (CS.CL) 2026-06-11

Soft-Prompt Tuning for Fair and Efficient LLM Benchmark Evaluation

Benchmark scores often misrepresent a large language model's (LLM's) knowledge, because they rely, e.g., on the model's ability to follow specific formatting requirements. This especially penalizes base models that may know the correct answers but lack the ability – typically introduced in post-training – to structure them as instructed. To overcome this, we propose soft-prompt tuning, an efficient, fair, and architecture-agnostic model evaluation. By optimizing only 10 soft-prompt vectors (roughly 0.0006% parameters for a 7B model) over a short tuning period, we adapt models to specific benchmark formats, closing gaps in format-following and ensuring that underlying knowledge is accurately reflected in benchmark scores. This allows one to fairly compare different base models – trained with various pre-training recipes – on benchmarks without the need for full post-training. We evaluated soft-prompt tuning across 7 models and 7 datasets. The results show that (a) soft-prompt tuning saturates format-following within 80 steps (~640 samples) making it highly efficient, (b) soft-prompt tuning significantly outperforms zero- and few-shot prompting, surfacing base model knowledge that standard prompting misses, that (c) even post-trained models can benefit from soft-prompts to maximize format compliance, and that (d) soft-prompted base model performance predicts post-trained model rankings more reliably than zero- and few-shot baselines, offering a low-cost proxy for downstream model quality. Our contributions include (1) metrics which disentangle format-following and knowledge accuracy, (2) a fairer benchmarking protocol of LLM knowledge, and (3) a cost- and memory-effective recipe to identify optimal pre-training strategies early in LLM development.

14.
arXiv (CS.CV) 2026-06-17

ED3R: Energy-Aware Distributed Disaster Detection Enabled by Cooperative Robotic Agents

Robotics are expected to support environmental monitoring and natural disaster management, where decisions must be made under uncertainty, resource limitations, and strict operational constraints. In critical missions, such as wildfires, robotic agents must not only identify hazardous events with sufficient confidence, but also manage the energy cost and time until detection. This paper introduces ED3R, an energy-aware distributed framework for wildfire detection under uncertainty. ED3R enables hierarchical cooperative decision-making between a robot and a remote controller. The remote controller decides upon the robot's motion, while the robot senses the environment and decides where to execute the wildfire detection (onboard or remotely) and how. The common goal is to detect wildfires with a required confidence while minimizing the energy consumed by any robot operation. ED3R further integrates mechanisms to avoid nearby obstacles, prevent redundant exploration, enable adaptive early mission completion, and ensure feasibility through a custom penalty function. ED3R also introduces a forward-looking capability, enabled through distributed neural regression models that allow the agents to anticipate the future by evaluating candidate strategies before execution. The framework is evaluated through realistic robotics simulations, ablation studies, and baseline comparisons. Overall, ED3R achieves a mission success rate of up to 97.18%. Especially in the most demanding missions, it reduces energy consumption by up to 36.4% and detects wildfires up to 41% faster than baselines.

15.
bioRxiv (Bioinfo) 2026-06-16

PhenoBIC: operator-free single-cell spatial phenotyping in multiplex imaging data using deep learning of cell staining patterns

Multiplex imaging is a valuable tool for spatially examining tissue microenvironments at the single-cell level to uncover biological and clinical insights. However, most multiplex image analysis workflows currently require manual intervention for cell phenotyping, which slows progress, demands human effort, and yields operator-dependent outputs. Here, we developed PhenoBIC, a pre-trained deep learning model for image classification of the multiplexed biomarker signals in a cell (Biomarker Imprint of a Cell) to classify cell phenotypes. We show that PhenoBIC (F1-score ~0.88) outperforms manual gating (widely used) and other machine learning-based computational approaches for cell marker expression classification. We validated this across multiple biomarkers, tissue sampling strategies (whole biopsies and tissue microarrays), multiplex panels, imaging platforms, and tissue types. We have released our in-house training and validation datasets of ~1.4 million manually curated cell expression ground truth labels. We have also open-sourced PhenoBIC and enabled its community-wide deployment via the QuPath interface.

16.
arXiv (CS.CV) 2026-06-16

Contrastive Learning for Seismic Horizon Tracking with Domain-Specific Priors

Unsupervised 3D seismic horizon tracking faces a key limitation: signal-based propagators provide accurate trace-level alignment but often fail near faults, whereas texture-driven deep models are more robust to discontinuities, typically at the cost of labeled data requirements and reduced trace-level precision. We propose a self-supervised fusion of both paradigms in which signal-derived local horizon correspondences act as domain-specific priors to train a texture-based deep learning model. Specifically, we estimate reliable trace-to-trace flows from reflector slopes and use them to form positive pairs in a contrastive objective, while restricting training to high-confidence neighborhoods, optionally augmented with a fault mask. The objective is not to infer ambiguous correspondences close to discontinuities, but to preserve horizon identity across them. As a result, the network learns voxel-wise embeddings that preserve local signal continuity while enabling horizon propagation beyond discontinuities through similarity search. Experiments on the public F3 dataset and a faulted synthetic dataset achieve lower mean absolute error (MAE) than unsupervised baselines and competitive performance against a semi-supervised method using a single labeled slice.

18.
arXiv (CS.CL) 2026-06-11

Hey Chat, Can You Teach Me? Structuring Socratic Dialogue for Human Learning in the Wild

Large language models are now widely used for everyday learning, but the underlying interactions are typically unstructured chats rather than following a curriculum. Unlike formal online learning systems, these interactions carry no prior record of the student, so any estimate of what the student already knows must be inferred from the dialogue itself. We show that this gap is not closed by scaling models alone. Frontier and education-tuned LLMs perform poorly when asked to tutor a student over an extended session, because doing so requires three things at once. The tutor must sequence a curriculum, conduct Socratic dialogue, and infer the student's knowledge state from that dialogue. We propose separating these responsibilities. Given a student query, our system constructs a prerequisite knowledge graph in which subtopics are nodes and dependencies are edges, and frames tutoring as deciding which node to teach next and how many dialogue turns to spend on it before moving on. A lightweight PPO policy handles this sequencing decision, while an LLM conducts the Socratic exchange at the chosen node and returns a signal of student progress. Across held-out STEM and non-STEM topics, our PPO-paired tutor outperforms heuristic baselines, frontier general-purpose models, and a model specialised for Socratic dialogue: on both the rate at which students reach full curriculum mastery and the number of turns required. Explicit curriculum structure delivers gains that scaling the underlying model does not.

19.
arXiv (quant-ph) 2026-06-15

Bandstructure of a coupled BEC-cavity system: effects of dissipation and geometry

arXiv:2504.17730v2 Announce Type: replace-cross Abstract: We present a theoretical model for a transversally driven Bose-Einstein condensate coupled to an optical cavity. We focus on the interplay between different coherent couplings, which can trigger a structural phase transition, known as the superradiant phase transition. Our approach, based on band structure theory and a mean-field description, enables a comprehensive analysis of the nature of the system's excited modes, precursing the phase transitions. By incorporating dissipative couplings, intrinsic to these systems, we find non-Hermitian phenomena such as the coalescence of crossing precursor modes and the emergence of exceptional points (EPs). The general formulation of our model allows us to explain the role of an angle between transverse pump and the cavity deviating from $90^\circ$. This offers us a unified perspective on the plethora of different implementations of such systems.

20.
arXiv (CS.CV) 2026-06-11

What Semantics Survive the Connector? Diagnosing VLM-to-DiT Alignment in Video Editing

Flow matching based video generative models have been increasingly relying on prepended Vision-Language Models (VLMs) to handle complex, instruction-based video editing. The prevailing assumption underlying this paradigm is that a connector module can seamlessly align the VLM's rich multi-modal reasoning with the original text embedding space of DiTs. However, we hypothesize that this alignment acts as a severe semantic bottleneck, degrading fine-grained structural variables. Verifying this is challenging, as end-to-end evaluations conflate alignment failures with generation errors, and natural datasets lack disentangled annotations. To rigorously investigate this, we propose a controlled data processing pipeline based on video composition that results in TRACE-Edit, a diagnostic dataset focusing on relation-based editing. Leveraging this dataset, we propose a comprehensive diagnostic protocol to analyze two important designs of meta-query and connector in the existing video editing models. Systematic evaluation of four representative model cases reveals that fine-grained structural semantics can be severely degraded during alignment. Our findings overturn the assumption of lossless semantic transfer, identifying the VLM-to-DiT alignment as a major bottleneck and providing a new diagnostic foundation for future multi-modal alignment architectures.

21.
arXiv (math.PR) 2026-06-11

A Hybrid LSMC-PDE Method for Bermudan Option Pricing under the Gatheral Double Mean-Reverting Model

arXiv:2606.11237v1 Announce Type: cross Abstract: We study Bermudan option pricing under the Gatheral Double Mean-Reverting (GDMR) stochastic volatility model. The model features a variance process together with a stochastic long-run mean variance process and allows Constant Elasticity of Variance (CEV)-type exponents in the diffusion coefficients. This model is attractive since it provides a flexible specification for volatility dynamics. However, the pricing of early-exercise derivatives under the GDMR model remains largely unexplored in the literature. To address this challenge, we adapt a Hybrid Least-Squares Monte Carlo-Partial Differential Equation (LSMC-PDE) framework to the GDMR model and provide a detailed model-specific implementation. Conditioning on simulated variance paths, the pricing problem reduces to a one-dimensional problem in the asset price, which is solved by a Fourier-based approach, while the remaining dependence on the variance variables is approximated by least-squares regression. Our numerical experiments demonstrate that the Hybrid LSMC-PDE approach yields accurate pricing estimates and often lower pricing errors than plain LSMC, particularly for low and moderate numbers of simulation paths, showing the benefit of using the model structure in early-exercise option pricing.

22.
arXiv (CS.LG) 2026-06-11

Open Materials Generation with Inference-Time Reinforcement Learning

arXiv:2602.00424v2 Announce Type: replace Abstract: Continuous-time generative models for crystalline materials enable inverse materials design by learning to predict stable crystal structures, but incorporating explicit target properties into the generative process remains challenging. Policy-gradient reinforcement learning (RL) provides a principled mechanism for aligning generative models with downstream objectives but typically requires access to the score, which has prevented its application to flow-based models that learn only velocity fields. We introduce Open Materials Generation with Inference-time Reinforcement Learning (OMatG-IRL), a policy-gradient RL framework that operates directly on the learned velocity fields and eliminates the need for the explicit computation of the score. OMatG-IRL leverages stochastic perturbations of the underlying generation dynamics preserving the baseline performance of the pretrained generative model while enabling exploration and policy-gradient estimation at inference time. Using OMatG-IRL, we present the first application of RL to crystal structure prediction (CSP). Our method enables effective reinforcement of an energy-based objective while preserving diversity through composition conditioning, and it achieves performance competitive with score-based RL approaches. Finally, we show that OMatG-IRL can learn time-dependent velocity-annealing schedules, enabling accurate CSP with order-of-magnitude improvements in sampling efficiency and, correspondingly, reduction in generation time. The OMatG-IRL code is included in a new release of the Open Materials Generation (OMatG) framework available at https://github.com/FERMat-ML/OMatG.

23.
arXiv (CS.LG) 2026-06-16

Reinforcement Learning-Guided Retrieval with Soft Fusion for Robust Multimodal Imitation Learning under Missing Modalities

arXiv:2606.15514v1 Announce Type: cross Abstract: Robotic systems perceive the world through multiple input modalities – including visual camera streams and natural language instructions – and must select appropriate actions based on these signals. However, assuming the permanent availability of all input devices is unrealistic, as sensors may fail, become occluded, or drop out entirely during deployment. Robust handling of such missing-modality scenarios is therefore essential for real-world robot operation. This paper introduces RL4IL, a reinforcement learning guided method for imitation learning that selects the most suitable action for a given observation by identifying the most relevant expert demonstrations from a training library. A reinforcement learning policy, trained via Proximal Policy Optimisation over Breadth-First Search candidate sets, ranks candidate demonstrations and a soft cross-attention fusion head aggregates their action signals to produce the final prediction. When a modality is missing at inference time, a dedicated per-modality RL retrieval policy identifies donor demonstrations from the training library, and a soft imputation head reconstructs the missing embedding via cross-attention over the top-ranked donors – without requiring any retraining of the system. Experiments on three LIBERO benchmark suites demonstrate that RL4IL substantially outperforms state-of-the-art imitation learning methods under sensor dropout conditions, while requiring no policy network training. The code can be found at https://github.com/h-ismkhan/Reinforcement-Learning-via-kNN-for-Robotic-Learning-with-Missing-Camera

24.
arXiv (CS.LG) 2026-06-11

RCAP: Robust, Class-Aware, Probabilistic Dynamic Dataset Pruning

arXiv:2606.11761v1 Announce Type: new Abstract: Dynamic data pruning techniques aim to reduce computational cost while minimizing information loss by periodically selecting representative subsets of input data during model training. However, existing methods often struggle to maintain strong worst-group accuracy, particularly at high pruning rates, across balanced and imbalanced datasets. To address this challenge, we propose RCAP, a Robust, Class-Aware, Probabilistic dynamic dataset pruning algorithm for classification tasks. RCAP applies a closed-form solution to estimate the fraction of samples to be included in the training subset for each individual class. This fraction is adaptively adjusted in every epoch using class-wise aggregated loss. Thereafter, it employs an adaptive sampling strategy that prioritizes samples having high loss for populating the class-wise subsets. We evaluate RCAP on six diverse datasets ranging from class-balanced to highly imbalanced using five distinct models across three training paradigms: training from scratch, transfer learning, and fine-tuning. Our approach consistently outperforms state-of-the-art dataset pruning methods, achieving superior worst-group accuracy at all pruning rates. Remarkably, with only $10\%$ data, RCAP delivers $>1\%$ improvement in performance on class-imbalanced datasets compared to full data training while providing an average $8.69\times$ speedup. The code can be accessed at https://github.com/atif-hassan/RCAP-dynamic-dataset-pruning

25.
arXiv (CS.AI) 2026-06-18

Generating Natural and Expressive Robot Gestures through Iterative Reinforcement Learning with Human Feedback using LLMs

arXiv:2606.18747v1 Announce Type: cross Abstract: Expressive gestures are essential for natural and effective communication, complementing speech when verbal cues alone are insufficient (e.g., pointing). For social robots such as the humanoid Pepper, producing natural and expressive movements is critical for improving human-robot interaction (HRI) and long-term acceptance. However, generating gestures remains challenging due to reliance on expert-authored animations, resulting in rigid behaviors that are impractical for dynamic and diverse environments. Alternatively, machine learning approaches often struggle to capture perceived naturalness, becoming increasingly challenging with more degrees of freedom. Consequently, producing expressive robot gestures requires a system that can adapt to the environment while adhering to social norms and physical constraints. Recent advances in large language models (LLMs) enable dynamic code generation, offering new opportunities for runtime gesture synthesis from natural language. In this paper, we integrate ChatGPT into the humanoid robot Pepper to generate co-speech gestures aligned with conversational output. While this baseline enables flexible gesture generation, the resulting motions are often perceived as stiff and unnatural. To address this limitation, we introduce an iterative reinforcement learning with human feedback (RLHF) system that finetunes gesture generation based on user evaluations, leveraging an iterative user study to compare Pepper's generated gestures. Our results show that RLHF improved the LLM's co-speech generative capabilities, producing more expressive, relevant and fluid movements.