Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-19

Indexed Bellman Information Complexity

Authors:

arXiv:2606.11171v2 Announce Type: replace Abstract: We develop indexed Bellman information complexity, a representation-level theory of interactive decision making centered on information indices and reference histories. The representation strips away problem-specific syntax and retains only the ingredients needed for dynamic programming and information accounting, thereby unifying the earlier framework of indexed algorithmic information ratios (AIR). On the upper-bound side, regret is controlled by Bellman supersolutions or potential identities whose gradient bracket is paid for by indexed information. Upper-confidence-bound (UCB), estimation-to-decision/decision-estimation-coefficient (E2D/DEC), and adaptive-minimax-sampling or exploration-by-optimization (AMS/EBO) methods appear as three relaxations of this same identity. On the lower-bound side, the posterior-reference trajectory supplies both the information telescope and the ghost quantile of small-regret trajectories. The resulting critical radius in the lower bound is an effective-dimension-scale quantity, as in Fano and local-prior-mass lower bounds, rather than the constant radius of a two-point Le Cam argument. The examples show that DEC is best viewed as a one-step relaxation of indexed Bellman information complexity, not as a universally tight conversion mechanism. We illustrate the framework through several applications, with particular emphasis on kernel bandits. In this setting, the active action marginal provides a concrete basis for comparing UCB, E2D, and AMS/EBO.

02.
arXiv (CS.LG) 2026-06-11

Open Materials Generation with Inference-Time Reinforcement Learning

arXiv:2602.00424v2 Announce Type: replace Abstract: Continuous-time generative models for crystalline materials enable inverse materials design by learning to predict stable crystal structures, but incorporating explicit target properties into the generative process remains challenging. Policy-gradient reinforcement learning (RL) provides a principled mechanism for aligning generative models with downstream objectives but typically requires access to the score, which has prevented its application to flow-based models that learn only velocity fields. We introduce Open Materials Generation with Inference-time Reinforcement Learning (OMatG-IRL), a policy-gradient RL framework that operates directly on the learned velocity fields and eliminates the need for the explicit computation of the score. OMatG-IRL leverages stochastic perturbations of the underlying generation dynamics preserving the baseline performance of the pretrained generative model while enabling exploration and policy-gradient estimation at inference time. Using OMatG-IRL, we present the first application of RL to crystal structure prediction (CSP). Our method enables effective reinforcement of an energy-based objective while preserving diversity through composition conditioning, and it achieves performance competitive with score-based RL approaches. Finally, we show that OMatG-IRL can learn time-dependent velocity-annealing schedules, enabling accurate CSP with order-of-magnitude improvements in sampling efficiency and, correspondingly, reduction in generation time. The OMatG-IRL code is included in a new release of the Open Materials Generation (OMatG) framework available at https://github.com/FERMat-ML/OMatG.

03.
arXiv (CS.CV) 2026-06-16

Keep It in Mind: User Centric Continual Spatial Intelligence Reasoning in Egocentric Video Streams

We introduce UCS-Bench, a dataset spanning 170+ hours of egocentric visual observations with 8.1K+ timestamped questions for diagnosing User-Centric Continual Spatial intelligence in egocentric video streams. UCS-Bench targets a new problem that emphasizes dynamic spatial reasoning, long-term memory, and their alignment with users' real-time locations. We propose DirectMe, a framework that incrementally constructs and maintains a structured spatial memory from streaming egocentric observations. DirectMe enables robust tracking and recall of object locations, all relative to the user's movement over time. By tightly coupling visual perception with memory updates and spatial reasoning, our approach supports long-horizon queries that require recalling interactions, resolving viewpoint-induced ambiguities, and adapting to dynamic scenes. Our experiments show that DirectMe significantly improves the spatial reasoning of leading multimodal LLMs; it also surpasses many spatially aware and long-form streaming video models. We hope our benchmark and solution will advance spatial intelligence research for egocentric AI assistants. Data and code are available at https://github.com/cocowy1/UCS-Bench.

04.
arXiv (CS.CV) 2026-06-17

Evaluating Synthetic Data Generation for Domain Generalization in Fetal Brain MRI Segmentation

Fetal brain tissue segmentation from magnetic resonance imaging (MRI) is crucial for studying neurodevelopment, but remains challenging due to data heterogeneity and limited annotations. Domain randomization (DR) has recently emerged as a promising strategy for single-source domain generalization by synthesizing training images with randomized artifacts, contrast, and resolution. In this work, we investigate how to maximize the out-of-domain (OOD) generalization of DR-based methods. We evaluate several synthetic data generation strategies for DR, with a particular focus on our recently proposed framework, FetalSynthSeg. We show that simple Gaussian mixture-based intensity modeling outperforms more complex physics-based simulations, and that intensity clustering (subdividing tissue classes based on intensity) improves OOD robustness. Evaluated on 348 fetal subjects from four sites spanning 0.55-3T and both T1w and T2w contrasts, FetalSynthSeg reaches state-of-the-art performance on several FeTA 2024 testing datasets (80-85 Dice score) and, for the first time, offers robust segmentation on modalities other than T2w for fetal brain segmentation (80 Dice on dHCP-T1w dataset). Compared with state-of-the-art methods such as BOUNTI, nnU-Net ensemble, and the FeTA 2024 winner, FetalSynthSeg delivers comparable or superior accuracy while maintaining strong robustness across domain shifts. Our code, model weights, and Docker image ready for easy inference are available at https://hub.docker.com/r/vzalevskyi/fetalsynthseg.

05.
arXiv (CS.CV) 2026-06-16

Chroma-gated, differentiable OKLCH interpolation: Continuous Oklab fallback for color-cast reduction

OKLCH – the cylindrical (lightness, chroma, hue) form of Ottosson's Oklab color space – is the interpolation space recommended by CSS Color 4 for gradients and color-mix(), and it is now broadly deployed. Its polar parameterization, however, casts color near the neutral axis in two ways: (1) an inter-hue detour between two chromatic endpoints that sweeps through an unintended hue (blue to yellow visibly passing through green), and (2) an off-line bow when one endpoint is achromatic. Existing remedies are uniformly two-valued – a threshold switch that fires only at an achromatic endpoint – so they address only (2); on chromatic pairs every one of them reduces to raw OKLCH, leaving the (1) inter-hue cast untreated. We introduce Continuous Oklab fallback (COFb), a one-parameter, differentiable chroma gate $w(C)=C^n/(C^n+\sigma^n)$ that continuously blends the OKLCH path toward the linear Oklab path as chroma falls. A single gate reduces the (1) cast that the two-valued family leaves untreated and unifies the handling of (1) and (2) without any endpoint test. We characterize a cast-hue trade-off frontier, adopt a default ($n=1$, the rational Michaelis-Menten form; $\sigma\approx0.19$ for a typical sRGB palette, from a normalization-independent cast-half criterion), and verify the gate's properties symbolically. At the default, COFb halves the inter-hue path detour (mean lateral deviation -49.5%, chroma-weighted hue excursion -35.5%). We also state the method's limits: on (2) alone the two-valued switch remains better, and like any Cartesian blend COFb does not preserve chroma. In deployment, COFb runs entirely in plain Oklab (a,b) to sRGB, so it serves as a fallback that delivers the same cast-reduced gradients where modern CSS color interpolation (color-mix(in oklch) and the like) is unavailable – older engines, image and video pipelines, or GPU shaders.

06.
arXiv (CS.CL) 2026-06-16

EHRNote-ChatQA: A Benchmark for Evidence-Grounded Multi-Turn Clinical Question Answering over Longitudinal Discharge Summaries

Discharge summaries are crucial clinical documents containing the context of a patient's overall hospital stay, and are routinely reviewed by medical experts for patient readmission, ongoing care, and diagnostic decision-making. When reviewing them, medical experts often must iteratively synthesize information across multiple summaries while verifying the evidence supporting each answer. Although large language models (LLMs) are increasingly explored for clinical question answering, existing benchmarks do not sufficiently reflect this setting: they often evaluate exam-style medical knowledge or focus on single-turn question answering with limited evidence-grounding evaluation. We introduce EHRNote-ChatQA, the first benchmark for evidence-grounded multi-turn clinical question answering over patients' multiple discharge summaries. Built from de-identified MIMIC-IV discharge summaries, EHRNote-ChatQA contains 967 patient-level multi-turn samples spanning one to five notes and 16,072 medical-expert-verified QA pairs (8,036 content questions, each paired with an evidence-grounding question) across eight clinical categories. The benchmark is constructed through an expert-informed pipeline combining discharge-summary structuring schema, expert-curated multi-turn QA templates, and LLM-based generation, followed by review and revision of every single QA sample by 11 medical experts. Benchmarking 22 open- and closed-source LLMs reveals several challenges, including that LLMs struggle more with evidence grounding than content answering, multi-turn errors compound across turns, and single-turn clinical QA performance does not reliably transfer to this setting. These findings establish EHRNote-ChatQA as a rigorous and practical benchmark for evaluating clinical QA systems. The dataset will be made publicly available through PhysioNet credentialed access.

07.
arXiv (CS.CL) 2026-06-16

Revisiting the Systematicity in Negation in the Era of In-Context Learning

Understanding the meaning of negated sentences remains one of the challenges for language models, even in the era of large language models (LLMs). We analyze systematicity regarding LLM understanding of negation from two perspectives: behavioral systematicity and representational systematicity. For behavioral systematicity, we confirm that through demonstrations and in-context learning, LLMs can recognize negation expressions and scope within sentences to some extent, but they fail to achieve perfect performance. In particular, the difficulty of the negation scope recognition for models varies depending on the output format. For representational systematicity, we analyze the extent to which function vectors can be robustly constructed from in-context examples for tasks that are essential to understanding negation. The experiments suggest that while function vectors can be composed for negation cue extraction tasks, extracting function vectors for recognizing scope is more challenging.

08.
medRxiv (Medicine) 2026-06-11

Impact of Out-Migration and Remittances on Food Consumption Outcomes among Rural Households in Tigray, Ethiopia

Authors:

This study examines the effects of rural out-migration and remittance inflows on food consumption outcomes among rural households in the Tigray region of Ethiopia. Utilizing household survey data collected from 521 rural households across three distinct Weredas (districts) (Tahtay Maichew, Kola Tembien, and Kilte-awlaelo). A Binary Probit model was employed to identify factors influencing migration decisions, while an Endogenous Switching Regression (ESR) model was used to estimate the impact of migration on food consumption outcomes while controlling for selection bias and unobserved heterogeneity. Food security was measured using the Food Consumption Score (FCS) and dietary diversity indicators. The empirical results reveal that severe food insecurity is widespread, with over 60% of all surveyed households falling into the "Poor" food consumption category. Descriptive baseline comparisons show that migration and remittance transfers marginally shift the raw average FCS upward from 23.86 to 25.48. However, this impact is profoundly nuanced: remittances serve as an immediate consumption-smoothing safety net but run parallel to a "labor-lost" constraint that reduces own-production capacities, forcing households to rely increasingly on market purchases for staple foods. The findings reveal that migration creates short-term labor shortages in agricultural production; however, remittance inflows substantially improve household food consumption frequencies, particularly for pulses, vegetables, and other nutrient-rich foods. After accounting for self-selection bias and unobserved traits, the rigorous ESR estimates indicate that migration increases the Food Consumption Score of participating households by an average Treatment Effect on the Treated (ATT) of 10.75 points, shifting them into more secure dietary tiers. Moreover, remittances help households mitigate the adverse effects of drought and other shocks by relaxing liquidity constraints and supporting both food purchases and agricultural investments. The study recommends establishing target food security safety nets for non-remittance households, promoting scale-appropriate labor-saving agricultural technologies, expanding traditional communal labor-sharing innovations, and boosting irrigation and agricultural input support programs to enhance rural food security and livelihood resilience.

09.
arXiv (CS.LG) 2026-06-11

Program Evaluation with Remotely Sensed Outcomes

arXiv:2411.10959v5 Announce Type: replace-cross Abstract: We study causal inference in experiments and quasi-experiments, where the economic outcome is imperfectly measured by a remotely sensed variable. The remotely sensed variable is low-cost, scalable, and predictive of the economic outcome in observational data; examples include satellite imagery and mobile phone activity. We model the remotely sensed variable as post-outcome: variation in the economic outcome causes variation in the remotely sensed variable. For example, changes in environmental quality cause changes in satellite imagery, not vice versa. Under this assumption, we propose a formula to nonparametrically identify the causal parameter by combining experimental and observational data. We develop a method for n^{-1/2} inference that is robust to misspecification and that does not restrict the algorithms used to process remotely sensed variables.

10.
arXiv (CS.CV) 2026-06-12

IterCAD: An Iterative Multimodal Agent for Visually-Grounded CAD Generation and Editing

Computer-Aided Design is pivotal in modern manufacturing, yet existing automated methods predominantly rely on open-loop, one-shot generation, creating a mismatch with iterative real-world practices. In this paper, we present IterCAD, a unified multimodal agent framework for closed-loop, interactive CAD generation and editing. We formulate the task as a multi-turn interaction between a multimodal agent and an executable CAD sandbox, covering three tasks: Drawing-to-Code, Text-to-Code, and Interactive Editing. To support this, we develop a data synthesis pipeline incorporating advanced industrial manufacturing features to generate standard-compliant multi-view engineering drawings, complex code-editing tasks, and high-fidelity interaction trajectories. We optimize the agent via progressive SFT followed by geometry-aware reinforcement learning with viable-prefix masking to enhance code executability and geometric fidelity. Finally, we introduce the IterCAD-Bench evaluation suite and propose the Chamfer Distance Tolerance-Recall (CD-TR) curve alongside its AUC-TR metric, establishing a survivor-bias-free standard that unifies code validity and geometric precision. Extensive experiments demonstrate that IterCAD achieves highly competitive performance across multiple benchmarks, significantly outperforming existing approaches in both code executability and geometric precision, while exhibiting superior capabilities in closed-loop iterative refinement.

11.
arXiv (CS.CL) 2026-06-17

Environment-Grounded Automated Prompt Optimization for LLM Game Agents

LLM agents in interactive environments are highly sensitive to their prompts, yet prompt engineering remains a manual, task-specific process. We introduce an automated prompt optimization framework for LLM agents that decomposes the observation-to-action pipeline into a goal-conditioned descriptor agent and an action selection agent, and iteratively refines each module's prompt through an LLM-driven evolutionary loop guided by environment returns. We propose a behavior analyzer to attribute episode outcomes to specific prompt components, and a mutator to propose targeted revisions to the prompt, before validating them through environment rollouts. We evaluate on all five BabyAI tasks in the BALROG benchmark, comparing our pipeline against BALROG's RobustCoTAgent under both plain and guided prompt initializations. Optimization improves performance consistently across tasks and conditions, without requiring updates to the model weights. On PutNext, a multi-step coordination task where the RobustCoTAgent achieves 0% success, our framework reaches up to 72.5% success rate using the same underlying LLM with optimized prompts. These results suggest that a multi-agent framework, combined with automatic prompt optimization, enhances LLMs without the need for fine-tuning or extensive human supervision.

12.
arXiv (CS.CV) 2026-06-16

Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling

Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal understanding. We argue that the field should move beyond appearance synthesis toward intelligent visual generation: plausible visuals grounded in structure, dynamics, domain knowledge, and causal relations. To frame this shift, we introduce a five-level taxonomy: Atomic Generation, Conditional Generation, In-Context Generation, Agentic Generation, and World-Modeling Generation, progressing from passive renderers to interactive, agentic, world-aware generators. We analyze key technical drivers, including flow matching, unified understanding-and-generation models, improved visual representations, post-training, reward modeling, data curation, synthetic data distillation, and sampling acceleration. We further show that current evaluations often overestimate progress by emphasizing perceptual quality while missing structural, temporal, and causal failures. By combining benchmark review, in-the-wild stress tests, and expert-constrained case studies, this roadmap offers a capability-centered lens for understanding, evaluating, and advancing the next generation of intelligent visual generation systems.

13.
medRxiv (Medicine) 2026-06-12

Estimating the effectiveness of syndromic screening at airports for Bundibugyo ebolavirus disease

Authors:

We used a stochastic simulation model to estimate the effectiveness of combined exit and entry airport screening for Bundibugyo ebolavirus disease (BVD), using natural-history parameters from a Bayesian re-analysis of the 2012 Isiro outbreak. For a 12-hour international flight from DRC or Uganda at 86% screening sensitivity, we estimate 65% of infected travellers would arrive undetected (95% CrI: 38 - 76%). The main driver of this outcome is the relative duration of the the incubation period (approximately 7.7 days) and the onset-to-severe-disease interval (approximately 4 days): most infected travellers board before symptom onset and are undetectable by any syndromic screen, whilst those who are symptomatic progress rapidly to illness severe enough to preclude travel. This is compounded during active epidemic growth, when recently exposed (and therefore pre-symptomatic) cases are overrepresented among travellers. Syndromic airport screening offers limited protection against BVD spread via air travel, and should be complemented by outbreak control at source and strengthened clinical surveillance in receiving countries with high travel connectivity to affected areas.

14.
arXiv (CS.CV) 2026-06-16

LentiAvatar: Pseudo-Multiview Reconstruction and Subpixel Prism Rendering for Real-Time Stereoscopic Communication

Real-time stereoscopic video communication has long been a goal of immersive telepresence, yet practical systems still require specialized capture rigs or reduce remote users to a single portrait view. We present LentiAvatar, a Gaussian head-avatar system that connects monocular avatar capture with subpixel-encoded glasses-free lenticular display for real-time autostereoscopic communication. From a monocular portrait video, LentiAvatar reconstructs a controllable head avatar and optimizes it for the lateral viewing zones induced by the display. The method uses natural head turns as pseudo-multiview (PMV) supervision to constrain regions that are otherwise weakly observed in monocular training, including hair, ears, jaw contours, and neck boundaries. Reliable side frames are yaw-binned, aligned to virtual cameras, and supervised within a strict head-and-hair domain; contour-aware losses and staged regularization further suppress ghosting, alpha leakage, and depth instability while preserving lateral detail. At runtime, LentiAvatar renders 32 virtual views and encodes them into a 4K lenticular raster with calibrated subpixel-routing masks. The live-tracker prototype sustains 10.65 FPS, and a subject-specific distilled driver raises the same display pipeline to 38.49 FPS.

15.
arXiv (CS.AI) 2026-06-12

Rarity-Gated Context Conditioning for Offline Imitation Learning-Based Maritime Anomaly Detection

arXiv:2606.13311v1 Announce Type: cross Abstract: Contextual anomaly detection aims to identify abnormal behavior conditional on context variables, but practical deployments often face highly imbalanced context distributions where rare regimes can be critical information. Under such frequency bias, context-conditioned models can produce unstable decisions and excessive false alarms in rare contexts. We propose Rarity-Gated Feature-wise Linear Modulation (RGFiLM), a rarity-aware conditioning module that combines feature-wise modulation (i.e., context-conditioned scaling and shifting of hidden features) with a gate controlled by a data-driven rarity score. The rarity score is estimated from the empirical distribution of context variables and regulates how strongly context modulates intermediate representations: the gate becomes more decisive under rare contexts while remaining conservative under frequent contexts. We evaluate RGFiLM on maritime trajectory anomaly detection using AIS motion sequences with ERA5 environmental context in an environment-sensitive detour scenario. When instantiated in a sequential anomaly scoring pipeline, RGFiLM achieves the best mean F1–False Positive Rate (FPR) trade-off among the compared context-agnostic and context-conditioned methods. These results suggest that explicitly accounting for context rarity is an effective approach for reducing false alarms in context-sensitive anomaly detection.

16.
arXiv (CS.LG) 2026-06-16

Incentives and Evidence in Learned Service Orchestration

arXiv:2606.16555v1 Announce Type: cross Abstract: Reinforcement learning for service orchestration has been the subject of sustained research for over a decade, yet it is not used in production at scale. The usual explanation is that learned controllers degrade under delayed and noisy telemetry, workload shifts, and uncontrolled tenants. We test whether existing evidence supports that explanation. We evaluate three highly influential RL-based orchestration systems spanning resource allocation, DAG scheduling, and autoscaling, using pre-registered predictions about comparative degradation under production-relevant perturbations and paired inference with family-wise error correction. Across the tests, most predicted performance reversals do not occur. Diagnostic analyses show that these outcomes often reflect comparator collapse, artefact limitations, or evaluation choices rather than evidence that learned controllers tolerate the perturbations. One apparent advantage under observation lag is roughly fortyfold compared to a Kubernetes HPA-equivalent controller. Another widely cited result cannot be reconstructed from its released artefact, and the strongest reproducible margin is far smaller than the published results. Conclusions also reverse under changes in perturbation magnitude and evaluation mode. Based on these results and broader patterns in the literature, we identify an institutional problem. Publication and review incentives favour benchmark gains against convenient comparators, even when those gains provide little evidence of deployment performance. We argue that the problem is not solely technical. Rather, it is institutional, so learned orchestration needs production-grade comparators, registered perturbation models, separate operational metrics, and publication criteria that reward reproducible operational evidence. Without these changes, the literature can grow without establishing whether learning improves orchestration.

17.
arXiv (quant-ph) 2026-06-17

Projected logical ensembles in surface codes via the random-matrix theory of quantum dots

arXiv:2606.17140v1 Announce Type: new Abstract: Measurements underpin active quantum error correction (QEC) and have been recognized as a source of novel measurement-induced many-body phenomena. Here, we study the statistical properties of post-measurement logical states arising in QEC on topological codes subject to deterministic transversal unitary gates. Upon syndrome extraction followed by maximum-likelihood decoding, a Born-weighted ensemble arises which we dub the "projected logical ensemble" (PLE). Focusing on surface codes subject to uniform single-qubit Pauli-$X$ rotations, we characterize the measurement-induced randomness of the PLE. To this end, we show that for a code with a single logical qubit, the PLE is isomorphic to an ensemble of scattering matrices describing mesoscopic quantum dots obtained from a 2D Majorana network model with suitable boundary conditions. We uncover regimes where these quantum dots are chaotic such that their scattering matrices are well-described by random matrix theory. In these regimes, the PLE approaches a universal ensemble that is maximally random up to symmetry and decoder-induced constraints. The symmetry constraints, set by stabilizer and logical operator weights, realize Altland-Zirnbauer classes D or DIII, which we both illustrate. Our results establish a fundamental connection between emergent universality concepts in mesoscopic physics, quantum many-body systems, and QEC.

18.
arXiv (CS.CL) 2026-06-18

Efficient Financial Language Understanding via Distillation with Synthetic Data

Large instruction-following models are powerful but costly to deploy, particularly in finance, where labelled data are limited by confidentiality and expert annotation cost. We present an efficient framework for financial sentiment analysis through distillation with synthetic data, transferring knowledge from a large instruction-tuned teacher to compact student models. The framework is designed for low-resource conditions, where a small set of real examples are collected and labelled by hand. The framework then clusters the examples and uses the clusters to select seeds for generating synthetic examples via structured few-shot prompting. Experiments show that clustering-based seed selection yields more representative synthetic data than random sampling, enabling compact models to achieve strong performance with minimal supervision. Notably, on a more complex and noisy text domain, the compact model trained on the complete synthetic-seed corpus even outperforms the teacher model, while remaining competitive on formal text. The framework provides a practical route toward resource-efficient domain adaptation in financial NLP with minimal human labelling effort.

19.
PLOS Computational Biology 2026-06-01

A statistical framework for comparing epidemic forests

Authors:

by Cyril Geismar, Peter J. White, Anne Cori, Thibaut Jombart Inferring who infected whom in an outbreak is essential for characterising transmission dynamics and guiding public health interventions. However, this task is challenging due to limited surveillance data and the complexity of immunological and social interactions. Instead of a single definitive transmission tree, epidemiologists often consider multiple plausible trees forming epidemic forests. Various inference methods and assumptions can yield different epidemic forests, yet no formal test exists to assess whether these differences are statistically significant. We propose such a framework using a chi-square test and permutational multivariate analysis of variance (PERMANOVA). We assessed each method’s ability to distinguish simulated epidemic forests generated under different offspring distributions. While both methods achieved perfect specificity for forests with 100+ trees, PERMANOVA consistently outperformed the chi-square test in sensitivity across all epidemic and forest sizes. Implemented in the R package mixtree, we provide the first statistical framework to robustly compare epidemic forests.

20.
arXiv (CS.CL) 2026-06-16

StagePilot: Stage-Level Planning for Long-Horizon Dialogue Simulation in Cybergrooming

Cybergrooming is an evolving threat to youth, requiring proactive educational interventions. We address this by modeling dialogue progression as a structured planning problem over stage-wise interactions. We propose StagePilot, a dialogue framework that separates stage-level planning from response generation, in which the model selects the next stage under constrained transitions and generates responses conditioned on it, enabling coherent and realistic progression. Reinforcement learning is used to learn stage-level policies from offline data, optimizing for both emotional alignment and goal-consistent progression. Our empirical experiments show that StagePilot generates more structured, coherent dialogue trajectories and reduces conversational stagnation compared to baselines; notably, the IQL+AWAC variant reaches the final stage more often while maintaining over 70% positive or neutral responses, yielding a 43% relative improvement.

21.
arXiv (CS.AI) 2026-06-15

From Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-Spoofing

arXiv:2606.14639v1 Announce Type: cross Abstract: Recent advances in speech generation have significantly improved the naturalness of synthetic speech, making spoofing detection increasingly challenging. A key limitation of current anti-spoofing systems is their limited robustness to unseen synthesis methods. In this work, we transform a self-supervised speech representation model into a Mixture-of-Experts (MoE) architecture to improve generalization. Feed-forward blocks in selected encoder layers are replaced by multiple expert networks controlled by a layer-wise gating mechanism, allowing experts to capture complementary acoustic patterns while preserving the representations learned during self-supervised pretraining. We further analyze the architectural choices affecting the performance of this MoE conversion and investigate the activation behavior of the experts. The proposed approach is evaluated on 14 spoofing datasets and reduces the macro EER from 5.46% to 4.81%, corresponding to 11.9% relative improvement over the baseline.

22.
medRxiv (Medicine) 2026-06-22

Accounting for uncertainty in the expected treatment effect substantially increases the sample size required for randomised trials: implications for the feasibility of clinical trials in anaesthesia and critical care

Background Multicentre trials in anaesthesia and critical care report low rates of statistically significant differences. This finding may partly reflect conventional sample size methods, which assume a fixed treatment effect. Assurance methods use a design prior to represent uncertainty in the expected treatment effect, which may provide a more realistic way of estimating sample sizes. Methods We calculated power curves across a range of effect sizes, design priors, and sample sizes using frequentist and Bayesian assurance methods and compared the sample sizes required to achieve 80% and 90% power to the conventional method. We standardised the design priors across effect sizes using the coefficient of variation. We derived a theoretical limit for achievable power. We validated a normal approximation to the Bayesian posterior distribution. Results Frequentist and Bayesian assurance methods produced similar power curves across all scenarios. At a coefficient of variation of 0.5 - reflecting realistic prior uncertainty in the expected effect size - both methods required sample sizes that were approximately 1.5 to 3.5 times larger than the conventional method. The theoretical power limit depends only on the coefficient of variation of the design prior and holds true across all effect sizes. The normal approximation to the Bayesian posterior distribution matched the results obtained from Markov chain Monte Carlo sampling. Conclusions Incorporating clinical uncertainty in the expected effect size substantially increases the sample size required to achieve adequate power, which has important implications for the feasibility of randomised trials in anaesthesia and critical care.

23.
arXiv (CS.AI) 2026-06-18

Bayesian Anytime Pareto Set Identification for Multi-Objective Multi-Armed Bandits

arXiv:2606.18785v1 Announce Type: cross Abstract: Identifying Pareto optimal solutions is critical to support multi-objective decision-making. We introduce the first anytime Multi-Objective Multi-Armed Bandit algorithm for the Pareto Set Identification problem, taking a Bayesian approach: Top-Two Pareto Front Thompson Sampling (TTPFTS). We benchmark TTPFTS against state-of-the-art fixed-budget Pareto Set Identification algorithms on synthetic environments. Next, we demonstrate its practical utility in a challenging multi-objective molecular discovery setting by efficiently exploring an ultra-large synthesis-on-demand molecular library. Furthermore, we introduce a novel uncertainty quantification metric that estimates our algorithm's confidence in the predicted Pareto set. We demonstrate that this metric effectively proxies true performance, yielding a robust methodology for monitoring learning progress in complex settings. Finally, we complement these empirical findings with a theoretical proof of the algorithm's asymptotic correctness.

24.
arXiv (quant-ph) 2026-06-16

Boson Sampling as a Probe of Chaotic and Integrable Quantum Dynamics in a Photonic Chip

arXiv:2605.25398v2 Announce Type: replace Abstract: Quantum chaos plays a key role in understanding complex quantum dynamics, while integrated photonics offers unique advantages for quantum applications, including high-speed operation, scalability, and programmable unitary transformations. However, integrated photonic approaches to probing quantum chaos remain largely unexplored, owing to the absence of a clear connection between programmable photonic dynamics and established chaos diagnostics. In this work, we establish Fock-state boson sampling as a practical probe of quantum chaos by exploiting the sensitivity of multiphoton interference to the random-matrix properties of underlying single-particle unitary dynamics. More importantly, we design and fabricate a programmable quantum photonic chip to experimentally implement this framework, achieving the first integrated-photonic demonstration of quantum-chaos probes based on boson sampling. Experimental results show that the three complementary probes proposed in this work, namely the distance to Porter–Thomas statistics, Shannon entropy, and Out-of-Time-Ordered-Correlator-equivalent observables, exhibit close agreement with theoretical predictions and consistently distinguish chaotic and integrable dynamics. Our work provides a scalable route for investigating complex quantum dynamics on programmable photonic platforms while leveraging the intrinsic advantages of boson sampling through multiphoton interference and complex output statistics.

25.
arXiv (quant-ph) 2026-06-12

Quantum Reference Fields Transformations in Linearized Quantum Gravity

arXiv:2606.09344v1 Announce Type: cross Abstract: Diffeomorphism invariance is a central feature of general relativity. Without external reference structures, matter and geometry must be specified relationally, with respect to internal subsystems serving as reference frames. In quantum gravity, these reference systems must themselves be treated as quantum, motivating the use of quantum reference frames. In this work, we address how such a relational description could be formulated within linearized quantum gravity. To this purpose, we introduce quantum reference fields, i.e. sets of four dynamical scalar fields whose stress-energy tensors enter the gravitational constraints. These fields extend the notion of quantum reference frames to local field-theoretic reference systems, allowing matter and gravitational degrees of freedom to be described relationally with respect to physical quantum systems. By generalizing the perspective-neutral construction of quantum reference frames, we show that relational, gauge invariant observables admit reduced descriptions in the perspective of each quantum reference field, and we derive the unitary transformations relating them. The resulting unitary maps implement local quantum coordinate changes between different internal perspectives, and act on the linearized gravitational field with an analogous structure to a linearized diffeomorphism, but with the classical gauge parameter replaced by a physical quantum field. Finally, we construct a relational von Neumann-type measurement scheme, showing how the corresponding reduced observables can be accessed operationally from the perspective of a quantum reference field.