Paper Plaza - AcademicHub

01.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2605.21850

ACC: Compiling Agent Trajectories for Long-Context Training

Authors:

Qisheng Su ↗Zhen Fang ↗Shiting Huang ↗Yu Zeng ↗Yiming Zhao ↗Kou Shi ↗Ziao Zhang ↗Lin Chen ↗Zehui Chen ↗Lijun Wu ↗Feng Zhao ↗

Recent development of agents has renewed demand for long-context reasoning capacity of LLMs. However, training LLMs for this capacity requires costly long-document curation or heuristic context synthesis. We observe that agents produce massive trajectories when solving problems, invoking tools and receiving environment observations across many turns. The evidence needed to answer the original question is thus scattered throughout these turns, requiring integration of distant context segments. Nevertheless, standard agent SFT masks tool responses and only trains turn-level tool selection, creating a supervision blind spot where these scattered signals go unused. We propose Agent Context Compilation (ACC), which converts trajectories from search, software engineering, and database querying agents into long-context QA pairs that combine the original question with tool responses and environment observations gathered across multiple turns, training the model to answer directly without tool use. This makes the dependencies between the question and the evidence explicit, enabling direct supervision of long-context reasoning over distant segments without additional annotation. ACC is a simple but effective approach that can be combined with any existing long-context extension or training method, providing scalable supervised fine-tuning data. We validate ACC on long-range dependency modeling tasks through MRCR and GraphWalks, challenging benchmarks requiring cross-turn coreference resolution and graph traversal over extended contexts. Training Qwen3-30B-A3B with ACC achieves 68.3 on MRCR (+18.1) and 77.5 on GraphWalks (+7.6), results comparable to Qwen3-235B-A22B, while preserving general capabilities on GPQA, MMLU-Pro, AIME, and IFEval. Further mechanism analysis reveals that the ACC-trained model exhibits task-adaptive attention restructuring and expert specialization.

Read & Discuss → View Source →

02.

arXiv (CS.CV) 2026-06-11 DOI: arXiv:2510.06596

SDQM: Synthetic Data Quality Metric for Object Detection Dataset Evaluation

Authors:

Ayush Zenith ↗Arnold Zumbrun ↗Neel Raut ↗Jing Lin ↗

The performance of machine learning models depends heavily on training data. The scarcity of large-scale, well-annotated datasets poses significant challenges in creating robust models. To address this, synthetic data generated through simulations and generative models has emerged as a promising solution, enhancing dataset diversity and improving the performance, reliability, and resilience of models. However, evaluating the quality of this generated data requires an effective metric. We introduce the Synthetic Dataset Quality Metric (SDQM) to assess data quality for object detection tasks without requiring model training to converge. This metric enables more efficient generation and selection of synthetic datasets, addressing a key challenge in resource-constrained object detection tasks. In our experiments, SDQM demonstrated a strong correlation with the mean average precision (mAP) scores of YOLO11, a leading object detection model, whereas previous metrics only exhibited moderate or weak correlations. In addition, it provides actionable insights into improving dataset quality, minimizing the need for costly iterative training. This scalable and efficient metric sets a new standard for evaluating synthetic data. The code for SDQM is available at https://github.com/ayushzenith/SDQM

Read & Discuss → View Source →

03.

medRxiv (Medicine) 2026-06-12 DOI: HASH:6d1904bb416e4372a23a95056eb16e50

Sociodemographic and health correlates of reimbursement authorizations for cannabis for medical purposes in Canadian veterans: A cross-sectional study linking the Life After Services Studies 2019 and Health Administrative Databases

Authors:

Kendzerska ↗Reyes ↗Poirier ↗Cull ↗Murkar ↗Saymeh ↗Belanger ↗Williams ↗Shlik ↗Jetly ↗Robillard ↗

Background Evidence on factors associated with cannabis for medical purposes (CMP) authorizations among Veterans Affairs Canada (VAC) clients remains limited and inconsistent, particularly concerning mental health and posttraumatic stress disorder (PTSD), a leading indication for use. We investigated demographic, clinical and service characteristics associated with VAC authorizations for CMP reimbursement. Method We linked VAC administrative CMP program data with responses from the 2019 Life After Services Studies cross-sectional survey of Regular Force veterans released between 1998 and 2018. Multivariable logistic regressions examined associations between CMP reimbursement (yes/no) and demographic, clinical and well-being factors, with analyses stratified by PTSD status. Results Among 1,289 respondents (weighted n=33,131), 18.4% were authorized for CMP reimbursement. Younger age (

Read & Discuss → View Source →

04.

arXiv (CS.LG) 2026-06-19 DOI: arXiv:2508.13313

Flow Matching for Efficient and Scalable Data Assimilation

Authors:

Taos Transue ↗Bohan Chen ↗So Takao ↗Bao Wang ↗

arXiv:2508.13313v4 Announce Type: replace-cross Abstract: Data assimilation (DA) estimates a dynamical system's state from noisy observations. Recent generative models like the ensemble score filter (EnSF) improve DA in high-dimensional nonlinear settings but are computationally expensive. We introduce the ensemble flow filter (EnFF), a training-free, flow matching (FM)-based framework that accelerates sampling and offers flexibility in flow design. EnFF uses Monte Carlo estimators for the marginal flow field, localized guidance for observation assimilation, and utilizes a novel flow path that exploits the Bayesian DA formulation. It generalizes classical filters such as the bootstrap particle filter and ensemble Kalman filter. Experiments on high-dimensional benchmarks demonstrate EnFF's improved cost-accuracy tradeoffs and scalability, highlighting FM's potential for efficient, scalable DA. Code is available at https://github.com/Utah-Math-Data-Science/Data-Assimilation-Flow-Matching.

Read & Discuss → View Source →

05.

arXiv (CS.LG) 2026-06-15 DOI: arXiv:2606.14139

Decoupled Latent Optimization of Diffusion Models for Full Waveform Inversion

Authors:

Chen Min ↗Zheng Ma ↗

arXiv:2606.14139v1 Announce Type: new Abstract: Full waveform inversion (FWI) recovers subsurface velocity from seismic recordings by solving a severely ill-posed, nonconvex PDE-constrained optimization. Classical regularizers stabilize the inversion but fail to reproduce realistic geological structures; recent diffusion-prior methods improve realism at the cost of a fragile trade-off between data fidelity and prior consistency. We propose Decoupled Latent Optimization (DLO), which relaxes the standard latent-optimization formulation into a quadratic-penalty objective over an auxiliary physical variable and a latent variable. The data-fidelity gradient acts in physical space, the diffusion sampler contributes only through a decoded prior sample, and the standard smoothed-velocity initialization of classical FWI is preserved. On the OpenFWI benchmark, DLO outperforms classical regularizers and existing diffusion-based methods under clean, noisy, and missing-trace acquisitions. The prior, trained on 70*70 OpenFWI models, transfers directly to the Marmousi and Overthrust benchmarks, where DLO recovers intricate fault structures and remains robust to initialization smoothing and measurement noise.

Read & Discuss → View Source →

06.

arXiv (CS.LG) 2026-06-15 DOI: arXiv:2606.13955

Smoothing Dark Areas in Molecular Latent Diffusion

Authors:

Xi Wang ↗Jiahan Li ↗Yuxuan Xia ↗Yingcheng Wu ↗Shaoyi Zheng ↗Shengjie Wang ↗

arXiv:2606.13955v1 Announce Type: new Abstract: Latent diffusion is a promising framework for scalable 3D molecular generation, but it requires a latent space that remains smooth, valid, and navigable beyond posterior samples. Existing molecular VAEs, however, are typically learned through reconstruction-based objectives, which do not guarantee such a latent space. We show that this leads to dark areas: regions of latent space that are reachable during diffusion sampling but decode to disconnected or chemically invalid molecules. Unlike in image generation, molecular decoding requires strict structural and chemical precision, so even small latent perturbations can produce catastrophic failures. We therefore propose TopVAE, a topology-optimized VAE that reduces dark areas by making the decoder internalize structural and chemical constraints during training, eliminating the need for test-time chemical correction. TopVAE greatly improves off-posterior robustness, and when paired with a standard DiT, achieves $77\%$ lower FCD-3D on QM9, the highest V&C, $52\%$ lower FCD-3D on GEOM-Drugs, and $1.29{\times}$ more stable and connected molecules on zero-shot scaffold inpainting.

Read & Discuss → View Source →

07.

arXiv (CS.LG) 2026-06-11 DOI: arXiv:2606.11490

OmniLoc: A Geometry-Aware Foundation Model for Anchor-Free UE Localization Across Diverse Indoor Environments

Authors:

Lei Chu ↗Yuning Zhang ↗Omer Gokalp Serbetci ↗Anushka Katiyar ↗Bassel Abou Ali Modad ↗Andreas F. Molisch ↗

arXiv:2606.11490v1 Announce Type: new Abstract: Indoor localization from wireless measurements remains challenging in large-scale deployments due to substantial variation in building geometry, the set of detectable access points (APs), and the heterogeneity of received signals. Existing learning-based methods often perform well only in limited settings and degrade under environmental shifts, making robust anchor-free localization across diverse indoor environments notoriously difficult. In this paper, we present OmniLoc, an environment-interactive foundation model for anchor-free user equipment localization across diverse indoor environments. To the best of our knowledge, OmniLoc is the first foundation-model-based approach built directly on wireless measurements for this task. OmniLoc is built on three key designs. First, a unified input tokenization module converts heterogeneous wireless measurements into a common representation that is more amenable to learning. Second, a geometry-aware Transformer performs AP-aware feature extraction by emphasizing dominant APs while aggregating complementary evidence from supporting APs. Third, a geometry-aware location estimation module conditions regression on geometric embeddings to produce geometrically consistent location predictions. We evaluate OmniLoc on both a large-scale in-house dataset and a public benchmark dataset. Results show that OmniLoc significantly outperforms existing methods, consistently improves existing backbones when its design components are integrated, and demonstrates strong generalization in cross-environment evaluations.

Read & Discuss → View Source →

08.

arXiv (CS.CV) 2026-06-19 DOI: arXiv:2606.19961

Addressing Detail Bottlenecks in Latent Diffusion for RGB-to-SWIR Image Translation

Authors:

Kaili Wang ↗Martin Dimitrievski ↗Jose Maria Salvador ↗Ben Stoffelen ↗David Van Hamme ↗Lore Goetschalckx ↗

Latent diffusion models (LDMs) enable efficient image-to-image translation but discard fine spatial details during compression, degrading downstream perception tasks. We identify two bottlenecks: the autoencoder, which loses spatial information, and the conditioning pathway, which further degrades the source signal through naive downsampling. We propose two lightweight, backbone-agnostic fixes: a Source-Conditioned Autoencoder (SCAE) that injects high-resolution source features into the decoder via skip connections, and a Learnable Guidance Encoder (LGE) that replaces naive downsampling with a learned conditioning signal. Evaluated on RGB-to-SWIR translation for driving scenes with two denoiser backbones (U-Net and DiT), our approach improves detection mAP by up to 2x over the latent diffusion baseline, with up to 3.4x gains on small objects (COCO-small,

Read & Discuss → View Source →

09.

arXiv (CS.LG) 2026-06-15 DOI: arXiv:2602.09258

Generalizing GNNs with Tokenized Mixture of Experts

Authors:

Xiaoguang Guo ↗Zehong Wang ↗Jiazheng Li ↗Shawn Spitzel ↗Qi Yang ↗Kaize Ding ↗Jundong Li ↗Chuxu Zhang ↗

arXiv:2602.09258v2 Announce Type: replace Abstract: Deployed graph neural networks (GNNs) are frozen at deployment yet must fit clean data, generalize under distribution shifts, and remain stable to perturbations. We show that static inference induces a fundamental tradeoff: improving stability requires reducing reliance on shift-sensitive features, leaving an irreducible worst-case generalization floor. Instance-conditional routing can break this ceiling, but is fragile because shifts can mislead routing and perturbations can make routing fluctuate. We capture these effects via two decompositions separating coverage vs selection, and base sensitivity vs fluctuation amplification. Based on these insights, we propose STEM-GNN, a pretrain-then-finetune framework with a mixture-of-experts encoder for diverse computation paths, a vector-quantized token interface to stabilize encoder-to-head signals, and a Lipschitz-regularized head to bound output amplification. Across nine node, link, and graph benchmarks, STEM-GNN achieves a stronger three-way balance, improving robustness to degree/homophily shifts and to feature/edge corruptions while remaining competitive on clean graphs.

Read & Discuss → View Source →

10.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2211.05142

Non-Markovianity-based ultrasensitive parameter estimation

Authors:

Olli Siltanen ↗

arXiv:2211.05142v2 Announce Type: replace Abstract: Accurate parameter estimation is a central task in quantum metrology and sensing, where quantum resources can provide precision beyond classical limits. In realistic settings, however, system-environment interactions lead to decoherence, reducing these strategies to their classical counterparts. Noise is typically classified as Markovian or non-Markovian, with the latter often preserving quantum coherence longer and thus supporting better metrological performance. Still, the absence of noise is generally considered ideal. In this work, we uncover a striking reversal: certain non-Markovian environments not only outperform Markovian ones - including their quantum Cramér-Rao bounds - but can also surpass the entirely noiseless case. We demonstrate these findings numerically for an all-optical setup, which is experimentally feasible and can be extended to other physical platforms. In general, our results open new avenues for noise-assisted quantum metrology beyond conventional limits.

Read & Discuss → View Source →

11.

arXiv (CS.CV) 2026-06-11 DOI: arXiv:2606.12378

Illumination-Robust Camera-Based Heart-Rate Estimation for Physiological Sensing in Robots

Authors:

Zhi Wei Xu ↗Torbj\"orn E. M. Nordling ↗

Physiological awareness is important for service, social, and assistive robots that interact with humans in everyday environments. Remote photoplethysmography (rPPG) enables non-contact heart-rate (HR) estimation from an RGB camera, making it a promising sensing modality for robot-mounted vision systems. However, illumination variation remains a major barrier to robust deployment. This paper presents an end-to-end spatial-temporal transformer framework for remote HR estimation on a new dataset with varied illumination. Our estimator integrates PRNet-based 3D face alignment, clip-level illumination augmentation, the Residual Temporal Standardization Module, and controlled hybrid temporal-frequency supervision. The training objective combines a Soft-Shifted Pearson waveform loss with a spectral Kullback-Leibler divergence loss, where a tuned weight ($\mathbf{\beta}$) controls the contribution of frequency-domain heart-rate guidance. Experiments on a static all-level mix protocol covering three illumination levels show that $\mathbf{\beta}=5$ provides the strongest result among the tested beta settings, achieving a best-run HR mean absolute error (MAE) of 0.79 bpm and an HR correlation of 0.982. Compared with the PhysFormer baseline evaluated on our dataset, our estimator reduces HR MAE by 93.6 %, while increasing HR correlation from 0.088 to 0.982, making it usable when illumination varies.

Read & Discuss → View Source →

12.

arXiv (CS.CV) 2026-06-19 DOI: arXiv:2606.19584

Language-Instructed Vision Embeddings for Controllable and Generalizable Perception

Authors:

Chengzhi Mao ↗Xudong Lin ↗Wen-Sheng Chu ↗

Vision foundation models are typically trained as static feature extractors, placing the burden of task adaptation onto large downstream models. We propose an alternative paradigm: instead of solely feeding visual features into language models, we use language itself to dynamically guide the vision encoder. Our method, Language-Instructed Vision Embeddings (LIVE), leverages language as high-level guidance to produce task-centric embeddings at inference time, removing the need for task-specific retraining. This enables the encoder to focus on contextually relevant aspects of the input, yielding more controllable and generalizable representations. Empirically, LIVE reduces visual hallucinations (+34 points on MMVP), surpasses vision-language models with orders of magnitude more parameters on visual question answering, and generalizes to unseen instructions and tasks – offering a direct path toward adaptive, instruction-driven visual intelligence.

Read & Discuss → View Source →

13.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2606.18273

Continuous Audio Thinking for Large Audio Language Models

Authors:

Gyojin Han ↗Dong-Jae Lee ↗Changho Choi ↗Jongsuk Kim ↗Junmo Kim ↗

Large audio language models (LALMs) have shown impressive capabilities on diverse audio understanding tasks, ranging from speech transcription to music analysis. However, because LALMs are typically trained to produce text-aligned responses, their hidden states are progressively shaped for text generation rather than for preserving acoustic information. As a result, the diverse acoustic content that audio carries, such as phonetic detail, prosody, sound events, affect, and pitch, is lost along the way and difficult to leverage in the response. We introduce Continuous Audio Thinking (CoAT), a framework that equips audio language models with a continuous latent workspace for organizing acoustic information prior to response generation, grounded by distillation from audio experts. Within the thinking space, the model can utilize the rich acoustic information provided by expert distillation when generating its response. Furthermore, the proposed continuous thinking block can be processed in a single prefill, so CoAT does not require additional autoregressive decoding cost over the baseline. Across three LALMs, Qwen2-Audio, Qwen2.5-Omni-7B, and Audio Flamingo~3, performance gains on a broad benchmark suite spanning audio reasoning, audio understanding, music classification, speech emotion, and speech transcription demonstrate the effectiveness of CoAT. Further analysis confirms that the auxiliary supervision propagates from the thinking positions to the model's textual responses.

Read & Discuss → View Source →

14.

arXiv (quant-ph) 2026-06-15 DOI: arXiv:2604.17583

Electromagnetic Wightman functions and vacuum densities for a brane intersecting the AdS boundary

Authors:

A. A. Saharian ↗R. M. Avagyan ↗V. F. Manukyan ↗

arXiv:2604.17583v2 Announce Type: replace-cross Abstract: We investigate the combined effects of a brane intersecting the AdS boundary and background gravitational field on the local characteristics of the electromagnetic vacuum. Two types of boundary conditions on the brane are considered, which are higher-dimensional generalizations of the perfect electric (PEC) and perfect magnetic (PMC) boundary conditions in Maxwell's electrodynamics. The brane-induced contributions to the Wightman functions of the vector potential and field tensor are explicitly extracted. Simple expressions in terms of elementary functions are provided. The behavior of the vacuum expectation values (VEVs) is mimicked by a scalar field with a negative effective mass squared determined by the radius of the AdS spacetime. The expectation values of the electric and magnetic fields squares and of the energy-momentum tensor are investigated as local characteristics of the vacuum state. The brane-induced contributions to these VEVs have opposite signs for the PEC and PMC conditions. For the PMC condition, this contribution is negative for the electric field squared and positive for the magnetic field squared. The VEV of the energy-momentum tensor has a nonzero off-diagonal component. The brane-induced vacuum energy density is positive for PMC condition, whereas the normal and parallel stresses change sign as functions of the distance from the brane. Unlike the problem involving a planar boundary in the Minkowski bulk, the vacuum energy-momentum tensor does not vanish in (3+1)-dimensional AdS spacetime.

Read & Discuss → View Source →

15.

arXiv (CS.AI) 2026-06-18 DOI: arXiv:2606.19220

Machine Unlearning for the XGBoost Model with Network Intrusion Datasets

Authors:

Diana Magalh\~aes ↗Eva Maia ↗Jo\~ao Vitorino ↗Isabel Pra\c{c}a ↗

arXiv:2606.19220v1 Announce Type: cross Abstract: Machine Unlearning (MU) has emerged as an important technique for removing specific data points from trained models without requiring full retraining. However, most existing MU research focuses on deep learning and image data, leaving a gap in the domain of network intrusion detection, which relies heavily on tabular data. This work introduces XGBoost-Forget, an unlearning approach for the XGBoost model, to address this gap. The approach is evaluated on two tabular Network Intrusion (NI) datasets, IoT-23 and GeNIS, using multiple metrics to assess model performance, unlearning efficiency, and forgetting quality. The results show that XGBoost-Forget maintains predictive performance close to the original model while providing significantly faster unlearning, demonstrating its potential for MU in tabular NI settings.

Read & Discuss → View Source →

16.

medRxiv (Medicine) 2026-06-15 DOI: HASH:986b1b38db71ba63b8d3cc4e1cee9b3c

Fanconi Anemia as a Window into Premalignant Field Cancerization of the Oral Mucosa

Authors:

Berger ↗Donovan ↗F. X ↗Lin ↗Y.-C ↗Soma ↗May ↗Bhadresha ↗Krieg ↗Giri ↗McReynolds ↗L. J ↗…

Head and neck squamous cell carcinoma (HNSCC) evolves through stepwise clonal expansion within genetically altered mucosa fields, yet actionable biomarkers remain undefined. Leveraging Fanconi anemia (FA), a cancer predisposition syndrome with extreme HNSCC risk due to defective DNA interstrand crosslink repair, we profiled premalignant changes in the oral cavity using noninvasive brush biopsies. Consistent with our prior demonstration of genomic instability in FA-associated SCCs, we detected pathogenic TP53 variants in 26% and copy number alterations in 60.5% in clinically normal-appearing oral mucosa of individuals with FA. These subclinical clonal expansions define candidate biomarkers of early clonal evolution amenable to serial sampling for risk stratification and prevention studies. Since FA-associated SCCs share genomic features with sporadic HNSCC, these findings may extend to the broader population. We also identify somatic reversion of a pathogenic FANCB variant, providing evidence of genomic self-correction and suggesting a potential avenue for gene-based cancer prevention in FA.

Read & Discuss → View Source →

17.

arXiv (CS.LG) 2026-06-19 DOI: arXiv:2507.22524

HGCN(O): A Self-Tuning GCN HyperModel Toolkit for Outcome Prediction in Event-Sequence Data

Authors:

Fang Wang ↗Paolo Ceravolo ↗Ernesto Damiani ↗

arXiv:2507.22524v3 Announce Type: replace Abstract: We propose HGCN(O), a self-tuning toolkit using Graph Convolutional Network (GCN) models for event sequence prediction. Featuring four GCN architectures (O-GCN, T-GCN, TP-GCN, TE-GCN) across the GCNConv and GraphConv layers, our toolkit integrates multiple graph representations of event sequences with different choices of node- and graph-level attributes and in temporal dependencies via edge weights, optimising prediction accuracy and stability for balanced and unbalanced datasets. Extensive experiments show that GCNConv models excel on unbalanced data, while all models perform consistently on balanced data. Experiments also confirm the superior performance of HGCN(O) over traditional approaches. Applications include Predictive Business Process Monitoring (PBPM), which predicts future events or states of a business process based on event logs.

Read & Discuss → View Source →

18.

arXiv (quant-ph) 2026-06-17 DOI: arXiv:2409.09033

Independent Chiral Control in Theory-Space Models:A Rank-Preserving Framework and Its Application to Neutrino Mass Generation

Authors:

Aadarsh Singh ↗

arXiv:2409.09033v3 Announce Type: replace-cross Abstract: We develop a general framework of rank-preserving, element-wise matrix transformations for engineering fermion mass hierarchies in theory-space constructions. We prove that preservation of massless modes requires the transformation function to be separable, $g_f(i,j)=g^{(L)}_f(i)g^{(R)}_f(j)$, which in turn enables independent control of left- and right-chiral zero-mode profiles directly at the level of the theory-space mass matrix. This formalism unifies and extends the clockwork mechanism, permits controlled deformation of Kaluza–Klein spectra, and enhances hierarchy generation in GIM-like fine-cancellation scenarios. As a concrete application, we show that in theory-space models for neutrino masses, suitable transformations allow sub-eV light neutrinos to arise from TeV-scale new physics with only $\mathcal{O}(40)$ additional fermionic sites, while remaining consistent with charged-lepton flavor-violation bounds. In contrast, the corresponding untransformed models asymptote at the MeV scale and cannot access the phenomenologically required regime without extreme field multiplicities or hierarchical parameters.

Read & Discuss → View Source →

19.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2509.15927

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

Authors:

Zhiyu Mou ↗Yiqin Lv ↗Miao Xu ↗Qi Wang ↗Yixiu Mao ↗Jinghao Chen ↗Qichen Ye ↗Chao Li ↗Rongquan Bai ↗Chuan Yu ↗Jian Xu ↗Bo Zheng ↗…

arXiv:2509.15927v5 Announce Type: replace-cross Abstract: Auto-bidding is a critical tool for advertisers to improve advertising performance. Recent progress has demonstrated that AI-Generated Bidding (AIGB), which learns a conditional generative planner from offline data, achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still face a performance bottleneck due to their inherent inability to explore beyond the static dataset with feedback. To address this, we propose AIGB-Pearl (Planning with \textbf{EvaluAtor via RL}), a novel method that integrates generative planning and policy optimization. The core of AIGB-Pearl lies in constructing a trajectory evaluator to assess the quality of generated scores and designing a provably sound KL-Lipschitz-constrained score-maximization scheme to ensure safe and efficient exploration beyond the offline dataset. A practical algorithm that incorporates the synchronous coupling technique is further developed to ensure the model regularity required by the proposed scheme. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.

Read & Discuss → View Source →

20.

arXiv (CS.CV) 2026-06-18 DOI: arXiv:2606.18960

Mem-World: Memory-Augmented Action-Conditioned World Models for Persistent Robot Manipulation

Authors:

Zirui Zheng ↗Jiaqian Yu ↗Xiongfeng Peng ↗jun shi ↗Mingyi Li ↗Chao Zhang ↗Weiming Li ↗Dong Wang ↗Huchuan Lu ↗Xu Jia ↗

Action-conditioned world models have emerged as a promising paradigm for robot learning, offering a scalable alternative to costly real-world experimentation by generating action-consistent video rollouts. However, persistent world modeling remains challenging in manipulation: frequent end-effector occlusions and rapid wrist-camera motion make the current observation insufficient for predicting future views, causing models to forget or hallucinate scene details seen in earlier frames. Existing memory retrieval strategies often fail to identify informative history in dynamic manipulation scenarios. To address this limitation, we propose Mem-World, a memory-augmented multi-view action-conditioned world model. At its core, we present W-VMem, a 4D wrist-view-centered surfel-indexed memory that anchors historical observations to temporally evolving surface elements. By explicitly modeling when and where scene elements are observed, W-VMem enables geometry-aware retrieval of relevant history frames conditioned on future actions. During generation, relevant history frames are selected via surfel-based rendering and scoring, providing informative and non-redundant context for prediction. Extensive experiments show that Mem-World generates persistent rollouts in complex manipulation scenarios, enables more reliable policy evaluation than Ctrl-World, improving the Pearson correlation with real-world performance by 14.5\%, and supports effective policy improvement through synthetic data generation, increasing success rates from 58\% to 72\% on long-horizon tasks.

Read & Discuss → View Source →

21.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.16583

Uncertainty Is Not a Safety Net for Clinical VQA, but Can It Anticipate Model Failure?

Authors:

Arnisa Fazla ↗Alberto Testoni ↗Ameen Abu-Hanna ↗Barbara Plank ↗Iacer Calixto ↗

Safe deployment of clinical vision-language models (VLMs) requires reliable uncertainty estimation (UE): a signal indicating when predictions should be trusted or escalated to a clinician. We test whether current UE methods actually deliver this signal. Benchmarking 8 methods across 12 VLMs on clinical visual question-answering (VQA), we find that UE quality is not an intrinsic property of the UE method: it tracks model accuracy, degrading precisely where the model performance is weakest, and therefore where reliability is most needed. When we stress-test models by hiding the correct option among the multiple-choice answers (NOTA perturbations), accuracy collapses while uncertainty barely changes, leaving models systematically miscalibrated. Yet, we find that uncertainty on the unperturbed input reliably anticipates which predictions will collapse under NOTA, indicating that UE in current VLMs carries diagnostic information about model fragility. Our results position UE as a diagnostic tool for identifying fragile predictions and motivate perturbation-based evaluation as a path toward safe clinical deployment.

Read & Discuss → View Source →

22.

arXiv (CS.AI) 2026-06-15 DOI: arXiv:2505.17961

Federated Causal Inference from Multi-Site Observational Data via Propensity Score Aggregation

Authors:

R\'emi Khellaf ↗Aur\'elien Bellet ↗Julie Josse ↗

arXiv:2505.17961v4 Announce Type: replace-cross Abstract: Causal inference typically assumes centralized access to individual-level data. Yet, in practice, data are often decentralized across multiple sites, making centralization infeasible due to privacy, logistical, or legal constraints. We address this problem by estimating the Average Treatment Effect (ATE) from decentralized observational data via a Federated Learning (FL) approach, allowing inference through the exchange of aggregate statistics rather than individual-level data. We propose a novel method to estimate propensity scores via a federated weighted average of local scores using Membership Weights (MW), defined as probabilities of site membership conditional on covariates. MW can be flexibly estimated with parametric or non-parametric classification models using standard FL algorithms. The resulting propensity scores are used to construct Federated Inverse Propensity Weighting (Fed-IPW) and Augmented IPW (Fed-AIPW) estimators. In contrast to meta-analysis methods, which fail when any site violates positivity, our approach exploits heterogeneity in treatment assignment across sites to improve overlap. We show that Fed-IPW and Fed-AIPW perform well under site-level heterogeneity in sample sizes, treatment mechanisms, and covariate distributions. Theoretical analysis and experiments on simulated and real-world data demonstrate clear advantages over meta-analysis and related approaches.

Read & Discuss → View Source →

23.

medRxiv (Medicine) 2026-06-15 DOI: HASH:a48ce293a7fd4e190e47329e4e2336b5

Differential DNA Methylation and Delirium After Anesthesia and Surgery

Authors:

Hogan ↗Berger ↗Kolstad ↗Madrid ↗Hsia ↗Wright ↗Devinney ↗Smith ↗Aisch ↗

Background: DNA methylation is an epigenetic modification that regulates gene expression in response to environmental exposures. We measured differential DNA methylation levels in blood before after general anesthesia and surgery in participants with and without postoperative delirium (POD) and postoperative neurocognitive disorder (PNCD). Methods: Blood sampling, delirium assessment and cognitive testing were prospectively performed at baseline before non-cardiac, non-neurologic surgery, and at 24 hours (24h) and 6 weeks (6wk) thereafter in 94 participants comprising 13 with POD and 81 without POD, and 40 with PNCD and 54 without PNCD 6wk after surgery who were matched for age and sex in the INTUIT and MADCO cohorts. DNA methylation was assessed using the Illumina Infinium MethylationEPIC Beadchip. Results: 132 differentially methylated positions (DMPs) annotated to 198 differentially methylated genes (DMGs) were identified in 94 participants 24h after surgery compared to baseline with a local false discovery rate (LFDR)

Read & Discuss → View Source →

24.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.15355

Sustainable Face Recognition on Low-Power Devices with VQ-VAE Embeddings

Authors:

Christos Chronis ↗Georgios Th. Papadopoulos ↗Iraklis Varlamis ↗

Face recognition has become a cornerstone of modern AI applications, yet conventional approaches often rely on computationally intensive models deployed in cloud environments, leading to increased network traffic, high energy consumption, and a heavy carbon footprint. This work introduces a sustainable, edge-deployable face recognition framework based on Vector-Quantized Variational Autoencoders (VQ-VAE), which generates compact and semantically rich latent representations of facial images. By leveraging the compression capacity and reconstruction quality of VQ-VAE embeddings on the edge and combining them with the power of pre-trained face embeddings in a knowledge distillation setup, our system achieves comparable accuracy to state-of-the-art face embedding models while significantly reducing memory and computation requirements on the edge, making it suitable for low-power edge devices. The integration of VQ-VAE compression minimizes network overhead while keeping the matching accuracy high by retaining only the most informative facial features in the latent space. As a result, the reconstructed images preserve the key identity characteristics, improving the robustness and overall performance of the face embeddings.

Read & Discuss → View Source →

25.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2510.09905

The Personalization Trap: How User Memory Alters Emotional Reasoning in LLMs

Authors:

Xi Fang ↗Weijie Xu ↗Yuchong Zhang ↗Stephanie Eckman ↗Scott Nickleach ↗Chandan K. Reddy ↗

When an AI assistant remembers that Sarah is a single mother working two jobs, does it interpret her stress differently than if she were a wealthy executive? As personalized AI systems increasingly incorporate long-term user memory, understanding how this memory shapes emotional reasoning is critical. We investigate how user memory affects emotional intelligence in large language models (LLMs) by evaluating 15 models on human-validated emotional intelligence tests. We find that identical scenarios paired with different user profiles produce systematically divergent emotional interpretations. Across validated user-independent emotional scenarios and diverse user profiles, systematic biases emerged in several high-performing LLMs where advantaged profiles received more accurate emotional interpretations. Moreover, LLMs demonstrate significant disparities across demographic factors in emotion reasoning and supportive recommendations tasks, indicating that personalization mechanisms can embed social hierarchies into models' emotional reasoning. These results highlight a key challenge for memory-enhanced AI: systems designed for personalization may reinforce social inequalities. To mitigate these disparities, we curate a general-purpose preference dataset designed to reduce demographic profiles' influence on emotional understanding.

Read & Discuss → View Source →

Explore the Frontier of Global Academia