Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-11

MA-DLE: Speech-based Automatic Depression Level Estimation via Memory Augmentation

Speech-based automatic estimation of depression levels is essential for enabling early detection and timely intervention, particularly in resource-constrained mental health settings. In recent years, deep learning has demonstrated impressive success across various domains, including affective computing and mental health assessment. Most existing approaches rely on RNN-based architectures (such as LSTM and GRU) to model temporal information for depression estimation. However, the extracted features often emphasize only a few adjacent speech segments, limiting their ability to capture long-range dependencies. To overcome this limitation, we introduce a memory-based feature augmentation method that enhances the representational capacity of GRU-extracted features. Rather than indiscriminately incorporating historical data, our memory bank is designed to selectively integrate two types of components in order to reduce redundancy and irrelevance: (1) historical temporal features that closely resemble the current GRU output, offering complementary contextual information; and (2) dynamic memory features identified based on feature variability, which capture behavioral and emotional fluctuations indicative of depressive symptoms. To effectively fuse the memory-augmented features with GRU outputs, we further design a Hierarchical Attention Fusion (HAF) module. Our method is evaluated on the widely used DAIC-WOZ and E-DAIC datasets, achieving state-of-the-art performance.

02.
arXiv (CS.CV) 2026-06-18

LARE: Low-Attention Region Encoding for Text-Image Retrieval

Image retrieval in crowded scenes is particularly challenging due to the salience bias of conventional visual encoders, which tend to focus on dominant objects while neglecting low-attention regions that are often crucial for fine-grained retrieval. We propose LARE (Low-Attention Region Encoding), a framework that explicitly models these overlooked regions. LARE adopts a dual-encoding strategy that encodes low-attention regions of an image and the full image in parallel, leading to more diverse and informative image embeddings. To evaluate image retrieval performance in challenging crowded scenes, we introduce Dense-Set, a challenging subset derived from COCO and Flickr30K. In this subset, images are re-captioned to provide richer descriptions of low-attention or previously overlooked regions. This dataset highlights the limitations of existing retrieval models and enables a more rigorous evaluation under densely crowded scene conditions. Experimental results demonstrate that the proposed framework improves retrieval performance by preserving subtle, non-dominant visual cues within the shared latent space.

03.
arXiv (CS.LG) 2026-06-12

Hierarchical Successor Representation for Robust Transfer

arXiv:2602.12753v2 Announce Type: replace Abstract: The successor representation (SR) provides a powerful framework for decoupling predictive dynamics from rewards, enabling rapid generalisation across reward configurations. However, the classical SR is limited by its inherent policy dependence: policies change due to ongoing learning, environmental non-stationarities, and changes in task demands, making established predictive representations obsolete. Furthermore, in topologically complex environments, SRs suffer from spectral diffusion, leading to dense and overlapping features that scale poorly. Here we propose the Hierarchical Successor Representation (HSR) for overcoming these limitations. By incorporating temporal abstractions into the construction of predictive representations, HSR learns stable state features which are robust to task-induced policy changes. Applying non-negative matrix factorisation (NMF) to the HSR yields a sparse, low-rank state representation that facilitates highly sample-efficient transfer to novel tasks in multi-compartmental environments. Further analysis reveals that HSR-NMF discovers interpretable topological structures, providing a policy-agnostic hierarchical map that effectively bridges model-free optimality and model-based flexibility. Beyond providing a useful basis for task-transfer, we show that HSR's temporally extended predictive structure can also be leveraged to drive efficient exploration, effectively scaling to large, procedurally generated environments.

04.
arXiv (CS.CV) 2026-06-11

FreqKD: Frequency-Decoupled Cross-Modal Knowledge Distillation for Infrared Object Detection

Transfer learning from large-scale RGB foundation models to infrared (IR) imagery through knowledge distillation (KD) remains challenging due to fundamental differences in image formation physics. We investigate the spectral structure of the RGB–IR modality gap and observe that feature divergence is not uniform across spatial frequencies: low-frequency components (shape, layout) show greater cross-modal alignment than high-frequency components (texture, fine edges), which reflect modality-specific characteristics. Based on this analysis, we propose FreqKD, a frequency-decoupled distillation framework that applies asymmetric supervision adapted to each band's cross-modal consistency. The method employs strict mean squared error (MSE) on the low-frequency band to preserve shared structural information and a relaxed log-MSE loss (weighted at 0.1) on the high-frequency band to provide edge guidance while tolerating texture differences. Spectral divergence analysis on 500 paired samples shows that high-frequency divergence exceeds low-frequency divergence by a factor of 2.4x on average across all analysed transformer layers. On KAIST multispectral pedestrian detection, FreqKD achieves 64.1 mAP50, improving 2.4 points over the DINOv2 baseline. The learned representation transfers across datasets (FLIR ADAS, +2.1 mAP50), tasks (MFNet segmentation, +1.85 mean intersection-over-union), and architectures (ResNet-50, +1.0 mAP50). Code is available at: https://anonymous.4open.science/r/freq_decoupled_kd-5E5A

05.
arXiv (quant-ph) 2026-06-16

Experimental realization of the complete seven-phase Anderson-localization landscape

arXiv:2606.14825v1 Announce Type: cross Abstract: Anderson localization has evolved far beyond the conventional dichotomy between extended and localized states. Modern localization theory predicts a complete transport hierarchy comprising extended, critical, and localized phases together with all coexistence phases among them, forming a seven-phase Anderson-localization landscape. Despite its fundamental importance, this hierarchy has never been experimentally realized within a single system. Here we realize the complete seven-phase Anderson-localization landscape in a one-dimensional Floquet photonic lattice. By engineering quasiperiodic hopping profiles containing inhomogeneously distributed hopping zeros, we generate critical states and enable their coexistence with extended and localized sectors. The resulting transport regimes are directly resolved through their distinct spatiotemporal dynamics, including ballistic expansion, confined critical oscillations, and persistent localization. We observe all seven phases, including the elusive triply coexisting extended-critical-localized phase, and experimentally track the phase transitions connecting them. Our results establish the first complete experimental map of the Anderson-localization landscape and provide a unified platform for investigating mobility edges, multifractality, and programmable coherent transport.

06.
arXiv (quant-ph) 2026-06-16

Witnessing Spin-Orbital Entanglement using Resonant Inelastic X-Ray Scattering

arXiv:2512.06718v2 Announce Type: replace Abstract: Entanglement plays a central role in quantum technologies, yet its characterization and control in materials remain challenging. Recent developments in spectrum-based entanglement witnesses have enabled new strategies for quantifying many-body entanglement in macroscopic materials. Here, we develop a protocol for detecting spin-orbital entanglement using experiment-accessible resonant inelastic x-ray scattering (RIXS). Central to our approach is the construction of a Hermitian generator from experimentally measurable spectra, which allows us to compute the quantum Fisher information (QFI) available in spin–orbital systems. The resulting QFI provides upper bounds for $k$-producible states and thus serves as a robust witness of spin-orbital entanglement. To account for realistic experimental limitations, we further extend our framework to include relaxed QFI bounds applicable to measurements lacking full polarization resolution.

07.
arXiv (CS.CL) 2026-06-17

Non-negative Elastic Net Decoding for Information Retrieval

Dense retrieval has become the dominant paradigm in information retrieval, in which each document is scored against a query by the inner product of their vector embeddings, and the top-$k$ documents by score are retrieved for this query. However, since each document's score depends solely on the embedding of the query and itself, the retrieval process is oblivious to the content of the entire corpus. Therefore, dense retrieval cannot avoid selecting semantically similar documents from the corpus, which may result in a non-diverse, redundant set of retrieved documents. To this end, we approach retrieval as a joint decoding problem, in which documents are selected as a set with regard to the context of the rest of the corpus. To achieve this, we propose Non-Negative elastic Net (NNN) decoding, which selects documents whose embeddings jointly reconstruct the query embedding as a sparse non-negative linear combination. Our main theoretical result establishes a strict separation between dense retrieval and NNN decoding. For any corpus, every query correctly handled by dense retrieval is also handled by NNN decoding, while on corpora containing correlated documents, NNN decoding additionally handles queries that dense retrieval cannot. Experimental results indicate that applying NNN decoding to frozen embeddings trained for inner-product scoring yields consistent improvements across several benchmarks. Moreover, we introduce an end-to-end training procedure which optimizes the embeddings for NNN decoding, producing significant performance gains surpassing in all metrics and benchmarks compared to dense retrieval. Our work establishes a new paradigm for leveraging dense embeddings in information retrieval, beyond the standard practice of inner-product scoring.

08.
arXiv (CS.LG) 2026-06-19

Evaluating deep learning models for fault diagnosis of a rotating machinery with epistemic and aleatoric uncertainty

arXiv:2412.18980v2 Announce Type: replace Abstract: Uncertainty-aware deep learning (DL) models recently gained attention in fault diagnosis as a way to promote the reliable detection of faults when out-of-distribution (OOD) data arise from unseen faults (epistemic uncertainty) or the presence of noise (aleatoric uncertainty). In this paper, we present the first comprehensive comparative study of state-of-the-art uncertainty-aware DL architectures for fault diagnosis in rotating machinery, where different scenarios affected by epistemic uncertainty and different types of aleatoric uncertainty are investigated. The selected architectures include sampling by dropout, Bayesian neural networks, and deep ensembles. Moreover, to distinguish between in-distribution and OOD data in the different scenarios two uncertainty thresholds, one of which is introduced in this paper, are alternatively applied. Our empirical findings offer guidance to practitioners and researchers who have to deploy real-world uncertainty-aware fault diagnosis systems. In particular, they reveal that, in the presence of epistemic uncertainty, all DL models are capable of effectively detecting, on average, a substantial portion of OOD data across all the scenarios. However, deep ensemble models show superior performance, independently of the uncertainty threshold used for discrimination. In the presence of aleatoric uncertainty, the noise level plays an important role. Specifically, low noise levels hinder the models' ability to effectively detect OOD data. Even in this case, however, deep ensemble models exhibit a milder degradation in performance, dominating the others. These achievements, combined with their shorter inference time, make deep ensemble architectures the preferred choice.

09.
arXiv (CS.LG) 2026-06-17

Beyond IGO-Flow: Toward Convergence Analysis of IGO in Continuous Spaces

arXiv:2606.17523v1 Announce Type: cross Abstract: Information-Geometric Optimization (IGO) provides a unified framework for black-box optimization by interpreting the adaptation of a search distribution as a natural gradient update. Despite its conceptual importance, the convergence theory of IGO remains limited: most existing results concern continuous-time idealizations such as the IGO flow, rather than discrete-time updates with non-infinitesimal learning rates. In this paper, we study discrete-time IGO in continuous spaces, formulated as natural gradient updates in the expectation-parameter coordinates of an exponential family. In particular, we analyze IGO over the multivariate Gaussian family on strongly convex quadratic objective functions. Our analysis covers a setting that simultaneously incorporates full covariance adaptation, a fixed positive learning rate, and quantile-based weights. In this setting, we prove that the covariance matrix converges to the zero matrix. We further show that the mean vector converges to the global optimum, provided that the condition number of the appropriately scaled covariance matrix is bounded at sufficiently frequent iterations. These results advance the convergence theory of IGO and help bridge the gap between the mathematical theory of IGO and practical covariance-adaptive search methods such as CMA-ES.

10.
arXiv (CS.CV) 2026-06-15

Prompt2Effect: Training-Free Image-to-Video Model Specialization via LoRA Generation

Personalizing Image-to-Video (I2V) diffusion models with specific visual effects is increasingly demanded for high-end video generation. Current practice requires training a separate Low-Rank Adaptation (LoRA) module for each effect, incurring substantial data curation and iterative optimization costs that hinder interactive control. We present Prompt2Effect, a weight-driven hypernetwork that amortizes per-effect training by directly synthesizing effect-specific LoRA weights in a single forward pass. Unlike prior hypernetworks that regress adapter weights purely from semantics, Prompt2Effect is explicitly conditioned on the frozen base model weights, grounding weight prediction in the structural geometry of each layer. Furthermore, instead of predicting raw LoRA matrices, we introduce an SVD-canonicalized parameterization that resolves factorization ambiguity and stabilizes large-scale weight synthesis. Together, these design principles enable accurate and scalable LoRA prediction for high-dimensional I2V diffusion models. Extensive experiments demonstrate that Prompt2Effect achieves on-par or superior video quality and effect alignment compared to conventional LoRA fine-tuning, while reducing the computational cost from 56 GPU training hours to 3.3 seconds of hypernetwork inference. When used as initialization for subsequent fine-tuning, our predicted weights further improve final performance and accelerate optimization by approximately 10x.

11.
arXiv (quant-ph) 2026-06-19

Discrimination of genuinely nonlocal sets without entanglement in multipartite systems

arXiv:2606.20380v1 Announce Type: new Abstract: Genuine nonlocality arises when a set of multipartite orthogonal states is locally indistinguishable under any bipartition of the subsystems. The entanglement-assisted discrimination of such genuinely nonlocal orthogonal product sets has attracted significant attention in quantum information. Based on the criterion of local irreducibility, genuine nonlocality is classified into Type I (reducible) and Type II (irreducible). We present entanglement-assisted discrimination schemes for both types of genuinely nonlocal sets that use minimal resources. For low-dimensional cases, Type I sets require only a single EPR pair, whereas Type II sets necessitate only one GHZ state. We extend these protocols to higher-dimensional systems: the discrimination of Type I sets requires only one maximally entangled state in a two-qutrit system, while that of Type II sets similarly demands a single maximally entangled state in a three-qutrit system. For $n$-partite ($n > 3$) systems, Type I sets continue to require only one maximally entangled state, whereas Type II sets necessitate just one additional EPR pair compared to their Type I counterparts. These results provide a robust framework for the efficient discrimination of genuinely nonlocal sets using minimal quantum resources.

12.
arXiv (CS.CL) 2026-06-19

Quality Over Clicks: Iterative Reinforcement Learning for Early-Stage E-Commerce Query Suggestion

Existing dialogue systems rely on query suggestion to enhance user engagement. Recent approaches mainly optimize generative models using click-through rate (CTR) models to align with user preferences. However, these methods are less effective in early-stage deployment scenarios, where click feedback is sparse and insufficient for training a reliable CTR model. To bridge this gap, we propose QualEQS, a quality-first iterative reinforcement learning framework for e-commerce query suggestion. We formalize actionable suggestion quality along three dimensions that directly affect downstream usability: answerability, factuality, and information gain. To continuously improve from online traffic without click supervision, we further propose group-level disagreement among candidate suggestions to identify ambiguous query contexts and mine hard training cases for iterative refinement. We also introduce EQS-Benchmark, a dataset of 16,949 real-world e-commerce queries for offline training and evaluation. Experiments show that our quality-based offline metrics correlate strongly with online performance, providing a practical evaluation recipe for sparse-feedback deployment. In both offline and online settings, QualEQS consistently outperforms strong baselines, yielding a 6.81% improvement in online ChatPV in a real-world enterprise-level conversational shopping assistant system.

13.
arXiv (CS.AI) 2026-06-15

An interpretable unsupervised representation learning for high precision measurement in particle physics

arXiv:2511.22246v2 Announce Type: replace-cross Abstract: Unsupervised learning has been widely applied to various tasks in particle physics. However, existing models lack precise control over their learned representations, limiting physical interpretability and hindering their use for accurate measurements. We propose the Histogram AutoEncoder (HistoAE), an unsupervised representation learning network featuring a custom histogram-based loss that enforces a physically structured latent space. Applied to silicon microstrip detectors, HistoAE learns an interpretable two-dimensional latent space corresponding to the particle's charge and impact position. After simple post-processing, it achieves a charge resolution of $0.25\,e$ and a position resolution of $3\,\mu\mathrm{m}$ on beam-test data, comparable to the conventional approach. These results demonstrate that unsupervised deep learning models can enable physically meaningful and quantitatively precise measurements. Moreover, the generative capacity of HistoAE enables straightforward extensions to fast detector simulations.

14.
arXiv (CS.AI) 2026-06-24

Lightweight Transformer Models for On-Device Fault Detection: A Benchmark Study on Resource-Constrained Deployment

Authors:

arXiv:2606.24173v1 Announce Type: cross Abstract: On-device fault detection enables real-time diagnostics without cloud dependency, but deploying machine learning models on resource-constrained hardware demands careful tradeoffs between accuracy, latency, and model size. We present a benchmark comparing traditional ML methods (Random Forest, XGBoost, SVM, Logistic Regression) against lightweight transformer architectures (DistilBERT, TinyBERT-6L, TinyBERT-4L, MobileBERT) for binary fault detection across three public datasets: NASA C-MAPSS turbofan degradation, SECOM semiconductor manufacturing, and UCI AI4I 2020 predictive maintenance. We evaluate classification performance (F1-score, AUC), model size, and CPU inference latency, and further assess INT8 dynamic quantization and a two-stage adaptive inference pipeline. Our results reveal that on well-separated sensor data (C-MAPSS), lightweight transformers match traditional ML at 87.8% F1 but at 100x the model size and 9000x the latency. TinyBERT-4L emerges as the most deployment-friendly transformer at 55 MB and 18 ms CPU latency. INT8 quantization reduces size by 25% while preserving 86.9% F1. Our adaptive pipeline, routing 97.9% of predictions through a quantized triage model and only 2.1% to a larger expert, achieves 87.6% F1 at 19.5 ms average latency. On severely imbalanced datasets (SECOM, UCI-PM), both traditional and transformer methods struggle significantly, highlighting fundamental limitations of current approaches for extreme class imbalance in fault detection. All code is publicly available.

15.
arXiv (CS.CL) 2026-06-17

The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act

Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the ancillary, paralegal tasks that most current legal-AI evaluations measure. This measurement gap is not only methodological but legal: the EU AI Act makes "appropriate accuracy" a binding requirement for high-risk AI used in the judicial domain, yet that requirement cannot acquire operational content without the very doctrinal-reasoning benchmark the field lacks.

16.
arXiv (quant-ph) 2026-06-11

Probing Quantum States over Spacetime Through Interferometry

arXiv:2507.19258v3 Announce Type: replace Abstract: Establishing a notion of the quantum state that applies consistently across space and time could be a crucial step toward formulating a relativistic quantum theory. We give an operational meaning to multipartite quantum states over arbitrary regions in spacetime through a causally agnostic measurement, a measurement scheme that can be consistently implemented independently of the causal relation between the regions. We prove that such measurements can always be implemented with interferometry, also known as the scattering circuit technique, wherein the conventional density operator, the recently developed quantum state over time (QSOT), and the process matrix formalisms smoothly merge. This framework allows for a systematic study of mixed states in the temporal setting, which turn out to be crucial for modeling quantum non-Markovianity. Based on this, we demonstrate that two different ensembles of quantum dynamics can be represented by the same QSOT, indicating that they cannot be distinguished through interferometry. Moreover, our formalism reveals a new type of spatiotemporal correlation between two quantum dynamics that originates from synchronized propagation in time under time-reversal symmetry. We show that quantum systems with such correlation can be utilized as a reference frame to distinguish certain dynamics indistinguishable under time-reversal symmetry.

17.
bioRxiv (Bioinfo) 2026-06-21

DeepCDS: Ab initio coding sequence prediction in prokaryotic short reads

Accurate coding sequence prediction in short prokaryotic metagenomic reads remains challenging due to sequence fragmentation, unknown sequence origins, and sequencing errors. Here we introduce DeepCDS, a deep learning-based ab initio coding sequence predictor trained on short prokaryotic sequences with and without simulated Illumina-like sequencing errors. DeepCDS integrates ESM-2 protein language model embeddings with nucleotide-level information to predict complete and fragmented coding sequence regions. Benchmarking on 215 phylogenetically diverse prokaryotic organisms demonstrates that DeepCDS consistently outperforms current state-of-the-art methods in coding sequence detection, start and stop codon localization, and robustness to different sequencing error profiles, while remaining operational at shorter sequence lengths than existing tools support. These findings demonstrate that protein language models capture distinct signals relevant for nucleotide-level coding sequence detection, especially at very short lengths. Ultimately, DeepCDS may help uncover the functional potential of the vast microbial diversity that remains genomically uncharacterized.

18.
arXiv (quant-ph) 2026-06-11

Superspace Concentration and Adversarial Robustness in Quantum Algorithms

arXiv:2606.11580v1 Announce Type: new Abstract: We study superspace concentration as a quantum resource, formalized through the focus measure F(\r{ho}) = {\lambda}_max(\r{ho}_super) - the largest eigenvalue of the reduced superspace state - which quantifies the capacity of a quantum system to concentrate informational weight into a preferred subspace of an extended degree-of-freedom space. We develop a complete resource-theoretic framework around this measure and validate its properties through GPU-accelerated numerical simulation. Analytic decoherence predictions are confirmed to machine precision (1.11 x 10^{-16}) for superspace dimensions dS in {2,4,8,16,32}. Focus monotonicity holds across 10,000 random states with zero violations under four focus-non-generating channels across six system configurations. Focused quantum states resist coherent unitary attacks with significantly greater resilience than standard fidelity predicts, with focus remaining above 0.9 at attack strength {\epsilon} = 0.302 versus {\epsilon} = 0.174 for fidelity. We further demonstrate that the focus measure and the U(dS)-asymmetry measure are operationally distinct: asymmetry remains near zero and provides no robustness signal under coherent and targeted attacks while focus tracks spectral concentration and remains robust until {\epsilon} > 0.3. The connection between Grover's algorithm and superspace concentration is made explicit via the identity F(|{\psi}_k>

19.
arXiv (CS.CL) 2026-06-16

Sycophancy as Material Failure under Pushback Loading: A Multi-Axis Characterization Across Three Loading Cases and up to Seventeen Material Charges

Sycophancy in LLMs is documented across 70+ papers, but expert agreement on construct boundaries remains low (ICC=.184; Ye et al., 2026). The construct fragments because behavioral classification depends on which surface form is privileged. We adopt a materials-science framing: conversation as test specimen under load, LLM-model as material charge, pushback as progressive load, stance-flip as material failure. We characterize this failure across three loading cases (debate n=1000; false-presuppositions n=3400; ethical-setting n=3400; 10-17 material charges per case; 7800 specimens total) using 14 turn-level axis-measurements spanning velocity, damage accumulation, frame-drift, brittleness, and direction stability, plus three speaker-resolved axes from an independent pipeline. The measurements are Hooke-coupled ($\sigma = E \cdot \varepsilon$ analog) and reproduce across loading cases with effects up to $|r_{rb}| = 0.35$ on debate; the sign structure adds a second pattern: the ethical-setting case inverts the velocity and accumulation blocks. Variance composition partitions into two profiles: debate is charge-dominated (brittle-fracture-like: the material grade decides), false-presuppositions and ethical-setting are topic-dominated (creep-like: the load decides); the ratios (2.03 vs 0.13/0.17) are estimator-dependent, for debate even in direction. Cross-judge reliability (GPT-4o vs Haiku 4.5) shows debate scoring is judge-robust (Cohen's $\kappa = 0.88$) while false-presupposition scoring is judge-sensitive ($\kappa = 0.36$) – a caveat single-judge benchmarks must report. This is the methodological move Ye et al.'s diagnosis calls for: a multi-axis characterization that does not depend on which surface form of the construct one privileges.

20.
arXiv (CS.LG) 2026-06-19

Diffuse AI Control on Fuzzy Tasks

arXiv:2606.08892v2 Announce Type: replace Abstract: AI models deployed in critical domains, such as AI safety research, may subtly sabotage our efforts due to misalignment. Diffuse AI Control is a subfield of AI safety concerned with mitigating risks from AI sabotage distributed over long deployment horizons (diffuse threats). These risks are particularly pernicious on fuzzy tasks, i.e. tasks which are hard to grade or require intuition. To understand diffuse threats on fuzzy tasks, we introduce a framework that considers AI control as an adversarial game between a blue team and a red team. The blue team uses a weak trusted model to construct a weak score against which they would train a strong, potentially subversive model to remove the subversion propensity if it were present. The red team then tries to find model behaviors that are rated highly by the weak score, and thus might not be trained out, but actually correspond to poor performance. We test our framework on the task of writing experimental proposals for research questions from recent ML papers. We use a language model with access to the original paper as a proxy "ground-truth" scorer. Our red team discovers subversive behaviors using multi-objective evolutionary prompt optimization. We show that Opus~4.6 can write proposals that are worse according to the ground truth proxy than those of GPT-OSS-20B, while the weak scorer rates them as highly as the best proposals from Opus 4.6. We then propose an adversarial optimization algorithm for the blue team that discovers more robust prompts for the weak model. This algorithm produces a blue team prompt that our red team optimization fails to exploit.

21.
arXiv (CS.CV) 2026-06-19

VFACamou: View-Fused Adversarial Camouflage for Environment-Adaptive Physical Evasion

Adversarial camouflage in the physical world remains highly challenging, particularly under UAV reconnaissance where targets undergo continuous geometric changes and extreme illumination variations. Existing methods either optimize 2D digital perturbations that fail to generalize to dynamic viewpoints or produce visually unnatural textures that cannot be deployed in real scenarios. Therefore, we propose an end-to-end framework for adversarial camouflage generation that automatically produces wearable adversarial patterns and maintains stable attack performance in real physical environments with changing viewpoints, poses, and lighting conditions. Our method integrates UV-volume rendering with a diffusion-based texture generator, enabling consistent appearance under varying scales, poses, and lighting conditions. To ensure environmental realism, we propose an illumination color consistency estimator that extracts dominant background attributes and guides a natural texture loss to align the generated UV texture with the surrounding environment. A multi-scale dynamic training strategy further enhances robustness against viewpoint shifts and body deformation. Extensive experiments across multiple mainstream detectors demonstrate that our method achieves strong and stable physical attack performance while maintaining high perceptual naturalness, reducing human detection rates without introducing unnatural artifacts.

22.
arXiv (CS.AI) 2026-06-17

Agentic AI-based Framework for Mitigating Premature Diagnostic Handoff and Silent Hallucination in Healthcare Applications

arXiv:2606.18068v1 Announce Type: new Abstract: Recent advances in Large Language Models (LLMs) and multi-agent systems have driven the rise of Agentic AI, showing promise for medical reasoning. However, open-ended conversational agents remain prone to two critical failure modes: premature diagnostic handoff and silent clinical hallucinations that may go undetected before reaching the patient. In this work, we propose a multi-agent framework that addresses both issues by replacing ``LLM-as-a-judge'' routing with deterministic orchestration constraints. The framework incorporates two safety mechanisms. First, a neuro-symbolic state-tracking gate enforces completeness of the OLDCARTS clinical protocol (Onset, Location, Duration, Character, Aggravating/Alleviating factors, Radiation, Timing, and Severity) by blocking diagnostic transitions until all required dimensions are collected. Second, an epistemic uncertainty quantification (UQ) gate computes semantic entropy (H) across K=5 independent diagnostic samples to identify and intercept divergent outputs before delivery. We evaluate the system using simulated patient agents powered by the llama-3.1-70b-instruct model on 150 test cases. The full architecture achieves 49.3% diagnostic precision, representing an absolute improvement of 11.3 percentage points over an unconstrained baseline. Additionally, we observe a statistically significant negative correlation (r = -0.181, p < 0.05) between OLDCARTS completeness (\sigma) and semantic entropy (H), suggesting that structured information gathering is associated with reduced diagnostic uncertainty.

23.
arXiv (CS.AI) 2026-06-18

Clin-JEPA: A Multi-Phase Co-Training Framework for Joint-Embedding Predictive Pretraining on EHR Patient Trajectories

arXiv:2605.10840v3 Announce Type: replace-cross Abstract: We present Clin-JEPA, a multi-phase co-training framework for joint-embedding predictive (JEPA) pretraining on EHR patient trajectories. JEPA architectures have enabled latent-space planning in robotics and high-quality representation learning in vision, but extending the paradigm to EHR data – to obtain a single backbone that simultaneously forecasts patient trajectories and serves diverse downstream risk-prediction tasks without per-task fine-tuning – remains an open challenge. Existing JEPA frameworks either discard the predictor after pretraining (I-JEPA, V-JEPA) or train it on a frozen pretrained encoder (V-JEPA 2-AC), leaving the encoder unaware of the rollout signal that the retained predictor must use at inference; co-training the encoder and predictor under a shared JEPA prediction objective would supply this grounding, but naïve co-training is unstable, with representation collapse and online/target drift causing autoregressive rollout to diverge. Clin-JEPA's five-phase pretraining curriculum – predictor warmup, joint refinement, EMA target alignment, hard sync, and predictor finalization – addresses each failure mode by phase, stably co-training a Qwen3-8B-based encoder and a 92M-parameter latent trajectory predictor. On MIMIC-IV ICU data, three independent evaluations support the framework: (1) latent $\ell_1$ rollout drift uniquely converges ($-$15.7%) over 48-hour horizons while baselines and ablations diverge (+3% to +4951%); (2) the encoder learns a clinically discriminative latent geometry (deteriorating-patient cohorts displace 4.83$\times$ further than stable patients in latent space, vs $\leq$2.62$\times$ for baseline encoders); (3) a single backbone outperforms strong tabular and sequence baselines on multi-task downstream evaluation. Clin-JEPA achieves mean AUROC 0.851 on ICareFM EEP and 0.883 on 8 binary risk tasks (+0.038 and +0.041 vs baseline average).

24.
arXiv (CS.LG) 2026-06-24

Stochastic Expectation Maximization for Robust State-Space Radio Interferometric Imaging

arXiv:2606.23944v1 Announce Type: cross Abstract: State–space models provide a flexible framework for analyzing dynamical systems, yet they often rely on Gaussian assumptions that fail to capture heavy-tailed or outlier-prone measurement noise. We propose a robust estimation scheme for linear state–space models subject to compound-Gaussian noise, as encountered for instance in radio interferometry affected by radio-frequency interference (RFI). The method relies on a Stochastic Approximation Expectation–Maximization (SAEM) algorithm in which the standard E-step is replaced by Monte Carlo sampling of the latent states and noise texture through closed-form Gibbs updates, enabling tractable inference despite the heavy-tailed likelihood. Numerical experiments show that the proposed method significantly improves reconstruction fidelity and robustness to RFI, outperforming a Gaussian EM algorithm and even an oracle RTS smoother. These results highlight the benefits of heavy-tailed state–space modeling and SAEM-based inference in interference-dominated imaging scenarios.

25.
arXiv (CS.CV) 2026-06-24

Does it matter which Gaussians you pick in 4D Gaussian streaming?

Anchor-driven 4D Gaussian streaming methods such as Instant Gaussian Stream (IGS) update a dynamic scene each frame from a compact set of Gaussian anchors, chosen by default with Farthest Point Sampling (FPS) at a fixed budget of $8{,}192$. Because these anchors act as control points that drive the whole scene through linear blend skinning, the rule used to choose them ought to affect reconstruction quality. We test this by holding the IGS pipeline fixed and changing only the sampler, comparing FPS, random, uniform, an opacity-scale heuristic, and a learned policy across budgets and refinement settings on N3DV and MeetingRoom. At deployment budgets the sampler has no measurable effect: a cheap random or uniform sampler at $4{,}096$ anchors matches FPS@8192 within measurement error, the default budget is over-provisioned, and the result holds on a second backbone (3DGStream). The learned policy is mixed rather than consistently better: it can improve the N3DV validation set at tight budgets, but does not give a stable cross-dataset rule, and selection is never the bottleneck because refinement dominates runtime. We will release our full sweep and evaluation protocol as a sampler benchmark.