Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-22

Longitudinal multi-omics characterization of the malignant evolution in multirelapsing glioblastoma

Linking glioblastoma (GBM) evolution to clinical progression is challenged by multiple factors, including tumor location for repeated sample collection, and short patient survival. In a single individual, we collected and analysed samples from 11 operations distributed across 31 months of multi-relapsing and multifocal GBM, including terminal leptomeningeal progression. All samples shared genomic ancestry of the retinoblastoma protein 1 (RB1) and neurofibromin 1 (NF1) mutations while advanced progression and extracranial metastases featured mutations of tuberous sclerosis complex 2 (TSC2), PBRM1, CD22 and Fanconi anemia supplementation group I (FANCI), correlated with clinical resistance to immunotherapies and DNA-damaging agents. Single-cell analytics revealed distinct yet reversible shifts in response to the precision medicine arsenal. GBM parenchymal dissemination and extracranial progression were associated with strengthening of neuron-like cell phenotypes. Our multidimensional study describes GBM evolution over a rarely reported time scale, and provides a valuable resource linking genetic, molecular, cellular and clinical progressions.

02.
arXiv (quant-ph) 2026-06-24

Active interference suppression in frequency-division-multiplexed quantum gates via off-resonant microwave tones

arXiv:2601.14547v3 Announce Type: replace Abstract: The increasing number of control lines connecting quantum processors to external electronics constitutes a major bottleneck in the realization of large-scale quantum computers. Frequency-division multiplexing is expected to enable control of multiple qubits through a single microwave cable; however, interference from off-resonant microwave tones hinders precise qubit control. Here, we propose an active interference suppression method for frequency-division-multiplexed simultaneous gates on microwave-controlled qubits. We demonstrate that the deliberate incorporation of off-resonant microwave tones improves single-qubit gate fidelity. In particular, the gate infidelity scales inversely with the square of the number of microwave tones when off-resonant orthogonal or quasi-orthogonal tones are incorporated. Furthermore, we show that fast oscillations, neglected under the rotating wave approximation, degrade the gate fidelity, and that this degradation can be mitigated through optimized frequency allocation. The proposed approach is simple and effective for improving the performance of frequency-division-multiplexed quantum gates.

03.
arXiv (quant-ph) 2026-06-12

Non-Hermitian skin effect induced by spatial noncommutativity

arXiv:2606.12961v1 Announce Type: new Abstract: In all known schemes for the non-Hermitian skin effect, the non-Hermitian ingredient that drives the skin localization, whether asymmetric hopping or gain and loss, is invariably introduced by hand as an independent model parameter along the skin direction. Here we show that when two spatial coordinates do not commute, the skin effect can break free of this paradigm: a gain-loss potential applied along one coordinate automatically generates non-reciprocity along the other through the coordinate noncommutativity, driving all eigenstates to pile up exponentially at a boundary. We term this phenomenon the noncommutative skin effect. The inverse skin length is proportional to the noncommutativity parameter and is given by an analytic formula, exact in the thermodynamic limit and verified by exact diagonalization of lattice models; the reflection symmetry of the imaginary potential furnishes an exact criterion for the presence or absence of the effect, valid rigorously for finite-size systems. For a sinusoidal imaginary potential, the skin direction of all eigenstates flips collectively at parameter points fixed purely by geometry. Because the flip point is independent of the potential strength, the reversal constitutes a zero-crossing measurement scheme intrinsically robust against systematic errors, from which the noncommutativity parameter can be extracted directly. The qualitative transition of the eigenstates from uniform to exponentially localized renders the effect a nonperturbative probe of spatial noncommutativity, and the Peierls-phase structure of its lattice model is in principle accessible to cold-atom synthetic dimensions, photonic resonators, and topolectrical circuits.

04.
arXiv (quant-ph) 2026-06-15

Tensor network manifolds and Riemannian fundamental theorem for tensor networks

arXiv:2606.14613v1 Announce Type: cross Abstract: Tensor networks provide a powerful framework for efficiently representing high-dimensional data and many-body quantum states. Endowing tensor networks with a Riemannian manifold structure provides a natural setting for numerical optimization and analysis. A central feature of tensor networks is their gauge freedom, whose characterisation (captured by so-called fundamental theorems) underlies both their intrinsic structure and the design of numerical algorithms. In this work, we study the interaction between the Riemannian manifold structure and the gauge freedom for several families of tensor networks. Using group actions and Riemannian submersions, we establish a Riemannian fundamental theorem for the tensor network families studied.

05.
arXiv (CS.CV) 2026-06-17

FUSER: Feed-Forward MUltiview 3D Registration Transformer and SE(3)$^N$ Diffusion Refinement

Registration of multiview point clouds conventionally relies on extensive pairwise matching to build a pose graph for global synchronization, which is computationally expensive and inherently ill-posed without holistic geometric constraints. This paper proposes FUSER, the first feed-forward multiview registration transformer that jointly processes all scans in a unified, compact latent space to directly predict global poses without any pairwise estimation. To maintain tractability, FUSER encodes each scan into low-resolution superpoint features via a sparse 3D CNN that preserves absolute translation cues, and performs efficient intra- and inter-scan reasoning through a Geometric Alternating Attention module. Particularly, we transfer 2D attention priors from off-the-shelf foundation models to enhance 3D feature interaction and geometric consistency. Building upon FUSER, we further introduce FUSER-DF, an SE(3)$^N$ diffusion refinement framework to correct FUSER's estimates via denoising in the joint SE(3)$^N$ space. FUSER acts as a surrogate multiview registration model to construct the denoiser, and a prior-conditioned SE(3)$^N$ variational lower bound is derived for denoising supervision. Extensive experiments on 3DMatch, ScanNet and ArkitScenes demonstrate that our approach achieves the superior registration accuracy and outstanding computational efficiency.

06.
arXiv (CS.LG) 2026-06-11

RCAP: Robust, Class-Aware, Probabilistic Dynamic Dataset Pruning

arXiv:2606.11761v1 Announce Type: new Abstract: Dynamic data pruning techniques aim to reduce computational cost while minimizing information loss by periodically selecting representative subsets of input data during model training. However, existing methods often struggle to maintain strong worst-group accuracy, particularly at high pruning rates, across balanced and imbalanced datasets. To address this challenge, we propose RCAP, a Robust, Class-Aware, Probabilistic dynamic dataset pruning algorithm for classification tasks. RCAP applies a closed-form solution to estimate the fraction of samples to be included in the training subset for each individual class. This fraction is adaptively adjusted in every epoch using class-wise aggregated loss. Thereafter, it employs an adaptive sampling strategy that prioritizes samples having high loss for populating the class-wise subsets. We evaluate RCAP on six diverse datasets ranging from class-balanced to highly imbalanced using five distinct models across three training paradigms: training from scratch, transfer learning, and fine-tuning. Our approach consistently outperforms state-of-the-art dataset pruning methods, achieving superior worst-group accuracy at all pruning rates. Remarkably, with only $10\%$ data, RCAP delivers $>1\%$ improvement in performance on class-imbalanced datasets compared to full data training while providing an average $8.69\times$ speedup. The code can be accessed at https://github.com/atif-hassan/RCAP-dynamic-dataset-pruning

07.
arXiv (CS.AI) 2026-06-11

From Architecture to Output: Structural Origins of Hallucination in Large Language Models and the Amplifying Role of Data

arXiv:2606.07537v1 Announce Type: cross Abstract: Large language models hallucinate–producing fluent, confident, factually wrong outputs–with a consistency that persists across generations and scales. Existing taxonomies classify hallucination by output type, distinguishing intrinsic from extrinsic failures and faithfulness from factuality divergence. These frameworks are descriptively rigorous but do not identify which internal mechanism produced a given instance. This paper analyses hallucination as a structural consequence of three architectural decisions that together form a compound failure system. Self-attention's co-occurrence learning substitutes statistical proximity for semantic meaning and produces entity confusion, fact misattribution, and semantic drift. The maximum likelihood estimation training objective optimises next-token probability without factual constraint, rewarding statistically plausible outputs regardless of their truth value. Autoregressive decoding's permanent left-to-right commitment under exposure bias ensures that a single wrong token cascades forward through the entire output sequence without revision. Dataset pathologies–long-tail deficiencies, training bias, and synthetic pollution–amplify these vulnerabilities but do not independently cause them. We make three contributions. First, we map each mechanism to a specific output category in the Alansari and Luqman taxonomy, locating intrinsic hallucination in self-attention, extrinsic hallucination in MLE, and logical inconsistency in autoregressive decoding. Second, we show that each commonly cited dataset pathology exploits one of these mechanisms rather than originating hallucination independently. Third, we identify the diagnostic limitation of output-type-only classification and contrast it with inference-layer mitigation approaches.

08.
arXiv (CS.LG) 2026-06-19

On the QUEST for Uncertainty Quantification via Highest Density Regions

arXiv:2606.19569v1 Announce Type: new Abstract: Uncertainty quantification (UQ) is essential for reliable decision-making in safety-critical applications in probabilistic machine learning. For regression problems, dominant scalar UQ approaches - notably, those based on proper scoring rules - measure uncertainty via pointwise predictive risk. This can lead to counterintuitive results when the target statistic is not the conditional expectation. We propose an alternative framework, in which uncertainty is characterised by the volume of the most probable subset of a distribution's support. QUEST (Quantifying Uncertainty via highest dEnSiTy regions) is a novel approach to UQ based on the concentration of Lebesgue measure at a distribution's peak(s), evaluated at one or more values of a robustness parameter $\alpha$. We establish connections between our measures and classical statistics from information theory and economics. We show that, unlike popular alternatives based on proper scoring rules, QUEST measures of epistemic and aleatoric uncertainty satisfy a set of axioms adapted from the UQ literature, including monotonicity under distributional spread and invariance to location shifts. Selective prediction benchmarks confirm that QUEST performs favourably against standard measures such as variance and differential entropy.

09.
arXiv (quant-ph) 2026-06-19

Quantum models with the Yang-Lee phase transition

arXiv:2606.19732v1 Announce Type: cross Abstract: In this article, we present four different $1+1$D quantum models that realize the Yang-Lee (YL) phase transition under a deformation that preserves $PT$ symmetry. These are the antiferromagnetic Ising spin chain in transverse and longitudinal magnetic fields, the massive Schwinger model, the Blume-Capel model, and the three-state quantum clock model. Using the state-operator correspondence, we identify the YL critical point, compute the scaling dimensions of the lowest operators in each model, and find perfect agreement with the exact results for the YL criticality in two dimensions. Using bosonization for the Schwinger model and the Polyakov-Hubbard transformation for the other models, we show that in all of these quantum models the YL critical point is described, as expected, by a massless bosonic field with an $i \phi^3$ interaction. In the quantum clock model, this critical field interacts with a massive bosonic field, and we identify the massless and massive states in the Hamiltonian spectrum. In addition, we numerically compute the two-point function of $\phi$ at the Yang-Lee critical point and show that it grows with distance, in agreement with theoretical expectations.

10.
arXiv (CS.LG) 2026-06-15

Arbitrary control over multimode wave propagation for machine learning

arXiv:2402.17750v2 Announce Type: replace-cross Abstract: Controlled multimode wave propagation can enable more space-efficient photonic processors than architectures based on discrete components connected by single-mode waveguides. Instead of defining discrete elements, one can sculpt the continuous substrate of a photonic processor to perform computations through multimode interference in two dimensions. Here we designed and demonstrated a device with a refractive index that can be rapidly reprogrammed across space, allowing arbitrary control of wave propagation. The device, a two-dimensional programmable waveguide, uses parallel electro-optic modulation of the refractive index of a slab waveguide with about $10^4$ programmable spatial degrees of freedom. We implemented neural network inference on benchmark tasks with up to $49$-dimensional vectors in a single pass, without digital pre-processing or post-processing. Theoretical and numerical analyses further indicated that two-dimensional programmable waveguides may offer not only a constant-factor reduction in device area but also a scaling benefit, with the area required growing as $N^{1.5}$ rather than $N^2$.

11.
arXiv (math.PR) 2026-06-12

Symmetric Cooperative Motion in Higher Dimensions

arXiv:2606.13459v1 Announce Type: new Abstract: We prove a distributional convergence result for a multidimensional version of symmetric cooperative motion which was introduced and studied in one dimension in [HRW, SCM1]. Our approach relies on framing the associated recursive distributional equation as a discretization of the porous medium equation. A major challenge is to analyze the behaviour of finite difference schemes which approximate weak solutions of the porous medium equation with unbounded initial data. In overcoming this difficulty, we perform a detailed analysis of the probability mass function of symmetric cooperative motion, in which we introduce several new comparison arguments for the discrete process. Consequently, along the way, we establish a novel multidimensional convergence result for a finite difference scheme approximating the ZKB/Barenblatt solution of the porous medium equation, which is of independent interest.

12.
arXiv (CS.CV) 2026-06-19

Linear Recurrent Unit with Semantic Modulation for Image Super-Resolution

Linear recurrent unit (LRU), designed with a principled formulation for stable linear recurrence, has demonstrated promising accuracy and robustness on long-range dependency tasks. However, its static parameterization and single-scan method limits its applicability to 2D vision tasks. In this study, we propose a LRU-based restoration network with a semantic modulating unit (SMU) to achieve a harmonious balance between performance and efficiency in single-image super-resolution. The SMU plays three key roles: LRU modulation, spatial categorization, and feature enhancement through learned prototype. Extensive experiments demonstrate that our method quantitatively and qualitatively surpasses recent state-of-the-art methods. Notably, our approach achieves superior performance with computational complexity on par with existing methods. The source code and models are available at https://github.com/MingyuChoi-run/LSM

13.
arXiv (CS.AI) 2026-06-15

The Journal of Prompt-Engineered (Moral) Philosophy Or: Why AI-Assisted Ethics Research Requires Process Transparency

作者:

arXiv:2511.08639v4 Announce Type: replace-cross Abstract: Existing AI disclosure mandates in scholarship require that AI assistance be reported but leave transparency philosophically unspecified: they fix the duty without explaining what the duty serves. We argue that ethical inquiry is essentially contested at two independent levels – about what it is, and about what it demands of the inquirer – defeating output-only evaluation and welfare-economic dismissal of the transparency question, and, by extension, reproducibility framings imported from the empirical sciences. The transparency duty is grounded instead in agent-integrity: the legibility, before a community of inquiry, of the identity-constituting commitments that the author's mode of philosophising expresses. Because the standards for evaluating such work are not communally settled, the achievable goal for transparency is not evaluation against agreed criteria but tracking – accumulating the evidentiary record that lets each tradition assess the work on its own terms and makes future normative judgments possible. We develop a documentation-adequacy framework that operationalises Meaningful Human Control through five transparency elements – declaration, navigation, documentation account, process documentation, and development records – demonstrated by the paper itself, whose full documentation record is archived at a persistent identifier. The framework is a first iteration subject to revision, not a settled standard.

14.
arXiv (CS.CV) 2026-06-25

Reflective VLA: In-Context Action Consequences Make VLAs Generalize

Most vision-language-action (VLA) models are reactive: they predict the next action from the current instruction and observation, implicitly assuming that the current observation fully specifies the action-relevant state. In embodied control, however, embodiment-specific factors such as camera-to-robot geometry, robot calibration, or systematic actuation bias are often hard to identify from a single observation. As a result, reactive policies cannot reliably disambiguate these factors in general, overfitting to training environments and generalizing poorly at deployment. We propose Reflective VLA, which conditions each decision on a context of observation-action-consequence triplets. Each triplet records not only what the robot observed and executed, but also how the scene changed afterward, exposing the deployment-specific mapping from actions to observed effects. Architecturally, Reflective VLA routes all observation modalities through the VLM under shared attention, so the action expert reasons directly over past triplets and the current observation. A block-causal mask enables parallel multi-frame training without leakage and supports KV-cached real-time inference. On standard LIBERO and SimplerEnv-Bridge, Reflective VLA preserves strong in-distribution performance. Under distribution shift on LIBERO-Plus and the harder LIBERO-Plus-Hard, it improves average success rate by 5.4 and 4.2 percentage points over a matched reactive baseline. Ablations with a matched history-only baseline further show that action consequences – rather than additional context length alone – are the key to cross-environment generalization. Project page: https://lianqing11.github.io/reflective-vla-page/

15.
arXiv (CS.AI) 2026-06-24

Multimedia and Visual Analytics in the Agentic Era

arXiv:2504.06138v3 Announce Type: replace-cross Abstract: Professional users need tools to help them gain actionable insights from large multimedia collections. Foundation models and AI agents have rapidly changed the playing field, and improving their accuracy, trustworthiness, and reasoning capabilities are active topics in the computer vision, machine learning, and multimedia communities. Most current research focuses on benchmark driven algorithmic improvements. The multimedia community is the place to go beyond algorithms and consider complete multimedia analytics systems that support professional users in their complex tasks and achieve a true teaming of humans and AI. Supporting users with machine learning and visualizations has been studied for decades in the visual analytics field. In this paper, we propose a framework to bring multimedia and visual analytics together and indicate how it could impact current and new multimedia analytics solutions. Additional information can be found at https://staff.fnwi.uva.nl/m.worring/analytics-model.html

16.
arXiv (quant-ph) 2026-06-15

Efimov Effect in Ultracold Microwave-Shielded Polar Molecules

arXiv:2602.21433v2 Announce Type: replace-cross Abstract: A quantum-mechanical description is presented for the three-body physics of shielded dipolar molecules, including a prediction of observable Efimov physics. Despite the anisotropic and long-range nature of the interaction, shielding enables a regime in which universality emerges already at the two-body level and extends to the three-body sector, where Efimov physics emerges. On the negative side of the scattering-length resonance, computed trimer binding energies display the characteristic scaling expected for Efimov resonances. Finally, the sudden approximation can be used to create trimer bound states, starting from positive energy trap states as a way to create or detect these molecular trimers. Moreover, the three-body parameter expressed in dipolar units is found to be universal.

17.
arXiv (CS.CL) 2026-06-19

CogniFold: Always-On Proactive Memory via Cognitive Folding

Existing agent memory remains predominantly reactive and retrieval-based, lacking the capacity to autonomously organize experience into persistent cognitive structure. Toward genuinely autonomous agents, we introduce CogniFold, a brain-inspired "always-on" agent memory designed for the next generation of proactive assistants. CogniFold continuously folds fragmented event streams into self-emerging cognitive structures, bootstrapping progressively higher-level cognition from incoming events and accumulated knowledge. We ground this by extending Complementary Learning Systems (CLS) theory from two layers (hippocampus, neocortex) to three, adding a prefrontal intent layer. Emulating the prefrontal cortex as the locus of intentional control and decision-making, CogniFold achieves this through graph-topology self-organization: cognitive structures proactively assemble under the stream, merge when semantically similar, decay when stale, relink through associative recall, and surface intents when concept-cluster density crosses a threshold. We evaluate structural formation using CogEval-Bench, demonstrating that CogniFold uniquely produces memory structures that match cognitive expectations and concept emergence. Furthermore, across eight downstream benchmarks – two probing long-term conversational memory (LoCoMo, LongMemEval) and six spanning other cognitive domains – we validate that CogniFold simultaneously performs robustly on conventional memory tasks. Our code is available at https://github.com/OpenNorve/CogniFold.

18.
arXiv (CS.CV) 2026-06-16

Is My Vision-Language Data in Your AI? Membership Inference Test (MINT) Demo 2

We present the Membership Inference Test (MINT) Demo 2, a framework designed to improve transparency in machine learning training processes. MINT is a technique for experimentally determining whether specific data were used during machine learning model training. We establish the theoretical framework and propose multiple architectures for MINT depending on the amount of information known about the models that are being audited. Experimental results using a popular face recognition model, 4 state-of-the-art LLMs, and multiple, diverse, and large-scale public image and text databases achieve promising accuracy levels in the detection of training data of up to 90%. Building on these results, we introduce a comprehensive web platform1 that expands these capabilities to image and text modalities. The platform integrates a diverse technological stack, including MINT, aMINT, and gMINT, allowing users to audit a wide range of models. This demonstrator aims to promote AI transparency and provides a practical tool to foster compliance with emerging AI regulations.

19.
arXiv (CS.LG) 2026-06-16

GPT-Based Fast Simulation of CLAS12 Detector Hits via Conditional Autoregressive Generation

arXiv:2606.16035v1 Announce Type: cross Abstract: Modern particles physics experiments have demonstrated an increasing need for fast, high-fidelity detector simulation as detector components have improved and subsequent computational requirements approach the limits of available resources. Recently, deep generative models have emerged as a promising alternative to traditional Monte-Carlo methods, with recent works drawing inspiration from large language models (LLMs) and self-supervised next-token prediction methods. In this work, we present an application of a GPT-style autoregressive transformer as a fast surrogate model for the calorimeter inside the CLAS12 experiment at the Thomas Jefferson National Accelerator Facility. The model is conditioned on incident momentum and generates realistic detector hits autoregressively across all nine calorimeter layers as sequences of strip, ADC, and TDC tokens. We demonstrate that the model faithfully reproduces hit multiplicity, spatial distributions, energy deposits, and the energy-momentum response of the electromagnetic calorimeter. The generator achieves inference rates exceeding 700 events per second on a single GPU, providing a substantial speedup over traditional Geant4-based simulations while maintaining physics fidelity essential for high-luminosity experimental programs.

20.
arXiv (quant-ph) 2026-06-12

Towards Geostrategic Critical Minerals and Materials Resilience: Secure Supply-Chain and Criticality Analyses for Quantum Technologies in Arctic and Space Environments

arXiv:2605.02926v2 Announce Type: replace-cross Abstract: This manuscript maps secure-supply and criticality risks for quantum technologies deployed in extreme environments, linking upstream critical minerals and materials (CMMs) to downstream system performance, continuity of security, and mission assurance. It adopts a reproducible "Critical Level I" screening method to identify materials whose supply concentration, essentiality, and limited mitigatability can create bottlenecks for quantum deployment. The analysis is structured around two use cases: (i) niobium as a key input for superconducting quantum computing and related manufacturing and toolchain dependencies; and (ii) space-qualified superconducting nanowire single-photon detectors (SNSPDs), alongside adjacent single-photon detector platforms such as SPADs, where radiation, thermal cycling, vibration, and electromagnetic interference can degrade device metrics and, in communications settings, threaten continuity of security. The manuscript further situates these dependencies within U.S.-China strategic competition over critical materials, refining capacity, export controls, and overseas mineral acquisitions, while also connecting them to standards-first governance, post-quantum cryptography migration, and the emerging security logic of quantum networking. It argues that static national critical-minerals lists are insufficient for mission-relevant quantum technology and proposes a dedicated Quantum Criticality and Critical Minerals (QCCM) dashboard as a living decision-support tool for tracking concentration, substitutability, qualification bottlenecks, stockpiling gaps, and geopolitical stress signals across quantum platforms. The paper concludes with implications for substitution, diversification, stockpiling, shielding, qualification-by-design, and standards-aligned governance to support secure, sustained, and mission-relevant quantum deployment.

21.
arXiv (CS.CV) 2026-06-11

World Model Self-Distillation: Training World Models to Solve General Tasks

Pretrained video generators are promising visual world models that exhibit emergent task-solving abilities; however, their reliance on detailed textual descriptions limits their direct use for planning and decision-making. Existing approaches either outsource this reasoning to language or vision-language models, or rely on supervised fine-tuning with paired task-execution videos, which are costly to collect and difficult to scale. We propose a scalable framework that elicits task-solving ability in such models by combining self-distillation with reinforcement learning. Given an unlabeled scene image, a vision-language model generates a candidate task and a detailed step-by-step solution. The solution conditions a pretrained video diffusion model, the Demonstrator; we distill its behavior into an Executor conditioned only on the image and a short task prompt. This transfers execution knowledge from caption-guided generation to instruction-conditioned task solving without curated task-video supervision. We further improve the Executor with reinforcement learning from VLM feedback, exploiting the asymmetry between judging whether a sampled video satisfies a task and generating the solution. Experiments on our proposed WorldTasks-Benchmark and the DreamGen robotics benchmark show that the Executor surpasses the Demonstrator under our VLM-based evaluation protocol and transfers competitively to robotic tasks.

22.
arXiv (CS.CL) 2026-06-17

OpenLID-v3: Improving the Precision of Closely Related Language Identification – An Experience Report

Language identification (LID) is an essential step in building high-quality multilingual datasets from web data. Existing LID tools (such as OpenLID or GlotLID) often struggle to identify closely related languages and to distinguish valid natural language from noise, which contaminates language-specific subsets, especially for low-resource languages. In this work we extend the OpenLID classifier by adding more training data, merging problematic language variant clusters, and introducing a special label for marking noise. We call this extended system OpenLID-v3 and evaluate it against GlotLID on multiple benchmarks. During development, we focus on three groups of closely related languages (Bosnian, Croatian, and Serbian; Romance varieties of Northern Italy and Southern France; and Scandinavian languages) and contribute new evaluation datasets where existing ones are inadequate. We find that ensemble approaches improve precision but also substantially reduce coverage for low-resource languages. OpenLID-v3 is available on https://huggingface.co/HPLT/OpenLID-v3.

23.
arXiv (CS.CV) 2026-06-18

Forged Calamity: Benchmark for Cross-Domain Synthetic Disaster Detection in the Age of Diffusion

The rapid advancement of text-to-image diffusion models has enabled the creation of highly photorealistic synthetic images that closely resemble real photographs, making it increasingly difficult to distinguish authentic content from AI-generated fabrications. This poses challenges for cybersecurity, digital forensics, and disaster response, where fake imagery of floods, fires, or earthquakes can spread misinformation or disrupt emergency operations. To address this, we introduce Forged Calamity, a benchmark dataset for synthetic disaster detection containing 30,000 images, including 6,000 real and 24,000 synthetic samples generated by four diffusion models. Comprehensive experiments across fine-tuned and zero-shot settings reveal consistent weaknesses in current forensic approaches. Fine-tuned detectors perform well in-distribution but lose up to 50\% accuracy on unseen generators or disaster types, showing overfitting to model-specific artifacts. Zero-shot generalized detectors also struggle to maintain stable accuracy, with only limited resilience in a few representation-robust models. These findings highlight persistent generalization gaps and the urgent need for domain- and model-agnostic detection methods to ensure visual authenticity in the diffusion era.

24.
medRxiv (Medicine) 2026-06-22

Sequential Deep Learning to Predict Non-Central to Central Geographic Atrophy Progression from OCT Imaging

Purpose: To develop and validate a temporal deep learning framework for predicting geographic atrophy (GA) progression across multi-year horizons using longitudinal optical coherence tomography (OCT) sequences. Design: Retrospective longitudinal cohort study. Subjects, Participants, and/or Controls: A total of 91 patients with dry age-related macular degeneration (AMD) were identified from Wake Forest University School of Medicine (2013-2023), yielding 455 OCT volumes. Two prediction cohorts were defined: 32 patients with no GA (NGA) at baseline who subsequently developed GA, and 35 patients whose earliest GA manifestation was non-central GA (NCGA). Non-progressing patients served as negative controls. Methods: OCT B-scan volumes were encoded into visit-level feature representations using three pretrained architectures (ResNet-18, ResNet-50, ViT-B/16). Chronologically ordered visit embeddings, optionally augmented with inter-visit time intervals ({Delta}t), were processed through recurrent neural networks (RNN), long short-term memory networks (LSTM), and Transformer encoders to model longitudinal disease trajectories. Models were trained and evaluated independently for prediction horizons of 2, 3, 4, 5, and 6 years using patient-level stratified splits (80/20). Performance was assessed across five random seeds. Main Outcome Measures: Area under the receiver operating characteristic curve (ROC-AUC), F1-score, and accuracy for predicting two clinically critical transitions: NGA to GA onset and NCGA to central GA (CGA) involvement. Results: For NGA to GA prediction, models achieved ROC-AUC of 0.84-0.94 at 2-4 years and 1.00 at 5-6 years. For NCGA to CGA prediction, Transformer-based models achieved peak AUC of 0.95 at 4 years and 0.96 at 5 years. Longer input sequences (8 visits vs. 4 visits) consistently improved NCGA to CGA performance at extended horizons. Temporal interval encoding improved stability in several LSTM configurations.

25.
arXiv (CS.AI) 2026-06-25

Power-Budgeted Underwater Vehicle Control via Constrained Reinforcement Learning

arXiv:2606.25680v1 Announce Type: cross Abstract: Underwater vehicles operate from a fixed onboard energy budget that propulsion rapidly depletes, so a controller that completes its task while drawing less thruster power directly extends mission range and endurance. Reinforcement learning yields capable model-free controllers for station-keeping and trajectory tracking, but optimizing task accuracy alone drives the policy toward oscillatory, energy-wasting actuation. The established remedy subtracts an energy penalty from the reward, yet this sets the task-power trade-off through a single weight with no physical units: a target power level cannot be specified, the weight must be re-tuned for every vehicle and task, and a mismatched weight can even raise power. This paper instead formulates energy-efficient underwater control as a constrained Markov decision process in which average thruster power is subject to an explicit budget, solved with a PPO-Lagrangian algorithm. The power level is set by declaring a budget in physical units, and a single dual variable is updated online to meet it for each vehicle and task, without manual weight search. Across three vehicles and four tasks in the MarineGym simulator, the energy-constrained policy draws the least power in all twelve settings, reducing it by 14–65\% (up to 64.9\%) over a task-only baseline and below an energy-reward baseline everywhere, while remaining the smoothest in ten settings and preserving task accuracy except in one deliberately power-limited regime. Imposing energy as an explicit constraint thus offers a tuning-free route to energy-efficient underwater control that needs no per-vehicle, per-task weight search.