Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
medRxiv (Medicine) 2026-06-22

Spatial Analysis and Multilevel Determinants of Hypertension in Zambia: Analysis of the 2017 WHO STEPS Survey

Background: Hypertension is the leading modifiable cardiovascular risk factor globally, with the fastest-growing burden in low- and middle-income countries. This study aimed to estimate national hypertension prevalence, map provincial patterns, assess spatial clustering, and identify individual and community-level determinants among Zambian adults using the 2017 WHO STEPS survey. Methods: This cross-sectional study used data from the 2017 WHO STEPS survey, a nationally representative sample of 4,301 adults aged 18-69 years. Hypertension was defined as systolic BP [&ge;]140 mmHg, diastolic BP [&ge;]90 mmHg, or current antihypertensive use. Spatial autocorrelation was assessed via Moran's I and LISA. Four nested generalised linear mixed models with PSU-level random intercepts identified individual and community-level determinants. Results: Overall weighted hypertension prevalence was 24.0%. Lusaka recorded the highest prevalence (30.2%), followed by Southern (29.9%) and Muchinga (28.3%) provinces; Western Province had the lowest (12.4%). Spatial clustering was statistically significant but modest (Moran's I = 0.0247, p < 0.001). Between-cluster variation reduced from ICC = 5.9% to 1.8% in the full model, indicating geographic differences were largely explained by individual characteristics. Age was the strongest predictor; adults aged 60-69 had nearly sevenfold higher odds than those aged 18-29 (AOR 6.92, 95% CI: 4.95-9.66). Women had lower odds than men (AOR 0.64, 95% CI: 0.52-0.79). Obesity (AOR 2.34), overweight (AOR 1.65), high cholesterol (AOR 1.40), diabetes (AOR 1.35), and single marital status (AOR 1.34) were independently significant. Western Province showed consistently lower odds than Central Province (AOR 0.48). Conclusion: Hypertension affects one in four Zambian adults, driven primarily by age, sex, obesity, dyslipidaemia, and diabetes. Geographically prioritised interventions, including community health worker-led screening programmes in Lusaka and Southern Province, would maximise population-level impact. Population-level salt reduction and alcohol policies represent cost-effective complementary strategies. Longitudinal studies with finer spatial resolution are needed to clarify causal pathways underlying observed geographic clustering and inform SDG Target 3.4 progress.

02.
arXiv (CS.AI) 2026-06-16

Distilling Drifting Transformers with Representation Autoencoders

arXiv:2606.15553v1 Announce Type: cross Abstract: Representation Autoencoders (RAEs) have improved diffusion and flow models by semantically richer latent space owing to the strongly label-wise clustered DINO features in the pretrained encoders. Yet in the distillation stage, the severe anisotropy and large curvatures caused by the rich semantic representations would hinder the convergence and performance, making the trajectory-based distillation unstable. In this work, we argue that the RAE latent space is compatible with distillation via the newly proposed Drifting Models. We first quantitatively study the curvatures and isotropy statistics across different autoencoders, and theoretically reveal that Drifting Model itself is highly likely to fail on extremely scattered spaces like reconstruction-based VAEs. These motivate us to apply the drifting paradigm directly to representation autoencoders. Our proposed method, Drift-RAE, distills pretrained flow models in RAE latent spaces using Drifting, together with insightful modifications that improve training stability by thereotically aligning drifting fields with other frameworks. Regarding the experimental evidences, we achieve 1.77 FID on ImageNet 256 dataset using only 10k distillation steps, surpassing state-of-the-art RAE distillation methods and appearing comparative with the original Drifting Model without requiring an auxiliary MAE feature extractor. The code will be made publicly available.

03.
bioRxiv (Bioinfo) 2026-06-18

A data-driven rediscovery of the specificity-conferring code of adenylation domains in nonribosomal peptide synthetases

Nonribosomal peptide synthetases (NRPSs) are large modular enzymes that assemble structurally diverse peptides, many of pharmacological importance, including antibiotics and immunosuppressants. Within each NRPS module, the adenylation (A) domain selects the substrate to be incorporated, a choice governed by a small set of residues lining the binding pocket. For two decades, computational prediction of A-domain substrate specificity has relied on residue sets - most prominently the Stachelhaus code and the 34-residue "8 Angstrom code" - that were defined by spatial proximity to the substrate rather than by demonstrated predictive value. Here we revisit which residues govern substrate specificity from a purely data-driven perspective. We assembled a non-redundant dataset of 5,366 A-domain sequences (4,693 bacterial and 673 fungal) and used information-theoretic measures to rank alignment positions by their statistical association with substrate identity, without restricting candidate positions to any predefined structural shell. This procedure yielded two compact, kingdom-specific codes: IG15B (15 positions) for bacterial and IG13F (13 positions) for fungal A-domains. Both match or exceed the predictive accuracy of the 34-residue 8 Angstrom code while using fewer than half its positions, and both independently recover the majority of the classical Stachelhaus positions. Notably, our analysis identifies four positions (242, 280, 281, and 284) that lie outside all conventional codes yet carry non-redundant specificity information and co-localize with classical determinants on two helices flanking the binding pocket. These positions provide new candidate sites for the rational engineering of A-domain specificity.

04.
arXiv (CS.LG) 2026-06-19

UltraEP: Unleash MoE Training and Inference on Rack-Scale Nodes with Near-Optimal Load Balancing

arXiv:2606.04101v3 Announce Type: replace-cross Abstract: Large-scale expert parallelism (EP) is becoming pivotal for training and serving frontier MoE models, but it also amplifies device-level expert load imbalance into compute stragglers, token all-to-all bottlenecks, and activation-memory spikes. Existing balancers redistribute experts periodically based on historical load, which becomes unreliable for production deployments with non-stationary load patterns. We present UltraEP, the first exact-load, real-time balancer for large-EP MoE training and serving prefill on rack-scale nodes (RSNs). Leveraging the extended scale-up connectivity among dozens of GPUs within RSNs, UltraEP rebalances every microbatch and layer on critical paths, which requires nontrivial co-design of plan solving and expert replication communication to minimize exposed overhead. To this end, UltraEP eagerly reacts to post-gating load with an efficient quota-driven planner, and executes the resulting irregular expert-state transfers with RSN-native persistent tile streaming and relay-based fan-out mitigation. We evaluate UltraEP in a multi-RSN deployment of up to 256 GPUs, using cutting-edge MoE models from 106B to 671B parameters. Averaged across training and serving, UltraEP achieves 94.3% of the force-balanced ideal throughput, delivering 1.49$\times$ improvement over no-balancing, while reducing the final inter-rank imbalance from 1.30$-$4.01 to 1.01$-$1.04.

05.
arXiv (CS.AI) 2026-06-19

Calibration Without Comprehension: Diagnosing the Limits of Fine-Tuning LLMs for Vulnerability Detection in Systems Software

arXiv:2606.20502v1 Announce Type: cross Abstract: Whether LLMs scoring well on vulnerability benchmarks genuinely reason about security or merely pattern-match on contaminated data remains unresolved. We present CWE-Trace, a framework for LLM vulnerability detection built from 834 manually curated Linux kernel samples spanning 74 CWEs. The framework enforces a strict temporal split (pre-2025 historical set / post-cutoff leakage-free set), preserves context-aware vulnerable–patched pairs, and introduces two diagnostic metrics: the Directional Failure Index (DFI) and Hierarchical Distance and Direction (HDD). We evaluate eight vanilla LLMs and 15 LoRA fine-tuned variants across non-targeted detection, targeted detection, and CWE classification. Our analysis yields two key results. First, data contamination provides no measurable advantage. Function-level analysis shows that 84% of nominally contaminated samples carry no usable memorization signal: vulnerable functions are absent or cross-mapped across datasets, and ~31% of contaminated samples carry CWE misclassification. Second, backbone directional priors dominate fine-tuning. Models exhibit stable, systematic failure modes (DFI ranging from -85.5 to +94.8 pp) that persist from historical to post-cutoff data and resist correction. Fine-tuning shifts the output threshold without changing the decision policy. This is calibration without comprehension: output distributions adapt to training data while the underlying security reasoning remains absent. The weakest backbone at binary detection (DeepSeek-R1) gains the most in coarse CWE classification, revealing that detection and understanding are decoupled capabilities. The best detection score reaches only 52.1% (+2.1 pp above chance); exact CWE ranking remains below 1.3% Top-1 accuracy, confirming that current LLMs lack reliable security reasoning for systems software, regardless of fine-tuning strategy.

06.
arXiv (quant-ph) 2026-06-16

A Gauge-Covariant Geometric Framework for Non-Hermitian Quantum Systems

arXiv:2606.15922v1 Announce Type: new Abstract: We develop a comprehensive, gauge-covariant geometric framework for non-Hermitian quantum systems in the quasi-Hermitian regime, that is, the region of parameter space where the non-Hermitian Hamiltonian admits a real spectrum and a positive-definite metric operator. We build this framework by elevating the Dyson map to a central geometric object. This map is the transformation that converts a non-Hermitian Hamiltonian into an equivalent Hermitian one. From it we construct the Dyson connection and decompose it into Hermitian and anti-Hermitian parts, identified respectively as {\it stretching } and {\it rotation } components. This decomposition cleanly separates the genuine physical metric deformations from the unitary gauge redundancies. Working with manifestly gauge-covariant states, we then derive the complex non-Hermitian Berry phase and the quantum geometric tensor (QGT), and show that the non-Hermitian geometric curvature originates from the non-commutativity of the stretching components at the operator level. We further analyse the geometric singularities near an exceptional point (EP) and uncover a distinct hierarchy of divergences. For a general two-level non-Hermitian model, the quantum metric tensor (QMT) exhibits a leading-order divergence $\sim |\epsilon_\mu|^{-2}$, while the Berry curvature shows a weaker, subleading divergence $\sim |\epsilon_\mu|^{-3/2}$, with $\epsilon_\mu$ denoting the parameter displacement from the EP along an individual parameter axis $\mu$. Finally, we examine physical realizations of this model, including the non-Hermitian Su–Schrieffer–Heeger (SSH) and Hatano–Nelson (HN) models, where exact analytical results confirm the predicted critical scaling laws and illustrate the metric-deformation-driven non-Hermitian geometries.

07.
medRxiv (Medicine) 2026-06-22

UKBAnalytica: an integrated R package for scalable phenotyping and reproducible epidemiological analysis within the UK Biobank Research Analysis Platform

Authors:

UK Biobank provides longitudinal health-related data for approximately 500,000 participants, and its Research Analysis Platform (RAP) has shifted large-scale analyses toward secure cloud-based computation. However, many existing tools address only specific steps of the analytical workflow, leaving a need for an integrated framework that connects multi-source disease phenotyping, survival-ready cohort construction, and downstream analysis on the RAP. Here, we present UKBAnalytica, an extensible R package for scalable phenotyping and integrated analysis of UK Biobank data within the RAP environment. It currently includes 52 predefined baseline variables and a built-in library of 331 curated disease definitions. These definitions are based on multiple UK Biobank data sources, including ICD-10, ICD-9, self-reported conditions, death registry records, algorithmically defined outcomes, and OPCS-4 procedure codes. UKBAnalytica distinguishes prevalent and incident cases, constructs follow-up time, generates analysis-ready survival datasets, and summarizes participant flow. Beyond phenotype construction, UKBAnalytica provides integrated modules for epidemiological analysis, omics analysis, and machine-learning-based modeling and interpretation. By linking endpoint definition with downstream modeling under a consistent data structure, UKBAnalytica reduces repetitive scripting and improves analytical transparency. Furthermore, we demonstrate the package's practical utility through a case study on chronic obstructive pulmonary disease (COPD) proteomics. The findings align closely with previously reported conclusions, underscoring the robustness and reliability of our analytical framework. This phenotype-centered framework complements existing UK Biobank tools and facilitates reproducible RAP-based biomedical research. UKBAnalytica is freely available at https://github.com/Hinna0818/UKBAnalytica.

08.
arXiv (CS.AI) 2026-06-11

Multi-Rate Mixture of Experts for Accelerating Liquid Neural Network Training

arXiv:2606.12240v1 Announce Type: cross Abstract: Multivariate time-series data often exhibit complex temporal dependencies, irregular sampling, and heterogeneous dynamics across multiple time scales, making accurate sequence modeling particularly challenging. Traditional recurrent neural networks (RNNs), such as Long Short-Term Memory (LSTM) networks, operate in discrete time and may struggle to effectively capture continuous and irregular temporal behaviors. Liquid Neural Networks (LNNs) address some of these limitations through continuous-time dynamics, but standard LNN architectures typically rely on a single dynamical system, limiting their ability to model heterogeneous temporal patterns. To address these challenges, we propose a Multi-Rate Mixture-of-Experts (MR-MoE) framework built on top of Liquid Neural Networks. In the proposed architecture, multiple LNN-based experts operate at distinct time scales, enabling the model to explicitly separate fast-changing dynamics from slow-evolving temporal trends. A gating network further enables adaptive expert specialization based on input conditions. In addition, we incorporate both feature-level and temporal attention mechanisms to improve robustness, interpretability, and long-range dependency modeling. Feature-level attention suppresses noisy or irrelevant variables, while temporal attention selectively focuses on informative historical states. We evaluate the proposed framework on a complex multivariate time-series prediction task and compare it against strong baselines, including LSTM, monolithic LNN, and standard MoE models. Experimental results demonstrate that the proposed MR-MoE framework consistently achieves improved AUROC and AUPRC performance while maintaining favorable computational efficiency. These results highlight the effectiveness of combining continuous-time dynamics, multi-scale expert decomposition, and adaptive attention mechanisms for time-series modeling.

09.
arXiv (CS.LG) 2026-06-18

Anti-causal domain generalization: Leveraging unlabeled data

arXiv:2602.17187v2 Announce Type: replace-cross Abstract: The problem of domain generalization concerns learning predictive models that are robust to distribution shifts when deployed in new, previously unseen environments. Existing methods typically require labeled data from multiple training environments, limiting their applicability when labeled data are scarce. In this work, we study domain generalization in an anti-causal setting, where the outcome causes the observed covariates. Under this structure, environment perturbations that affect the covariates do not propagate to the outcome, which motivates regularizing the model's sensitivity to these perturbations. Crucially, estimating these perturbation directions does not require labels, enabling us to leverage unlabeled data from multiple environments. We propose two methods that penalize the model's sensitivity to variations in the mean and covariance of the covariates across environments, respectively, and prove that these methods have worst-case optimality guarantees under certain classes of environments. Finally, we demonstrate the empirical performance of our approach on a controlled physical system and a physiological signal dataset.

10.
arXiv (CS.CL) 2026-06-16

Measuring Whether LLM Tutors Teach or Solve: A Diagnostic for Educational Impact

Large language models are increasingly proposed as educational tutors, yet stronger task-solving ability does not necessarily imply stronger learning support. Motivated by recent calls to measure the social impact of NLP systems in practice, we study whether public LLM tutoring benchmarks distinguish learning-supportive behavior from mere answer production. We propose a lightweight diagnostic based on the gap between solving-oriented and pedagogy-oriented benchmark performance. Using public MathTutorBench leaderboard results, we show that these dimensions are only partially aligned: across eight publicly reported models, the correlation between solving and pedagogy composites is 0.421, and several models shift meaningfully in rank when evaluation moves from solving to pedagogy. We then analyze the public TutorBench sample and show that agency-relevant behaviors are explicitly encoded in benchmark rubrics, especially in active-learning settings that reward guiding questions, calibrated hints, and non-disclosive scaffolding. Together, these findings suggest that educational-impact evaluation should not treat task success as a sufficient proxy for learning support. We argue that public tutoring benchmarks can better support positive-impact evaluation by reporting solving-oriented and pedagogy-oriented scores separately and by making disclosure-sensitive, student-agency-preserving criteria more explicit.

11.
medRxiv (Medicine) 2026-06-18

Cost-effectiveness of a virtual fracture clinic versus traditional in-person fracture clinic care for adults with acute simple fractures: a protocol for a health economic evaluation within the RECITAL trial

ABSTRACT Introduction Traditional in-person fracture clinics are often overcrowded and inconvenient for patients. Virtual fracture clinics aim to address some of these concerns by improving the efficiency of the orthopaedic service and reducing unnecessary interventions while maintaining safety and quality of care. The RECITAL trial is a non-inferiority randomised controlled trial comparing follow-up care provided at a virtual fracture clinic for people with acute simple fractures to follow-up care provided at an in-person fracture clinic. This study describes the protocol for an economic evaluation of RECITAL where the primary aim is to investigate the cost-effectiveness of a virtual fracture clinic compared with traditional in-person fracture clinic care from a health system perspective. Methods and analysis The RECITAL trial recruited 312 participants with acute simple fractures and randomised them to receive follow-up care provided at a virtual fracture clinic or follow-up care provided at an in-person fracture clinic. We will conduct a within-trial analysis from a health system perspective (primary analysis), as well as a health service, patient and societal perspective. The economic evaluation will estimate the difference in the cost of resource inputs on an intention to treat basis used by participants in the two arms of the trial, allowing comparisons to be made between the in-person and virtual fracture clinics. Data for intervention costs and healthcare utilisation will be collected from trial records, hospital electronic medical records and district performance units. The results of the economic evaluation will be expressed in terms of incremental cost per utility weight gained at 12 weeks and will be plotted on a cost-effectiveness plane. Bootstrapping by resampling will be used to estimate 95% confidence intervals around costs and outcomes, and to calculate the confidence intervals around the incremental cost-effectiveness ratio. A cost-effectiveness acceptability curve (CEAC) will be plotted, which will provide information about the probability that an intervention is cost-effective, given the level of a decision makers willingness to pay for each additional outcome. Ethics and Dissemination The trail was approved by the SLHD Ethics Review Committee (RPAH Zone) (X23-0200 and 2023/ETH01038). The findings will be disseminated through a peer-reviewed journal and conference presentations. Trial registration number The trial was prospectively registered on the Australian New Zealand Clinical Trials Registry (ANZCTR; 12623000934640)

13.
arXiv (CS.LG) 2026-06-18

MOLAR: Learning Multimodal Molecular Representations from Noisy Labels

arXiv:2606.18390v1 Announce Type: new Abstract: Motivation: Noisy labels are a common challenge in molecular property prediction because molecular annotations are often obtained from assays, curated databases, or weak annotation pipelines rather than directly observed clean biological states. Treating recorded labels as reliable supervision can cause models to memorize corrupted observations and learn misleading molecular evidence. In multimodal molecular representation learning, this issue can be amplified by graph-text fusion or alignment, which may propagate label-induced errors across modalities. Results: We propose MOLAR, a noise-aware framework for learning multimodal molecular representations from noisy labels. MOLAR separates latent clean-property inference from recorded-label observation: graph and text views contribute residual evidence to a clean-property distribution, and a categorical label-observation channel maps this distribution to recorded labels for training. This formulation derives posterior label reliability and modality-specific molecular evidence from the model. Experiments on naturally noisy molecular benchmarks and controlled label-flipping benchmarks show that MOLAR consistently outperforms representative baselines. Visualization analyses further show that MOLAR provides interpretable reliability and modality-evidence diagnostics.

14.
medRxiv (Medicine) 2026-06-10

Optimisation of steatotic liver disease screening algorithm for resource-poor settings using machine learning

Background The European Association for the Study of the Liver (ESAL) - Steatotic Liver Disease (SLD) screening algorithm involves two steps; initial screening with FIB-4 followed by referral for vibration-controlled transient elastography (VCTE) in patients likely to have significant fibrosis (SF). However, VCTE is not widely available in resource-limited settings. Aim To optimise the EASL SLD screening algorithm for resource-poor settings using machine learning (ML). Methods We analysed data from 964 adults aged [&ge;]35 years who underwent VCTE at a tertiary referral centre in Sri Lanka between November 2024 and 2025. Multiple ML models using different methods and variable combinations were trained on 80% of the dataset and tested on the remaining 20%. Best models were selected based on performance and externally validated using data from 430 patients who underwent VCTE before November 2024. Model performance was compared with the FIB-4 using confusion matrices. Results A Random Forest model incorporating age, AST, ALT, and platelet count separately, rather than using FIB-4, outperformed. The all-variable ML model showed the best predictive performance for SF, with accuracy of 77.2%, recall of 0.762, precision of 0.778, and AUC-ROC of 0.818. The variables used in the model, in descending order of feature importance, were AST, platelet count, BMI, ALT, age, diabetes mellitus, hypertension, dyslipidaemia, sex, family history, hypothyroidism, diabetes complication and smoking. External validation demonstrated 75.1% accuracy and an AUC of 0.779. When used as the first step of the SLD screening algorithm, the all-variable ML model identified 37 (17.1%) additional true positives and reduced false-negative diagnoses by 50% compared with FIB-4. Conclusions ML-based models were more effective than the FIB-4 score as the first-line screening tool for VCTE referral, substantially improving the identification of patients with significant fibrosis in this South Asian cohort.

15.
arXiv (CS.CV) 2026-06-16

Toward the Whole Picture: Accumulative Fingerprint Mapping and Reconstruction for Small-Area Mobile Sensors

Small-area fingerprint sensing on mobile devices creates a fundamental mismatch between acquisition and recognition: each touch captures only a tiny, pose-varying local patch, while reliable biometric matching ultimately requires a stable and sufficiently complete fingerprint representation. Existing pipelines largely cope with this mismatch by treating repeated touches as independent partial templates, which leads to repeated registration, repeated matching, and no guarantee of adequate global coverage. In this paper, we advocate a different formulation, namely accumulative fingerprint mapping and reconstruction for small-area mobile sensing. Rather than matching every partial patch separately, the proposed perspective converts a sequence of local observations into a unified fingerprint state that is progressively refined as new touches arrive and can be matched only once after consolidation. As a concrete baseline, we present a classical pipeline that performs patch-wise structural feature extraction, feature-level registration and fusion, fingerprint map construction, and phase-based ridge reconstruction. More importantly, we position this baseline within a broader mobile fingerprint framework that integrates structured token learning, two-stage pose reasoning, and diffusion-based generative reconstruction. This viewpoint reframes mobile fingerprint recognition from multi-capture multi-match processing to accumulative map building, state refinement, and one-shot matching, offering a principled route toward efficient, pose-robust, and deployment-friendly biometrics for small-area mobile platforms. The baseline implementation has been publicly released at https://github.com/XiongjunGuan/FpReconstruction.

16.
arXiv (CS.CL) 2026-06-15

Hybrid Classical-Quantum Variational Autoencoder for Neural Topic Modeling

Authors:

Neural topic models enable scalable semantic discovery, but their integration with quantum hardware remains largely unexplored. We present a proof-of-concept hybrid classical-quantum variational autoencoder (VAE) for topic modeling, embedding parameterized quantum circuits within the VAE inference network while retaining a classical topic-word decoder. To address the resource constraints of quantum hardware, we propose a modified Gaussian Softmax posterior that decouples latent space dimensionality from the number of topics to be extracted, enabling the model to operate with a low-resource 10-qubit quantum device. On the AgNews dataset, the hybrid VAE outperforms state-of-the-art neural topic models (NTMs), reaching a $C_v$ coherence score of 0.71 and an NPMI score of 0.20 while preserving high topic diversity. For comparison, we also construct a fully classical variant, which also outperforms state-of-the-art models on AgNews and exhibits clear class separation in the latent space. These results demonstrate that hybrid VAEs are computationally viable even on NISQ-era devices and represent a promising direction for quantum-enhanced topic modeling.

17.
arXiv (CS.LG) 2026-06-15

Side-Channel Attacks Bypass Protection in 3D Printers

arXiv:2606.13952v1 Announce Type: cross Abstract: Active Motor Noise Cancellation (AMNC) ships in commercial fused deposition modeling (FDM) 3D printers as a hardware countermeasure against acoustic side-channel attacks that target intellectual property (IP). We present the first empirical evaluation of a deployed AMNC countermeasure, using a public dataset of synchronized acoustic and vibration recordings from two AMNC-equipped Bambu Lab printers across 12 object classes. AMNC fully neutralizes the acoustic channel: classification accuracy is indistinguishable from the 8.33% random baseline. The vibration channel, which AMNC does not target, still leaks. With summary statistics the leak is coarse and amplitude-driven (vibration accuracy approximately 31% pooled, 36-47% within-printer), while the waveform shape carries essentially nothing (frequency-only features at chance). A full-sequence temporal model that ingests the ordered evolution of the print raises accuracy to approximately 61%, and an order-shuffling control (approximately 33%) shows that a substantial component is genuinely sequential and tied to print progression. The leak is device-specific: a classifier trained on one printer transfers near chance to the other. We conclude that AMNC is an acoustic-only defense: vibration remains a partial, geometry-correlated side channel it does not address, but one that does not, on this dataset, support full geometric reconstruction; reconstruction-grade attacks would require the magnetic or power channels AMNC also leaves untouched. We release all code.

18.
arXiv (CS.CV) 2026-06-12

DIMOS: Disentangling Instance-level Moving Object Segmentation

Moving instance segmentation (MIS) attracts increasing attention due to its broad applications in traffic surveillance, autonomous driving, and animal tracking. Event cameras record asynchronous brightness changes, providing high temporal resolution and dynamic range, which makes them highly sensitive to motion information. By fusing event and image features, motion cues from events can complement spatial details from images, enhancing the performance of MIS. However, current multimodal MIS methods still struggle to segment small moving instances, as event cameras often yield sparse features under limited resolution. Moreover, event features entangle appearance attributes with motion cues, which further restricts effective cross-modal fusion. To address these challenges, we first propose a dual-disentangling feature extraction framework that separates and extracts appearance and motion information within both image and event modalities, thereby improving feature density. Subsequently, a multi-granularity cross-modal alignment is introduced to align distributionally and semantically consistent features across modalities, enabling more effective fusion with rich spatial and temporal details. The experiment results demonstrate that our method achieves state-of-the-art performance in multimodal MIS, especially for small instances under challenging conditions such as fast motion and low-light settings.

19.
arXiv (CS.LG) 2026-06-15

When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge Editing

Authors:

arXiv:2606.14668v1 Announce Type: new Abstract: Knowledge editing systems must update selected facts while preserving nearby but irrelevant behavior. This paper studies this problem in a memory-assisted setting where an edit memory is retrieved at inference time and a parameter-efficient adapter corrects the model's object preference. We argue that the central design question is not only how to write an edit, but also when to suppress it. We introduce \method{}, a route-specialized dual-adapter editor. A relevance router first decides whether a prompt should receive an edit memory. Routed prompts use an edit adapter trained to prefer the new object over the original object; unrouted non-direct prompts use a separate locality adapter trained to preserve or restore the original-object preference. We evaluate \method{} on three 1,000-case protocols, \cf{}, \zsre{}, and \mquake{}, under the same memory protocol and two 7B/8B base models. On Llama-3.1-8B-Instruct, \method{} obtains the best overall probability-preference accuracy on all three benchmarks: 0.8180 on \cf{}, 0.8946 on \zsre{}, and 0.9922 on \mquake{}. The same trend holds on Qwen3-8B. Router ablations show that the relevant memory boundary differs across datasets: a lexical neural router is safest on \cf{}, while BGE embedding routing is better on \zsre{} and \mquake{}. Component and module ablations show that the gain mainly comes from separating edit injection from off-route suppression rather than from simply increasing LoRA capacity.

20.
arXiv (CS.AI) 2026-06-15

FactoryLLM: A Safe and Open-Source AI Playground for Evaluating LLMs in Smart Factories

arXiv:2606.14119v1 Announce Type: new Abstract: Fault diagnostics and recovery in smart factories is challenging because critical information is dispersed across manuals of multiple machines which are interconnected through the manufacturing process. Large Language Models (LLMs) can provide a promising approach. In this paper, we propose FactoryLLM, a safe and open-source AI playground designed for evaluating different LLM-based retrieval-augmented generation (RAG) models by analysing documents from multiple machines across the manufacturing process. FactoryLLM enables the user to configure the LLM, and assess performance when reasoning over multiple documents, through a dual evaluation setup using both RAGAS and NVIDIA's LLM-as-a-Judge metrics. FactoryLLM is safe because it allows users to run local or open-source LLMs without sharing sensitive industrial data, providing a controlled environment for experimentation. We demonstrate the efficacy of FactoryLLM through a case study which involves an Autonomous Intelligent Vehicle and its Mobile Planner software, evaluating three LLMs across 30 maintenance queries derived from approximately 600 pages of cross-machine documentation. The results suggest that FactoryLLM is effective in cross-machine document reasoning: every model achieved a groundedness score above 0.88. The full code and documentation for community to test FactoryLLM with their manufacturing specific scenarios are publicly available.

21.
arXiv (CS.CL) 2026-06-19

The Almost Intelligent Revolution: Options for Scaling Up Deliberation and Empowering People with AI

Authors:

The increasing prominence of Large Language Models (LLMs) in public discourse presents both opportunities and challenges for democratic deliberation. While red teaming strategies help mitigate specific risks, broader concerns persist regarding linguistic constraints, biases, and the sycophantic tendencies of LLMs. This chapter explores how LLMs can be used to significantly scale up and democratise deliberation, particularly in fostering inclusivity and empowering traditionally marginalised groups. Drawing on concepts from Systemic-Functional Linguistics, the chapter examines how variations across language users (for example, with respect to socio-demographic groups) and across language use (for example, with respect to communicative functions) shape participation in AI-supported deliberation. The chapter presents AI-driven deliberation studies and assesses their potential to scaffold argumentation, enhance access, and reduce the influence of exclusionary linguistic norms and biases which are embedded in prestigious registers. At the same time, the chapter cautions against both overclaiming, which leads to unrealistic expectations, and underclaiming, which risks missed opportunities for AI-assisted engagement. The chapter concludes by identifying future research directions to maximise the democratic potential of AI-assisted participation while embedding ethical safeguards to counteract the reproduction of linguistic inequalities.

22.
arXiv (CS.LG) 2026-06-18

Quantifying and Auditing LLM Evaluation via Positive–Unlabeled Learning

arXiv:2606.19057v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM–as–a–Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable positive judgments but leaving most outputs unlabelled and potentially mixed in quality. We formulate LLM evaluation under selective human supervision as a positive–unlabelled learning problem and propose a geometric auditing framework based on Partial Optimal Transport. By aligning a small set of human–verified positives with a reliable subset of unlabelled outputs in a fixed embedding space, our method identifies human–consistent preferences and corrects biased judges without retraining. Experiments demonstrate improved alignment with human preferences, increased robustness to presentation biases, and interpretable confidence estimates, offering a scalable and statistically grounded alternative to existing LLM–as–a–judge pipelines.

23.
arXiv (CS.CV) 2026-06-17

Theoretical Grounding of Out-Of-Distribution Detection With Reinforcement Learning Optimizer

Out-of-distribution (OOD) detection in dynamic open-world environments requires a model to continually adapt to evolving data distributions while generalizing to covariate-shifted inputs and rejecting semantic-shifted OOD examples. Most existing OOD detection methods optimize only the current-step objective and do not explicitly account for how post-deployment environment changes affect future OOD behavior. In this paper, we establish a theoretical grounding for dynamic OOD detection using a reinforcement learning (RL)-guided optimizer that explicitly favors updates that reduce the semantic OOD false positive rate over time. We develop a novel augmented optimizer that uses an RL-guided correction term on top of standard gradient descent (GD) and show its improvement over both future-domain generalization and semantic-OOD rejection. We analyze temporal error decomposition in terms of model-change and environment-change generalization errors and develop a new theoretical framework for comparing the generalization errors under both GD and RL-guided optimizers.

24.
arXiv (CS.CV) 2026-06-18

Benchmarking Physics-Informed Time-Series Models for Operational Global Station Weather Forecasting

The development of Time-Series Forecasting (TSF) models is often constrained by the lack of comprehensive datasets, especially in Global Station Weather Forecasting (GSWF), where existing datasets are small, temporally short, and spatially sparse. To address this, we introduce WEATHER-5K, a large-scale observational weather dataset that better reflects real-world conditions, supporting improved model training and evaluation. While recent TSF methods perform well on benchmarks, they lag behind operational Numerical Weather Prediction systems in capturing complex weather dynamics and extreme events. We propose PhysicsFormer, a physics-informed forecasting model combining a dynamic core with a Transformer residual to predict future weather states. Physical consistency is enforced via pressure-wind alignment and energy-aware smoothness losses, ensuring plausible dynamics while capturing complex temporal patterns. We benchmark PhysicsFormer and other TSF models against operational systems across several weather variables, extreme event prediction, and model complexity, providing a comprehensive assessment of the gap between academic TSF models and operational forecasting. The dataset and benchmark implementation are available at: https://github.com/taohan10200/WEATHER-5K.

25.
arXiv (CS.CL) 2026-06-11

Agentic Environment Engineering for Large Language Models: A Survey of Environment Modeling, Synthesis, Evaluation, and Application

Environments serve as interactive systems for large language model (LLM) based agents across diverse scenarios and play a crucial role in driving the continual evolution of model capabilities. Despite this importance, existing work lacks a systematic categorization and deep analysis. This paper systematically studies current researches on agentic environments from the perspective of the environment engineering lifecycle, covering their modeling, synthesis, evaluation and application. Specifically, the paper first introduces representative environments from the perspectives of eight attributes and eight domains, providing detailed analyses of their development paths and highlighting their core capabilities. Second, for automated environment synthesis, two paradigms are introduced, such as symbolic synthesis and neural synthesis. This paper also shows different environment evaluation methods in each paradigm. Thirdly, the corresponding environment applications from the perspective of agent-environment co-evolution are discussed. In specific, the paper characterizes the primary pathways for agent evolution in dynamic environments from four complementary perspectives: memory-centric experience evolution, orchestration-centric workflow evolution, trajectory-centric offline evolution, and exploration-centric online evolution. And three paradigms of environment evolution are identified, namely neural-driven, difficulty-driven, and scaling-driven approaches. At last, several promising future directions are discussed, including Environment-as-a-Service, Multi-agent Environments, and Neural-Symbolic Environments.