Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-17

Retrofitters, pragmatists and activists: Public interest litigation for accountable automated decision-making

arXiv:2511.03211v4 Announce Type: replace-cross Abstract: This paper examines the role of public interest litigation in promoting accountability for AI and automated decision-making (ADM) in Australia. Since ADM regulation faces political and geopolitical headwinds, effective governance will have to rely on the enforcement of existing laws. Drawing on interviews with Australian public interest litigators, technology policy activists, and technology law scholars, the paper positions public interest litigation as part of a larger ecosystem for transparency, accountability and justice with respect to ADM. The paper explores the tactics and strategies of what one participant described as 'retrofitting' old laws to ADM. These go beyond creative legal argumentation, to encompass practices of community-building, collaboration on theories of change, canny selection of clients and causes of action, and the alignment of the interests of stakeholders in litigation. Naturally, the paper also contends with the limits of these strategies, and of the Australian legal system. Where limits are, however, capable of being overcome, the paper presents findings on urgent needs: the enabling institutional arrangements without which effective litigation and accountability will falter. The paper is relevant to law and technology scholars; individuals and groups harmed by ADM; public interest litigators and technology lawyers; civil society and advocacy organisations; and policymakers.

02.
arXiv (CS.CV) 2026-06-11

Contactless 3D Human Body Measurement Using Depth Cameras for Smart Health Monitoring

Contactless body measurement technologies are becoming increasingly significant for smart health monitoring, digital health applications, and remote patient assessment. Traditional anthropometric measurements typically necessitate physical contact and trained personnel, which may constrain scalability in remote healthcare settings. In this study, we introduce a depth camera-based framework for estimating human body measurements utilizing 3D point cloud data. An Orbbec Astra 2 depth camera was employed to capture RGB images, depth maps, and 3D point clouds of participants. The captured point cloud was processed using Python-based tools, including Open3D, NumPy, and OpenCV, to segment the human body from the background. Key anthropometric measurements, such as height and arm span, were computed. The measurements were obtained through a combination of spatial filtering and landmark selection on the 3D point cloud, followed by the projection of the computed measurements onto the corresponding RGB image using camera intrinsic parameters. In addition to linear measurements, the approximate body volume and visible surface area were estimated using voxel-based occupancy analysis and mesh-based surface reconstruction methods. The experimental results from a single depth capture demonstrated that accurate body measurements and geometric estimates could be obtained from depth camera data without physical contact. This study provides a foundation for future real-time systems that integrate depth sensing with intelligent health monitoring and generative AI models for smart healthcare applications.

03.
arXiv (CS.CL) 2026-06-18

The Wrong Kind of Right: Quantifying and Localizing Misfired Alignment in LLMs

Warning: This paper studies stereotypes and biases, and contains potentially disturbing examples, used for illustration purposes only. Our findings should not be interpreted as an argument against alignment. Instead, this paper highlights the need for principled approaches to more advanced alignment. Alignment aims to ensure that large language models (LLMs) behave safely and reliably, including by avoiding unsafe inferences. However, we show that such safety-oriented behaviors can misfire: models may reject warranted conclusions even when they are explicitly supported by context. We call this failure mode misfired alignment, where alignment-induced changes cause LLMs to override explicit evidence. To quantify this phenomenon, specifically on stereotype-related alignment, we introduce VETO, a benchmark consisting of 2,032 BBQ-derived contrastive pairs, and define a new metric, Misfired Alignment Rate (MAR), which measures on a 0 to 100 scale how often a model fails on a stereotype-related question but succeeds on its contrastive counterpart. We benchmark 25 LLMs on VETO, and show that all LLMs, including the most recent ones, exhibit non-trivial (4.7 to 18.9%) MARs while all human participants achieve 0.0% MAR. Controlled priming experiments further show that alignment-induced cues can substantially amplify MAR across LLMs, indicating that these failures are not merely artifacts of individual examples but can be induced by safety-related framing. Mechanistic analyses on open-weight LLMs reveal late-layer suppression of evidence-supported answers, and comparisons between instruct and base LLMs suggest that this suppression emerges after instruction training. These findings show that current alignment methods can overgeneralize surface-level safety cues, to the point of overriding objective evidence, motivating more work on alignment objectives that better preserve contextual grounding.

04.
arXiv (CS.CV) 2026-06-17

Robustness of Similarity-based Positional Encoding Under Rotations: Theoretical Analysis and Experimental Validation

Positional encoding is a fundamental component of Transformer architectures, as it injects information about the spatial or sequential arrangement of inputs. Among recent alternatives to standard absolute and sinusoidal encodings, similarity-based positional encoding (simPE) has emerged as a flexible framework for representing positional structure through pairwise relations. simPE was originally designed for medical imaging applications, where geometric robustness is especially relevant: small rotations naturally arise during image acquisition, induced by imaging instruments, patient positioning, or slight acquisition misalignments. Despite its empirical promise, the theoretical behavior of simPE under geometric perturbations has not been fully characterized. In this paper, we study the robustness of simPE with respect to rotations, combining formal theoretical analysis with experimental validation. We first show that simPE is generally not rotation-invariant. We then prove that, under mild Lipschitz assumptions on the elementary components, simPE is stable under rotational perturbations and derive explicit perturbation bounds in Frobenius norm. We validate these findings experimentally on four controlled datasets–a synthetic Arrow dataset, a synthetic Shapes dataset (four geometric shape categories), a synthetic Digits dataset, and a benchmark image classification dataset (FashionMNIST)–in which training and validation images are kept in a fixed canonical orientation while test images are subjected to increasing rotation angles. Across all datasets, simPE consistently outperforms standard learned positional encoding in terms of accuracy, F1 score, precision, and recall under rotation, particularly in the small-to-moderate angle regime, corroborating the theoretical stability guarantees.

05.
arXiv (CS.AI) 2026-06-15

FAConformer: Frequency-Aware Convolutional Transformer for Auditory Attention Decoding

arXiv:2606.14120v1 Announce Type: cross Abstract: Auditory attention decoding (AAD) aims to infer the attended speaker from neural responses in multi-speaker acoustic environments and is a key problem for neuro-steered hearing systems. Although recent studies have achieved encouraging progress, existing AAD models still do not fully exploit frequency domain electroencephalography (EEG) information. In particular, most approaches introduce multi-band information through handcrafted feature extraction or direct cross-band feature concatenation, which mainly exploit frequency information at a shallow level and may overlook band-specific patterns and cross-band interactions. To address these limitations, this paper proposes FAConformer, a frequency-aware CNN-Transformer framework for AAD that explicitly integrates band-specific encoding and adaptive cross-band interaction. Specifically, FAConformer first decomposes EEG signals into multiple frequency bands and assigns each band to an independent CNN-Transformer encoder for band-specific modeling. The resulting band-wise features are then adaptively fused by a carefully designed frequency-aware attention (FAA) module that models cross-band dependencies by treating band-wise features as tokens. Further, band-wise auxiliary supervision (BAS) is introduced to prevent weakly contributing branches from being under-optimized during joint training. In this way, FAConformer performs frequency-aware modeling that more effectively exploits frequency domain information. Extensive experiments on two public AAD datasets with three decision-window lengths demonstrated that FAConformer consistently outperformed 12 competitive baselines, surpassing the current state-of-the-art model by 4.9%. Further analyses of band importance, ablation, and parameter sensitivity verify the effectiveness, robustness, and interpretability of the proposed framework. Code is available at https://github.com/wzwvv/FAConformer.

06.
arXiv (CS.CV) 2026-06-12

RGB-S: Image-Aligned Tactile Saliency for Robust Dexterous Manipulation

Effective visuo-tactile integration is critical for robotic dexterous manipulation, especially when visual observations are unreliable or occluded. However, robustly aligning sparse, heterogeneous tactile measurements with dense visual representations remains a fundamental challenge. Most existing approaches require policies to learn cross-modal correspondences implicitly from limited demonstrations, without leveraging geometric priors. As a result, they are often data-inefficient and generalize poorly when visual observations are degraded. To address this limitation, we propose a framework that explicitly grounds physical contacts in the image domain. Using robot forward kinematics and camera calibration, we project tactile sensor locations directly onto the RGB image plane. We then render force-modulated Gaussian saliency maps to model spatial uncertainty arising from kinematic and calibration errors. By integrating these 2D spatial anchors through a zero-initialized conditioning architecture, our method injects physical contact priors into standard visual backbones while preserving pre-trained visual representations. We evaluate our method on six dexterous manipulation tasks in both simulation and the real world under severe visual occlusions. Real-world experiments show that explicit RGB-S grounding in the image domain improves real-world occluded manipulation success rates by $26.7$ percentage points over the strongest implicit visuo-tactile baseline, suggesting its improved spatial reasoning and robustness to occlusion. Project page: touch-as-saliency.github.io

07.
arXiv (CS.AI) 2026-06-18

MIDS: Detecting Stealthy Masquerade and Tampering Attacks on CAN Bus via Bidirectional Mamba

arXiv:2606.18599v1 Announce Type: cross Abstract: The Controller Area Network (CAN) protocol is the primary communication standard for Electronic Control Units (ECUs) in modern vehicles, but its lack of encryption and authentication exposes it to a range of security threats. Existing intrusion detection systems are largely tuned to fabrication-style attacks (DoS, fuzzing, ID spoofing realised by frame injection), in which detection signals such as per-ID inter-arrival statistics are readily available. We instead address the harder masquerade setting[b37], in which an internal adversary substitutes a legitimate frame in-situ at its original transmission slot, preserving traffic periodicity and rendering traffic-statistic defences ineffective. We propose the Mamba Intrusion Detection System (MIDS), an innovative dual-stream framework that processes CAN identifiers and payloads in parallel and reconstructs their joint temporal semantics through bidirectional selective state-space modelling. To evaluate MIDS, we collected over 100 million CAN frames from a physical Tesla Model 3 across three driving regimes and synthesised 54 masquerade attack variants spanning ID-only, data-only, and combined modifications. MIDS attains an F1 of 96.94\% on this dataset, exceeding the strongest reproducible baseline by more than 8 percentage points, while sustaining a 1.147~ms single-window inference latency – ample headroom for real-time onboard deployment. To verify generalisation, we further evaluate MIDS on four public benchmarks (ROAD, CrySyS, OTIDS, CT\&T) covering both masquerade and injection scenarios; MIDS attains F1 from 93.70\% to 99.61\%, outperforming the strongest of eight reproduced baselines by up to 13.94 percentage points under a unified 5-fold protocol.

08.
arXiv (quant-ph) 2026-06-11

Towards the implementation of a quantum classifier

arXiv:2606.10150v2 Announce Type: replace Abstract: In this work, we investigate the use of a quantum circuit as a binary classification model in the context of quantum machine learning. We call this model, binary quantum classifier. First, we describe fundamental concepts of quantum computing and introduce the computational tool used: Qibo, an open-source framework for efficient quantum simulations and quantum hardware control. Then, we describe how to design a binary quantum classifier for the classification of images and small arrays of variables by showing how to input data in the circuit, defining a quantum circuit model Ansatz with trainable parameters and a loss function, and implementing multiple minimizers. We test our quantum classifier with two data sets. The first one is the MNIST data set which is composed of handwritten digits (reduced to only handwritten zeros and handwritten ones for binary classification). We study the behavior of different minimizers by increasing the number of layers of the Ansatz. The second data set represents two different high energy collisions that can occur at colliders such as LHC (CERN). Due to in-time proton-proton interactions known as pile-up, we distinguish two different data sets: "without pile-up" and "with pile-up". These collisions can be represented by images of size 32x32 or by six high-level variables that we call features. By increasing the size of the training data set and the number of layers of the Ansatz, we search for the best minimizer. Splitting the data set in training set and test set, we compute: ROC curve, AUC score, confusion matrices and test set accuracy. For "with pile-up" images, we compare the results obtained with the quantum classifier with a small convolutional neural network. We conclude that is possible to build a binary quantum classifier with a quantum circuit and we highlight its performances and limitations in comparison with classical technologies.

09.
arXiv (CS.AI) 2026-06-11

Quality Adaptive Angular Margin Learning for Respiratory Sound Classification

arXiv:2606.11915v1 Announce Type: cross Abstract: We present a quality-adaptive angular-margin learning framework that improves feature generalization by enforcing intra-class compactness and inter-class separability. Our framework, titled QLung, introduces a no-reference audio quality margin derived from spectral entropy and root-mean-square energy, which adaptively scales angular margins based on recording quality. To this end, we propose a log-scaled angular margin that stabilizes training under severe class imbalance. We also use an angular classifier that normalizes features and class weights, ensuring margin penalties are applied consistently on the unit hypersphere. Our approach improves in-distribution performance on the ICBHI dataset by 2.46\% over the cross-entropy baseline, and most significantly, achieves the strongest out-of-distribution performance on the SPRSound dataset compared to prior state-of-the-art methods. Code is available at https://github.com/RSC-Toolkit/QLung.

10.
arXiv (CS.LG) 2026-06-16

IBAD: Interpretable Behavioral Anomaly Detection on Human Mobility Data

arXiv:2606.16023v1 Announce Type: new Abstract: Human mobility appears highly diverse, yet much of a person's daily mobility can be explained by a small set of recurring behavioral templates, such as commuting, school-centered activities, caregiving, nightlife, or errand patterns. We present \texttt{IBAD} (\underline{I}nterpretable \underline{B}ehavioral \underline{A}nomaly \underline{D}etection), a framework that learns interpretable daily mobility templates and represents each individual as a distribution over mixtures of these templates. Rather than focusing on specific locations, IBAD characterizes activities that individuals perform across locations. This approach first discovers global behavioral templates using Latent Dirichlet Allocation (LDA), then employs a hierarchical self-supervised model to learn normal behavior of individuals from their soft behavioral templates. We also introduce a splicing benchmark that creates controlled behavioral mismatches between an individual's historical profile and injected mobility patterns. Experiments on real-world and synthetic datasets show that daily behavior can be effectively decomposed into a small number of interpretable templates. Crucially, we show that the learned behavioral archetypes transfer across distinct geographic and demographic contexts. Furthermore, IBAD maintains a robust competitive performance across all settings. For reproducibility purposes, the code is accessible at ~\href{https://github.com/USC-InfoLab/IBAD}{https://github.com/USC-InfoLab/IBAD}.

11.
arXiv (quant-ph) 2026-06-16

Communication Complexity of Distributed Unitary Synthesis

arXiv:2511.04250v2 Announce Type: replace Abstract: We study space-bounded communication complexity for unitary implementation in distributed quantum processors, where we restrict the number of qubits per processor to ensure practical relevance and technical non-triviality. We model distributed quantum processors using distributed quantum circuits with nonlocal two-qubit gates, defining the distributed communication complexity of a unitary as the minimum number of such nonlocal gates required for its realization, up to permutations of data qubit positions. Our contributions are twofold. First, for general $n$-qubit unitaries, we improve upon the trivial $O(4^n)$ communication bound. Considering $k$ pairwise-connected processors (each with $n/k$ data qubits and $m$ ancillas), we prove the communication complexity satisfies $O\left(\max\{4^{(1-1/k)n - m}, n\}\right)$ – for example, $O(2^n)$ when $m=0$ and $k=2$ – and establish the tightness of this upper bound. We further extend the analysis to approximation models and general network topologies. Second, for special unitaries, we show that both the Quantum Fourier Transform (QFT) and Clifford circuits admit linear upper bounds on communication complexity in the exact model, outperforming the trivial quadratic bounds applicable to these cases. In the approximation model, QFT's communication complexity reduces drastically from linear to logarithmic, while Clifford circuits retain a linear lower bound. These results offer fundamental insights for optimizing communication in distributed quantum unitary implementation, advancing the feasibility of large-scale DQC systems.

12.
arXiv (CS.AI) 2026-06-12

LLM-as-an-Investigator: Evidence-First Reasoning for Robust Interactive Problem Diagnosis

arXiv:2606.13220v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used as interactive assistants for technical problem solving. However, when users provide incomplete descriptions or plausible but unverified explanations, LLMs may prematurely align with these assumptions and propose solutions before collecting sufficient evidence. We refer to this behavior as user-driven sycophancy: the tendency of an LLM to reinforce a user-provided hypothesis instead of testing alternative explanations. This paper introduces LLM-as-an-Investigator, an evidence-first agentic AI methodology for robust problem diagnosis. The approach is implemented through a Solution Investigator Agent, which estimates the ambiguity of an initial problem description, generates candidate hypotheses, asks targeted clarification questions, and updates hypothesis probabilities after each answer. Rather than producing an immediate response, the agent continues the investigation until the evidence makes one candidate explanation stronger than the alternatives. To evaluate the approach, we build a benchmark from solved technical forum threads in mechanical, electrical, and hydraulic domains. We use a three-agent evaluation pipeline in which a Problem-Solution Extractor Agent converts solved threads into structured cases, a Ground-Truth Evaluator Agent simulates the user while hiding the known solution, and the tested assistant attempts to recover the solution through dialogue. The experiments compare standard assistants, reasoning-oriented LLMs, and the proposed investigator-based model across LLM backbones. In addition to diagnostic accuracy, we analyze how standard assistants follow misleading user hypotheses in diagnostic cases. The results show that the proposed approach identifies the problem more accurately than direct prompting and reasoning-only baselines, while its evidence-first protocol helps reduce user-induced conversational bias.

13.
bioRxiv (Bioinfo) 2026-06-13

ADMETron: An AI-driven SaaS platform for comprehensive ADMET prediction and compound prioritisation

ONTOSIGHT(R) ADMETron is an AI-driven platform designed for rapid prediction and visualization of Absorption, Distribution, Metabolism, Excretion, and Toxicity (ADMET) properties to support modern drug discovery. The platform integrates an interactive web interface with a scalable predictive engine, enabling high-throughput virtual screening and batch analysis of chemical compounds. Its core architecture combines recurrent neural network (RNN)-derived molecular embeddings from SMILES representations with physicochemical descriptors, which are subsequently modeled using gradient boosting machines (GBMs). This framework provides predictions across 34 ADMET endpoints, including physicochemical properties, absorption, CYP450 interactions, hERG liability, and mutagenicity. The predictive performance of ADMETron was evaluated using benchmark datasets from the Therapeutics Data Commons (TDC), demonstrating strong performance and generalizability across both classification and regression tasks. Beyond predictive modeling, the platform introduces an interactive radar graph-based structure-activity relationship (SAR) visualization framework that enables real-time comparison of multiple compounds and reference drugs across selected ADMET parameters. This feature facilitates intuitive interpretation of multidimensional molecular profiles and supports lead optimization and compound prioritization. Comparative assessment against widely used online ADMET tools further demonstrated broad endpoint coverage spanning pharmacokinetic, physicochemical, toxicity, and medicinal chemistry properties within a unified environment. Together, these capabilities establish ADMETron as a comprehensive platform for ADMET assessment and data-driven decision-making in drug discovery. (https://admetron.partex.ai/).

14.
arXiv (CS.CL) 2026-06-17

Smarter edits? Post-editing with error highlights and translation suggestions

As MT quality increases, interest in enhanced post-editing features such as QE-derived error highlights is growing, yet evidence for their usefulness remains limited. In this work, we explore the usefulness of LLM-derived error highlights and correction suggestions based on automatic post-editing (APE). We conduct a study where professional translators (En-Nl) post-edit translations using APE error highlights and correction suggestions and compare productivity, quality and user experience to regular PE and PE with QE-derived highlights. While no condition yielded productivity or quality gains compared to regular PE, APE highlights were better received than QE-derived highlights, and correction suggestions improved overall user experience.

15.
arXiv (CS.AI) 2026-06-11

Reliability-Calibrated Edge-IoT Early Fault Warning for Rotating Machinery with a Physics-Guided Tiny-Mamba Transformer

arXiv:2601.21293v3 Announce Type: replace-cross Abstract: Industrial Internet of Things (IIoT) systems increasingly rely on distributed vibration sensing to support predictive maintenance of rotating machinery. In practical deployments, however, raw signal upload is costly and alarm decisions must be made locally under limited computation, changing operating conditions, and strict nuisance-alarm budgets. This paper presents a reliability-calibrated edge-IoT early-warning framework, in which a compact Physics-Guided Tiny-Mamba Transformer (PG-TMT) acts as the representation module and an extreme value theory (EVT) layer converts streaming anomaly scores into event-level alarm episodes. PG-TMT combines a depthwise-separable convolutional stem, a Tiny-Mamba state-space branch, and a lightweight local Transformer to capture transient, long-horizon, and multichannel degradation cues under batch-size-one inference. To improve auditability, temporal attention is projected to the frequency domain and softly aligned with analytical bearing fault-order bands. EVT calibration, dual-threshold hysteresis, and trimmed-tail fitting provide controllable false-alarm intensity even when healthy calibration data are imperfect. Experiments on CWRU, Paderborn, XJTU-SY, and an industrial pilot demonstrate that the proposed framework improves PR-AUC, reduces detection delay under a controlled nuisance-alarm budget, and remains robust to structured interference, metadata uncertainty, compound fault mixtures, and domain transfer. With a sub-1 MB footprint and Jetson p99 latency below 7 ms, the framework supports calibrated and interpretable early warnings for IIoT predictive maintenance.

16.
arXiv (CS.CV) 2026-06-17

RAIGen: Rare Attribute Identification in Text-to-Image Generative Models

Text-to-image diffusion models achieve impressive generation quality but inherit and amplify training-data biases, skewing coverage of semantic attributes. Prior work addresses this in two ways. Closed-set approaches mitigate biases in predefined fairness categories (e.g., gender, race), assuming socially salient minority attributes are known a priori. Open-set approaches frame the task as bias identification, highlighting majority attributes that dominate outputs. Both overlook a complementary task: uncovering rare or minority features underrepresented in the data distribution (social, cultural, or stylistic) yet still encoded in model representations. We introduce RAIGen, the first framework, to our knowledge, for label-free rare-attribute discovery in diffusion models, requiring no predefined minority categories. RAIGen leverages Matryoshka Sparse Autoencoders and a novel minority metric combining neuron activation frequency with semantic distinctiveness to identify interpretable neurons whose top-activating images reveal underrepresented attributes. Experiments show RAIGen discovers attributes beyond fixed fairness categories in Stable Diffusion, scales to larger models such as SDXL, supports systematic auditing across architectures, and enables targeted amplification of rare attributes during generation. The project page is available at https://vssilpa.github.io/RAIGen_webpage/ .

17.
arXiv (CS.CV) 2026-06-16

Post-Launch Capability Expansion of Vision-Language Models via Prompting for On-Orbit Spacecraft Inspection

Spaceborne inspection systems often deploy perception models prior to launch, after which updating model weights or expanding fixed label sets becomes operationally impractical. While supervised models can be integrated pre-flight, adding new semantic capabilities in orbit requires retraining and re-uploading parameters. We investigate whether prompt-driven vision–language models can enable post-launch semantic expansion, allowing new spacecraft components to be specified via natural-language prompts without modifying onboard weights. We evaluate zero-shot instance segmentation of spacecraft components under a strictly frozen, single-pass inference protocol on a test set of $129$ images of previously unseen satellites. Under fixed global thresholds and no post-processing, SAM3 achieves $0.385$ mAP@$0.5$ and $0.267$ mAP@$0.5{:}0.95$. Performance is strongly scale-dependent: large structural elements like spacecraft bodies ($0.639$ AP@$0.50$) and solar arrays ($0.598$ AP@$0.5$) localize reliably, while relatively small appendages like antennas ($0.221$ AP@$0.5$) and thrusters ($0.081$ AP@$0.5$) remain difficult. Prompt formulation influences performance, with structured prompts incorporating spatial and geometric descriptors yielding up to $82%$ improvement over short category-name prompts. The model operates within the memory and compute envelope of contemporary embedded GPUs, suggesting prompt-driven grounding can provide a practical mechanism for post-launch semantic extension of dominant spacecraft structures while highlighting limitations of zero-shot localization for fine-scale components under orbital domain shift.

18.
arXiv (CS.CL) 2026-06-19

IHUBERT: Vector-Based Semantic Deduplication and Domain-Balanced Pretraining for Persian Resources

Persian pretrained language models (PLMs) are still limited by the scarcity of large-scale, high-quality pretraining corpora and by insufficient evaluation beyond standard classification and NER tasks. We present IHUBERT, a monolingual Persian PLM trained from scratch with the RoBERTa-base encoder (125M parameters) on a 45 GB curated subset of the Sepahr-Danesh collection (about 7-8B tokens). To improve corpus quality and reduce redundancy, we employ a multi-stage preprocessing pipeline that includes normalization, exact and near-duplicate removal, anonymization, and vector-database-based semantic deduplication for distribution balancing control across domains and registers. We additionally train a 139k-vocabulary BPE tokenizer on the full pretraining corpus to better capture Persian morphology and orthographic variation. IHUBERT is evaluated on seven Persian NLU benchmarks covering NER, sentiment analysis, topic classification, NLI, extractive question answering, and relation extraction, using task-standard metrics (entity-level F1, Macro-F1, EM/F1). IHUBERT achieves its strongest gains on extractive QA, ranking first on both PQuAD (F1 88.3542) and ParsiNLU-RC (F1 49.0987), and attains the best result on FarsTail (Macro-F1 0.8350). On NER and topic classification, it remains competitive (e.g., 0.8308 F1 on ParsTwiNER; 0.7953 Macro-F1 on DigiMag), while relation extraction remains the main remaining gap (0.6684 Macro-F1 on PERLEX). A controlled tokenizer ablation on the IHUBERT pretraining corpus shows that BPE yields slightly lower subword fragmentation than WordPiece at matched vocabulary size, supporting our tokenization design. Overall, IHUBERT advances Persian language modeling through semantically curated large-scale pretraining and broad evaluation across both classification and comprehension-oriented tasks.

19.
arXiv (CS.CV) 2026-06-16

Light Forcing: Accelerating Autoregressive Video Diffusion via Sparse Attention

Advanced autoregressive (AR) video generation models have improved visual fidelity and interactivity, but the quadratic complexity of attention remains a primary bottleneck for efficient deployment. While existing sparse attention solutions have shown promise on bidirectional models, we identify that applying these solutions to AR models leads to considerable performance degradation for two reasons: isolated consideration of chunk generation and insufficient utilization of past informative context. Motivated by these observations, we propose \textsc{Light Forcing}, the first sparse attention solution tailored for AR video generation models. It incorporates a Chunk-Aware Growth mechanism to quantitatively estimate the contribution of each chunk, which determines their sparsity allocation. This progressive sparsity increase strategy enables the current chunk to inherit prior knowledge in earlier chunks during generation. Additionally, we introduce a Hierarchical Sparse Attention to capture informative historical and local context in a coarse-to-fine manner. Such two-level mask selection strategy (i.e., frame and block level) can adaptively handle diverse attention patterns. Extensive experiments demonstrate that our method outperforms existing sparse attention in quality (e.g., 84.5 on VBench) and efficiency (e.g., $1.2{\sim}1.3\times$ end-to-end speedup). Combined with other efficient solutions, \textsc{Light Forcing} further achieves a $2.0{\sim}3.0\times$ end-to-end speedup across diverse GPUs (e.g., 27.4\,FPS on RTX 5090 and 33.9\,FPS on H100). Code is released via this \href{https://github.com/chengtao-lv/LightForcing}{link}.

20.
arXiv (CS.LG) 2026-06-18

UST-GNN: A Unified Spatial–Topological Graph Neural Network Framework for Urban Analytics–Demonstrated through a Case Study on Urban Health Prediction

arXiv:2504.04739v3 Announce Type: replace Abstract: Understanding how social, demographic, environmental, and spatial factors jointly shape urban outcomes is essential for sustainable urban development and evidence-based policy. Traditional statistical approaches often struggle to capture complex non-linear relationships, while many machine learning methods overlook the joint roles of spatial autocorrelation and network topology in urban systems. Recent advances in GeoAI have addressed these challenges only partially, often treating spatial effects, graph structure, evaluation, and interpretability separately. We present UST-GNN, a unified spatial–topological graph neural network framework that integrates neighbourhood connectivity, heterogeneous urban features, and positional/locational embeddings into a single representation. Using the MedSAT dataset, which contains over 150 environmental and socio-demographic variables and six prescription outcomes across 4,835 neighbourhoods in Greater London, UST-GNN outperforms strong statistical, geographically enhanced, and graph Machine Learning baselines, improving out-of-sample $R^2$ by 8.4–13.2\% under strict spatial cross-validation. We further introduce a lightweight principal-component module to interpret learned node embeddings geographically and relate them to policy-relevant covariates. The resulting analyses recover established patterns, offer new perspectives on debated associations, and reveal novel predictors warranting further causal investigation. Together, these findings demonstrate the value of graph-based spatial machine learning for urban health analytics, environmental inequality assessment, and evidence-based urban policy. Beyond predictive gains, UST-GNN provides a unified GeoAI analytical pipeline that can be embedded into urban digital twin workflows for scenario testing, monitoring, and data-informed decision-making for healthier, more sustainable cities.

21.
arXiv (CS.CV) 2026-06-16

Open-World Video Segmentation

While video segmentation has advanced rapidly on short clips and closed-set benchmarks, open-world video segmentation remains largely unexplored. The challenge is twofold: (1) existing methods are not designed to support object discovery and identity maintenance in long videos of dynamic ego-motion, and (2) existing evaluation protocols rely on a rigid 1:1 matching that unfairly penalizes semantically valid predictions with mismatched granularity. To address both gaps, we introduce Savvy, a practical and strong system for zero-shot open-world long-horizon video segmentation. Savvy combines hierarchical mask discovery, deferred admission, and track consolidation to support persistent object discovery, safe track promotion, and stable long-range identity maintenance. We further propose OGA, a granularity-aware evaluation suite for open-world video segmentation. Built on a Granularity-Agnostic (GA) matching protocol, OGA relaxes conventional 1:1 matching to an n:1 mapping, but still enforces temporal rigor by detecting support discontinuities through sever points and scoring each reference object through its dominant coherent fragment. This prevents fragmented or flickering support from being over-rewarded while enabling GA-adapted metrics and structural diagnostics: identity persistence (IP), and identity concentration (IC). On VIPSeg, we show that standard 1:1 evaluation substantially underestimates open-world methods, whereas GA evaluation recovers much of their suppressed performance. On the more realistic long-horizon benchmarks: ScanNet and HM3D, Savvy consistently outperforms strong baselines across both classical and proposed metrics, including STQ, VPQ$_\infty$, IP and IC. Together, these results establish a practical benchmark and a strong baseline for open-world long-horizon video segmentation.

22.
arXiv (quant-ph) 2026-06-11

Measurement-Free Toric-Code Memory in Array Globally Controlled Rydberg Array

arXiv:2606.12030v1 Announce Type: new Abstract: The central prerequisite of any fault-tolerant quantum architecture is a quantum memory: a block of encoded physical qubits whose logical state is actively preserved against noise across many rounds of error correction. In neutral-atom Rydberg arrays, realizing such a memory is obstructed not by the entangling gates themselves, which are already fast and high-fidelity, but by the auxiliary operations that a conventional error-correction cycle requires: mid-circuit fluorescence measurement, inter-zone atom transport, and locally focused single-qubit addressing. Each of these introduces latency, atom loss, or optical crosstalk that exceeds the cost of the underlying gates by orders of magnitude. These costs accumulate cycle after cycle, progressively degrading the very logical information the code is meant to protect. Here we propose a protocol that stabilizes a toric-code quantum memory without moving, measuring or local addressing atoms. The key is to use a three-species Rydberg atom array for the complete stabilizer cycle, including syndrome extraction, coherent correction, and ancilla reset, under global, species-selective laser pulses. Numerical simulation of a $4 \times 4$ rotated toric code shows a longer qubit lifetime when the physical error rate is below a pseudo-threshold $p^\star \approx 0.034$. The scheme offers a concrete, hardware-efficient route to topological quantum memory in neutral-atom platforms.

23.
arXiv (quant-ph) 2026-06-16

Interaction and non-Hermiticity controlled transmission in extended Su-Schrieffer-Heeger models

arXiv:2606.15245v1 Announce Type: cross Abstract: We study the transport characteristics of an extended version of the Su-Schrieffer-Heeger (SSH) model with next-nearest-neighbor (NNN) interactions and non-Hermitian onsite energies. We observed that transport in such a system is significantly modified by the NNN interaction and the non-Hermitian terms. The transmission coefficient exhibits oscillatory behavior as the strength of the NNN interaction varies in a fixed-length chain. Moreover, the transmission coefficient also shows oscillation with system size for a fixed strength of the NNN interaction. We find that novel oscillatory behavior of the transmission coefficient, arising form the NNN interaction, is a unique feature of such a model and has not been reported previously. The presence of the non-Hermitian terms also enhances/reduces the transmission coefficient depending on the values of the other system parameters like intra-, inter- and NNN hopping. It appears from our study that both the NNN interaction and the non-Hermiticity introduce significant changes in the transport properties of the extended SSH chain, which are not observed in the standard Hermitian nearest-neighbour variant of the SSH model.

24.
arXiv (CS.AI) 2026-06-11

AI Researchers Must Help Lead Arms Control to Mitigate Military AI Risks

arXiv:2606.11533v1 Announce Type: cross Abstract: The advancement of AI capabilities compels researchers and the public to be more aware of its potential worldwide impact. A pressing near-term concern is the regulation of military AI applications. Armament manufacturers and defense contractors are increasingly investing in AI capabilities and forging partnerships with AI companies, creating a burgeoning coalition that demands military leaders, arms control diplomacy experts, and AI researchers collaborate to ensure a safer future. While AI researchers often focus on the long-term implications of superintelligent AI, this approach may not adequately address the immediate challenges posed by AI in military applications. Success requires acknowledging and mitigating the emerging risks of frontier AI models that plan to be integrated into defense applications, like military AI systems. Arms control has reduced past catastrophic risks, so lessons learned from nuclear deterrence can guide AI safety and security research towards innovations in verification and diplomacy. AI researchers, however, must assist in leading the technical research that clearly defines and alleviates instability in military settings. Given these new responsibilities and the lack of sufficiently reliable solutions, we argue that AI researchers must take a leading role in advancing arms control research to minimize risk in military AI applications.

25.
arXiv (CS.AI) 2026-06-16

From Agent Traces to Trust: A Survey of Evidence Tracing and Execution Provenance in LLM Agents

arXiv:2606.04990v2 Announce Type: replace-cross Abstract: Large language model (LLM)-based agents are evolving from passive text generators into autonomous systems capable of planning, tool use, retrieval, memory access, environmental interaction, and multi-agent collaboration. These capabilities expand agent autonomy, but also make agent behavior harder to verify, debug, and audit. Final-answer accuracy alone cannot explain how an output was produced, which evidence supported each claim, whether tool calls were justified, how memory influenced later decisions, or where failures originated. This survey examines evidence tracing and execution provenance as foundations for process-level accountability in trustworthy LLM agents. We define execution provenance as the typed graph of an agent execution and evidence tracing as its projection onto evidence-support relations. This perspective connects retrieval grounding, claim support, tool-use safety, memory lineage, observability, debugging, audit, and recovery within a unified framework. We introduce a taxonomy covering trace sources, evidence and execution units, provenance relations, tracing granularity and timing, representation forms, and trust functions. We then review key methodological directions, including provenance representation, evidence attribution, tool-use provenance, runtime guardrails, provenance-bearing memory, observability, and failure diagnosis. Finally, we discuss benchmarks, datasets, metrics, and open challenges for building provenance-aware, auditable, and recoverable agent systems.