Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-11

AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents

Unmanned aerial vehicles (UAVs) are increasingly used in inspection, search and rescue, environmental monitoring, and emergency response. However, most UAV applications still rely on pre-defined command sequences or task-specific pipelines, where developers manually connect perception, planning, flight control, simulation, logging, and safety modules. This limits the flexibility, reproducibility, and extensibility of autonomous aerial systems. This paper presents AerialClaw, an open-source software framework that enables UAVs to operate as decision-making aerial agents rather than merely command-following platforms. Given a natural-language mission, AerialClaw allows an LLM-based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop. The framework adopts a modular brain-skill-runtime architecture, combining hard skills for atomic UAV operations, Markdown-based soft skills for reusable task strategies, document-driven agent state and capability boundaries, memory-driven reflection, safety-oriented runtime validation, and platform-agnostic execution adapters. AerialClaw supports lightweight mock execution, PX4 SITL with Gazebo, and AirSim-based simulation, together with a web console, pluggable model backends, example missions, simulation assets, and staged deployment scripts. By combining standardized aerial skills, document-driven agent state, memory, and closed-loop LLM decision-making, AerialClaw provides a reproducible and extensible open-source framework for building UAV systems that can interpret missions, make decisions, execute skills, and adapt their behavior from feedback.

02.
arXiv (CS.AI) 2026-06-15

DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation

arXiv:2506.14202v4 Announce Type: replace-cross Abstract: End-to-end backpropagation requires storing activations throughout all layers, creating memory bottlenecks that limit model scalability. Existing block-wise training methods offer means to alleviate this problem, but they rely on ad-hoc local objectives and remain largely unexplored beyond classification tasks. We propose $DiffusionBlocks$, a principled framework for transforming transformer-based networks into genuinely independent trainable blocks that maintain competitive performance with end-to-end training. Our key insight leverages the fact that residual connections naturally correspond to updates in a dynamical system. With minimal modifications to this system, we can convert the updates to those of a denoising process, where each block can be learned independently by leveraging the score matching objective. This independence enables training with gradients for only one block at a time, thereby reducing memory requirements in proportion to the number of blocks. Our experiments on a range of transformer architectures (vision, diffusion, autoregressive, recurrent-depth, and masked diffusion) demonstrate that DiffusionBlocks training matches the performance of end-to-end training while enabling scalable block-wise training on practical tasks beyond small-scale classification. DiffusionBlocks provides a theoretically grounded approach that successfully scales to modern generative tasks across diverse architectures. Code is available at https://github.com/SakanaAI/DiffusionBlocks .

03.
medRxiv (Medicine) 2026-06-12

Crimean-Congo haemorrhagic fever virus transmission: exploring perceptions of human-animal-tick interactions across six districts in Uganda

Crimean-Congo haemorrhagic fever virus (CCHFV) causes a viral zoonotic disease transmitted through tick bites and direct contact with infected blood or tissue of infected animals. Socio-ecological and behavioural risk factors for CCHFV exposure in Uganda remain poorly understood, which can lead to the omission of key risk factors in quantitative survey design and limit our wider understanding. In this study, we explored human-animal-tick interaction transmission risks in Uganda. We conducted 24 focus group discussions (FGDs) and 31 key-informant interviews (KIIs) across six environmentally and socio-ecologically diverse districts, between October 2023 and March 2024. Study sites were selected using K-prototype analysis, which combined environmental and socio-ecological variables to identify distinct clusters within Uganda. FGDs were conducted separately with groups of community leaders, men, women and teenagers with stratified purposive sampling. Medical doctors, veterinarians, traditional healers, district surveillance officers, and herdsmen were individually interviewed as key informants and purposively sampled. Data were transcribed and translated into English, and analysed thematically using iterative categorisation in NVivo 14. Most participants reported tick bites, some as frequently as every day. Close contact with animals was common, including sleeping next to them in the same building, largely due to concerns about animal theft. Less frequent but notable practices included slaughtering animals for consumption or sacrifice and interactions with wild animals during hunting. Slaughtering and butchering an animal which was sick or had died was reportedly performed by participants in most districts. Plucking and roasting engorged ticks was a practice described in the Kaabong and Arua districts of Northern Uganda. These practices and behaviours highlight potential key risks of CCHFV transmission and underscore the need for future studies to address specific behaviours, to quantify if, and to what extent, they present an exposure risk. Further work should include underlying reasons for the behaviours, which would help ensure that culturally appropriate interventions are targeted.

04.
arXiv (CS.CV) 2026-06-16

Effective and Low-cost Lane-based Map Localization for Vehicle-Centric Route Generation

Driver-centric route representation plays a vital role in intuitive driving guidance systems. This paper presents OLRA, a low-cost, map-localization-based framework that derives driver-view-aligned routes by matching map-based navigation routes with camera-detected lane markings. This alignment process mutually enhances vehicle localization accuracy and visual route consistency. To bridge the evaluation gap across different paradigms, we introduce practical route evaluation metrics and benchmark OLRA against OpenPilot, a representative direct-generation approach. Experimental results on the nuScenes dataset demonstrate that OLRA outperforms OpenPilot in complex road segments and in route estimation at distance beyond 20 meters, achieving lower overall Euclidean error. This study is expected to promote future research in low-cost, maplocalization-based route generation methods.

05.
arXiv (CS.CL) 2026-06-18

JetFlow: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

Speculative decoding (SD) accelerates autoregressive Large Language Models (LLMs) by drafting multiple tokens and verifying them in parallel, but it faces a scaling limitation: increasing the draft budget improves speed only when acceptance remains high and drafting overhead stays low. This ceiling has been difficult to break because prior head-based SD methods face a causality-efficiency dilemma. Autoregressive drafters produce path-conditioned candidates that are effective for tree speculative decoding with higher acceptance length, but their drafting cost grows with tree depth. Bidirectional block-diffusion drafters generate all positions in one pass, but their branch-agnostic marginals can form individually plausible yet mutually inconsistent trees, wasting budget and reducing acceptance. We propose JetFlow, a head-based SD framework that combines one-forward drafting efficiency with branch-wise causal conditioning. JetFlow trains a causal parallel draft head over fused hidden states from the frozen target model, producing candidate trees whose scores align with the target model's autoregressive factorization. This enables JetFlow to convert larger draft budgets into longer accepted prefixes and higher end-to-end speedup. Across math, coding, and chat benchmarks on dense and MoE Qwen3 models, JetFlow consistently outperforms bidirectional-head and tree-based SD baselines. On H100 GPUs, JetFlow achieves up to 9.64x speedup on MATH-500 and 4.58x on open-ended conversational workloads, with further latency gains demonstrated through vLLM integration under realistic serving loads. Our code and models are available at https://github.com/hao-ai-lab/JetFlow.

06.
bioRxiv (Bioinfo) 2026-06-11

GeroEngine: Generative single-cell aging trajectories reveal a bidirectionally traversable identity core and direction-specific inflammatory remodeling

作者:

Single-cell RNA sequencing (scRNA-seq) maps aging tissues at high resolution but is destructive, preventing longitudinal tracking; dropout and zero-inflation artifacts, amplified by shift-invariant linear simulations, confound age-associated variability. We developed GeroEngine, a technical-artifact-aware framework combining VAE-based trajectory simulation, LOPO cross-validation, linear baselines, reverse traversal, and reverse-directed network inference. In microglia and HSCs, the VAE reduced technical-artifact carryover while preserving trajectory heterogeneity and improving alignment to artifact-reduced reference manifolds. Consensus GeroTargets and GeroRegulators defined tissue-specific GeroNetworks organized into three pillars: lineage/replication identity collapse, a sex-dimorphic endocrine/stress core, and inflammatory remodeling. Forward and reverse simulations aligned to the common young[->]old aging axis revealed a sign-coherent, direction-specific program: identity/replication targets were bidirectionally recovered, whereas MHC/NF-{kappa}B inflammatory programs were preferentially forward-recovered. These results support identity collapse as a deep traversable core of aging and nominate upstream homeostatic restoration over downstream inflammatory suppression.

07.
medRxiv (Medicine) 2026-06-22

Knowledge, Attitudes, and Practices Regarding Maternal Nutrition Counselling Among Frontline Health Workers in Udupi, Karnataka, India: A Sequential Explanatory Mixed-Methods Study

Background Indias maternal nutrition profile is undergoing a dual-direction shift, with persistent undernutrition coexisting alongside rising overweight and micronutrient deficiencies. Despite national efforts through Integrated Child Development Services (ICDS) and the National Health Mission (NHM), maternal dietary diversity remains suboptimal in India. Frontline health workers (FLWs) play a central role in delivering nutrition counselling; however, gaps remain between knowledge and its translation into practice, highlighting the need to strengthen training, applied competencies, and health system support within primary care settings. Objective To assess knowledge, attitudes, and practices (KAP) regarding maternal nutrition counselling among FLWs and to explore contextual factors influencing counselling delivery. Methods A sequential explanatory mixed-methods study was conducted in Udupi, Karnataka, India. In phase one, 46 FLWs- Accredited Social Health Activists (ASHA), Community Health Officers (CHO), and Primary Health Care Officers (PHCO) completed a validated Knowledge, Attitudes, and Practices (KAP) questionnaire. Data were analysed using descriptive statistics, Kruskal-Wallis test, Spearman correlation, and exploratory multiple linear regression. In phase two, one focus group discussion with 21 participants was conducted and analysed using reflexive thematic analysis. Results FLWs demonstrated moderate KAP scores (37.50 {+/-} 5.09), with lower scores observed in dietary diversity knowledge and counselling practices. CHOs and PHCOs had significantly higher knowledge (p < 0.001) and practice scores (p = 0.002) compared to ASHAs, while attitudes were similar across cadres. Knowledge was positively associated with practice ({rho} = 0.389, p = 0.008). Exploratory regression indicated that cadre and knowledge were associated with practice, while attitude was not statistically significant. Qualitative findings suggested that counselling was largely protocol-based and constrained by workload, limited counselling tools, economic barriers, and cultural food practices. Conclusion Despite positive attitudes towards maternal nutrition counselling, frontline health workers demonstrated gaps in knowledge and counselling practices. Mixed-methods findings suggest that counselling delivery is shaped by both provider competencies and health-system constraints, highlighting the need for implementation-focused strategies to strengthen maternal nutrition counselling in routine antenatal care.

08.
arXiv (CS.CL) 2026-06-11

Gumbel-BEARD: Automatic Layer Selection for Self-Supervised Adaptation of Whisper in Low-Resource Domains

Speech foundation models often struggle in low-resource domains due to domain mismatch and data scarcity. We propose Gumbel-BEARD, a domain adaptation framework that automates Whisper encoder layer selection via an end-to-end trainable hard Gumbel-Softmax selector. It enables self-supervised adaptation with a BEST-RQ objective that dynamically adapts to target acoustic characteristics without manual tuning. Experiments on the MyST child speech corpus demonstrate efficiency and scalability: with 10 h of labeled data for fine-tuning, our method matches a fully supervised baseline trained on the complete 133 h labeled set. We establish new state-of-the-art word error rates (WERs) of 8.21% using Whisper-medium on MyST and 11.06% using Whisper-small on the OGI Spontaneous dataset. Evaluation on CORAAL further confirms robustness to adult dialectal domain shifts, with up to 6% relative WER reduction, highlighting the generalizability of our approach to diverse low-resource conditions.

09.
arXiv (CS.CV) 2026-06-16

A Human-in-the-Loop Label Error Detection Framework Applied to Arabic-Script HTR Datasets

Despite recent advances, Handwritten Text Recognition (HTR) for Arabic-script languages still lags behind Latin-script HTR. Part of the problem is dataset quality. To help closing this gap, we propose a two-stage framework (CER-HV) for detecting label errors. Stage 1 (CER) is a Character-Error-Rate-based noise detector built on a Convolutional Recurrent Neural Network (CRNN) architecture. Stage 2 (HV) is the Human-In-The-Loop (HITL) Verification of noisy samples detected by the first stage. Applying the CER-HV framework on multiple Arabic-script datasets can identify samples with label errors including transcription, segmentation, orientation, and non-text content errors that can markedly affect HTR performance. These errors were identified by the first stage of the framework with up to 90percent (top-50) precision. We also show that our CRNN achieves state-of-the-art performance across five of the six evaluated datasets, reaching 8.46 percent Character Error Rate (CER) on KHATT (Arabic), 8.22 percent on PHTI (Pashto), 10.59 percent on Ajami, and 10.11% on Muharaf (Arabic), all without any data cleaning. We establish a new baseline of 11.3 percent CER on the PHTD (Persian) dataset. Applying CER-HV improves evaluation CER by up to 1.8 percentage points after dataset cleaning and retraining. Although our experiments focus on documents written in an Arabic-script language, the framework is general and can be applied to other text recognition datasets

10.
arXiv (CS.CV) 2026-06-12

TetherCache: Stabilizing Autoregressive Long-Form Video Generation with Gated Recall and Trusted Alignment

Autoregressive video diffusion models provide a natural formulation for streaming and variable-length video generation by conditioning newly generated frames on previously generated content. However, extending these models to minute-level generation remains challenging: the limited KV-cache budget prevents the model from retaining the full history, while repeatedly conditioning on self-generated frames induces a context distribution shift that accumulates over time, leading to visual artifacts, quality degradation, and temporal drift. In this paper, we propose TetherCache, a training-free and plug-and-play cache management strategy for drift-resistant long video generation. TetherCache organizes the cache into sink, memory, and recent regions, and introduces two complementary mechanisms. First, GRAB (Gated Recall with Attention-Diversity Balancing) selects long-range memory frames using a gated score that combines attention-based relevance with temporal diversity, preserving informative yet diverse historical context under a fixed cache budget. Second, TAME (Trusted Alignment via Memory Editing) lightly edits newly recalled memory tokens by aligning their statistics to a trusted context distribution, reducing the pollution caused by drifted historical features. Built on Self-Forcing, TetherCache consistently improves long-video generation quality on VBench-Long across 30s, 60s, and 240s settings. In particular, for 240s generation, it substantially improves overall and semantic scores while reducing quality drift from 7.84 to 1.33, demonstrating its effectiveness for stable long-horizon autoregressive video diffusion.

11.
medRxiv (Medicine) 2026-06-17

Trends in Suicide Mortality by Method among US Individuals aged 10-24 Years from 1999 to 2024

Background: Suicide is the second leading cause of death in US adolescents aged 10-24. Method use strongly influences lethality and design of prevention strategies, but recent trends remain unclear. We therefore aimed to investigate trends in suicide mortality rates by method, age group, and sex. Methods: This cross-sectional study used suicide mortality data from the National Center for Health Statistics for a quarter-century period, between 1999 and 2024. All individuals aged 10-24 years at the time of death, with suicide as the underlying cause, were included. We estimated suicide mortality rates (i.e., the number of suicide deaths per 100,000 people) and annual percent change by method (firearm, asphyxiation, poisoning, other), age group (10-14, 15-19, 20-24), and sex. Changing trend time points were determined using Joinpoint regression models Results: From 1999 to 2024, 159,241 suicide deaths occurred among individuals aged 10-24. While suicide rates declined across all age groups between 2017 and 2024, the male-to-female gap narrowed by 18.9%. Among 10-14-year-olds, declining rates among males masked a consistent increase in female suicide rates since 2011. Although asphyxiation-related suicides decreased across all groups since 2018, firearm suicide rates increased for females in the 10-14 and 20-24 age groups. Albeit not as common as firearms or asphyxiation, poisoning suicide rates increased in the 15-19 and 20-24 age groups. Since 1999, suicide rates by other less common methods (e.g., jumping) showed significant increases, for both sexes, especially among individuals aged 20-24. Suicide rates were consistently highest in the 20-24 age group across all study years. Conclusion: The decrease in suicide mortality rates among individuals aged 10-24 was largely driven by declines in males and reductions in asphyxiation-related suicides. However, increasing female suicide rates in the 10-14 age group, as well as increasing rates of death by less common means, warrant close attention. While suicide prevention efforts like structural interventions and means restriction have shown effectiveness among male adolescents, priority should now be given to adapting these approaches for female adolescents, particularly those aged 10-14.

12.
arXiv (quant-ph) 2026-06-11

Strong-field control of the $Z$-boson resonance in $e^+e^-$ collisions

arXiv:2606.09394v2 Announce Type: replace-cross Abstract: Resonant $Z$-boson production is a cornerstone of precision electroweak physics, with its vacuum line shape set by the $Z$ mass, width, and collision kinematics. We show that a strong laser field can significantly alter this picture. By treating the field nonperturbatively, we find that laser dressing of the incoming fermions alters the effective collision kinematics and opens laser-photon exchange channels, including multiphoton processes, in $e^{+}e^{-}$ collisions. As a result, the $Z$-resonance profile develops distinct intensity-dependent regimes, evolving from the vacuum limit to saturation at intermediate field strengths and to an approximately quadratic enhancement at higher intensities. Additionally, the polarization composition of the produced $Z$ bosons is redistributed. In particular, at high intensities the laser-induced contribution can compensate the intrinsic chiral asymmetry of the electroweak interaction, leading to nearly parity-balanced $Z$-boson production. Our results identify that strong classical fields can dynamically control electroweak resonance phenomena, opening a bridge between strong-field QED and high-energy collider physics.

13.
arXiv (CS.AI) 2026-06-24

GUI vs. CLI: Execution Bottlenecks in Screen-Only and Skill-Mediated Computer-Use Agents

arXiv:2606.24551v1 Announce Type: new Abstract: Computer-use agents can execute software tasks through either graphical interfaces or programmatic command interfaces, but existing evaluations confound interaction modality with differences in tasks, initial states, verifiers, and permitted actions. We introduce a matched execution-layer benchmark of 440 desktop tasks across 18 applications and 12 workflow categories, where screen-only GUI agents and skill-mediated CLI agents receive identical goals, states, and final-state verifiers while being restricted to modality-native actions. In this controlled setting, the strongest GUI agent reaches a 59.1% full pass rate, outperforming the strongest original-skill CLI agent at 48.2%; however, verifier-guided skill augmentation raises CLI success to 69.3%, showing that much of the CLI deficit comes from incomplete skill coverage rather than model capability alone. These results suggest that GUI and CLI expose different execution bottlenecks: GUI agents are limited by reliable grounded interaction over long-horizon workflows, whereas CLI agents are limited by the coverage and scalability of their skill interfaces.

14.
arXiv (CS.CV) 2026-06-18

From Bounding Boxes to Visual Reasoning: An On-Policy Data Annotation Tool for Vision-Language Models

Vision-language models (VLMs) are rapidly advancing toward sophisticated grounded structured visual reasoning. Training models for such advanced capabilities demands a new genre of data that seamlessly unifies spatial coordinates, open-vocabulary descriptions, structured attributes, and topological relationships into a singular representation. However, existing data annotation tools fundamentally fail to meet these intricate demands, suffering from three systematic bottlenecks: limited expressiveness, severe annotation-training decoupling, and poor data reusability. To bridge this infrastructure gap, we introduce an open-source annotation tool, ScreenAnnotator. First, we define a unified annotation atom schema that binds spatial, semantic, and structural primitives into a single unit. Second, we implement an on-policy annotation loop embedded with a Bayesian Annotation Verifier (BAV). Finally, we design a template-driven multi-task data synthesis process dynamically transforms static atoms into diverse multi-dimensional reasoning tasks, eliminating redundant re-annotation. The on-policy loop drives the annotation accept rate to nearly 100% on flowcharts and 77% on GUI screenshots, while steadily reducing per-image annotation time as labeled data accumulate. In the flowchart scenario, fine-tuning a VLM yields 76.1% average accuracy, which is a 35.1% point absolute gain. Our code is available at: https://github.com/WnQinm/Annotator.

15.
arXiv (quant-ph) 2026-06-15

Spin-orbit coupling by design in quantum state engineering of atomically defined quantum dots

arXiv:2606.14487v1 Announce Type: cross Abstract: Tuning spin-orbit coupling is essential in controlling both spin and charge in confined semiconductor nanostructures, yet it is rarely a truly controllable parameter. Here, we show control over the spin-orbit Hamiltonian in quantum dots and the resulting quantum states by tailoring the confinement potential with atomic-scale precision. Using scanning tunnelling microscopy and spectroscopy, we pattern individual Cs ions into designer quantum dot structures on the surface of indium antimonide, in which electrons from a two-dimensional electron gas are confined with chosen in-plane electric-field gradients. We then quantify the atomic level structure, both spatially resolving the orbital character of the electronic states and their magnetic-field evolution. We demonstrate that the level structure, including the induced zero-field splitting, can be tailored by the designed geometry of the local electric fields. These effects can be described using a Hamiltonian that allows consistent treatment of the confinement-induced spin-orbit coupling beyond the conventional Bychkov-Rashba description. This Hamiltonian is derived from a multiband k.p model and takes the energy dependence of the relevant physical parameters into account. Such precise control of spin-orbit coupling in semiconductor quantum dots is relevant to quantum and spintronic technologies.

16.
arXiv (CS.CL) 2026-06-17

When English Isn't the Best Teacher: Source Language Effects in Cross-Lingual In-Context Learning

Cross-lingual transfer in multilingual NLP has been widely explored in supervised fine-tuning contexts, where factors like data availability and linguistic similarity largely determine transfer quality. As the field shifts toward few-shot In-Context Learning (ICL), it is often presumed that insights from fine-tuning carry over unchanged. Yet this assumption has not been rigorously evaluated, leaving open the question of how to choose source languages for cross-lingual ICL. We conduct a broad empirical study of cross-lingual transfer in ICL spanning seven tasks, six models, and a typologically diverse set of languages. We further analyze language confusion, a key obstacle for generative tasks in cross-lingual ICL. Our results show that conventional fine-tuning-based expectations do not consistently apply in the ICL regime and point to alternative heuristics for selecting source languages effectively.

17.
arXiv (CS.LG) 2026-06-15

Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

arXiv:2606.14187v1 Announce Type: new Abstract: Large-scale neural network training increasingly relies on matrix-aware optimizers that exploit the structure of weight parameters beyond element-wise adaptation. However, existing matrix-aware methods such as Muon have an underappreciated vulnerability: their core operation, Newton-Schulz iteration, depends critically on input conditioning, yet the raw momentum matrices exhibit severe coordinate-wise scale heterogeneity. In this paper, we first verify this scale heterogeneity through a chi-square uniformity test, showing that intra-matrix scale imbalance is prevalent across Transformer layers and that coordinate whitening effectively corrects it. Motivated by this finding, we propose Zeta, a dual whitening optimizer that applies coordinate whitening and spectral whitening in a strictly ordered pipeline. The ordering is not a tunable choice but follows from a mathematical dependency: coordinate whitening establishes the statistical isotropy that spectral whitening requires to function reliably. We further prove that this dual pipeline strictly reduces orthogonalization error relative to pure spectral methods by improving the condition number of the input. Empirically, Zeta matches or surpasses strong baselines across language modeling (0.6B to 8B parameters), mixture-of-experts architectures, and vision tasks, demonstrating that resolving scale imbalance before orthogonalization leads to faster convergence and better generalization. Code is available at https://gitcode.com/kevin259/MindSpeed.

18.
arXiv (CS.AI) 2026-06-16

The Reservoir Attention Network: Cross-Pass State in Pretrained Transformers via Content-Addressable Reservoir Injection

arXiv:2606.15678v1 Announce Type: cross Abstract: A feasibility and dynamics study of the Reservoir Attention Network (RAN), an architecture that injects a fixed, randomly-initialized reservoir into the mid-layer attention of a pretrained transformer to carry state across forward passes. Experiments span GPT-2 (124M, 355M) to Qwen2.5 (0.5B, 1.5B) on a single consumer GPU. The tasks are minimal probes chosen to isolate individual mechanisms; the broader always-alive agent vision is treated throughout as compute-limited future work, not a claim of this paper. The reservoir is left untrained (fixed random) by design: this isolates whether untrained recurrent dynamics alone suffice to carry usable cross-pass state, leaving trained recurrence as a complementary, more expensive direction.

19.
arXiv (CS.AI) 2026-06-18

Enhancing Fatigue Detection through Heterogeneous Multi-Source Data Integration and Cross-Domain Modality Imputation

arXiv:2507.16859v5 Announce Type: replace-cross Abstract: Fatigue detection for human operators is important in safety-related applications such as aviation, mining, and long-haul transport. Reliable estimation of operator fatigue can support timely warnings, adaptive task scheduling, takeover reminders, and other safety-management decisions in human-machine systems. However, the effectiveness of these functions depends on whether fatigue-related signals can be reliably captured in the deployment environment. While many studies have shown the value of high-fidelity sensors in controlled laboratory environments, their performance often degrades when used in real-world settings because of noise, lighting conditions, and field-of-view constraints, thereby limiting their practical use. This paper formalizes a deployment-oriented setting for real-world fatigue detection, where high-quality sensors are often unavailable in practical applications. To address this issue, we use knowledge from heterogeneous source domains, including high-fidelity sensors that are difficult to deploy in the field but commonly used in controlled environments, to assist fatigue detection in the real-world target domain. Based on this idea, we design a heterogeneous and multi-source fatigue-detection framework that uses the available modalities in the target domain while leveraging diverse configurations in the source domains through cross-domain modality imputation based on shared modalities.

20.
arXiv (CS.CV) 2026-06-11

On Aligning Hierarchical Standardized Embedding for Audio-visual Generalized Zero-shot Learning

Audio-visual Generalized Zero-shot Learning (AV-GZSL) is a challenging task that aims to classify both seen and unseen objects or scenes by integrating data from audio and visual modalities. Recent studies primarily focus on fusing or aligning audio and visual features to generate more informative audio-visual embeddings. Also, aligning the audio-visual and textual features of most existing methods relies solely on the optimization objectives. However, those methods neglect the inherent distributional and structural differences between audio-visual and textual modalities. To address this limitation, we propose a method termed Aligning Hierarchical Standardized Embedding (AHSE), which enables hierarchical alignment of standardized audio-visual and textual embeddings within a shared embedding space. Specifically, we first apply Z-score standardization to the fused audio-visual and textual embeddings to reduce distributional mismatches. We then introduce a hierarchical alignment strategy that minimizes discrepancies at the semantic, class, and batch levels, thereby constructing a more robust and well-structured embedding space. This strategy not only preserves semantic and inter-class relationships but also maintains spatial consistency within each batch. Extensive experiments on three benchmark datasets: VGGSound-GZSL, UCF-GZSL, and ActivityNet-GZSL, demonstrate that AHSE achieves competitive performance in zero-shot learning.

21.
arXiv (quant-ph) 2026-06-16

Rigorous extension of semilocal collinear functionals to noncollinear DFT using $SU(2)$ rotations

arXiv:2605.31203v2 Announce Type: replace-cross Abstract: In the presence of spin-orbit coupling and in geometrically frustrated materials, a noncollinear treatment the magnetization density is essential. However, in density functional theory most exchange–correlation functional approximations were originally developed for locally collinear magnetization. Many practical approaches to noncollinear DFT have emerged over the past decade. However, a first-principles connection between widely used semilocal collinear functionals and their noncollinear generalizations remains lacking. In this work, a locally exact relation between collinear and noncollinear exchange–correlation functionals is derived at the level of gradient expansions within a $u(2)$ matrix representation of the energy functional. Within this framework, collinear semilocal variables naturally acquire distinct dependencies on transverse and longitudinal magnetization gradient components. The widely used Scalmani–Frisch scheme emerges as a first-order approximation. The transformation of collinear functional derivatives to noncollinear space is implemented through numerically robust $SU(2)$ rotations. A consistent description of local magnetic torques is demonstrated for the prototypical spin-frustrated Cr$_3$ cluster. The approach further extends to fully nonlocal functionals and provides a direct route towards numerically stable relativistic response calculations. The influence on magnetic properties in presence of spin-orbit coupling is illustrated through calculations of hyperfine couplings in the high-spin ground states of uranium and the uranium ion.

22.
arXiv (CS.AI) 2026-06-15

Hyperdimensional computing for structured querying on tabular data embeddings

arXiv:2606.13871v1 Announce Type: new Abstract: Tabular data embeddings have become a cornerstone of data profiling and data integration pipelines, enabling tasks such as entity annotation and resolution; schema matching; column type detection; and table search, among others. Existing approaches embed rows, columns, or entire tables into a vector space and rely on nearest-neighbor search to retrieve candidate matches. A fundamental limitation of current embedding methods is the lack of interpretable similarity scores: the concrete similarity value between a query and its nearest neighbour carries no intrinsic meaning, making it impossible to determine whether that neighbour is a true match or simply the least-dissimilar item in a corpus that contains no valid answer. This inability to set principled thresholds for retrieval undermines practical deployment, particularly for zero-match detection. We investigate the use of HyperDimensional Computing (HDC), specifically the Holographic Reduced Representations (HRR) model, as a framework for tabular row embeddings when the retrieval task corresponds to answering structured select-project queries in vector space. Exploiting the algebraic properties of HDC operations, we derive closed-form expected similarity values for both equality and non-equality retrieval predicates, which converge to interpretable values as dimensionality increases, and use these to identify suitable retrieval thresholds. We evaluate HDC against EmbDI, a graph-based baseline, on two real-world datasets across varying table sizes and predicate lengths. Our results show that HDC matches or outperforms EmbDI for row retrieval across all configurations, handles non-equality predicates more robustly, and achieves perfect attribute projection accuracy at sufficient dimensionality – while uniquely enabling reliable identification of zero-match predicates through its principled thresholds.

23.
arXiv (CS.CV) 2026-06-18

DART: A design-aware microfluidic chip paradigm for real-time live-cell image analysis

High-throughput microfluidic live-cell imaging generates rich single-cell data. Yet semi-automated procedures for locating regions of interest (RoIs), each containing one cell population, and removing surrounding microfluidic structures from recorded images, scale with the number of RoIs. This prevents real-time image analysis and delays time-to-insight by hours to days. We introduce the Design-Aware and Real-Time capable (DART) paradigm for microfluidic cultivation chips, which aligns the CAD blueprint with the physical chip and thereby enables throughput-independent localization of all RoIs and fully automated image processing across diverse RoI geometries and chip layouts. DART establishes this alignment through embedded fiducial markers and deep-learning-based marker detection. We validate DART using the Swiss Army Knife chip, which combines eight structurally distinct RoI designs across 1164 RoI locations. DART localizes all RoIs in five minutes, removes microfluidic structures from raw microscopy images in 40 ms, and performs fully automated image analysis, including cell segmentation, in under 1.1 s per image. Together, these capabilities establish DART as an end-to-end hardware-software paradigm with real-time-capable analysis that paves the way toward closed-loop and outcome-driven smart microscopy.

24.
arXiv (CS.AI) 2026-06-11

SirenFNO: Efficient and Full Frequency Learning of Fourier Neural Operators

arXiv:2606.11518v1 Announce Type: cross Abstract: Fourier neural operators (FNOs) are effective and efficient surrogates for approximating solutions of PDEs and generalize across discretizations. However, owing to the reliance on frequency truncation to maintain learning efficiency of FNOs, empirical studies suggest that FNOs exhibit spectral bias toward low-frequency information, which may hinder the learning capability especially for certain PDEs with strong high-frequency oscillations. To address this limitation, we propose SirenFNO, a novel framework that leverages sinusoidal representation networks (SIRENs) to learn implicit neural representations and performs mode-wise kernel parameterization. Our SIREN parameterization learns a full-grid spectrum with a constant and discretization-independent parameter count, thereby eliminating the need for frequency truncation. We further extend SirenFNO with functional tensor decompositions to enhance parameter and learning efficiency. Empirical results show that our SirenFNO consistently outperforms FNO with approximately $4$ to $15$ times parameter reductions with preserved discretization invariance, and our functional decomposition variants obtain performance improvements with a maximum of $73$ times fewer parameters across multiple PDE benchmarks.

25.
arXiv (CS.LG) 2026-06-16

Escaping the Cognitive Well: Efficient Competition Math with Off-the-Shelf Models

arXiv:2602.16793v2 Announce Type: replace Abstract: In the past year, custom and unreleased math reasoning models reached gold medal performance on the International Mathematical Olympiad (IMO). Similar performance was then reported using large-scale inference on publicly available models but at prohibitive costs (e.g., 3000 USD per problem). In this work, we present an inference pipeline that attains best-in-class performance on IMO-style math problems at an average inference cost orders of magnitude below competing methods while using only general-purpose off-the-shelf models. Our method relies on insights about grader failure in solver-grader pipelines, which we call the Cognitive Well (iterative refinement converging to a wrong solution that the solver as well as the pipeline's internal grader consider to be basically correct). Our pipeline addresses these failure modes through conjecture extraction, wherein candidate lemmas are isolated from generated solutions and independently verified alongside their negations in a fresh environment (context detachment). On IMO-ProofBench Advanced (PB-Adv), our pipeline achieves 67.1 percent performance using Gemini 3.0 Pro with an average cost per question of approximately 31 USD. At the time of evaluation, this represented the state-of-the-art on PB-Adv among both public and unreleased models, and more than doubles the success rate of the next best publicly accessible pipeline, all at a fraction of the cost.