Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-24

Fine-Grained Open-Vocabulary Object Detection with Fined-Grained Prompts: Task, Dataset and Benchmark

Open-vocabulary detectors are proposed to locate and recognize objects in novel classes. However, variations in vision-aware language vocabulary data used for open-vocabulary learning can lead to unfair and unreliable evaluations. Recent evaluation methods have attempted to address this issue by incorporating object properties or adding locations and characteristics to the captions. Nevertheless, since these properties and locations depend on the specific details of the images instead of classes, detectors can not make accurate predictions without precise descriptions provided through human annotation. This paper introduces 3F-OVD, a novel task that extends supervised fine-grained object detection to the open-vocabulary setting. Our task is intuitive and challenging, requiring a deep understanding of Fine-grained captions and careful attention to Fine-grained details in images in order to accurately detect Fine-grained objects. Additionally, due to the scarcity of qualified fine-grained object detection datasets, we have created a new dataset, NEU-171K, tailored for both supervised and open-vocabulary settings. We benchmark state-of-the-art object detectors on our dataset for both settings. Furthermore, we propose a simple yet effective post-processing technique. Our data, annotations and codes are available at https://github.com/tengerye/3FOVD.

02.
arXiv (quant-ph) 2026-06-11

An iterative Ising decoder for quantum error correction codes

arXiv:2606.12301v1 Announce Type: new Abstract: The Ising framework maps the decoding problem in quantum error correction onto ground-state optimization of a classical Hamiltonian, in which $X$-$Z$ error correlations enter as cross terms. Under phenomenological depolarizing noise, the exact joint formulation contains up to 8-body interactions for the toric code and 10-body for the $6.6.6$ color code. These high-order terms degrade solver convergence, inflate runtime, and raise the auxiliary spin overhead when embedding into native 2-body Ising hardware. In this work, we propose the iterative low-order decoding (ILOD) algorithm, which alternates between $X$- and $Z$-type sub-Hamiltonians, approximating cross-type correlations through Bayesian priors that reweight each type's couplings using the other type's inferred error configuration. This halves the maximum body count of interaction terms in the Hamiltonian, accelerating the solver, restoring convergence at larger code distances, and reducing the total spin count for 2-body embedding by a factor of $2.5$. For the toric code, ILOD attains a threshold of $4.73%$ versus $4.83%$ for the joint formulation, with the empirical runtime ratio scaling as $(0.81)^d$. For the $6.6.6$ color code, their thresholds agree within statistical uncertainty for small code distances, and ILOD remains convergent for larger distances where the joint formulation fails to converge despite a larger annealing budget.

03.
arXiv (CS.AI) 2026-06-11

Power Term Polynomial Algebra for Boolean Logic

arXiv:2603.13854v2 Announce Type: replace-cross Abstract: We introduce power term polynomial algebra, a representation language for Boolean formulae designed to bridge conjunctive normal form (CNF) and algebraic normal form (ANF). The language is motivated by the tiling mismatch between these representations: direct CNFANF conversion may cause exponential blowup unless formulas are decomposed into smaller fragments, typically through auxiliary variables and side constraints. In contrast, our framework addresses this mismatch within the representation itself, compactly encoding structured families of monomials while representing CNF clauses directly, thereby avoiding auxiliary variables and constraints at the abstraction level. We formalize the language through power terms and power term polynomials, define their semantics, and show that they admit algebraic operations corresponding to Boolean polynomial addition and multiplication. We prove several key properties of the language: disjunctive clauses admit compact canonical representations; power terms support local shortening and expansion rewrite rules; and products of atomic terms can be systematically rewritten within the language. Together, these results yield a symbolic calculus that enables direct manipulation of formulas without expanding them into ordinary ANF. The resulting framework provides a new intermediate representation and rewriting calculus that bridges clause-based and algebraic reasoning and suggests new directions for structure-aware CNFANF conversion and hybrid reasoning methods.

04.
arXiv (CS.LG) 2026-06-24

EnerInfer: Energy-Aware On-Device LLM Inference

arXiv:2606.23001v1 Announce Type: cross Abstract: On-device LLM inference is increasingly attractive for privacy-preserving, reliable, and cost-effective deployment, yet its energy and thermal costs remain a critical bottleneck. Existing systems primarily optimize for decoding speed, implicitly assuming that faster execution is always preferable. We show instead that on-device LLM inference often has exploitable configuration slack: modestly lowering NPU and memory frequencies preserves quality of experience (QoE) while substantially improving energy efficiency and reducing heat. Realizing this opportunity in production is challenging. The most energy-efficient NPU/DDR setting varies with the model, inference engine, platform, and runtime conditions, with no stable ranking across configurations. Commercial devices further lack component-level power sensing, and shell temperature evolves with request arrivals, response lengths, and thermal history. To address these challenges, we propose EnerInfer, the first on-device LLM inference framework that jointly manages energy efficiency, throughput, and thermal comfort for LLM workloads. EnerInfer replaces per-model profiling and sensor-heavy control with disaggregated, model-structure-aware prediction and ranking-driven online feedback. It predicts throughput and power for unseen LLMs across NPU/DDR frequency settings, selects QoE-satisfying efficient configurations under runtime interference, and uses lightweight limited-horizon thermal prediction to dynamically switch between energy-optimized and thermally constrained inference. Evaluations on real-world LLMs show that EnerInfer improves energy efficiency by up to 65%, 12%, and 24% on phones, a laptop, and a development board, respectively, without QoE violation.

05.
arXiv (CS.LG) 2026-06-15

Deep Spectral Learning of Embedded Latent Transfer Operators for Stochastic Dynamical Systems

arXiv:2606.14079v1 Announce Type: new Abstract: We propose a spectral learning method for stochastic nonlinear dynamical systems represented with embedded latent transfer operators in deep feature spaces. We instantiate the method as Deep Spectral Encoder (DSE), an operator-based latent state-space model in which a time-invariant neural encoder implements learnable nonlinear feature maps from observations, and these features define Markovian latent states whose temporal evolution and observation mapping are described by the transfer and observation operators, respectively. Functional canonical correlation analysis in a learnable Galerkin-projected feature space provides state coordinates from past and future observations, and the two linear operators are estimated on the state coordinates as ridge-regularized closed-form solutions that coincide with Galerkin projections of the associated covariance operators. On this representation, we generalize sequential Bayesian filtering and Koopman spectral mode decomposition in feature space. Experiments on several scenarios show stable and superior performance with sequential Bayesian filtering and dynamic mode decomposition baselines even under noise and partial observability.

06.
arXiv (CS.CV) 2026-06-12

OR-Action: Multi-Role Video Understanding with Fine-Grained Actions

Fine-grained understanding of operating room (OR) activity could enable workflow-aware assistance, yet remains difficult due to clutter, occlusions, and limited sensing. The prevailing approach to model this environment is scene graphs as an interpretable representation of OR interactions. Converting their frame-wise relational predictions into temporally extended, fine-grained actions however, is challenging without explicit temporal modeling. To enable a principled temporal evaluation of current OR understanding methods, we introduce the first action-centric benchmark built on a publicly available ego-exocentric OR dataset by defining a fine-grained, multi-role action taxonomy and generating dense action segments via distillation from ground-truth scene graph state changes. Experiments on this benchmark show that current scene graph prediction methods struggle to model temporal structure, even when adding explicit modeling through Graph Neural Networks. We therefore introduce a vision-only temporal model that outperforms graph-based methods significantly when using all available egocentric video as input. Building on this model we also introduce a novel multi- to single-view feature alignment strategy that improves single-view performance on multi-role action recognition, mitigating the need for extensive egocentric video capture. Benchmark and code will be released upon acceptance.

07.
arXiv (CS.CV) 2026-06-19

Spectral Query-Key Product Weight Steering for Training-Free VLM Hallucination Mitigation

Vision-language models (VLMs) often generate fluent but visually unsupported descriptions, especially by mentioning objects absent from the image. We propose QK Product Steering, a data-free, training-free, and zero-inference-cost weight edit for reducing object hallucination. The method directly edits the per-head query-key product, the operator that produces pre-softmax attention logits, by suppressing a small number of dominant singular modes in selected middle layers. The edited product is then mapped back to the query weights through a closed-form query-only update while keeping shared key weights fixed, making the edit compatible with grouped-query attention. We further decompose the QK product into symmetric and antisymmetric components to distinguish mutual content-similarity patterns from directional attention patterns. Across three GQA-based VLMs, QK Product Steering achieves an average relative CHAIR$_s$ reduction of $4.0\%$, while matched random-mode controls show negligible change. Interpretability ablations show that the hallucination signal is specific to dominant QK modes and is primarily localized to the symmetric mutual-attention channel. Overall, QK Product Steering offers a simple alternative to decoding-time mitigation, requiring no additional data, fine-tuning, or inference-time overhead while largely preserving general multimodal capability.

08.
arXiv (quant-ph) 2026-06-11

Controlled ion-ion interactions and cavity-enhanced emission of a coherent dinuclear Eu$^{3+}$ complex

arXiv:2606.11947v1 Announce Type: new Abstract: Molecular rare-earth-ion complexes offer unique opportunities for quantum technologies by combining the intrinsic coherence properties of rare-earth ions with chemically tunable molecular environments. A crucial capability is the realization of multi-qubit architectures with defined qubit couplings to enable two-qubit quantum gates. Here, we investigate the optical coherence properties and excitation-induced interactions of two Eu$^{3+}$-based molecular complexes, comparing a mononuclear reference system with a dinuclear analogue in which two Eu$^{3+}$ ions are positioned at a well-defined intramolecular distance of about 7 Angstrom. Using cryogenic ensemble spectroscopy, including spectral hole burning, free-induction decay, and photon echo measurements at temperatures down to 100 mK, we demonstrate long optical coherence times $T_{2,o}$ of up to 9 $\mu$s. As a key step toward scalable multi-qubit architectures, a control-target sequence was implemented to probe conditional ion-ion interactions, revealing a stronger interaction-induced dephasing in the dinuclear complex. Finally, we show the integration of the dinuclear complex into a fiber-based optical microcavity, and observe an 380-fold emission enhancement of the $\mathrm{}^5\mathrm{D}_0\rightarrow\mathrm{}^7\mathrm{F}_0$ transition. Together, these results position molecular rare-earth complexes as versatile and chemically tunable building blocks for scalable quantum technologies.

09.
arXiv (CS.LG) 2026-06-15

A General Framework for Decision Trees via Bregman Divergences

arXiv:2606.13984v1 Announce Type: cross Abstract: Decision trees are one of the fundamental tools in statistical learning due to their interpretability, flexibility, and their ability to adapt to nonlinear structures. Among them, the Classification and Regression Trees, introduced by Breiman, Friedman, Olshen, and Stone in 1984, became one of the most influential algorithms and remains one of the most widely used methods for classification and regression problems. On the other hand, Bregman divergences, introduced by Lev Bregman in 1967 in the context of convex optimization, provide a broad family of loss functions that naturally generalize the squared Euclidean distance. This family includes, among others, the Kullback-Leibler divergence, the Poisson divergence, and the Itakura-Saito divergence, as well as several losses associated with distributions belonging to the exponential family. Moreover, Bregman divergences possess a rich geometric structure and deep connections with convex analysis and information geometry. In this work, we propose a generalization of the CART paradigm based on Bregman divergences, thereby obtaining a broader family of decision trees adapted to different statistical models and underlying geometries. Although algorithms such as CART or classical implementations such as rpart incorporate different impurity criteria, these are usually introduced in an ad hoc manner for each specific model. In contrast, the Bregman divergence approach provides a unified framework that allows these criteria to be derived and interpreted from common convex and geometric principles. Beyond the algorithmic construction, we also investigate theoretical properties of these trees. In particular, we study how properties of the generating convex function – such as strong convexity or smoothness – influence impurity gains between parent and child nodes, as well as stability and consistency properties of the estimator.

10.
medRxiv (Medicine) 2026-06-23

Social networks and their association with quality of life among older adults in rural Burkina Faso

Objective: This study aimed to identify the types of social networks present among older adults in a rural, low-income country setting and describe their association with quality of life (QoL). Methods: A population-representative, cross-sectional survey was conducted in 60 villages around Nouna in Burkina Faso from July to August 2021. Data were collected from resident adults aged 40 years and older. Variables captured were sociodemographic status; social network characteristics (using the Practitioner Assessment of Network Typology (PANT)); quality of life (using the EuroHIS-8 tool); presence of non-communicable diseases, mental health conditions, and disability. Additionally, social networks were broadly categorised as aggregated integrated and aggregated less-integrated groups. Social network types and the groups were described separately, and a multivariable linear regression model was used to understand the association between social network types and QoL, adjusted for sociodemographic and morbidity factors. Results: Among the 2390 respondents, median age was 55 yrs (IQR: 47-64 yrs) and 55.8% were female. Locally Integrated (35.4%) or Family Dependent (30.3%) were the most common PANT social network types, followed by a mixed group (having characteristics of two or more social network types) (30.5%). Private Restricted (2.1%), Locally Self-Contained (1.2%), and Wider Community-Focussed (0.4%) types were uncommon. Adults with aggregated integrated network groups (36.1%) and aggregated less-integrated group (36.0%) were near equal, while others were non-aggregable. Although Wider Community-Focused type showed a significantly better QoL ({beta}= 8.69, 95%CI: 4.10 to 13.27), the association between social networks and QoL were subdued when controlled for morbidity factors, and hence no significant associations were observed between other types or the aggregated groups. Conclusion: Although having integrated social networks lead to a better QoL, morbidity has a greater effect on the QoL among older adults in Nouna and hence, investing more on improving the physical and mental health needs appears more beneficial.

11.
arXiv (CS.AI) 2026-06-11

Erased but Not Forgotten: How Backdoors Compromise Concept Erasure

arXiv:2504.21072v3 Announce Type: replace-cross Abstract: The expansion of text-to-image diffusion models has raised concerns about harmful outputs, from fabricated depictions of public figures to sexually explicit imagery. To mitigate such risks, prior work has proposed concept erasure methods that aim to sever unwanted concepts from the model via fine-tuning, yet it remains unclear whether these approaches truly remove all links to the harmful concept or merely conceal superficial connections. In this work, we reveal a critical vulnerability, the Erasure Evasion Backdoor (EEB): an adversary binds a backdoor trigger to a concept slated for removal, and this malicious link survives subsequent erasure. We show that both black-box and white-box adversaries can instantiate this threat. Across six state-of-the-art erasure methods, including robust ones that explicitly search for alternative representations of the target concept, EEB consistently exposes harmful content: up to 82% success against celebrity-identity unlearning, up to 94% for object erasure, and up to 16 times amplification of explicit-content exposure. While EEB uncovers a blind spot in current erasure methods, it also provides a diagnostic tool for stress-testing future concept erasure techniques.

12.
arXiv (CS.CL) 2026-06-18

Montreal Forced Aligner and the state of speech-to-text alignment in 2026

The Montreal Forced Aligner (MFA) was released in 2016 and has since become the most widely used tool for forced alignment in research and industry. In the decade since, MFA has undergone substantial development, including expanded coverage across more languages and dialects using larger open-source datasets, harmonized IPA dictionaries, model adaptation, cross-language phone remapping, and support utilities. This paper documents MFA 3.0's developments since version 1.0 and evaluates MFA's performance across English, Japanese, and Korean, benchmarked against classic and neural forced aligners. MFA 3.0 achieves state-of-the-art or near state-of-the-art performance across all four benchmark datasets with mean boundary errors below 15 ms. Adaptation and cross-language remapping are effective for languages outside MFA's training distribution, and pronunciation probability modeling and phonological rules provide gains in specific conditions.

13.
arXiv (CS.AI) 2026-06-16

Trust-Region Diffusion Policies for Massively Parallel On-Policy RL

arXiv:2606.15260v1 Announce Type: cross Abstract: Reinforcement learning with massively parallel simulations has become a standard framework for developing robust, deployable policies; however, most existing approaches still rely on simple Gaussian policy parameterizations. Diffusion models provide a more expressive policy class and have shown strong performance on challenging control problems, yet most diffusion-based RL methods are designed for offline or off-policy training. In this work, we ask whether diffusion policies can be trained effectively in the massively parallel, on-policy regime. To this end, we introduce Trust-region Diffusion Policies (TruDi), which enables diffusion policies for on-policy RL with massively parallel simulations. This setting is particularly challenging because the data distribution changes quickly across updates, making stable training with complex policies difficult. TruDi addresses this by integrating a trust-region optimization rule to enforce a KL-divergence constraint over the entire diffusion trajectory. Empirically, we evaluate TruDi on a diverse set of 4 massively parallel RL benchmarks comprising a total of 73 tasks. Across these tasks, TruDi consistently outperforms or is on-par with strong baselines on standard tasks and achieves clear gains on more challenging humanoid control tasks, establishing a strong new baseline for massively parallel on-policy RL.

14.
arXiv (CS.LG) 2026-06-17

Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection

arXiv:2604.17616v3 Announce Type: replace Abstract: Root cause analysis (RCA) for time-series anomaly detection is critical for the reliable operation of complex real-world systems. Existing explanation methods often rely on unrealistic feature perturbations and ignore temporal and cross-feature dependencies, leading to unreliable attributions. We propose a conditional attribution framework that explains anomalies relative to contextually similar normal system states. Instead of using marginal or randomly sampled baselines, our method retrieves representative normal instances conditioned on the anomalous observation, enabling dependency-preserving and operationally meaningful explanations. To support high-dimensional time-series data, contextual retrieval is performed in learned low-dimensional representations using both variational autoencoder latent spaces and UMAP manifold embeddings. By grounding the retrieval process in the system's learned manifold, this strategy avoids out-of-distribution artifacts and ensures attribution fidelity while maintaining computational efficiency. We further introduce confidence-aware and temporal evaluation metrics for assessing explanation reliability and responsiveness. Experiments on the SWaT and MSDS benchmarks demonstrate that the proposed approach consistently improves root-cause identification accuracy, temporal localization, and robustness across multiple anomaly detection models. These results highlight the practical utility of conditional attribution for explainable anomaly diagnosis in complex time-series systems. Code and models are available at: https://github.com/dfki-av/Conditional-Attribution-for-Root-Cause-Analysis-in-Time-Series-Anomaly-Detection.

15.
arXiv (CS.AI) 2026-06-12

ERTS: Adversarial Robustness Testing of Ethical AI via Semantic Perturbation in a Bounded Consequence Space

arXiv:2606.13282v1 Announce Type: new Abstract: As AI systems are deployed in high-stakes ethical contexts such as healthcare triage, autonomous vehicle control, and employment screening, formal methods for evaluating their robustness against adversarial manipulation of ethical reasoning remain underdeveloped. This paper introduces the Ethical Robustness Testing System (ERTS), a closed-pipeline framework that: (1) encodes ethical dilemmas into a 22-dimensional Ethical Consequence Space (ECS) grounded in established ethical theory; (2) applies 17 semantic perturbation functions subject to 6 validity constraint classes including a novel semantic coherence constraint; (3) measures decision deviation via a 4-component Ethical Instability Index (EII); and (4) produces domain-adaptive pre-deployment robustness assessment verdicts. We evaluate 4 structured baseline models and 2 production LLMs (Gemini 2.0 Flash and Llama 3.2) across 50 ethical scenarios spanning 8 deployment domains, generating 1,500 adversarial test cases. Results demonstrate that only 33% of models achieve assessment clearance, with the local Llama-3.2 model proving particularly vulnerable to fairness corruption and information degradation attacks (ERS = 0.737). To the best of our knowledge, no existing framework combines a bounded ethical consequence space, semantic coherence constraints, and domain-adaptive assessment in a single adversarial testing pipeline.

17.
arXiv (CS.CL) 2026-06-16

Islamic Large Language Models: From Knowledge Acquisition to Trustworthy and Hallucination-Resistant AI

Large language models (LLMs) are increasingly used for knowledge-intensive question answering, including religious and legal questions. Islamic knowledge is a particularly demanding setting: answers are expected to be grounded in authoritative sources, citations must be exact, Arabic varieties differ substantially from the language of classical sources, and legitimate jurisprudential disagreement must be represented rather than collapsed into a single answer. This survey reviews the emerging field of Islamic LLMs and trustworthy Islamic AI. We organize the literature around Arabic NLP and Arabic-centric LLMs, Islamic NLP resources, Qur'anic question answering, Islamic knowledge benchmarks, retrieval-augmented generation, Islamic legal reasoning, inheritance reasoning, hallucination evaluation, and trustworthiness. We argue that fluency in Arabic is not sufficient for Islamic AI. Reliable systems require curated sources, retrieval and verification modules, citation-aware generation, madhhab-aware reasoning, human expert evaluation, and benchmarks that measure not only answer accuracy but also faithfulness, source validity, and reasoning quality. The survey concludes with a research agenda for hallucination-resistant Islamic AI systems.

18.
PLOS Medicine 2026-06-23

Prevalence and epidemiological patterns of <i>Neisseria gonorrhoeae</i> infection in sub-Saharan Africa, 1964–2025: Systematic review, meta-analyses, and meta-regressions

Authors:

by Aisha Osman, Hina Akram, Bayan Alemrayat, Sumaya Al-Maraghi, Manale Harfouche, Laith J. Abu-Raddad Background Neisseria gonorrhoeae (NG) infection is a global health concern because of its morbidity and increasing antimicrobial resistance. Sub-Saharan Africa is believed to carry a disproportionately high burden of NG infection, but the epidemiology of NG infection in this region has not been comprehensively synthesized. This study systematically reviewed and analyzed NG prevalence in sub-Saharan Africa to characterize prevalence patterns and identify populations at risk. Methods and findings A systematic review was conducted and reported following PRISMA guidelines. Embase, PubMed, Scopus, and Web of Science were searched from inception to June 4, 2025. Eligible studies reported NG prevalence in sub-Saharan Africa. Random-effects meta-analyses generated pooled prevalence estimates, and random-effects meta-regression analyses identified associations and sources of heterogeneity.Nine hundred fifty publications contributed 1,604 prevalence measures spanning 1964–2025. In the general population, pooled urogenital prevalence was 3.2% (95% confidence interval (CI): 2.9–3.5), with substantial between-study heterogeneity and a wide prediction interval, indicating considerable variation in prevalence across settings. Prevalence was high in key populations: among female sex workers, 11.5% (95% CI: 9.9–13.2) for urogenital and 2.0% (95% CI: 0.4–4.5) for anorectal infection; and among men who have sex with men, 2.8% (95% CI: 2.4–3.3) for urogenital, 8.3% (95% CI: 5.8–11.0) for anorectal, and 5.7% (95% CI: 3.6–8.3) for oropharyngeal infection. Symptomatic men exhibited high urogenital prevalence (51.5%; 95% CI: 47.5–55.5), and symptomatic women showed 9.0% (95% CI: 7.7–10.4). Among women with adverse pregnancy or birth outcomes, urogenital prevalence was 8.6% (95% CI: 5.3–12.6). Meta-regression analyses explained over half of the variability in prevalence, showing a long-term decline of 1% per year, a clear population type gradient, subregional differences, and decreasing prevalence with increasing age, but no variation by sex. These findings may be affected by variability in data availability across countries, anatomical sites, and population groups, as well as heterogeneity across included studies. Conclusions NG prevalence remains markedly high in this region but has declined over time. These findings highlight the need for strengthened surveillance, expanded prevention and diagnostic strategies, and continued monitoring of gonococcal antimicrobial resistance to support effective control efforts in sub-Saharan Africa.

19.
Nature (Science) 2026-06-17

Optical fibre gripper for high-performance 3D micromanipulation

Authors:

Optical tweezers offer precise, non-contact control, but operate in a limited force regime and impose strict requirements on the characteristics of the targets as well as the environmental conditions1–4. Millimetre-scale mechanical tweezers can offer higher gripping force but are not suitable for precise manipulations5–11. Integrating microgrippers directly at the optical fibres provides a new approach for precise micromanipulation. However, existing fibre-integrated tweezers still face challenges in achieving high-performance manipulation of micro-objects (for example, single cells) within narrow spaces, mainly due to simplified architectures, constrained designs and millimetre-scale footprints12–14. Here we report a three-dimensional (3D) optical fibre gripper (OFG), which is fabricated by two-step, two-photon polymerization. The OFG consists of rigid photoresist microclaws and soft thermoresponsive hydrogel muscle doped with silver nanoparticles, and its size is only 38 × 38 × 61 μm3. The OFG exhibits a force-to-mass ratio of about 340 μN mg−1, outperforming previously reported fibre-integrated tweezers by one to two orders of magnitude. The OFG can manipulate opaque particles, irregular micromechanical components and diverse single-cell types. We further demonstrated its potential in 3D microassembly of complex microdevices (bearings, shafts and gearboxes) and biomimetic sampling in the narrow environment (&lt;300 μm). These results position the OFG as a compact fibre-tip manipulator for 3D micromanipulation, offering reversible and tunable gripping in an intermediate force regime between optical field trapping and millimetre-scale mechanical tweezers. A miniature three-dimensional optical fibre gripper enables powerful, precise micromanipulation of particles and single cells in confined spaces, bridging the gap between optical and mechanical tweezers.

20.
arXiv (CS.LG) 2026-06-15

Nonlocal Bayesian Modeling of Continuous Spatio-Temporal Dynamics

arXiv:2606.14313v1 Announce Type: cross Abstract: Real-world spatio-temporal forecasting must handle irregular time points, spatially sparse observations, and the need for uncertainty quantification. This setting is often further compounded by nonlocal interactions (long-range spatial coupling). Modeling continuous-space, continuous-time nonlocal dynamics naturally leads to infinite-dimensional integro-differential equations (IDEs), making principled Bayesian inference intractable. We propose the NonLocal Bayesian Spatio-Temporal model (NLBST), a hierarchical Bayesian framework for continuous spatio-temporal fields that learns explicit nonlocal coupling while retaining tractable inference. NLBST represents the latent field via a coordinate-based spatial basis expansion and models the coefficient process with a continuous-time ODE whose learnable linear operator corresponds to a Galerkin reduction of a nonlocal IDE; a Neural ODE residual captures additional nonlinear dynamics. A linear-Gaussian observation model enables Kalman-style sequential updates under missing and irregular observations, while the spatial basis representation enables inductive prediction at unmeasured locations without retraining. Global parameters are learned via variational inference, and uncertainty is handled through a Bayesian hierarchy. Experiments on synthetic and real-world datasets demonstrate strong forecasting and spatial generalization with well-calibrated uncertainty, yielding substantial gains over baselines in strongly nonlocal and partially observed regimes.

21.
medRxiv (Medicine) 2026-06-16

Using visual biofeedback to reduce step length error at fast walking speeds is feasible after stroke

Background and Purpose: Walking after stroke is often characterized by persistent biomechanical impairments and reduced walking capacity. While visual biofeedback can improve gait mechanics and fast walking can enhance capacity, it is unclear whether individuals post-stroke can effectively use biofeedback at higher walking speeds to address both deficits simultaneously. This study examined the effects of walking speed on the ability of participants with chronic stroke to reduce step length (SL) errors using visual biofeedback. Methods: Sixteen individuals with chronic stroke walked on a treadmill at slow, self-selected, and fast speeds with and without visual SL biofeedback. Absolute SL error relative to individualized targets was calculated for paretic and non-paretic limbs. Linear mixed-effects models with piecewise linear splines assessed the effects of speed, limb, and feedback condition. Post hoc comparisons were performed for significant interactions. Results: At lower speeds, increasing speed reduced SL error in both limbs (p < 0.001). At higher speeds, the effects of speed were dependent on limb and condition (p < 0.001). Paretic SL error increased with speed without feedback but remained stable with feedback (p < 0.001). Non-paretic SL error decreased with speed regardless of condition. SL error was greater in the paretic limb overall (p < 0.001). Discussion and Conclusions: Fast walking alone did not reduce paretic SL errors. Participants with chronic stroke can effectively use visual biofeedback to reduce paretic SL errors at higher speeds, supporting its integration into high-intensity gait training to simultaneously treat biomechanical impairments and walking capacity deficits after stroke.

22.
bioRxiv (Bioinfo) 2026-06-12

A Graph-based QSAR Modeling Pipeline for Predicting In vitro PubChem Assays and In vivo Human Hepatotoxicity: Mechanistic Analysis of Caspase-3/7 Activation

Background: Caspase-3 and -7 are key effector caspases in the apoptotic pathway, a form of programmed cell death, and their activities serve as a well-established biomarker for evaluating environmental chemical toxicity and informing chemical risk assessment. Loss of mitochondrial membrane potential is a key event in the activation of Caspase-3/7 signaling and the subsequent induction of apoptosis. Therefore, simultaneous assessment of mitochondrial membrane potential and Caspase-3/7 activity enables elucidation of the mechanisms and pathways through which apoptosis is initiated. Rapid and accurate assessment of the potential toxicity of environmental chemicals and drugs remains a major challenge. Quantitative Structure Activity Relationship (QSAR) modeling have been widely used for toxicity prediction. Graph-based approaches encode compounds directly as molecular graphs, allowing structure-activity relationships to be learnt from molecular topology without the information loss in binary fingerprints. While advanced graph models such as graph transformers (GTs) have shown outstanding performance in many domains, they have not been fully leveraged in QSAR modeling on Caspase and mitochondrial toxicity. Methods: We propose a QSAR modeling pipeline that encompasses assay data preprocessing, feature representations (fingerprints and molecular graphs), and benchmarking machine learning (ML) models, including classic ML models, graph neural networks (GNNs), GTs, and their consensus ensembles. Based on in vitro Caspase and mitochondrial assays in PubChem, we applied the pipeline to predict Caspase-3/7 activation and mitochondrial membrane potential (MMP). Beyond in vitro assays, we also built in vivo QSAR modeling for FDA Drug-Induced Liver Injury (DILI) gold standard on human hepatotoxicity. Moreover, mechanistic analysis on Caspase-3/7 activation was conducted by comparing with MMP disruption to identify chemical substructures that may be responsible for dual activations. We also investigated cell-line-specific responses by identifying structural motifs that selectively induce Caspase-3/7 activation in individual cell lines.Results:Experimental evaluations show that GTs and GNNs outperformed classic ML models when the number of active compounds is large, such as MMP disruption, while classic ML models and GTs performed good for highly imbalance data with limited active compounds, such as Caspase-3/7 activation. For DILI prediction, the full consensus model achieved the highest AUC 0.69 and Graphormer had the highest F1 score 0.79, both surpassing the previous best model with AUC 0.63 and F1 0.65 with a large margin.Our mechanistic analysis shows that phenolic compounds bearing a para-hydroxyphenyl motif, as well as members of the lipophilic chain family with long alkyl chains can trigger the collapse of MMP, leading to the activation of caspases-3 and -7. Human embryonic kidney (HEK293) was the only cell line with a distinct structural motif: 1,1-dichloroethane and chlorobenzene. Human neuroblastoma (SK-N-SH) is uniquely impacted by an epoxide fragment and rat hepatoma (H-4-II-E) is uniquely impacted by a tetramethylcyclohexene motif and an acetaldehyde fragment.Conclusions:The proposed pipeline for QSAR modeling, including data preprocessing, feature representations, and incorporation of advanced graph ML approaches, is highly effective in predicting not only on Caspase-3/7 activation and membrane potential collapse, but also on FDA DILI human hetatotoxicity. As future research directions, we will leverage extra information, e.g., biological activity and findings in existing toxicity literature, and recent advances in large language models and agentic AI to further improve the predictive performance and enable a sensitive and specific framework for assessing human hepatotoxicity of environmental compounds.

23.
arXiv (CS.AI) 2026-06-18

NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation

arXiv:2606.18271v1 Announce Type: new Abstract: As Earth Observation data generation outpaces downlink bandwidth and human-in-the-loop processing, a widening gap has emerged between onboard collection and actionable ground intelligence. This paper presents NAVI-Orbital, a software system deployed on a Low Earth Orbit (LEO) spacecraft. On April 16, 2026, NAVI-Orbital achieved what is, to the authors' knowledge, the first in-orbit demonstration of a vision-language model performing autonomous multi-modal inference entirely onboard. NAVI-Orbital uses a local vision-language model (Gemma 3) to classify each captured scene, produce a text description of its content and the relationships between its features, and respond to operator follow-up via natural-language dialogue. The system is re-tasked through plain-English prompts in place of conventional command sequences, and is orchestrated by a graph-based state machine (LangGraph) coordinating dedicated agents for detection and dialogue. Results across ground benchmarking (88.16% accuracy on the 7,960-image curated AID benchmark), Flatsat validation, and live in-orbit captures of newly acquired, previously unseen Earth imagery (including uncorrected YAM-9 imagery, processed onboard with hardware-accelerated GPU inference and no fine-tuning for the flight instrument) demonstrate the feasibility of running foundation models on satellite-class edge computers to invert the conventional acquire-then-downlink-everything bandwidth profile through semantic compression of Earth observations in-orbit.

24.
medRxiv (Medicine) 2026-06-15

High Demand, Low Possession: Dilemmas and Strategies for Research Capability Cultivation in Clinical Medicine Postgraduates

Most previous studies have examined medical postgraduate research training from a single dimension, lacking a full-chain analysis that integrates capability demand, actual possession, obstacles, and output. Consequently, the measurement of capability gaps and the analysis of underlying training model deficiencies remain insufficient. To address this gap, we administered a self-designed multidimensional questionnaire to 86 clinical medicine postgraduates at a medical school, covering research cognition, interest, capability demand and possession, participation pathways, difficulties, and outputs. The aim was to systematically characterize the current situation, identify problems, and propose optimization strategies. Over 90% of participants expressed interest in research, yet only 1.16% self-rated as very knowledgeable. The largest demand-possess gap was for writing and publication (86.05% vs. 16.28%), followed by independent research capability (75.58% vs. 11.63%). A total of 59.30% cited lack of foundational knowledge, making experiments very difficult, as the greatest challenge, and 66.28% had no research achievements. The primary source of research topics was supervisor assignment (54.65%), with only 4.65% choosing topics independently. No statistically significant differences were found across grades or training types (P > 0.05). These findings reveal a structural high demand, low possession gap in medical postgraduate research training, with early research experience deficit and a passive research model as key constraining factors. Accordingly, an integrated bachelor-postgraduate progressive research competency training system is proposed.

25.
arXiv (CS.CL) 2026-06-12

Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review

The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant risks for adversarial manipulation, especially given the multimodal nature of scientific papers where figures, not just text, convey core evidence. This creates a significant gap: current robustness studies on AI peer-review are overwhelmingly text-only. Moreover, the problem is distinct from standard jailbreaking, as a peer-review attack seeks to induce a domain-specific, targeted failure (e.g., "inflate this score") rather than a general safety policy violation, for which no practical defenses exist. To address this, we introduce PaperGuard, the first comprehensive benchmark designed to systematically evaluate and defend AI-generated peer-review against these domain-specific, cross-modal attacks. Our framework is built on three pillars: (1) a new multimodal peer-review dataset spanning multiple scientific domains; (2) a unified suite of attacks, including black-box prompt injections and white-box perturbations, specifically designed to target both text (GCG) and figures (PGD); and (3) a practical defense, motivated by the long-context challenge of academic papers, that uses chunk-based embedding search to efficiently localize and mitigate harmful instructions. Our extensive experiments, conducted across state-of-the-art models, confirm that AI reviewers are pervasively vulnerable. PaperGuard establishes the foundational benchmark, protocols, and actionable defense necessary to pioneer trustworthy, attack-resilient AI-assisted scholarly reviewing.