Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-16

PACT: Privileged Trace Co-Training for Multi-Turn Tool-Use Agents

Multi-turn tool-use agents must reason, call tools, and adapt to observations across several interaction turns. Post-training such agents is challenging, as reinforcement learning often suffers from sparse rewards and weak credit assignment despite matching the prompt-only inference setting, while supervised fine-tuning on expert traces provides dense process supervision but can over-constrain the model to fixed trajectories. To tackle this, we propose PACT, a Privileged trAce Co-Training framework for multi-turn tool-use agents. The key idea is to use expert traces only as training-time optimization signals rather than rollout-time hints. PACT keeps rollout generation prompt-only, then uses expert traces to guide optimization through two complementary signals: a trace-conditioned RL surrogate that evaluates prompt-only rollouts under expert-trace context, and a component-aware SFT loss that supervises reasoning prefixes and tool-calls with annealed strength. To reduce over-reliance on the training-only trace context, PACT further introduces a prompt-only anchoring. We also provide a latent-trace view that connects the two trace-based objectives and explains how expert traces can guide optimization without being used during rollout generation. Experiments on FTRL, BFCL, and ToolHop show that PACT consistently improves over strong SFT- and RL-based baselines, highlighting the value of privileged trace co-training for multi-turn tool-use learning.

02.
medRxiv (Medicine) 2026-06-24

Repetitive Transcranial Magnetic Stimulation over Primary Somatosensory Cortex for Upper Limb Function in Stroke: An Exploratory Randomized Controlled Trial

Background: Stroke often causes Upper Limb (UL) functional impairments. The Primary Somatosensory Cortex (S1) plays an important role in motor learning. Repetitive Transcranial Magnetic Stimulation (rTMS) over S1 could enhance UL recovery. We aimed to explore its preliminary effects on UL motor activity and function post-stroke. Methods: An exploratory parallel-group randomized controlled trial in people with chronic stroke (>3 months) and moderate hemiparesis was conducted. Participants received 20 sessions of active or sham 5Hz rTMS over affected S1, with Robot-Assisted Therapy and Task-Oriented Training, 5 days/week for 4 weeks. The primary endpoint was UL motor activity (Action Research Arm Test, ARAT). Secondary measures were the UL Fugl-Meyer Assessment (UL-FMA) and sensory outcomes. Results: The baseline-adjusted mean difference (MD) in ARAT was 4.05 points [0.78, 7.33], favoring active stimulation. Secondary measures did not favor active stimulation (UL-FMA: MD = 2.62 [-1.51, 6.76]; sensory outcomes showed no between-group differences). Conclusion: High-frequency rTMS over S1 may enhance UL motor activity (ARAT), but no evidence for motor impairment (UL-FMA) or sensory domains was found. Compensation rather than restoration may underlie this improvement. Stimulation targets should match the intended recovery domain, although larger trials are needed to confirm these preliminary findings.

03.
arXiv (CS.CL) 2026-06-19

MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization

Real-world clinical decision support requires reasoning over heterogeneous and longitudinal patient information rather than answering isolated medical questions. However, current medical large language models and retrieval-augmented generation systems often rely on single-step prompting or retrieval, which can be fragile when clinical evidence is distributed across long electronic health records, medical images, sensor streams, guidelines, and referral constraints. This paper proposes MedRLM, a Recursive Multimodal Health Intelligence framework for long-context clinical reasoning, sensor-guided screening, and community-to-tertiary referral support. Instead of compressing all patient information into one prompt, MedRLM treats the patient case as an external clinical environment that can be recursively inspected, decomposed, retrieved, verified, and synthesized. The framework coordinates specialized agents for clinical text, longitudinal EHR, medical imaging, physiological sensor signals, guideline retrieval, uncertainty auditing, and referral planning. It further introduces a Clinical Evidence Graph Memory to connect patient-specific observations with retrieved evidence, standardized definitions, sensor-derived biomarkers, and referral criteria. A sensor-guided recursive triggering mechanism activates deeper reasoning when abnormal physiological or behavioral patterns are detected, while uncertainty-gated refinement supports clinician review for high-risk or low-confidence cases. We also outline a real-data evaluation design using public and credentialed clinical datasets spanning EHR, radiology, ECG, ICU time series, and referral-proxy outcomes. MedRLM aims to move medical AI from static question answering toward auditable, multimodal, and workflow-aware clinical decision support.

04.
Nature (Science) 2026-06-15

Daily briefing: Iron-Age human bones were made into tools before interment

Authors:

Newly uncovered bones hint at how Iron Age Britons treated their dead. Plus, AI models have failed to beat human mathematicians at research-level problems and the everyday items that make great scientific tools. Newly uncovered bones hint at how Iron Age Britons treated their dead. Plus, AI models have failed to beat human mathematicians at research-level problems and the everyday items that make great scientific tools.

05.
arXiv (CS.LG) 2026-06-15

DTVEM-RE: A Hierarchical Random-Effects Extension of the Differential Time-Varying Effect Model for Person-Specific Multi-Lag Estimation in Intensive Longitudinal Data

arXiv:2606.14116v1 Announce Type: new Abstract: The Differential Time-Varying Effect Model (DTVEM) of Jacobson et al. (2019) is a popular tool for finding the best time lag in intensive longitudinal data, but it assumes everyone shares the same lag structure. The original authors named fixing this as future work, and it clashes with the premise of modern clinical research, which is that people differ. We present DTVEM-RE, an extension that lets each person have their own lag coefficients, with two versions of the confirmatory step: a discrete-time hierarchical Bayesian VAR in Stan, which pools across people and gives calibrated uncertainty, and a continuous-time per-person Ornstein-Uhlenbeck model in ctsem, which handles unevenly spaced beeps directly. We report four results. A simulation shows the Bayesian version recovers the between-person spread tau_a with bias below 0.01 and coverage of 90 to 93 percent. On the Fisher et al. (2017) EMA dataset (N=40), person-specific lag-1 effects vary by an order of magnitude across three mood items, the Bayesian and GAMM estimates agree closely (r=0.87 to 0.92), and DTVEM-RE gives the best one-step-ahead prediction among four discrete-time methods. A multi-lag version shows all nine tau_k values have credible intervals excluding zero, and the lag where people differ most changes across items, something lag-1-only methods like mlVAR cannot detect. Finally, the two versions agree almost exactly on person-specific lag-1 estimates (r >= 0.995), differing only as shrinkage predicts. DTVEM-RE is, to our knowledge, the first person-specific implementation of DTVEM-style lag detection, and it contains standard DTVEM as a special case.

06.
arXiv (CS.AI) 2026-06-18

From Specification to Execution: AI Assisted Scientific Workflow Management

arXiv:2606.18425v1 Announce Type: cross Abstract: Scientific workflow management systems (WMS) support scalable and reproducible execution of complex pipelines, but workflow design, implementation, and debugging remain largely manual and require significant expertise. Recent approaches using large language models (LLMs) show promise for workflow generation from natural language, but often rely on direct code synthesis, which limits transparency, reproducibility, and integration with workflow systems. We present an AI-assisted approach to scientific workflow management that combines specification-driven workflow generation, automated debugging, and distributed execution. The method introduces a structured specification phase that separates workflow intent, design, and implementation, allowing validation prior to code generation. We also develop an LLM-based debugging agent that diagnoses and resolves failures across multiple system layers. To support distributed execution and user interaction, we integrate Pegasus, a widely used WMS, with a Model Context Protocol (MCP) layer, providing a unified interface for workflow submission, monitoring, and control. We evaluate the approach using a federated learning workflow for medical imaging, chosen for its parallel, iterative, and dependency-intensive structure. The system generated and executed large-scale workflows with thousands of jobs, reduced debugging effort, and allowed non-expert users to construct workflows with expert-level design patterns. These results indicate that end-to-end AI-assisted workflow generation and execution is feasible, and point toward AI-driven platforms for managing the scientific workflow lifecycle.

07.
arXiv (CS.CL) 2026-06-17

Guidelines for the Annotation and Visualization of Legal Argumentation Structures in Chinese Judicial Decisions

This Guideline presents a systematic and operationalizable annotation framework for representing legal argumentation structures in judicial decisions. Grounded in theories of legal reasoning and argumentation, the framework aims to reveal the logical organization of judicial reasoning and provide a reliable foundation for computational analysis. At the element level, the Guideline distinguishes between the non-propositional layer and the propositional layer. The non-propositional layer consists of two elements: Issue and Non-argumentative Component. At the propositional level, the Guideline defines four proposition types: General Normative Judgment, Particular Normative Judgment, General Factual Judgment, and Particular Factual Judgment. At the relational level, five relation types are defined to represent argumentative structures: Support, Attack, Joint, Match, and Identity. These relations capture positive and negative argumentative connections, conjunctive reasoning structures, correspondences between legal norms and case facts, and identity or semantic equivalence between propositions. The Guideline further specifies formal representation rules and visualization conventions for both basic and nested structures, enabling consistent visualization of complex argumentation patterns. In addition, it establishes a standardized annotation workflow and consistency control mechanisms to ensure the reproducibility and reliability of annotated data. By providing a clear conceptual model, formal representation rules, and practical annotation procedures, this Guideline supports large-scale analysis of judicial reasoning and future research in legal argument mining, computational modeling of legal reasoning, and AI-assisted legal analysis.

08.
arXiv (CS.AI) 2026-06-15

Learning High Coverage Discriminative Parsimonious Rulesets

arXiv:2606.14156v1 Announce Type: cross Abstract: Learning systems based on IF-THEN rule representations readily offer interpretability, making them a crucial focus in contemporary AI research. A key objective for such rule sets is to achieve both high discriminative power and interpretability. While existing state-of-the-art algorithms implicitly prioritize predictive accuracy, they often fall short on one or more quality metrics that ensure interpretability, such as coverage and parsimony of rule sets. Motivated by this, this paper propose the development of CDPR, which aims to create highly accurate and interpretable rule sets for classification problems. To the best of our knowledge, this represents the first attempt to establish such an approach. In this study, we introduce two algorithms rooted in submodular maximization, which not only provide provable guarantees on coverage but also yield rule sets that are both discriminative and parsimonious. We empirically demonstrate that rule sets learned through our approaches achieve higher accuracy and interpretability and has more than a 2.5-fold improvement in average coverage rates when compared to the next best algorithm.

09.
arXiv (CS.LG) 2026-06-19

Multi-Task Bayesian In-Context Learning

arXiv:2606.20538v1 Announce Type: new Abstract: Bayesian predictive inference provides a principled framework for uncertainty quantification, data efficiency, and robust generalization. However, exact inference is often intractable, and scalable approximations may remain computationally expensive or require restrictive modeling assumptions that degrade predictive performance. Prior-Data Fitted and in-context models have recently emerged as an amortized alternative by learning to map datasets directly to predictive distributions, but existing approaches are tightly coupled to the support of the training prior and lack explicit mechanisms for adapting to new priors at test time, resulting in limited robustness under distribution shift. We introduce a multi-task in-context learning framework for amortized hierarchical Bayesian predictive inference that explicitly represents prior information as a prefix of in-context datasets. A transformer trained on sequences of prior and target tasks learns to adapt its predictions across families of priors. On a suite of evaluations with increasing difficulty, including out-of-meta-distribution priors and priors with high-dimensional latent structures, our method matches oracle Bayesian predictors while being orders of magnitude faster. We further demonstrate its practical relevance on a real-world spatiotemporal temperature prediction benchmark. Code is available at https://github.com/martianmartina/multi-task-bayesian-icl/.

10.
arXiv (CS.CL) 2026-06-19

HydraHead: From Head-Level Functional Heterogeneity to Specialized Attention Hybridization

The quadratic complexity of attention poses a critical bottleneck for long-context processing, spurring interest in hybrid attention designs. Most open-source hybrid models adopt a layer-wise strategy. Yet, prior work has noted the inherent difficulty of integrating Linear Attention (LA) with Full Attention (FA), suggesting that the design space of attention hybridization remains underexplored. To probe this space, we conduct interpretability analysis and observe that layers exhibit block-wise functional similarity, while individual heads within the same layer display distinct functional specialization despite sharing input features. This head-level heterogeneity suggests that the head dimension provides a natural and principled granularity for fusing heterogeneous attention signals. Building on this insight, we introduce HydraHead, a novel architecture that hybridizes FA and LA along the head axis. HydraHead features two key innovations: (1) an interpretability-driven selection strategy that identifies retrieval-critical heads and preserves FA only for them, and (2) a scale-normalized fusion module that reconciles the distributional gap between FA and LA head outputs. By leveraging a three-stage transfer pipeline with parameter reuse and distillation, we achieve high-performance hybrid models with minimal training overhead. Under a unified training setup, HydraHead outperforms other hybrid designs in long-context tasks while maintaining strong general reasoning. With interpretability-driven head selection, it matches a 3:1 layer-wise hybrid's long-context performance at a 7:1 LA-to-FA ratio. Crucially, trained on only 15B tokens, HydraHead achieves over 69% improvement over the baseline at 512K context length, approaching Qwen3.5, a leading model of comparable size with a native context length of 256K. This highlights the significant scaling potential of head-level hybridization.

11.
bioRxiv (Bioinfo) 2026-06-18

novelBGC: An interactive dual-score framework for biosynthetic gene cluster novelty assessment and candidate prioritisation

Genome mining now yields tens of thousands of putative biosynthetic gene clusters (BGCs) per project, yet, separating genuinely novel candidates from rediscoveries of known compounds remains the rate-limiting step before experimental validation. Single-axis prioritisation tools, antiSMASH similarity, BiG-FAM GCF distance, and self-resistance-enzyme (SRE) filters such as ARTS, each surface a different facet of evidence, yet their isolated use systematically over-ranks rediscovery-prone BGCs and overlooks genuinely orphan clusters. We present novelBGC, a web-hosted framework that converts these disparate outputs into two deliberately non-inverse continuous metrics per BGC, a Novelty (N) and a Reference Similarity (RS) score which together define a 2D decision plane that resolves rediscoveries, divergent family members, contig-edge artefacts, and uncharted chemistry with interactive visualisations, with all component weights user-tuneable at submission. Retrospective validation across three independent experimental datasets demonstrates the utility of the framework for candidate prioritization. Within the first 186-BGC SRE-guided cloning study, every confirmed bioactive product fell within the low-to-mid N band whereas 55 high-N (N [≥] 0.50) BGCs were never selected. Moreover, in the other two studies, it correctly prioritised the fully orphan lariocidin BGC of Paenibacillus sp. M2 and the divergent within-family indanopyrrole-A idp BGC of Streptomyces sp. CNX-425. Together, these case studies demonstrate that the joint (N, RS) space facilitates prioritization decisions that are difficult to achieve using any single criterion alone. from identical input data. novelBGC requires no command-line expertise, no local tool installation, and no manual integration of intermediate output formats, addressing a well-documented accessibility barrier for wet-laboratory researchers engaging with genome-mining workflows. novelBGC is freely available at https://project.iith.ac.in/sharmaglab/novelbgc/.

12.
arXiv (CS.AI) 2026-06-16

RollArt: Disaggregated Multi-Task Agentic RL Training at Scale

arXiv:2512.22560v2 Announce Type: replace-cross Abstract: Agentic Reinforcement Learning (RL) trains LLMs through multi-turn interactions with environments, producing workloads that mix compute-bound prefill, bandwidth-bound decoding, CPU-heavy environment execution, and bursty reward evaluation. Existing systems either colocate all stages on a single GPU cluster or decouple them only at a coarse granularity, overlooking hardware heterogeneity and incurring substantial synchronization overhead across stages. We present ROLLART, a system for multi-task agentic RL on disaggregated infrastructure. ROLLART maps each pipeline stage to best-fit hardware, routing prefill-heavy tasks to compute-optimized GPUs, decode-heavy tasks to bandwidth-optimized GPUs, and environments to CPU clusters. It decouples rollout at the trajectory level, allowing generation, environment interaction, and reward scoring to proceed independently, so that slow or failed environments never block the others. ROLLART offloads stateless reward computation to serverless infrastructure and overlaps rollout with training via staleness-bounded asynchronous weight synchronization. Our results demonstrate that ROLLART effectively improves training throughput and achieves 1.31–2.05 \(\times\) training time reduction compared to various RL systems. We also evaluated ROLLART by training a hundreds-of-billions-parameter MoE model for Qoder product on an Alibaba cluster with above 3,000 GPUs, demonstrating its stability and scalability.

13.
bioRxiv (Bioinfo) 2026-06-11

SPARK: A Systems-level Computational Framework for Reconstructing Transcriptomic State Organisation in Lung Adenocarcinoma

Lung adenocarcinoma (LUAD) exhibits substantial molecular heterogeneity, which complicates tumour stratification and limits the ability of mutation-centric models to capture tumour behaviour and predict patient outcomes. This study investigates whether coordinated transcriptomic programs can provide a systems-level representation of tumour states. Bulk RNA-sequencing data from the TCGA-LUAD cohort were analysed to reconstruct pathway-level transcriptomic organisation using a stability-optimised network framework (SPARK). This analysis identified eight transcriptomic modules representing coordinated biological processes active across tumours. Module activity scores were subsequently used to derive a composite Transcriptomic Risk Score through elastic-net Cox proportional hazards modelling. The resulting risk score showed a significant association with overall survival in the discovery cohort and improved prognostic discrimination beyond clinical variables. An independent evaluation in the CPTAC-LUAD cohort confirmed the prognostic signal and preserved risk stratification across patient groups. Unsupervised clustering of module activity further revealed three transcriptomic patient groups characterised by distinct biological programs, genomic alteration patterns, and survival outcomes. Single-cell analysis also demonstrated that the identified transcriptomic modules reflect coordinated organisation of the tumour-immune-stromal ecosystem across cellular compartments. Together, these findings suggest that LUAD heterogeneity can be organised into coordinated transcriptomic programs with measurable clinical relevance, providing a systems-level framework for representing tumour molecular states.

14.
arXiv (CS.AI) 2026-06-18

Scaling Learning-based AEB with Massive Unlabeled Data

arXiv:2606.18864v1 Announce Type: cross Abstract: This paper studies how to scale learning-based automatic emergency braking (AEB) with massive unlabeled fleet data under production constraints. Our approach is based on meta-feedback semi-supervised learning (MF-SSL), where a teacher generates pseudo labels for unlabeled driving data and is updated using a small labeled anchor set as safety-critical feedback. In production, anchor ambiguity and labeled-unlabeled mismatch can amplify systematic pseudo-label errors, leading to spurious triggers. We propose a stabilized MF-SSL framework with (i) Noise-Aware Decoupling, which removes ambiguity-prone anchors from the teacher's supervised update path, and (ii) kinematics-gated pseudo-labeling with a teacher conflict penalty to suppress mismatch-induced risk hallucinations on unlabeled data while maintaining broad coverage. Extensive experiments show consistent gains as unlabeled data scale from 1M to 1B windows, improving safety while keeping comfort stable. The 1B-trained student model is deployed to hundreds of thousands of vehicles and validated over \$10^9$ km of driving, achieving a positive-to-false activation ratio exceeding 100:1 and a 35% improvement in accident-free driving mileage over a production rule-only baseline.

15.
arXiv (math.PR) 2026-06-11

Numerical simulations of the spread from the mean of the SLE and Multiple SLE dynamics

arXiv:2606.11254v1 Announce Type: cross Abstract: The Schramm-Loewner Evolution (SLE) describes a family of fractal curves that arise in the study of the scaling limits of many planar Statistical Physics models. These curves are modeled using the Loewner Differential Equation for the conformal maps $g_t(z)$ with a Brownian motion driver. Using Euler's Method, in the current work we performed numerical experiments to study at a fixed time the quantities $|g_t(z) - \overline{g_t(z)}|$ and $Re(g_t(z)) - Re(\overline{g_t(z)})$, where $Re$ denotes the real part and $\overline{g_t(z)}$ refers to the sample average. These random variables measure the 'spread' of the dynamics from the average behavior at fixed time. One of the scopes of this work is to give numerical predictions for future theoretical investigations on these quantities. When investigating these quantities in the SLE case our experiments predict that the distribution is bimodal when the dynamics started close to the origin, and it can become bell-shaped if the dynamics is started further from the origin. In the second part, we performed experiments for a Multiple SLE model whose driver is Dyson Brownian Motion. Due to singularity in the dynamics of the drivers and the many data points needed, this part is challenging from a computational perspective. In the multiple SLE case, our experiments predict that the distribution is bell-shaped in all cases. In addition, we check the changes in the distributions as we vary the parameter $\kappa$ in the SLE case and $\beta$ in the Multiple SLE case.

16.
arXiv (CS.CL) 2026-06-16

T-Mem: Memory That Anticipates, Not Archives

Long-term memory is essential for conversational agents to remain coherent across extended dialogues, follow through on commitments made many sessions earlier, and adapt their behaviour to each user. Current LLM-backed long-term conversational memory, however, is reachability-bounded by the similarity between a query and stored content, both lexical and dense-vector. The approach is effective when query and memory share surface features such as wording or named entities (we call this descriptive). But it misses another, equally valuable class of cases, where query and memory do not share surface features and are tied only by a latent semantic arc (associative). On this regime prevailing long-term memory systems collectively fail. Covering this other half is what allows an assistant, for the first time, to actively draw on past dialogue as a semantic asset. On the memory side, this is the engineering counterpart of what cognitive science calls episodic future thinking: rehearsing past experience for the future contexts under which it will need to be found. We call these write-time rehearsals triggers. We propose T-Mem, the first long-term conversational memory architecture that covers both descriptive and associative recall. At each of two evidence granularities, single facts and full exchanges, T-Mem instantiates one descriptive trigger family and one associative trigger family, so that every memory remains reachable from both surface-similar and relevance-bound queries. As empirical validation, T-Mem reaches state-of-the-art on both LoCoMo and LoCoMo-Plus.

17.
arXiv (CS.CV) 2026-06-24

Trustworthy Image Authentication using Forensic Knowledge Graphs

Advances in generative AI have made image falsification highly realistic, demanding trustworthy authentication systems. Existing forensic detectors can target certain forgery types but lack interpretability, while vision-language models (VLMs) provide explanations but cannot exploit forensic traces for reliable detection. We propose Forensic Knowledge Graphs (FKGs), a unified framework that integrates forensic evidence extraction, structured reasoning, and human-interpretable explanation. Our FKG structure encodes forensic traces along with their causal dependencies and links to scene content. To generate accurate FKGs, we introduce a novel forensic authentication network and an Iterative Context Refinement strategy that guides VLMs to produce faithful, grounded explanations. We also present FKG-50K, a dataset of 50,000 realistic forgeries with ground-truth FKGs. Experiments demonstrate that FKG outperforms both forensic detectors and VLMs in detection, forgery identification and localization, and forensic justification.

18.
arXiv (quant-ph) 2026-06-17

Matrix Product States for Modulated Symmetries: SPT, LSM, and Beyond

arXiv:2603.19189v2 Announce Type: replace-cross Abstract: Matrix product states (MPS) provide a powerful framework for characterizing one-dimensional symmetry-protected topological (SPT) phases of matter and for formulating Lieb-Schultz-Mattis (LSM)-type constraints. Here we generalize the MPS formalism to translationally invariant systems with general modulated symmetries. We show that the standard symmetry "push-through" condition for conventional global symmetry must be revised to account for symmetry modulation, and we derive the appropriate generalized condition. Using this generalized push-through structure, we classify one-dimensional SPT phases with modulated symmetries and formulate LSM-type constraints within the same MPS-based framework.

19.
arXiv (CS.CV) 2026-06-12

Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning

For robotics to be effectively integrated into household or industrial environments, machines must adapt to natural-language prompts in real time. Although Vision-Language Models (VLMs) have enabled zero-shot generalization in robot task and motion planning (TAMP), current state-of-the-art approaches often remain computationally "heavyweight" or require extensive training on thousands of demonstrations. We present GRASP (Grounded Reasoning and Symbolic Planning), a framework designed as a step toward open-vocabulary tabletop manipulation. Our approach leverages a pretrained VLM to translate natural-language queries into neuro-symbolic goal states, grounded in the physical world via a bounding-box detection pipeline. Unlike methods that rely on fixed color lists or hard-coded coordinates, GRASP enables robots to interpret abstract spatial concepts such as "top shelf" and execute tasks without additional fine-tuning. We achieve 73.3% overall success across 90 real-robot trials at three difficulty levels, requiring no task-specific training.

20.
arXiv (CS.CV) 2026-06-12

Goal2Pixel: Grounding Goals to Pixels for Vision-Language Navigation

Vision-language models (VLMs) have become a common foundation for vision-and-language navigation in continuous environments (VLN-CE). Yet most VLM-based methods cast navigation as low-level action prediction, an interface that is ambiguous, tied to short-horizon motion primitives, and inefficient due to repeated VLM querying. We propose Goal2Pixel, a pure pixel-based paradigm that reformulates VLN-CE as navigable pixel grounding. Rather than predicting actions, Goal2Pixel uses the image plane as a unified spatial interface between VLM reasoning and robot motion: the model predicts a visible navigable pixel to the agent, which is back-projected into a 3D waypoint for forward navigation. For non-forward actions, we append auxiliary directive regions to the image plane, where the left/right/bottom regions are interpreted as turning left, turning right, and stopping, respectively. To enable long-horizon navigation, we propose a visibility-aware keyframe memory for compact and informative history representation. To adapt pretrained VLMs to navigable pixel grounding, we introduce semantic embeddings and coordinate-aware auxiliary losses. Goal2Pixel achieves competitive state-of-the-art performance while requiring fewer VLM inference calls than prior methods. On R2R-CE Val-Unseen it achieves 54.1% SR and 52.5% SPL with just 7.75 VLM calls per episode, 6x fewer than the 46.62 required by direct action prediction at 32.9% SR. The same trend holds on RxR-CE.Project Page: https://baobao0926.github.io/Goal2Pixel/.

22.
PLOS Medicine 2026-05-06

Pathways of emergency care for severely ill children in Nigerian and Ugandan hospitals: A process mapping study

Authors:

by Rami Subhi, Abiodun Sogbesan, Dan Muramuzi, Mikael Burhin, Ayobami A. Bakare, Adegoke G. Falade, Freddy E. Kitutu, Freddie Ssengooba, Carina King, Sumit Kane, Belinda Dawson-McClaren, Hamish R. Graham, the MOXY-Implementation Research Collaboration Background Child mortality remains high in countries with weak emergency care systems. Facility organisation for paediatric emergency care is heterogeneous and under-described. We examined how hospitals in Uganda and Nigeria are organised to deliver emergency care for neonates and children. Methods and findings We conducted a qualitative, multi-method study in 26 purposively selected secondary and tertiary facilities in Uganda and Nigeria from October 2023 to December 2024. Embedded researchers documented patient pathways, resources for care, and care processes for severely ill children (

23.
arXiv (CS.CV) 2026-06-16

HMR-Net: Hierarchical Modular Routing for Cross-Domain Object Detection in Aerial Images

Despite advances in object detection, aerial imagery remains a challenging domain, as models often fail to generalize across variations in spatial resolution, scene composition, and semantic label coverage. Differences in geographic context, sensor characteristics, and object distributions across datasets limit the capacity of conventional models to learn consistent and transferable representations. Shared methods trained on such data tend to impose a unified representation across fundamentally different domains, resulting in poor performance on region-specific content and less flexibility when dealing with novel object categories. To address this, we propose a novel modular learning framework that enables structured specialization in aerial detection. Our method introduces a hierarchical routing mechanism with two levels of modularity: a domain routing layer that uses latent geographic embeddings to assign inputs to domain-specialized expert modules, and a scene routing mechanism that allocates image subregions to scene-specific expert modules. This allows our method to specialize across datasets and within complex scenes. Additionally, the framework contains a conditional expert module that uses external semantic information (e.g., category names or textual descriptions) to enable detection of novel object categories during inference, without the need for retraining or fine-tuning. By moving beyond monolithic representations, our method provides an adaptive framework for remote sensing object detection. Comprehensive evaluations on four datasets highlight improvements in multi-dataset generalization, region-level specialization, and open-category detection.

24.
arXiv (CS.CV) 2026-06-12

GEASS: Gated Evidence-Adaptive Selective Caption Trust for Vision-Language Models

Vision-Language Models (VLMs) hallucinate objects that are not present, and a growing line of work tries to curb this by feeding the model its own generated caption as auxiliary evidence – assuming that a caption, once available, is something to consume. We show this fails: naively appending a caption can lower accuracy rather than raise it, dropping Qwen2.5-VL-3B$^\dagger$ on HallusionBench by nearly ten points. To understand why, we build GD-Probe, a diagnostic set that pairs a global and a detail question on the same image, so that any difference in caption effect is attributable to the question alone. Caption utility proves to be a per-query property: the same caption helps global questions and harms detail ones, through a single mechanism – an embedded caption competes with the image for attention and pulls the model's evidence onto its own text – whose sign is set by whether the caption covers the queried content. Crucially, this regime is readable from quantities the decoder already emits, with no attention access or grounding. We turn this into GEASS (Gated Evidence-Adaptive Selective Caption Trust), a training-free, logit-level module that decides per query how much of the caption to trust, gating it by the clean path's confidence, weighting it by the entropy reduction it induces, and raising the evidence bar when the two pathways disagree. Across four VLMs and two benchmarks (POPE and HallusionBench), GEASS improves over both vanilla inference and contrastive decoding under a single fixed setting, adding only two forward passes and no parameters.

25.
arXiv (quant-ph) 2026-06-24

Gate-Controlled Spin Qubits in Confined Altermagnets

Authors:

arXiv:2606.24150v1 Announce Type: cross Abstract: We propose gate-defined spin qubits in electrostatically confined altermagnetic quantum dots. Elliptical confinement of the $d$-wave altermagnetic structure produces a low-energy doublet with opposite spin polarization. For the range of parameters used here, the qubit states energy gap lies in the microwave range while the leakage gap remains in the meV range. Even without spin-orbit coupling, time-dependent simulations show that a phase-controlled quadrupolar gate drive about a fixed bias point implements $X_{\pi/2}$ and $X_\pi$ rotations by resonantly modulating the confinement anisotropy. We extend the study to two-qubits using a double quantum dot. We show that the double quantum dot spectrum can be cleanly projected onto isolated quantum dot product states with a nonzero nonlocal Pauli block in the effective logical two-qubit Hamiltonian. Resonant central-barrier modulation then drives the logical two-qubit component close to a maximally entangled state. These calculations show anisotropic altermagnetic quantum dots as a route to locally gate-controlled spin qubits without requiring spin-orbit coupling.