Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-11

Learning to Inject: Automated Prompt Injection via Reinforcement Learning

arXiv:2602.05746v2 Announce Type: replace-cross Abstract: Prompt injection is a critical vulnerability in LLM agents, yet the strongest methods still rely on human red-teamers and hand-crafted prompts. Adapting automated jailbreak optimizers does not close this gap: jailbreaks shape models toward generic compliance, while prompt injection requires emitting specific tool calls with correct parameters. The success signal is binary, and randomly sampled suffixes almost never trigger it, so standard optimizers have no gradient to follow. We present AutoInject, a black-box reinforcement learning (RL) framework that learns adversarial suffixes for prompt injection. A learned comparison-based reward scores each candidate against the best suffix seen so far, turning the binary signal into a dense reward suitable for RL optimization. The framework supports both online query-based attacks and offline-trained transferable suffixes that need no utility access at deployment, and incorporates a utility objective when task-completion feedback is available. On AgentDojo, AutoInject outperforms template attacks, GCG, TAP, and adaptive attack across production models, with statistically significant improvements under McNemar's test with p

02.
arXiv (quant-ph) 2026-06-16

Non-Hermitian Crystalline Braid Topology from Hermitian Projection: A Zero-Mode Resonance Mechanism

arXiv:2606.06626v2 Announce Type: replace-cross Abstract: Non-Hermitian topological phases are typically engineered through gain and loss, nonreciprocity, or interaction with an environment. Here we show that they can instead emerge purely by projecting a fully Hermitian, topologically trivial parent lattice onto an embedded subsystem. The mechanism is general: when a zero mode of the eliminated degrees of freedom couples to the retained subsystem, the embedding self-energy develops a pole, the zero-frequency description becomes singular, and topology is carried by the finite-frequency projected Green's function. We realize the mechanism exactly in a trivial nearest-neighbor square lattice with an embedded one-dimensional zig-zag brane. In the periodic transverse geometry, the parity of the eliminated complement selects the outcome: even sectors reduce to a regular Schur complement and yield conventional SSH-type descendants, whereas odd sectors host a sublattice-imbalance zero mode and follow the resonant route. There, the complex bands braid through isolated finite-frequency exceptional points (EPs), while a parity symmetry inherited from the embedding, together with $\mathrm{TRS}^{\dagger}$, induces conjugated pseudo-Hermiticity and quantizes the complex Berry phase. The stable bulk invariant of the nondegenerate phases is this quantized complex Berry phase; adjacent sectors are separated by parity-paired exceptional points whose half-integer vorticities encode the local exchange of complex-energy strands.The absence of the non-Hermitian skin effect ensures that the invariant is defined directly on the ordinary Brillouin zone. A topolectrical implementation of the projected response predicts momentum-resolved transmission minima at the exceptional-point transition frequencies together with a characteristic low-frequency resonant admittance, providing an experimentally testable signature of the mechanism.

03.
arXiv (CS.CL) 2026-06-16

Surpassing Scale by Efficiency: A Compact 135M Parameter Foundational LLM Natively Adapted for the Bangla Language

While the NLP landscape is dominated by multi-billion parameter architectures, their deployment in low-resource, non-Latin scripts remains computationally prohibitive for edge configurations, mobile systems, and decentralized local hardware. This paper presents bangla-smollm-135m, a highly compact 135-million parameter decoder-only foundational model engineered explicitly for high-efficiency language modeling in the Bangla script. By leveraging a deterministic intersect-and-append token merging strategy between TituLLMs and SmolLM2-135M, the model overcomes subword script fragmentation without destabilizing early pretrained parameter states. In zero-shot multi-task benchmark evaluations (PIQA_bn, OpenBookQA_bn, CommonsenseQA_bn, and Bangla_MMLU), bangla-smollm-135m matches or outperforms models twice its size (Gemma-3-270m) and achieves parity with models in the 1B parameter tier. The model is available at rnnandi/bangla-smollm-135m

04.
arXiv (CS.AI) 2026-06-16

Co-Scraper: query-aware DOM Pruning and Reusable Scraper Synthesis for Lightweight Web Data Extraction

arXiv:2606.14821v1 Announce Type: cross Abstract: The abundant and heterogeneous nature of web content necessitates automated information extraction, and generating scrapers that can be reused across similar web pages offers an effective solution for scalable data extraction. In this work, we propose Co-Scraper, a two-stage framework capable of handling the hierarchical complexity of long HTML documents. By integrating a query-aware DOM pruning mechanism with stable extraction strategy induction, Co-Scraper can effectively transforms web content into executable programmatic wrappers using a fine-tuned Qwen3-8B model. On the test set of SWDE, Co-Scraper achieves state-of-the-art performance with an F1 score of 94.78% and a reuse success rate of 90.39%. This framework significantly enhances the accuracy and resilience of data extraction, providing a highly efficient approach for web data acquisition tasks.

05.
arXiv (CS.CL) 2026-06-11

Evolving Agents in the Dark: Retrospective Harness Optimization via Self-Preference

AI agents rely on a harness of skills, tools, and workflows to solve complex problems. Continually improving this harness is essential for adapting to new tasks. However, existing optimization methods typically require ground-truth validation sets, yet such labeled data is difficult to acquire in practical deployment settings. To address this problem, we introduce Retrospective Harness Optimization (RHO), a self-supervised method that optimizes the agent harness using only past trajectories. Specifically, RHO selects a diverse coreset of challenging tasks from past trajectories and re-solves them in parallel. The agent analyzes these rollouts using self-validation and self-consistency, then generates candidate harness updates and selects the most effective one by its own pairwise self-preference. We evaluate RHO across three diverse domains, spanning software engineering, technical work, and knowledge work. Notably, a single optimization round improves the pass rate on SWE-Bench Pro from 59% to 78% without any external grading. Furthermore, our analysis demonstrates that RHO effectively targets prior failure modes. As a result, the optimized harness alters the agent's behavior patterns and sustains higher accuracy during long-horizon sessions.

06.
bioRxiv (Bioinfo) 2026-06-10

A Unified Spatial AI Framework for Cross-Domain Tissue-State Analysis in Trauma, Oral, and Cardiovascular Pathology

Authors:

Objective: To develop a cross-domain spatial AI framework for identifying conserved tissue-state organisation across trauma, oral disease, and cardiovascular tissue using spatial transcriptomic data. Methods: Four public spatial transcriptomic datasets spanning wound healing, periodontitis, oral squamous cell carcinoma, and cardiac tissue were integrated using recurrence modelling, graph-based spatial learning, fuzzy tissue-state analysis, and tensor decomposition. Cross-domain coupling, spatial fragmentation, recurrence structure, and permutation-based topological validation were evaluated. Results: Six conserved fuzzy tissue states were identified, dominated by extracellular matrix remodelling, fibroblast/stromal activation, endothelial signalling, and inflammatory pathways. Latent embedding analysis demonstrated strong overlap between trauma and oral domains, while cardiovascular tissue exhibited more compact spatial organisation. Oral inflammatory tissue showed the highest fragmentation, whereas cardiovascular tissue demonstrated greater recurrence coherence. Tensor decomposition identified conserved stromal-remodelling programmes across domains. Permutation testing confirmed significantly elevated graph modularity and reduced spatial entropy relative to null distributions. Conclusion: The proposed framework identified conserved spatial tissue-state architecture linking wound healing, oral pathology, and cardiovascular tissue despite differences in tissue origin, pathology, and acquisition technology. Significance: These findings demonstrate the potential of spatial AI for investigating conserved stromal and inflammatory microenvironmental organisation across clinically related disease systems and may support spatial biology research in trauma–oral–systemic health.

07.
arXiv (CS.CL) 2026-06-15

SIMMER: Benchmarking Latent Failures in LLM Executable Planning with a World Model

Large language models (LLMs) are increasingly deployed as planners for autonomous agents in household environments. While existing benchmarks evaluate whether LLM-generated plans execute successfully, they overlook a critical type of failure: latent failures. Unlike immediate failures that trigger instant feedback at execution time and enable timely correction, latent failures do not immediately halt plan execution but silently compromise goal achievement. In severe cases, they cause irreversible harm. To address this gap, we introduce SIMMER, a benchmark for evaluating latent failures in LLM planning through a human-curated symbolic world model grounded in the kitchen domain. SIMMER defines a world model comprising 77 actions, 262 unique objects, and approximately 46,800 possible interactions that are semantically realistic, derived from real-world cooking scripts. It then leverages a state machine executor that validates plans against the world model and detects immediate precondition violations, latent hazards, and irreversible failures. Experiments across six LLMs show that even frontier models achieve at most 17% error-free plans. Moreover, up to 56% of plans contain latent failures, the majority of which lead to irreversible consequences. We further demonstrate that explicit state reasoning via counterfactual foresight simulation can reduce latent failures by up to 72% and irreversible cases by up to 75%, suggesting a promising direction for more robust LLM planners.

08.
arXiv (CS.CV) 2026-06-16

FrameOracle: Learning What to See and How Much to See in Videos

Vision-language models (VLMs) advance video understanding but operate under tight computational budgets, making performance dependent on selecting a small, high-quality subset of frames. Existing frame sampling strategies, such as uniform or fixed-budget selection, fail to adapt to variations in content density or task complexity. To address this, we present FrameOracle, a lightweight, plug-and-play module that predicts both (1) which frames are most relevant to a given query and (2) how many frames are needed. FrameOracle is trained via a curriculum that progresses from weak proxy signals, such as cross-modal similarity, to stronger supervision with FrameOracle-41K, the first large-scale VideoQA dataset with validated keyframe annotations specifying minimal sufficient frames per question. Extensive experiments across five VLMs and six benchmarks show that FrameOracle reduces 16-frame inputs to an average of 10.4 frames without accuracy loss. When starting from 64-frame candidates, it reduces inputs to 13.9 frames on average while improving accuracy by 1.5%, achieving state-of-the-art efficiency-accuracy trade-offs for scalable video understanding.

09.
arXiv (CS.AI) 2026-06-11

Forecasting Future Behavior as a Learning Task

arXiv:2606.11445v1 Announce Type: new Abstract: Trust in an AI system is often anchored by explanations of how it works, which one then uses to forecast its behavior on new inputs. For large reasoning models (LRMs), this conventional route is particularly difficult to follow: explanation methods for single token generations do not naturally generalize to long trajectories, and the trajectories themselves are often not faithful when read as natural language. We propose an alternative that bypasses the explanation step: treat behavior forecasting as a learnable task and train Behavior Forecasters that operates on a single reasoning trajectory to make the same forecasts one would typically seek from an explanation. The forecaster's training data is obtained by querying the LRM with no human annotation, and its inference is done in a single forward pass. We instantiate this approach on two tasks: how likely the LRM is to repeat its answer on re-runs, and how removing parts of the input changes its answer. We evaluate this approach on both tasks across three diverse reasoning datasets and find that trained Behavior Forecasters are more accurate than GPT-5.4 and Claude Opus-4.6 reading the same trajectories as naive readers, at a small fraction of their inference cost. We find that fine-tuning the backbone end-to-end and initializing it from the target LRM are each necessary for strong performance. These results show that the reasoning trajectory carries information about the LRM's future behavior that goes beyond what naive reading conveys.

10.
arXiv (CS.AI) 2026-06-15

Efficient Temporal Modeling for Mobile Sleep Staging via Lightweight Random Attention

arXiv:2606.13694v1 Announce Type: cross Abstract: Mobile sleep staging serves as a foundational infrastructure for in-home sleep monitoring and closed-loop modulation. But existing sequential models such as RNNs and Transformers are computationally expensive for mobile deployment. In this paper, we propose Random Attention (RA), a lightweight temporal modeling module based on fixed random projections, which replaces learnable sequence modeling with similarity-based aggregation. RA introduces little additional parameters beyond the epoch encoder while enabling effective temporal smoothing. We further provide a theoretical interpretation via the Random Attention Prior Kernel (RAPK), which decomposes RA into a global smoothing term and a feature similarity term, offering an interpretable view of temporal sleep structure. Experiments on Sleep-EDF-20 and Sleep-EDF-78 show that RA consistently improves epoch-wise baselines by 1-3\% in accuracy and F1 score, while achieving competitive performance compared with LSTM, GRU, and Transformer models. RA also demonstrates strong generalization across different backbone encoders and improved robustness over conventional temporal smoothing methods. These results indicate that efficient sleep staging can be achieved through lightweight similarity-based temporal aggregation, making RA suitable for real-time wearable applications.

11.
arXiv (CS.AI) 2026-06-17

Querying an astronomical database using large language models: the ALeRCE text-to-SQL system

arXiv:2606.18108v1 Announce Type: cross Abstract: We develop a text-to-SQL (structured query language) system based on large language models (LLMs) using in-context learning and apply it to the Automatic Learning for the Rapid Classification of Events (ALeRCE) astronomical database. ALeRCE is a community broker for the Zwicky Transient Facility and the Vera C. Rubin Observatory. The system enables users to query the database in natural language (NL) and generates executable SQL queries. To develop and evaluate the system, we constructed a dataset of 110 NL/SQL pairs. We propose a step-by-step generation framework comprising four modules: schema linking, query classification, prompt decomposition, and self-correction. The performance of thirteen LLMs is evaluated using in-context learning and prompt engineering techniques. Text-to-SQL performance is assessed using the perfect-match (PM) rate for row identifiers (e.g., object identifiers) and column identifiers (i.e., column names). The proposed step-by-step framework consistently outperforms a direct-inference baseline, while the self-correction module consistently reduces execution errors. For Claude Opus 4.6, PM performance on row (column) identifiers is high for simple queries, reaching 0.97 (0.94), and decreases with query complexity to 0.44 (0.72) for medium queries and 0.59 (0.49) for hard queries. Among the thirteen evaluated models, the best-performing LLMs for the text-to-SQL task are Claude Opus 4.6, Gemini 2.5 Pro, Gemini 3 Flash, and GPT-5.2-Codex.

12.
arXiv (CS.AI) 2026-06-16

MASCOT-Android: A Curated Dataset and Automated Collection Pipeline for Android Malware Source Code Specimens

arXiv:2606.16072v1 Announce Type: cross Abstract: Compared with binaries and decompiled code, malware source code more directly reflects the attackers' original intent. However, the scarcity of source code and the high cost of manual review make such datasets difficult to build and maintain. We propose MASCOT-Android, a curated dataset of Android malware source code and an automated collection framework for scalable malware source code discovery on GitHub. A key finding of our work is that repository-level documentation alone provides a strong signal for malware source code collection. Our model extracts character-level TF-IDF features from 8,772 malware and 25,747 benign README documents and trains a LinearSVC classifier to distinguish malware repositories. This README-only model achieves an accuracy of 96.28\% and an FPR of 1.06\% in local evaluation. In addition, the model outputs confidence scores, allowing users to adjust the decision threshold to balance FPR and coverage, which is practical in real-world malware source code collection.

13.
medRxiv (Medicine) 2026-06-11

Advancing Clinical Implementation of Cardiovascular Polygenic Risk Scores Through Patient-Level Robustness Assessment

Background and Aims: Polygenic risk scores (PRSs) for atherosclerotic cardiovascular disease (ASCVD) can perform equivalently at the population level yet disagree for individual patients. We examined whether such intra-individual variability reflects genuinely complementary risk information or mainly statistical and methodological uncertainty, and whether it affects clinical classification once PRSs are integrated into SCORE2-OP. Methods: In 4,137 ASCVD-free participants of the CoLaus|PsyCoLaus cohort (478 incident events over a median 14.4 years), we identified 16 ASCVD-PRSs with practically equivalent population-level performance using Bayesian equivalence testing. We quantified intra-individual variability (standard deviation, coefficient of variation, intraclass correlation, Cohen's kappa, extreme discordance), tested whether discordance exceeded chance, decomposed scores into shared and unique genetic components, and assessed variability after integration into SCORE2-OP, benchmarked against perturbation of systolic blood pressure. Results: For a typical individual, risk estimates varied by 18 percentile points across PRSs. Discordance matched chance expectations under a shared-signal model, with no distinct phenotypic profile among discordant individuals, and predictive power resided overwhelmingly in the shared genetic component. Variability tracked PRS size and weighting rather than distinct variants. After integration into SCORE2-OP, 75.6% of participants were placed in different categories by at least one model and 54.6% as both low and high risk; instability was concentrated near guideline thresholds and far exceeded that from blood-pressure measurement error. Conclusions: Equivalent population-level performance is not sufficient to treat PRSs as interchangeable at the individual level, and methodological standardisation and pragmatic clinical trials remain necessary to determine whether PRS integration improves long-term cardiovascular outcomes.

14.
arXiv (CS.AI) 2026-06-15

HarnessX: A Composable, Adaptive, and Evolvable Agent Harness Foundry

arXiv:2606.14249v1 Announce Type: new Abstract: AI agent performance depends critically on the runtime harness, comprising the prompts, tools, memory, and control flow that mediate how a model observes, reasons, and acts. Yet today's harnesses remain largely hand-crafted and static: each new model or task still demands bespoke scaffolding, and the rich traces produced during execution are rarely distilled back into systematic improvement. We introduce HarnessX, a foundry for composable, adaptive, and evolvable agent harnesses. HarnessX assembles typed harness primitives via a substitution algebra, adapts them through AEGIS, a trace-driven multi-agent evolution engine grounded in an operational mirror between symbolic adaptation and reinforcement learning, and closes the harness-model loop by turning trajectories into both harness updates and model training signal. Across five benchmarks (ALFWorld, GAIA, WebShop, tau^3-Bench, and SWE-bench Verified), HarnessX yields an average gain of +14.5% (up to +44.0%), with gains largest where baselines are lowest. These results suggest that agent progress need not come from model scaling alone: composing and evolving runtime interfaces from execution feedback is an actionable and complementary lever. The complete codebase will be open-sourced in a future release.

15.
arXiv (CS.CL) 2026-06-16

SkillWiki: A Living Knowledge Infrastructure for Agent Skills

While knowledge is managed through Wikipedia and software through GitHub, agent skills still lack an infrastructure for large-scale production, governance, and evolution. SkillWiki is a living knowledge infrastructure that supports the organization, grounding, and continuous evolution of agent skills by transforming heterogeneous knowledge into reusable skill assets linked to their originating evidence. Our demonstration presents the complete skill lifecycle, from knowledge ingestion and skill production to provenance-aware exploration, governance, and execution-driven evolution. SkillWiki highlights a future in which knowledge, skills, and execution experience co-evolve within a shared infrastructure. The live demonstration and source code are publicly available at https://github.com/Huangdingcheng/SkillWiki.

16.
PLOS Computational Biology 2026-06-22

GrassSV – hybrid method to detect structural variants in high throughput DNA-seq data

by Dominik Witczak, Krzysztof Sychla, Julia Wysocka, Artur Laskowski, Wojciech Frohmberg, Marta Glowacka, Alicja Dzik, Piotr Lukasiak, Jacek Blazewicz, Aleksandra Swiercz Genetic diversity is crucial for populations to adapt and survive in dynamic environments. This diversity arises from genetic mutations, which manifest in the genome as structural variants (SVs). Several types of SVs exist, but not all are equally easy to detect. Current SV detection tools tend to specialize in certain SV types or require the use of multiple tools to obtain a comprehensive variant profile, which increases computational cost and complexity. While some methods excel at identifying breakpoints, they often struggle with accurately classifying variant types, and their precision depends strongly on data quality and sequencing technology. At present, the majority of available genomic data originates from high-quality short reads, which remain the most affordable sequencing technology. In this manuscript, we introduce GrassSV, a novel and computationally efficient method that employs a hybrid pattern-matching approach to detect all major classes of structural variants using short-read sequencing data. GrassSV integrates depth-of-coverage analysis with contig-based pattern recognition to ensure both sensitivity and precision while minimizing false positives and runtime. Its robustness was demonstrated on the human Genome in a Bottle dataset, as well as on synthetic data derived from the yeast genome, where it achieved high accuracy across all SV types at a lower computational cost compared to existing methods. This makes GrassSV a practical alternative to multi-tool pipelines typically required for comprehensive SV detection. GrassSV is available at https://github.com/Domomod/GrassSV under GPL-3.0 license and the benchmark at: https://github.com/Domomod/GrassBenchmark.

17.
bioRxiv (Bioinfo) 2026-06-11

A Deep Hypergraph Learning Model for Predicting Antimicrobial Combination Effects Across Bacterial Targets

Antimicrobial resistance (AMR) creates an urgent need for efficient strategies to identify effective antibacterial combinations. Combination therapy, including antimicrobial peptides (AMPs) paired with conventional antibiotics, is a promising approach, but exhaustive experimental screening across drug pairs and bacterial targets is impractical. This study introduces a hybrid GCN-based hypergraph neural network (HGNN) for predicting antimicrobial-agent combination outcomes against bacterial targets. Each antimicrobial-agent-antimicrobial-agent-bacterium triplet is represented as a ternary hyperedge, enabling the model to learn context-dependent interaction patterns. The framework integrates SMILES-derived molecular graph embeddings for antimicrobial agents, including conventional antibiotics and AMPs, with taxonomy-derived bacterial representations. The prediction task was formulated as a three-class classification problem: synergy, antagonism, and non-interaction. The non-interaction class included experimentally verified indifferent records and synthetic presumed non-interaction triplets generated by negative sampling. Model development used drug-pair-grouped splitting, five-fold grouped cross-validation within the training/validation partition, and final evaluation on a held-out test set. On the held-out three-class test set, the selected GCN-based HGNN achieved an accuracy of 0.83, weighted F1-score of 0.84, macro F1-score of 0.80, and ROC-AUC of 0.95. Per-class evaluation showed accuracies of 0.80 for synergy, 0.92 for antagonism, and 0.85 for non-interaction. Pair-type analysis showed strong performance across AMP-AMP, AMP-conventional antibiotic, and conventional antibiotic-conventional antibiotic combinations. These findings suggest that hypergraph-based representation learning can support computational prioritization of antimicrobial combinations for experimental follow-up. Further studies will be needed to improve model interpretability and to perform prospective validation of predicted synergistic combinations.

18.
Nature (Science) 2026-06-22

Stereoretentive decarbonylative C(sp<sup>3</sup>)-C(sp<sup>3</sup>) cross-coupling

Authors:

While C(sp3)–C(sp3) bond-forming cross-coupling methods have become more common, stereocontrolled bond-formation remains a challenge,1 despite its importance for drug discovery, where there is a emerging demand for molecules with increased sp3 character.2-4 Enantiospecific cross-coupling approaches would complement advances in enantioselective coupling,5-8 but have been limited to specialized substrates with lower availability5,9 because stereospecific oxidative addition of more abundant chiral alkyl electrophiles is unknown.10 Inspired by the classic, stereoretentive Curtius rearrangement,11 herein we disclose a catalytic strategy that proceeds by an analogous stereoretentive decarbonylation step to form a versatile chiral alkylnickel intermediate from easily-available chiral amino-acid and α-hydroxy-acid derivatives. The chiral alkylnickel intermediates decompose and/or racemize on the order of minutes, but are sufficiently stable to enable stereoretentive cross-electrophile coupling12 with alkyl radicals (derived from alkyl iodides) at relatively low temperature (22-40 °C). This mechanistic strategy provides a straightforward approach to stereocontrolled C(sp3)–C(sp3) bond formation, including diastereomers that are inaccessible by stereoselective radical mechanisms. The “metallo-Curtius” strategy described in this study lays a mechanistic foundation for the development many new stereospecific cross-coupling reactions.

19.
arXiv (CS.AI) 2026-06-16

EChO-Agent: Evidence Chain Orchestration Agent for Audio Reasoning

arXiv:2606.15141v1 Announce Type: cross Abstract: While LALMs show promise on audio question answering, they fail to focus on question-relevant segments of audio and provide a clear, checkable reasoning process when dealing with complex audio reasoning. Reinforcement learning and tool-augmented prompting can help models better relate questions to audio but lack a reliable way to understand, integrate, and self-verify audio segments. To address this gap, we present EChO-Agent, a modular agent framework that reformulates complex audio QA as a planning, tool execution, evidence integration, and answer verification workflow. Experiments on MMAR benchmark show EChO-Agent improves both accuracy and rubric scores over baseline and ablation studies show evidence integration is the key factor.

20.
arXiv (CS.CV) 2026-06-16

SceneCraft: Interactive System for Image Editing via Scene Graph

Recent advances in generative AI have enabled natural language-driven image editing, yet existing systems often fail in complex scenes with multiple interacting objects because they rely heavily on users crafting precise text prompts. To address the absence of structured control, we propose SceneCraft, a novel interactive framework that bridges user intent and model execution by representing images as editable scene graphs. Instead of guessing text prompts through trial and error, users interact directly with a visual graph to perform complex spatial and relational operations. These graph modifications are automatically translated into precise, context-aware editing prompts, effectively eliminating linguistic ambiguity. To ensure robust and diverse results, structured prompts are dispatched to multiple state-of-the-art generative models. Evaluations across diverse editing scenarios show that SceneCraft provides a more intuitive control mechanism, significantly reducing the cognitive burden of manual prompt engineering while generating outputs that users consistently rate as higher in quality and fidelity.

21.
arXiv (CS.LG) 2026-06-16

LoComposition: Terrain-Adaptive Energy-Efficient Quadruped Locomotion without Gait Priors

arXiv:2606.15896v1 Announce Type: cross Abstract: Learning-based quadrupedal locomotion typically relies on complex reward formulations that entangle task specification, operational limits, gait preference, and terrain adaptation within a single optimization objective. We instead treat these functions through distinct mechanisms: rewards for task specification, constraints for operational limits, energy minimization for gait preference, and exteroceptive perception for adapting energy use to terrain difficulty. We show that these components jointly enable efficient, terrain-adaptive locomotion, and that removing each component exposes a distinct failure mode. Our formulation removes explicit gait priors (including air-time, contact-count, and foot-clearance targets) in favor of emergent behavior. Compared to a conventional complex-reward baseline, our formulation achieves comparable terrain traversal while reducing cost of transport by 56% and operational-limit violations by 96%. The resulting policies transfer zero-shot to a physical Unitree Go2 using LiDAR-based elevation mapping. Project website with videos: https://tinyurl.com/locomposition.

22.
medRxiv (Medicine) 2026-06-10

A Heterogeneous Graph Neural Network Framework for Multi-Horizon Stroke Mortality Prediction

Background: Machine learning models for stroke mortality prediction typically treat each time horizon independently and use flat tabular features that ignore the relational structure of electronic health records (EHRs). In this pilot study, we leveraged graph-based machine learning models to predict post stroke all-cause-mortality across three different time horizons. Methods: We developed Stroke Temporal Heterogeneous Graph (StrokeTHG), a heterogeneous graph neural network model for simultaneous multi-horizon stroke mortality prediction (30-day, 90-day, 1-year) using EHR data from Penn State Health System. The model encodes various relations among EHR entities (e.g., patient, diagnosis, comorbidity) and temporal encoding of admission time to better predict stroke mortality. We compared our proposed approach against various baseline methods, including Logistic Regression, Random Forest, and XGBoost. We also performed ablation and subgroup analyses, evaluated the quality of learned graph embeddings, and assessed the importance of different edge types in the graph. Results: We included 4,144 stroke patients (mean age 69.2 years; 54.3% men), of whom 3,332 (80.4%) survived their stroke after one year. 30-day, 90-day, and 1-year mortality rates were 9.7%, 13.7%, and 19.6%, respectively. Our proposed approach, StrokeTHG, achieved AUROC of 0.872, 0.878, and 0.837 across horizons, outperforming all tabular baselines. At [&ge;] , 75% specificity, the model identified 5-10 percentage points more mortality cases than the best baseline at each horizon. Subgroup analysis demonstrated consistent performance across sex subgroups and the largest discriminative gains in the Age 65-80 stratum. Edge-type ablation identified phenotype-patient and admission-patient edges in the constructed EHR graph as the most influential relational edges for mortality prediction. StrokeTHG embeddings outperformed all graph and matrix factorization baselines under an identical downstream classifier, confirming that performance gains stem from representation quality rather than classifier capacity. Conclusions: StrokeTHG demonstrates that heterogeneous graph representations of EHR data provide a consistent improvement over flat tabular models for multi-horizon stroke mortality prediction, with particular advantage at clinically actionable sensitivity thresholds and novel multi-horizon monotonic prediction capability. This methodological framework may be adaptable to other EHR-based clinical research studies seeking to leverage heterogeneous relational structures for predictive modeling.

23.
arXiv (CS.LG) 2026-06-16

M-CTX: Exact and Scalable Spatial Context Retrieval for Trajectory Analytics

arXiv:2606.15244v1 Announce Type: new Abstract: Modern trajectory predictors increasingly condition on external spatial context, such as map geometry, signed distance fields (SDFs), and nearby moving agents. While this context improves prediction quality, constructing it for every training anchor has become a hidden systems bottleneck. In a representative maritime AIS pipeline, spatial context construction requires roughly 17 CPU-days for a 5.48M-anchor corpus, dominating the cost of the downstream predictor. We present M-CTX, an exact and scalable spatial context-retrieval framework for trajectory analytics. M-CTX recasts context construction as an ingest-once, query-many spatial database workload and replaces three brute-force stages – OSM range retrieval, SDF computation, and moving-vessel neighbour lookup – with composable, index-backed operators. Its learned range-index backend, BR-LZ, provides recall-complete MBR-overlap range retrieval and reduces candidate amplification by 1.1x–2.7x relative to global-expansion one-curve baselines. Across four maritime regions, eight baseline systems, synthetic workloads with up to 40M spatial features, and 10^7-record AIS streams, M-CTX reproduces the reference context exactly. On the 5.48M-anchor corpus, it reduces context construction from about 17 CPU-days to 1.8 hours, a measured 226x end-to-end speed-up. An optional storage mode further compresses SDF context by 64x with only a 0.04 m ADE change. These results establish exact spatial context retrieval as a first-class database problem in modern trajectory analytics. Code and datasets are publicly available at https://github.com/mark000071/M-CTX-Traj.

25.
arXiv (CS.CL) 2026-06-12

Causal Inference with Generative Artificial Intelligence: Application to Texts as Treatments

In this paper, we demonstrate how to enhance the validity of causal inference with unstructured high-dimensional treatments like texts, by leveraging the power of generative Artificial Intelligence (GenAI). Specifically, we propose to use a deep generative model such as large language models (LLMs) to efficiently generate treatments and use their internal representation for subsequent causal effect estimation. We show that the knowledge of this true internal representation helps disentangle the treatment features of interest, such as specific sentiments and certain topics, from other possibly unknown confounding features. Unlike existing methods, the proposed GenAI-Powered Inference (GPI) methodology eliminates the need to learn causal representation from the data, and hence produces more accurate and efficient estimates. We formally establish the conditions required for the nonparametric identification of the average treatment effect, propose an estimation strategy that avoids the violation of the overlap assumption, and derive the asymptotic properties of the proposed estimator through the application of double machine learning. Finally, using an instrumental variables approach, we extend the proposed GPI methodology to the settings in which the treatment feature is based on human perception. The GPI is also applicable to text reuse where an LLM is used to regenerate existing texts. We conduct simulation and empirical studies, using the generated text data from an open-source LLM, Llama 3, to illustrate the advantages of our estimator over state-of-the-art causal representation learning algorithms.