Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-12

Helping Figures Tell their Story! Paper-Grounded Video Generation Explaining Complex Scientific Figures

Scientific figures compress complex pipelines into a single canvas, yet understanding them requires paper-grounded, step-by-step narration aligned with visual highlights a capability missing from current video generation systems and benchmarks. To address this, we introduce paper-grounded figure-to-video generation: generating narrated, region-grounded walkthrough videos from a figure and its paper. We propose MINARD (Multimodal Interpretation of Narrated Architecture via Region Decomposition), a pipeline that generates paper-grounded narrations and sequentially grounds them to figure regions. We also release FigTalk, a benchmark with new sequential and component-level grounding metrics derived. On FigTalk, MINARD generates humanlike, paper-faithful narrations and outperforms narration-conditioned figure spatial grounding compared to existing approaches in both automatic and human evaluation

02.
arXiv (CS.CV) 2026-06-16

Analyzing Visual Aircraft Representations with Sparse Autoencoders

Vision models can achieve strong performance on classification tasks, but the internal representations supporting their predictions are often difficult to interpret. This work investigates whether sparse autoencoders can decompose intermediate representations of a vision model into interpretable features. We train a ConvNeXt classifier on the FGVC-Aircraft dataset, extract spatial activations from its final feature stage, and train a sparse autoencoder on these activations. The learned sparse features are analyzed using top-activating image patches, activation strength, and class selectivity. Qualitative visual inspection reveals that several features correspond to recognizable aircraft structures and visual patterns. We evaluate a subset of selected features using input-space and feature-space ablations, measuring how blurring image patches and suppressing sparse features affect class logits, classification margins, and prediction confidence. The results suggest that sparse autoencoders can reveal partially interpretable, class-relevant visual features associated with aircraft recognition, while also exposing limitations such as polysemanticity and coarse spatial localization.

03.
arXiv (CS.AI) 2026-06-18

Mitigating Anchoring Bias in LLM-Based Agents for Energy-Efficient 6G Autonomous Networks

arXiv:2606.18272v1 Announce Type: cross Abstract: This paper presents an autonomous agentic resource negotiation framework designed to enable zero-touch network slicing in 6G architectures using Large Language Model (LLM) agents. While LLMs offer powerful reasoning capabilities, we demonstrate that such agents inherently suffer from anchoring bias, rigidly adhering to initial heuristic proposals and causing severe network over-provisioning. To systematically mitigate this cognitive bias, we propose a novel randomized anchoring strategy modeled via a Truncated 3-Parameter Weibull distribution. This mathematically bounded approach seamlessly integrates with burst-aware Digital Twins (DTs) employing Conditional Value at Risk (CVaR) to rigorously guarantee strict Service Level Agreement (SLA) tail-latencies. To validate our methodology, we introduce and prove the Bimodal Constraint-Avoidance Utility Theorem, demonstrating that while feasible negotiations follow classical convex bounds, highly constrained scenarios undergo a phase transition governed by an inverse rational decay envelope. Empirical results generated using a locally hosted 1B-parameter model (\texttt{otel-llm-1b-it}) confirm these dual-regime bounds. Our cognitive de-biasing successfully dismantles rigid negotiation patterns, forcing agents into active exploration to safely ride SLA boundaries and boost system energy savings up to 25\%. Crucially, the lightweight 1B LLM achieves sub-second inference latencies (0.95s mean), ensuring our multi-agent framework is compatible with the operational timescales of the O-RAN non-Real-Time RAN Intelligent Controller (non-RT RIC)\footnote{Our source code is available for non-commercial use at https://github.com/HatimChergui.

04.
PLOS Computational Biology 2026-06-01

BeetleAtlas 2: An enhanced <i>Tribolium castaneum</i> web resource for tissue and developmental transcriptomics allowing refinement of gene predictions

by David P. Leader, Muhammad T. Naseem, Janina L. Rinke, Kenneth Veland Halberg BeetleAtlas is an online resource for tissue- and stage-specific transcriptomics in the red flour beetle, Tribolium castaneum. On updating from the original Tcas5.2 genome assembly to the more recent improved icTriCast1.1 genome assembly it became evident that there were major discrepancies between the gene models of the two genome annotations in use: the OGS3 and the NCBI gene sets. As neither was clearly superior we implemented a new design in BeetleAtlas 2 (beetleatlas.org) comprising two parallel ‘modes’ — one incorporating results using the NCBI gene models and a second incorporating those using the OGS3 gene models. This allows direct comparison where equivalent gene models exist: 50–57% of cases. To aid resolution of discrepancies between the two gene model sets and verification of results, gene models are linked to a custom visualization of RNA-seq read coverage of the genome in the UCSC Genome Browser. This displays reads from 22 tissues and life stages superimposed on the icTriCast1.1 genome assembly. Reference tracks show the NCBI gene models, the OGS3 gene models after translation of their coordinates from the Tcas5.2 assembly, and 1050 discontinued NCBI gene models from the previous assembly after a similar transfer of coordinates. We document various situations in which distinct patterns of expression of the tissues can be used to confirm and extend correlations between the two gene sets, resolve discrepancies between them, make corrections and identify putative genes or exons absent from the current gene sets. BeetleAtlas 2 allows those involved in Tribolium research to avoid the pitfalls inherent in incorrect gene models when planning experiments on specific genes and interpreting the results. It also demonstrates how BeetleAtlas 2 might play an important role in establishing a revised gene set for Tribolium castaneum in the future.

05.
arXiv (CS.CV) 2026-06-11

Towards Conditional Feature Alignment for Cross-Domain Counting

Object counting models often degrade under cross-domain deployment because density composition varies across domains and is itself task-relevant. Standard feature alignment methods tend to suppress such variation by encouraging global domain invariance, which can be harmful when source and target domains contain different proportions of background, sparse foreground, and dense foreground. We propose Conditional Feature Alignment (CFA), a cross-domain counting framework that aligns representations within label-induced conditions rather than across full marginal feature distributions. Given density annotations or pseudo-density predictions, CFA constructs foreground/background or density-level conditions and aligns only features belonging to matching conditions. We formalise this idea through a conditional divergence perspective, showing that conditional alignment removes within-condition discrepancy while preserving condition-marginal density shift. For unsupervised domain adaptation, CFA estimates source conditions from annotations and target conditions from detached pseudo-density maps, then performs condition-wise adversarial alignment with full-image consistency regularisation. For source-domain generalisation, we instantiate the same principle with MPCount by enforcing condition-wise memory-consistency between generated source-domain views. Experiments on crowd and cell counting benchmarks show competitive or improved performance across diverse UDA and DG settings. For example, on JHU-CROWD++ FH$\rightarrow$SN, CFA-DG reduces MAE/RMSE from MPCount's 216.3/421.4 to 90.5/169.9, indicating that condition-wise alignment is especially effective under large weather- and density-induced shifts. These results suggest that condition-wise alignment is a promising design principle for domain-adaptive counting.

06.
arXiv (CS.LG) 2026-06-16

EnvShip-Bench: An Environment-Enhanced Benchmark for Short-Term Vessel Trajectory Prediction

arXiv:2606.15240v1 Announce Type: new Abstract: Vessel trajectory prediction is important for intelligent shipping, maritime surveillance, and navigation safety. However, existing public maritime AIS resources are often limited by inconsistent forecasting protocols, uneven data quality, and the lack of benchmark-ready contextual annotations, which hinder fair comparison and context-aware modeling. To address this gap, we present EnvShip-Bench, a unified benchmark for short-term vessel trajectory prediction built from large-scale raw AIS data from the Danish Maritime Authority (DMA) and NOAA through a common processing pipeline. EnvShip-Bench adopts a standardized forecasting protocol with 10 minutes of observation, 10 minutes of prediction, and 20-second sampling in vessel-centric local metric coordinates. Beyond the large-scale core benchmark, it provides a quality-first compact subset for efficient and reproducible experimentation, together with synchronized environmental and nearby-vessel context extensions. As a result, EnvShip-Bench supports trajectory-only, environment-aware, and interaction-aware forecasting under a unified evaluation framework. Extensive benchmark statistics and analysis demonstrate that EnvShip-Bench offers a standardized, extensible, and context-aware foundation for maritime trajectory forecasting research.

07.
arXiv (CS.CL) 2026-06-15

Efficient Rationale-based Retrieval: On-policy Distillation from Generative Rerankers based on JEPA

Unlike traditional fact-based retrieval, rationale-based retrieval typically necessitates cross-encoding of query-document pairs using large language models, incurring substantial computational costs. To address this limitation, we propose Rabtriever, which independently encodes queries and documents, while providing comparable cross query-document comprehension capabilities to rerankers. We start from training a LLM-based generative reranker, which puts the document prior to the query and prompts the LLM to generate the relevance score by log probabilities. We then employ it as the teacher of an on-policy distillation framework, with Rabtriever as the student to reconstruct the teacher's contextual-aware query embedding. To achieve this effect, Rabtriever is first initialized from the teacher, with parameters frozen. The Joint-Embedding Predictive Architecture (JEPA) paradigm is then adopted, which integrates a lightweight, trainable predictor between LLM layers and heads, projecting the query embedding into a new hidden space, with the document embedding as the latent vector. JEPA then minimizes the distribution difference between this projected embedding and the teacher embedding. To strengthen the sampling efficiency of on-policy distillation, we also add an auxiliary loss on the reverse KL of LLM logits, to reshape the student's logit distribution. Rabtriever optimizes the teacher's quadratic complexity on the document length to linear, verified both theoretically and empirically. Experiments show that Rabtriever outperforms different retriever baselines across diverse rationale-based tasks, including empathetic conversations and robotic manipulations, with minor accuracy degradation from the reranker. Rabtriever also generalizes well on traditional retrieval benchmarks such as MS MARCO and BEIR, with comparable performance to the best retriever baseline.

08.
arXiv (CS.LG) 2026-06-11

Bypassing Prompt Guards in Production with Controlled-Release Prompting

arXiv:2510.01529v4 Announce Type: replace Abstract: Ball et al. recently established that prompt filtering for AI alignment faces a fundamental barrier: under standard cryptographic assumptions, no filter running significantly faster than the protected model can universally distinguish adversarial prompts from benign ones. We investigate whether this impossibility result translates to real-world vulnerabilities in deployed large language model (LLM) systems. We answer affirmatively by introducing controlled-release prompting, a practical instantiation of the theoretical framework that exploits the resource asymmetry between lightweight input filters and the main models they protect. Unlike the theoretical construction, our attack does not require model modification: it generates malicious prompts that are indecipherable by any bounded filter yet remain tractable to the target LLM. We find our attack to be successful on four major chat platforms (Google Gemini, DeepSeek Chat, xAI Grok, and Mistral Le Chat) where baseline methods fail. Additionally, we apply our attack to extract copyrighted data from Gemini. Finally, we provide a systematic evaluation of 14 open-weight prompt guard models, revealing that even reasoning-capable filters cannot reliably detect our attack without incurring prohibitive resource overhead.

09.
arXiv (CS.CV) 2026-06-11

NSVQ: Mitigating Codebook Collapse by Stabilizing Encoder Drift in Vector Quantization

Vector quantization is central to modern generative modeling pipelines, but large-codebook VQ models often suffer from codebook collapse. We identify encoder drift as a key driver of this failure: as the encoder moves the latent distribution, sparsely updated code vectors can lag behind, lose assignments, and increase quantization error, creating a feedback loop through the straight-through estimator. We propose NSVQ, a non-stationary-aware VQ training strategy that combines a dense non-stationary embedding loss, codebook replacement, and stage-wise encoder freezing. NSVQ first helps the codebook track encoder drift during early training, then freezes the encoder to consolidate the codebook under a fixed latent geometry, and finally reintroduces adversarial refinement. Experiments on ImageNet-1k show that NSVQ improves reconstruction quality while maintaining full codebook utilization. On ImageNet-1k at 128$\times$128 with 65,536 codes, NSVQ reduces rFID from 2.39 to 2.10 compared with SimVQ, while both methods maintain 100\% utilization. Additional latent diffusion experiments show that NSVQ also improves downstream ImageNet generation FID.

10.
arXiv (quant-ph) 2026-06-17

Cavity-enhanced superconducting response in an underdoped cuprate

arXiv:2606.18084v1 Announce Type: cross Abstract: Superconductors carry electrical current without resistance when paired electrons condense into a coherent macroscopic quantum state. In underdoped cuprates, evidence suggests that pairing-related correlations and superconducting fluctuations can survive above the temperature at which global coherence is lost, pointing to phase fluctuations as a key limitation on superconductivity in this regime. Motivated by recent demonstrations of cavity-modified collective states in quantum materials, we investigate whether superconducting coherence can be stabilized by engineering the electromagnetic environment of the superconductor. We study an underdoped YBa$_2$Cu$_3$O$_{7-\delta}$ thin film in a tunable terahertz cavity formed with a semi-transparent gold mirror. From temperature-dependent terahertz transmission measurements, we find that the cavity enhances the superconducting response below the critical temperature, with an increase of the inferred superfluid weight. The effect becomes more pronounced at smaller cavity lengths and is accompanied by an upward shift of the superconducting onset temperature. Calculations based on a cavity-coupled model for phase-fluctuating superconductors capture these trends and support an interpretation in terms of cavity-enhanced phase stiffness. These results showcase the potential of cavity engineering for designing emergent functionalities in correlated systems.

11.
arXiv (CS.CV) 2026-06-12

DiskChunGS: Large-Scale 3D Gaussian SLAM Through Chunk-Based Memory Management

Recent advances in 3D Gaussian Splatting (3DGS) have demonstrated impressive results for novel view synthesis with real-time rendering capabilities. However, integrating 3DGS with SLAM systems faces a fundamental scalability limitation: methods are constrained by GPU memory capacity, restricting reconstruction to small-scale environments. We present DiskChunGS, a scalable 3DGS SLAM system that overcomes this bottleneck through an out-of-core approach that partitions scenes into spatial chunks and maintains only active regions in GPU memory while storing inactive areas on disk. Our architecture integrates seamlessly with existing SLAM frameworks for pose estimation and loop closure, enabling globally consistent reconstruction at scale. We validate DiskChunGS on indoor scenes (Replica, TUM-RGBD), urban driving scenarios (KITTI), and resource-constrained Nvidia Jetson platforms. Our method uniquely completes all 11 KITTI sequences without memory failures while achieving superior visual quality, demonstrating that algorithmic innovation can overcome the memory constraints that have limited previous 3DGS SLAM methods.

12.
arXiv (CS.AI) 2026-06-18

Correcting Sensor-Induced Distribution Drift with Wasserstein Adversarial Learning

arXiv:2606.18561v1 Announce Type: cross Abstract: The quality of recorded data depends on the stability of the sensor system that acquires it. Sensor motion and aging can degrade the performance and stability of downstream data-driven methods. We present a Wasserstein-GAN-inspired approach for unsupervised inference of physically interpretable transformation parameters that map a changed detector response distribution back to a nominal reference distribution. In contrast to standard generative modeling, the generator is used as a learnable calibration transformation whose trainable weights represent the sought parameters, while the critic provides a distributional distance signal via the Wasserstein objective. We validate the approach on a tracking-detector toy model with controlled layer shifts and demonstrate its application on high-granularity Geant4-simulated calorimeter data with cell-wise aging effects. The method recovers aging coefficients for individual cells with correlation to ground truth and improves agreement between calibrated and reference energy-sum distributions, while exhibiting the expected degradation at increasing channel-to-channel noise levels. These results indicate that adversarial distribution matching can serve as a data-driven component of calibration strategies in settings where direct labels for degradation parameters are unavailable.

13.
arXiv (CS.LG) 2026-06-15

FlowMo-WM: A World Model with Object Momentum and Hidden Ambient Drift

arXiv:2606.13817v1 Announce Type: cross Abstract: World models in robot learning predict future states from visual observations and actions, enabling agents to reason about the consequences of their controls. However, many action-conditioned models are evaluated in settings where motion is dominated by immediate control, whereas aquatic surface vehicles and other real-world objects continue moving under inertia and are displaced by hidden ambient drift, such as water currents or wind. We propose FlowMo-WM, an end-to-end trainable visual world model that infers object-centric motion state and a predictive long-history context associated with hidden drift from image-action histories without direct supervision of flow fields. FlowMo-WM factorizes image-action history into a short-history latent state, trained to summarize object-centric motion, and a longer-history context, trained to summarize slowly varying exogenous influences. A zero-context residual transition separates action-conditioned base dynamics from context-dependent drift effects during latent rollout. In simulated aquatic surface-vehicle environments with diverse hidden flows, disturbances, and randomized vehicle dynamics, FlowMo-WM improves long-horizon rollout accuracy over representative action-conditioned latent world models. Prediction-time context ablations, in which the inferred context is zeroed or shuffled during rollout, show that the ambient context is important for stable prediction under hidden drift, while frozen linear probes characterize information encoded in the learned factors.

14.
arXiv (CS.CV) 2026-06-18

Cosmos 3: Omnimodal World Models for Physical AI

We introduce Cosmos 3, a family of omnimodal world models designed to jointly process and generate language, image, video, audio, and action sequences within a unified mixture-of-transformers architecture. By supporting highly flexible input-output configurations, Cosmos 3 seamlessly unifies critical modalities for Physical AI – effectively subsuming vision-language models, video generators, world simulators, and world-action models into a single framework. Our evaluation demonstrates that Cosmos 3 establishes a new state-of-the-art across a diverse suite of understanding and generation tasks, demonstrating omnimodal world models as scalable, general-purpose backbones for embodied agents. Our post-trained Cosmos 3 models were ranked as the best open-source Text-to-Image and Image-to-Video models by Artificial Analysis, and the best policy model by RoboArena at the time the technical report was written. To accelerate open research and deployment in Physical AI, we make our code, model checkpoints, curated synthetic datasets, and evaluation benchmark available under the Linux Foundation's OpenMDW-1.1 License at https://github.com/nvidia/cosmos and https://huggingface.co/collections/nvidia/cosmos3. The project website is available at https://research.nvidia.com/labs/cosmos-lab/cosmos3.

15.
arXiv (CS.AI) 2026-06-17

FllumaOne: A Code-Native Multimodal CAD Dataset with Executable Programs and Kernel-Validated Feature Histories

Authors:

arXiv:2606.17696v1 Announce Type: new Abstract: Parametric computer-aided design records both final geometry and the ordered construction history that determines how a part can be edited. Datasets for editable CAD research should therefore expose modeling operations, parameters, and feature dependencies together with validated geometry. We introduce FllumaOne, a code-native multimodal CAD dataset whose models are generated by executable Python programs in Flluma, a Qt/C++ OpenCASCADE-based CAD system. Each sample aligns its program with a structured feature tree, a training-oriented intermediate representation, STEP geometry, a surface point cloud, natural-language descriptions, metadata, and eight canonical visible-edge renderings. The primary release, FllumaOne-100K, contains 100,000 accepted samples across four template-level complexity regimes. Programs are executed and retained only after kernel geometry, solid validity, and export checks; release reports also record modality completeness and split-level duplicate tests. A Qwen2.5-Coder-1.5B LoRA baseline trained on 80,000 samples achieves 99.98% Python syntax validity, 99.97% Flluma build success, and 99.14% STEP-export validity on the held-out 10,000-sample test split. For the 9,909 predictions converted to surface point clouds, the mean normalized Chamfer Distance is 0.002124. The dataset supports conditioned CAD reconstruction, executable program synthesis, feature-tree prediction, B-Rep analysis, retrieval, design completion, and editable reverse engineering.

16.
arXiv (CS.LG) 2026-06-16

Test-Time Compute Scaling for ASR with Depth-Conditioned Looped Transformers

arXiv:2606.04678v2 Announce Type: replace Abstract: End-to-end ASR systems typically use fixed-depth acoustic encoders at inference, making it difficult to trade additional test-time computation for improved recognition without training a larger model. A natural approach is to reuse a shared Transformer block recurrently, but we find that naive looping does not fully exploit additional recurrent compute. We introduce LARM, a depth-conditioned looped Transformer that turns recurrent encoder depth into a controllable test-time compute axis. LARM combines sparse CTC checkpoints, supervision-clock embeddings, FiLM depth conditioning, and delayed soft-posterior feedback. These components structure the loop into recognition checkpoints separated by latent refinement phases and allow shared weights to specialize across recurrent steps. On LibriSpeech, LARM improves WER as the number of inference loops increases and achieves performance competitive with deeper unshared-parameter baselines. Our results show that test-time compute scaling can extend beyond autoregressive language-model reasoning to continuous non-autoregressive speech recognition.

17.
arXiv (CS.AI) 2026-06-16

Poster: EdgeCitadel – Hybrid NATS-MQTT Orchestration for Edge Multi-Agent Systems

arXiv:2606.14710v1 Announce Type: cross Abstract: Edge-resident AI agents increasingly span home servers, IoT hubs, laptops, and phones, yet their coordination stacks still assume cloud-style transports or a central relay. We present EdgeCitadel, an edge multi-agent orchestration platform built around a single NATS 2.10 server with the built-in MQTT adapter. The design combines MQTT connectivity for heterogeneous agents, JetStream-backed persistence and replay for backend services, direct peer delegation over a shared subject namespace, and a passive aggregator that visualizes and stores traffic without sitting on the delivery path. Our poster highlights the migration from MQTT relay prototypes (common in IoT communication) to the current hybrid architecture and demonstrates a working cross-device testbed spanning ARM64, x64, and Android clients.

18.
arXiv (CS.CL) 2026-06-16

Formalize Once, Edit the Rest: Efficient Lean-Based Answer Selection for Math Reasoning

With large language models (LLMs) increasingly applied to mathematical reasoning, formal proof assistants such as Lean can be leveraged to verify reasoning outputs with machine-checkable rigor, enabling use cases such as answer selection in test-time scaling with K sampled candidate answers. However, employing Lean requires that LLM outputs, originally in natural language, first be formalized. Existing Lean-based answer-selection work uses an autoformalization model to generate a formal statement in Lean for each candidate answer independently, incurring a significant computational cost. We propose BASE, a base-and-edit pipeline that formalizes a single base candidate per problem and derives the remaining K-1 statements by editing the answer expression in place. To facilitate this, we train a rewriter model LEANSCRIBE to localize the answer in the base formalization and generate a reusable edit function for the other K-1 candidates. BASE simultaneously improves selection accuracy and reduces formalization cost - a Pareto improvement that holds on all 12 (dataset, solver) configurations across four benchmarks and three solvers, cutting autoformalizer calls by about 5x at K=8, with the reduction expected to become larger as K grows. Code is available at https://github.com/ucr-rai/base-and-edit.

19.
arXiv (quant-ph) 2026-06-11

Diffusive Relaxation of Participation Entropy in U(1)-symmetric Dynamics

arXiv:2606.11561v1 Announce Type: new Abstract: Participation entropy (PE) quantifies the spread of a many-body wavefunction across configuration space. While PE relaxes rapidly in generic chaotic systems, we show that $\mathrm{U}(1)$ conservation laws slow it down by imprinting with the slow hydrodynamic modes. Using a cluster expansion around equilibrium, we show that, after local density inhomogeneities decay, the leading PE deficit is dominated by squared connected density correlations. The long time relaxation is therefore controlled by diffusive correlation spreading, giving $\Delta S(t)\sim t^{-1/2}$ in the hydrodynamic regime and crossing over to $\sim \exp[-O(t/L^2)]$ when $t\geq L^2$. We confirm this entropy correlation relation using exact computation and infinite system tensor network simulations in various quantum $\mathrm{U}(1)$ conserving circuits. Our results establish PE as a sensitive probe of hydrodynamic memory and suggest that slow relaxation is a generic consequence of conservation laws.

20.
arXiv (CS.CV) 2026-06-16

MolSight: Molecular Property Prediction with Images

Every molecule ever synthesised can be drawn as a 2D skeletal diagram, yet in modern property prediction this universally available representation has received less focus in favour of molecular graphs, 3D conformers, or billion-parameter language models, each imposing its own computational and data-engineering overhead. We present $MolSight$, the first systematic large-scale study of vision-based Molecular Property Prediction (MPP). Using 10 vision architectures, 7 pre-training strategies, and $2\,M$ molecule images, we evaluate performance across 10 downstream tasks spanning physical-property regression, drug-discovery classification, and quantum-chemistry prediction. To account for the wide variation in structural complexity across pre-training molecules, we further propose a $chemistry-informed curriculum$: five structural complexity descriptors partition the corpus into five tiers of increasing chemical difficulty, consistently outperforming non-curriculum baselines. We show that a single rendered bond-line image, processed by a vision encoder, is sufficient for competitive molecular property prediction, i.e. $chemical insight from sight alone$. The best curriculum-trained configuration achieves the top result on $5 of 10$ benchmarks and top two on $all 10$, at $$80$\times$ lower$$ FLOPs than the nearest multi-modal competitor.

21.
arXiv (CS.CL) 2026-06-19

Prompt, Plan, Extract: Zero-Shot Agentic LLMs Workflows for Lung Pathology Extraction from Clinical Narratives

Information extraction from pathology reports is essential for cancer staging, tumor registry population. Yet key data remains embedded in narrative reports, making manual extraction labor-intensive and error-prone. Traditional supervised Natural Language Processing pipelines address this through fully supervised Named Entity Recognition and Relation Extraction, but require expensive manual annotation and suffer cascading failures when upstream entities are missed. In this study, we developed a zero-shot, agentic workflow, and evaluated five open-source generative Large Language Models (LLMs) to populate 13 College of American Pathologists synoptic fields from lung resection pathology reports. We compared them against a state-of-the-art supervised GatorTron NER-RE baseline using a novel, registry-aligned evaluation framework. The baseline achieved Micro-F1of 0.960, while the best zero-shot model (GPT-OSS-20B) achieved Micro-F1 of 0.893 (recall: 0.949), accurately extracting complex relations like Pathologic Stage without task-specific training. These results suggest that open-source, zero-shot agentic LLMs are a low-cost solution for extracting lung pathology information.

22.
arXiv (CS.LG) 2026-06-15

Lower Complexity Bounds for Nonconvex-Strongly-Convex Bilevel Optimization with First-Order Oracles

Authors:

arXiv:2511.19656v3 Announce Type: replace Abstract: Although upper bound guarantees for bilevel optimization have been widely studied, progress on lower bounds has been limited due to the complexity of the bilevel structure. In this work, we focus on the smooth nonconvex-strongly-convex setting and develop new hard instances that yield nontrivial lower bounds under deterministic and stochastic first-order oracle models. In the deterministic case, we prove that any first-order zero-respecting algorithm requires at least $\Omega(\kappa^{3/2}\epsilon^{-2})$ oracle calls to find an $\epsilon$-accurate stationary point, improving the optimal lower bounds known for single-level nonconvex optimization and for nonconvex-strongly-convex min-max problems. In the stochastic case, we show that at least $\Omega(\kappa^{5/2}\epsilon^{-4})$ stochastic oracle calls are necessary, again strengthening the best known bounds in related settings. Our results expose substantial gaps between current upper and lower bounds for bilevel optimization and suggest that even simplified regimes, such as those with quadratic lower-level objectives, warrant further investigation toward understanding the optimal complexity of bilevel optimization under standard first-order oracles.

23.
arXiv (CS.CL) 2026-06-17

EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning

Reinforcement learning (RL) has emerged as a powerful paradigm for training Large Language Models (LLMs) as agents. However, conventional RL methods for long-horizon agentic tasks often struggle with sparse outcome rewards. Intuitively, this overlooks the rich environment dynamics information contained in rollout interaction trajectories. We argue that the interaction experience inherently serves as an implicit supervision signal, reveals the underlying transition mechanisms of the environment, and enables the agent to construct a more accurate internal model of the environment.. Therefore, in this work, we investigate how to leverage this additional signal to improve policy learning. Specifically, we propose EnvRL, a framework that incorporates environment dynamics learning into agentic RL via two auxiliary objectives: state prediction and inverse dynamics. By jointly optimizing with the primary RL objective, we encourage the agent to internalize environment dynamics from its own interaction experience. Extensive experiments on two long-horizon agentic benchmarks demonstrate that EnvRL achieves significant improvements on success-rates over RL-only baselines, e.g., when trained with GRPO, lifting Qwen-2.5-1.5B-Instruct from 72.8% to 77.4% on ALFWorld, and from 56.8% to 67.0% on WebShop.

24.
bioRxiv (Bioinfo) 2026-06-23

Model-based inference of gene expression noise from single-cell RNA-sequencing data

The heterogeneity of expression levels among genetically identical cells, termed gene expression noise, is a property of the gene expression process whose importance in the biology of organisms and their evolution is increasingly recognized. Measuring gene expression noise requires single-cell expression data, as obtained from single-cell RNA sequencing (scRNASeq). Its estimation, however, is challenging owing to (i) the presence of technical noise in addition to biological noise, and (ii) the heterogeneity of cell types in the sampled population. We propose a maximum-likelihood framework to infer biological noise from scRNASeq data, while accounting for technical noise, dropout probabilities, and distinct cell sequencing depths. We demonstrate the parameter identifiability using simulations and that the resulting noise estimates are uncorrelated from the mean gene expression, and therefore do not need extra correction in downstream analyses, easing intra- and inter- genome comparisons. Using two technical replicates of scRNASeq data from the wild yeast *Saccharomyces paradoxus*, we show that expression noise can be inferred in a reproducible manner.

25.
arXiv (CS.AI) 2026-06-11

PROJECTMEM: A Local-First, Event-Sourced Memory and Judgment Layer for AI Coding Agents

arXiv:2606.12329v1 Announce Type: new Abstract: AI coding assistants now support a growing share of software work, from quick scripts to production applications. Yet these agents remain largely stateless: each new session re-reads project files, re-derives prior decisions, and - most costly - may repeat debugging attempts that already failed. Reconstructing this context can consume an estimated 5,000-20,000 tokens per session; the bottleneck is often not model capability but missing project memory. We present projectmem, an open-source, local-first memory and judgment layer for AI coding agents. projectmem records development as an append-only, plain-text event log of typed events - issues, attempts, fixes, decisions, and notes - and deterministically projects that log into compact, AI-readable summaries served through the Model Context Protocol (MCP). Beyond storage, projectmem adds a deterministic pre-action gate that warns an agent before it repeats a previously failed fix or edits a known-fragile file. We frame this as Memory-as-Governance: memory that does not merely answer the agent but acts on its next action. The system runs fully offline with no telemetry; its immutable log also serves as a provenance trail for reproducible, auditable AI-assisted development. projectmem ships as a three-dependency Python package (14 MCP tools, 19 CLI commands, 37 automated tests) and is evaluated through a two-month self-study across 10 projects comprising 207 logged events. Source code: https://github.com/riponcm/projectmem.