Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-18

A Convex Route to Thermoelasticity: Learning Internal Energy and Dissipation

arXiv:2603.28707v3 Announce Type: replace-cross Abstract: We present a physics-based neural network framework for the discovery of constitutive models in fully coupled thermomechanics. In contrast to classical formulations based on the Helmholtz energy, we adopt the internal energy and a dissipation potential as primary constitutive functions, expressed in terms of deformation and entropy. This choice avoids the need to enforce mixed convexity–concavity conditions and facilitates a consistent incorporation of thermodynamic principles. In this contribution, we focus on materials without preferred directions or internal variables. While the formulation is posed in terms of entropy, the temperature is treated as the independent observable, and the entropy is inferred internally through the constitutive relation, enabling thermodynamically consistent modeling without requiring entropy data. Thermodynamic admissibility of the networks is guaranteed by construction. The internal energy and dissipation potential are represented by input convex neural networks, ensuring convexity and compliance with the second law. Objectivity, material symmetry, and normalization are embedded directly into the architecture through invariant-based representations and zero-anchored formulations. We demonstrate the performance of the proposed framework on synthetic and experimental datasets, including purely thermal problems and fully coupled thermomechanical responses of soft tissues and filled rubbers. The results show that the learned models accurately capture the underlying constitutive behavior. All code, data, and trained models are made publicly available via https://doi.org/10.5281/zenodo.19248596.

02.
medRxiv (Medicine) 2026-06-22

Building accessible resources to empower communities: the case of the Lupus Mexican Registry

Motivation: Although SLE data in Latin America is increasing, clinical datasets remain difficult to access and interpret, highlighting the need for accessible tools that support data-driven precision medicine, citizen science, and public health initiatives. Results: We developed a user-friendly platform that enables us to explore LupusRGMX data through interactive queries, report generation, statistical modeling, and comprehensive insights. This resource supports community-oriented research, improves the visibility of underrepresented populations in lupus research, and provides a useful tool to enhance data accessibility. Availability and implementation: Developed in R using Shiny and bslib for interactive visualization and interface design. Available at https://github.com/NeuroGenomicsMX/Lupus_App_2.0 and https://lupusrgmx.liigh.unam.mx/shiny/lupus/

03.
arXiv (CS.CV) 2026-06-11

ViT-FREE: Efficient Face Recognition via Early Exiting and Synthetic Adaptation

Vision Transformers (ViTs) have gained significant attention in computer vision and shown strong potential for face recognition (FR). However, their high computational cost makes deployment on resource-constrained devices challenging, motivating the need for methods that balance efficiency and accuracy. In this work, we investigate early exiting in pretrained ViTs as a simple yet effective training-free strategy for efficient FR inference. Leveraging the uniform feature dimensionality across transformer encoder blocks, we introduce ViT-FREE, a multi-exit framework that enables face verification directly from intermediate representations without modifying or retraining the backbone model, and thus, reducing inference cost. Empirically, we show that patch embeddings and attention maps evolve progressively across depth, exhibiting high similarity between consecutive ViT blocks and increasing alignment with the final representation. This indicates gradual feature refinement and attention convergence, suggesting that intermediate layers already provide stable and discriminative representations suitable for early exiting. Through extensive experiments on multiple FR benchmarks, we systematically analyze the accuracy-efficiency trade-off across exit depths. Our results demonstrate that later exits achieve a highly favorable balance, with exiting at layer 10 yielding up to a 20% speedup while incurring only a 1.5 drop in verification performance on benchmarks such as IJB-C. Also, we propose ViT-FREE_FT, a lightweight exit-specific fine-tuning strategy that adapts only the projection layers using a small synthetic dataset while keeping the transformer backbone frozen. This approach improves the performance of shallow exits while preserving the efficiency benefits and leaving deeper exits largely unaffected.

04.
arXiv (CS.CV) 2026-06-11

AerialClaw: An Open-Source Framework for LLM-Driven Autonomous Aerial Agents

Unmanned aerial vehicles (UAVs) are increasingly used in inspection, search and rescue, environmental monitoring, and emergency response. However, most UAV applications still rely on pre-defined command sequences or task-specific pipelines, where developers manually connect perception, planning, flight control, simulation, logging, and safety modules. This limits the flexibility, reproducibility, and extensibility of autonomous aerial systems. This paper presents AerialClaw, an open-source software framework that enables UAVs to operate as decision-making aerial agents rather than merely command-following platforms. Given a natural-language mission, AerialClaw allows an LLM-based agent to understand the task, maintain context, invoke executable aerial skills, observe perception and runtime feedback, and iteratively update its decisions in a closed loop. The framework adopts a modular brain-skill-runtime architecture, combining hard skills for atomic UAV operations, Markdown-based soft skills for reusable task strategies, document-driven agent state and capability boundaries, memory-driven reflection, safety-oriented runtime validation, and platform-agnostic execution adapters. AerialClaw supports lightweight mock execution, PX4 SITL with Gazebo, and AirSim-based simulation, together with a web console, pluggable model backends, example missions, simulation assets, and staged deployment scripts. By combining standardized aerial skills, document-driven agent state, memory, and closed-loop LLM decision-making, AerialClaw provides a reproducible and extensible open-source framework for building UAV systems that can interpret missions, make decisions, execute skills, and adapt their behavior from feedback.

05.
arXiv (CS.AI) 2026-06-19

Hybrid ANN-SNN Pipeline with Local Plasticity

arXiv:2606.20151v1 Announce Type: cross Abstract: This work proposes a hybrid ANN-SNN pipeline that effectively leverages the rich embeddings of pretrained artificial neural networks (ANNs) to enable high-performance spiking neural networks (SNNs). The architecture couples a pretrained EfficientNet encoder with a CoLaNET spiking classifier. We convert the encoder's activations into spike trains via rate-coding and train the subsequent SNN classifier using local, biologically inspired learning rules, bypassing end-to-end gradient propagation. This approach achieves 99.09% accuracy on a 64-class ImageNet benchmark, demonstrating performance on par with conventional deep networks. The work presents a biologically plausible and efficient framework for adapting powerful pretrained encoders to downstream spiking neural network tasks.

06.
arXiv (CS.LG) 2026-06-15

Decompose Sparsely Where You Should, Absorb Densely Where You Should No

arXiv:2606.14040v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) are typically trained to reconstruct the entire residual stream through a sparse dictionary, implicitly assuming that all activation content is amenable to sparse, monosemantic decomposition. We question this assumption and hypothesize that activations contain a low-rank, dense component that is computationally important to the model yet inherently unsuitable for sparse representation, which serves as a major source of the persistent dense latents widely observed in trained SAEs. To test this, we add a small rank-$r$ linear bottleneck in parallel with standard SAEs (BatchTopK and Matryoshka), allowing dense structure to be absorbed before sparse reconstruction. On Gemma-2-2B layer 12, a rank-24 bottleneck reduces dense latent count by up to 84\% while improving sparse probing and targeted probe perturbation on both architectures at matched sparsity. The absorbed component is (i) structurally identifiable as the top principal components and outlier dimensions; (ii) causally necessary, with removing it raising next-token cross-entropy by 7.5$\times$, far exceeding the 2.8$\times$ from removing the geometrically near-identical top-24 PCA directions; and (iii) redundantly encoded by sparse dictionaries, with ablating 787 maximally aligned sparse features raising cross-entropy by only 2.9$\times$ and ablating 2,048 topic-aligned features leaving MMLU topic classification virtually unchanged, whereas removing the scaffold drops it from 98.7\% to chance. Together, our findings identify a compact, semantically informative and causally important component of residual stream activations (which we term a computational scaffold) that standard sparse dictionaries represent inefficiently, suggesting that the scope of sparsity-based interpretability methods warrants careful re-examination.

07.
arXiv (CS.AI) 2026-06-16

QoS-Aware Token Scheduling and Private Data Valuation for Multi-Modal Agentic Networks

arXiv:2606.15573v1 Announce Type: new Abstract: In agentic systems, human-generated data records anchor the value of AI services. Yet cloud compute pipelines centralize processing on remote servers. Data centralization reduces personal data sovereignty and may potentially degrade the quality of service (QoS). Meanwhile, user contributions are diverse in quantity and quality: decentralized records can be biased, noisy, and heterogeneously distributed. To address the data challenge, we study fair token allocation and private data valuation for decentralized and resource-constrained agentic systems. Our approach embeds multi-modal representations in a shared semantic space and releases differentially private (DP) prototypes to preserve utility while reducing semantic leakage. With the DP guarantee, we design a fair token allocation scheme that rewards effective contributions and remains robust to data heterogeneity and AI resource scarcity. Extensive simulations demonstrate improved contribution-based fairness and QoS compared to standard benchmarks. The improved resistance to image reconstruction attacks indicates enhanced privacy for multi-modal personal data.

08.
arXiv (CS.AI) 2026-06-19

QueryGaussian: Scalable and Training-Free Open-Vocabulary 3D Instance Retrieval

arXiv:2606.19733v1 Announce Type: cross Abstract: Efficiently retrieving specific 3D instances from large-scale scenes via natural language prompts remains a formidable challenge in multimedia analysis. Existing approaches predominantly follow a "scene-level embedding" paradigm, which requires distilling high-dimensional semantic features into every 3D primitive. This strategy suffers from a fundamental architectural bottleneck: memory and computational costs scale linearly with scene complexity, inevitably triggering out-of-memory (OOM) failures in city-scale environments. To address this barrier, we propose QueryGaussian, a training-free framework for expeditious and scalable open-vocabulary 3D instance retrieval. Unlike holistic semantic distillation, QueryGaussian employs an instance-level query mechanism that decouples semantic understanding from geometric representation. Specifically, we leverage pre-trained 2D vision models to interpret user prompts and lift segmentation masks into 3D via a concurrent maximum-weight association strategy, ensuring semantic-visual consistency. To mitigate projection ambiguity, we introduce a temporal fusion module with multi-stage adaptive density clustering. Experimental results demonstrate that QueryGaussian not only matches the accuracy of state-of-the-art methods but also delivers a decisive efficiency leap, reducing GPU memory usage by over 70% and accelerating inference by 180x. Crucially, QueryGaussian enables expeditious instance retrieval on city-scale scenes containing tens of millions of Gaussians using consumer-grade hardware.

09.
arXiv (quant-ph) 2026-06-19

Distinguishing quantum processes with bounded coherent memory

arXiv:2606.19511v1 Announce Type: new Abstract: Distinguishing multi-time quantum processes is a fundamental task underlying the diagnosis, benchmarking, and learning of temporally correlated quantum dynamics. The standard benchmark for distinguishing two processes is the strategy-norm distance, which optimizes over arbitrary adaptive probing strategies but can require large coherent memory and time-dependent control. We introduce machines for autonomous distinction~($\mathsf{MAD}$s): probing strategies that apply the same quantum instrument at each time step, retain the full classical outcome record, and carry a coherent memory of dimension $d_A$. Optimizing over these strategies defines a memory-parametrized distinguishability measure, $d^{(N)}_{\mathsf{MAD}}(\mathbf{P}^N,\mathbf{Q}^N;d_A)$. We show that the resulting hierarchy is monotone in coherent memory and complete at finite times. Specifically, any admissible $N$-step probing strategy can be compiled into a single $\mathsf{MAD}$ with an internal counter and sufficiently large coherent memory, so the hierarchy saturates the strategy-norm benchmark. For recurrent processes generated by repeated system–environment interactions, we derive a single-step description that separates the generation of new distinguishing information from the propagation and decay of information generated at earlier times. Numerical results in a repeated-interaction model show that increasing coherent memory systematically improves the $\mathsf{MAD}$ success probability and closes the gap to the strategy-norm distance while remaining substantially more tractable to evaluate. $\mathsf{MAD}$ distinguishability therefore provides an operational and scalable framework for quantifying what can be learned about genuinely multi-time quantum processes with bounded coherent memory.

10.
arXiv (CS.CV) 2026-06-16

Qwen-RobotWorld Technical Report: Unifying Embodied World Modeling through Language-Conditioned Video Generation

We introduce Qwen-RobotWorld, a language-conditioned video world model for embodied intelligence. With natural language as a unified action interface, it predicts physically grounded future visual trajectories from current observations across robotic manipulation, autonomous driving, indoor navigation, and human-to-robot transfer. This unified formulation provides three promising application directions: synthetic data generation for policy training augmentation, scalable virtual environments for policy evaluation, and language-guided planning signals for downstream robot control. This is achieved through a three-part design: a) Double-Stream MMDiT with MLLM Action Encoding, where a 60-layer double-stream diffusion transformer couples frozen Qwen2.5-VL semantics with video-VAE latents through layer-wise joint attention; b) Embodied World Knowledge (EWK), an 8.6M video-text corpus (200M+ frames) with action-language mapping over 20+ embodiments and 500+ action categories; and c) General+Expert Progressive Curriculum, a two-stage training strategy that first learns general visual priors and then injects embodied specialization under a shared language interface. Extensive results show strong competitiveness: ranks 1st overall on EWMBench and DreamGen Bench, outperforms all open-source models on WorldModelBench and PBench. Additional zero-shot analyses on RoboTwin-IF benchmark further support robust generalization and multi-view consistency.

11.
arXiv (CS.LG) 2026-06-18

MOLAR: Learning Multimodal Molecular Representations from Noisy Labels

arXiv:2606.18390v1 Announce Type: new Abstract: Motivation: Noisy labels are a common challenge in molecular property prediction because molecular annotations are often obtained from assays, curated databases, or weak annotation pipelines rather than directly observed clean biological states. Treating recorded labels as reliable supervision can cause models to memorize corrupted observations and learn misleading molecular evidence. In multimodal molecular representation learning, this issue can be amplified by graph-text fusion or alignment, which may propagate label-induced errors across modalities. Results: We propose MOLAR, a noise-aware framework for learning multimodal molecular representations from noisy labels. MOLAR separates latent clean-property inference from recorded-label observation: graph and text views contribute residual evidence to a clean-property distribution, and a categorical label-observation channel maps this distribution to recorded labels for training. This formulation derives posterior label reliability and modality-specific molecular evidence from the model. Experiments on naturally noisy molecular benchmarks and controlled label-flipping benchmarks show that MOLAR consistently outperforms representative baselines. Visualization analyses further show that MOLAR provides interpretable reliability and modality-evidence diagnostics.

12.
medRxiv (Medicine) 2026-06-22

Clinical-grade Cuffless Blood Pressure Monitoring via Deep-tissue Diffuse Speckle Pulsatile Flowmetry

Blood pressure (BP) is a vital sign which is measured to diagnose and manage hypertension. However, current methods to measure BP use inflatable cuffs which cause discomfort and limit the frequency at which measurements can be made, or intra-arterial catheters which are invasive and pose infection risks. Here, we propose and evaluate the use of Diffuse Speckle Pulsatile Flowmetry (DSPF) as a cuffless BP measurement method to address these limitations. DSPF is a laser speckle-based technique which simultaneously records blood flow rate and blood volume (i.e. photoplethysmography or PPG) signals from relatively deep vascular tissue. Using information from these signals, we studied DSPFs effectiveness in measuring systolic BP (SBP) and diastolic BP (DBP) through an outpatient study in which 133 patients were recruited, and in measuring beat-to-beat BP waveforms through an inpatient study in which two patients were recruited. In the outpatient study, the DSPF method was able to achieve mean absolute errors (MAEs) of 4.17 mmHg and 2.42 mmHg for SBP and DBP respectively compared to conventional cuff-based methods. It was also able to fulfil the requirements of the AAMI/ESH/ISO 81060-2:2018 standard for BP measurement devices and attain an "A" grade according to the British Hypertension Society grading scheme. For the inpatient study, it produced BP waveforms which had MAEs of 2.35 mmHg and 3.06 mmHg compared to arterial-line measurements for the two patients, respectively. Compared to PPG which has been studied more extensively as a cuffless BP measurement method, we found through ablation studies that DSPF was able to reach significantly lower MAEs and hence better accuracies. DSPF augments the performance of PPG-only methods by leveraging additional information from the blood flow rate signal, and we therefore find it to be a superior cuffless BP measurement method which can potentially be used in outpatient, inpatient, and remote settings.

13.
arXiv (CS.AI) 2026-06-16

StyleShield: Exposing the Fragility of AIGC Detectors through Continuous Controllable Style Transfer

arXiv:2605.00924v2 Announce Type: replace-cross Abstract: AI-generated content (AIGC) detectors are increasingly deployed in high-stakes settings such as academic integrity screening, yet their reliability rests on a fundamental paradox: as language models are trained on human-written corpora, the statistical boundary between AI and human writing will inevitably dissolve as models improve. Commercial incentives have further distorted this landscape – detection services and "de-AIification" tools often operate within the same supply chain, replacing evaluation of content quality with judgment of content origin. We present StyleShield, the first flow matching framework for conditional text style transfer, operating directly in continuous token embedding space via a DiT backbone with zero-initialized cross-attention adapters conditioned on frozen Qwen-7B representations. At inference, we adapt the SDEdit paradigm from image synthesis to text embeddings, with a single parameter gamma providing smooth continuous control over the evasion-preservation trade-off. On a multi-domain Chinese benchmark, StyleShield achieves 94.6% evasion against the training detector and >=99% against three unseen detectors, maintaining 0.928 semantic similarity. We further introduce RateAudit, a document-level scheduling algorithm that demonstrates detection-rate verdicts can be set to arbitrary values, directly questioning the reliability of score-based evaluation.

14.
arXiv (CS.AI) 2026-06-12

Understanding the Rejection of Fixes Generated by Agentic Pull Requests – Insights from the AIDev Dataset

arXiv:2606.13468v1 Announce Type: cross Abstract: AI coding agents are increasingly used to generate pull requests (PRs) that propose code fixes in software projects. From a first exploration of the AIDev dataset, we find that 46.41\% of the fixes proposed by the agents Copilot, Devin, Cursor, and Claude are rejected. This represents a significant amount of wasted resources that require human reviews, verifications, and running tests and validations for fixes that are merely discarded. Our goal in this paper is to understand the failure modes of AI-agents, an understanding that is crucial for better integrating AI-agents as efficient teammates. In this paper, we conduct a qualitative study on a representative sample of 306 non-merged pull requests created or co-authored by the agents mentioned earlier, followed by a quantitative analysis of the reasons for rejection. Our qualitative findings identify 14 reasons divided into four high-level categories for rejecting AI-agent fixes. We observe that developers can reject fixes due to fixes whose implementation is incorrect (e.g., incomplete, wrong approach), fixes that do not pass the continuous integration (CI) pipelines and fail tests, fixes for which the agent is unable to perform the implementation (e.g., no code generated, sessions lost), and fixes whose priority is low. Our results shed light on the importance of better guiding the model at these levels: (1) proposing hints about the approach to follow for fixing an issue, (2) outlining constraints or limitations regarding the approaches that should not be taken, and (3) instructing the agent on how to validate the implementation through CI pipelines and without introducing a breaking change. Our results suggest the need for good prioritization of tasks so that generated fixes do not lead to wasted human review efforts or wasted agent resources (e.g., tokens, compute, or allowed number of requests).

15.
arXiv (CS.LG) 2026-06-15

Trust but Verify: Mitigating Medical Hallucinations via Post-Hoc Adversarial Auditing and Multi-Agent Feedback Loops

arXiv:2606.14149v1 Announce Type: new Abstract: Large Language Models (LLMs) are increasingly deployed in healthcare settings, yet their tendency to hallucinate poses risks when clinical decisions are involved. This study examine whether LLMs recommend recently banned or withdrawn pharmaceuticals when answering clinical questions and tests an agent-based method for reducing such errors. We developed a five-agent "Trust but Verify" system using a single LLM backbone. To measure regulatory knowledge obsolescence, we created an adversarial dataset of 103 clinical MCQs where historically correct answers now refer to banned substances. This scale ensures statistical significance across various therapeutic classes. We evaluated three open-access model families (GPT-OSS, Llama-3, Falcon-3) under vanilla and agentic conditions. Performance was measured via pointwise score, label accuracy, Hallucination Error Rate (HER), and Component Fidelity (CF) score. We also observed clinical safety regression in proprietary models. In default configurations, all models showed high hallucination rates, consistently selecting banned drugs that matched training data patterns. Our proposed agentic architecture reduced HER by approximately 53% across models. Pointwise scores shifted from -0.25 (unsafe recommendation) toward 0.0 (appropriate refusal). The safety audit intercepted dangerous outputs even when models' parametric knowledge favored the banned substance. The proposed multi-agent framework offers a model-agnostic method for enforcing regulatory compliance that prioritizes patient safety over fluent text generation. Our work demonstrates a practical approach for deploying autonomous AI systems in safety-critical healthcare settings. It shows how real-time regulatory data can be integrated into LLM pipelines to support clinical decision-making.

16.
arXiv (CS.LG) 2026-06-16

TriAdReview: Triangular Adversarial Review Architecture for Multi-Model Technical Document Generation

arXiv:2606.15074v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly used for technical document generation, yet single-model outputs often suffer from over-engineering, security blind spots, and incomplete coverage. We propose TriAdReview, a triangular adversarial review architecture that employs two independent reviewer models (engineering and boundary perspectives) and a triangular judging mechanism to iteratively improve a generator model's output. We evaluate TriAdReview across five benchmark tasks - architecture design, code generation, proposal review, security audit, and requirements analysis - using three configurations: single model (baseline), dual model (single review), and triple model (full system). Results across 75 experiments (n=5 per cell) show that the triple model configuration achieves a 10.1% overall improvement over the single model baseline (26.2 vs. 23.8 out of 50; p

17.
PLOS Computational Biology 2026-06-22

Ten simple rules for making the supplement increase your paper’s impact

作者:

by Volker Grimm, Uta Berger, Stefano Mammola Have you ever lost hours navigating supplementary materials—clicking between the main text and dozens of auxiliary files only to encounter broken links, illegible figures, and undefined variables and acronyms? If so, you’re not alone. What should support scientific communication has instead become an obstacle: supplementary information (SI) increasingly suffers from inconsistent formatting, poor accessibility, and fragmented organization that impedes rather than advances understanding. This is disheartening since the SI, if used effectively, has the power to enhance transparency, credibility, and reproducibility of research. Therefore, we propose 10 simple rules to help authors design SI that genuinely increase the impact of their research. The rules emphasize treating SI with the same care as the main text, using it strategically to support the scientific narrative while preserving clarity and focus. Key recommendations include creating a single, well-structured, self-contained SI master document; ensuring explicit cross-referencing between the main text and SI; making SI machine-readable; and avoiding the misuse of SI as a substitute for proper data repositories. We also highlight the importance of creativity in choosing appropriate formats and strict adherence to journal-specific guidelines. Finally, when available, we advocate the use of standardized templates to improve consistency, readability, and reuse across studies. By following these rules, authors can substantially increase the scientific impact of their work while at the same time contributing to more sustainable research practices.

18.
arXiv (quant-ph) 2026-06-17

Efficient time-series prediction on NISQ devices via time-delayed quantum extreme learning machine

arXiv:2602.21544v2 Announce Type: replace Abstract: We proposed a time-delayed quantum extreme learning machine (TD-QELM) for efficient time-series prediction on noisy intermediate-scale quantum (NISQ) devices. By encoding multiple past inputs simultaneously, TD-QELM achieves shallow circuit depth independent of sequence length, thereby, mitigating noise accumulation and reducing computational complexity. Experiments using the NARMA benchmark on both noiseless simulations and IBM's 127-qubit processor demonstrate that TD-QELM consistently outperforms conventional quantum reservoir computing in prediction accuracy and noise robustness. These results highlight TD-QELM as a practical and scalable framework for time-series learning on current NISQ hardware.

19.
arXiv (CS.LG) 2026-06-16

Localized Kernel Projection Outlyingness: A Two-Stage Approach for Multi-Modal Outlier Detection

arXiv:2510.24043v4 Announce Type: replace Abstract: This paper presents Two-Stage LKPLO, a novel multi-stage outlier detection framework that overcomes the coexisting limitations of conventional projection-based methods: their reliance on a fixed statistical metric and their assumption of a single data structure. Our framework uniquely synthesizes three key concepts: (1) a generalized loss-based outlyingness measure (PLO) that replaces the fixed metric with flexible, adaptive loss functions like our proposed SVM-like loss; (2) a global kernel PCA stage to linearize non-linear data structures; and (3) a subsequent local clustering stage to handle multi-modal distributions. Comprehensive 5-fold cross-validation experiments on 10 benchmark datasets, with automated hyperparameter optimization, demonstrate that Two-Stage LKPLO achieves state-of-the-art performance. It significantly outperforms strong baselines on datasets with challenging structures where existing methods fail, most notably on multi-cluster data (Optdigits) and complex, high-dimensional data (Arrhythmia). Furthermore, an ablation study empirically confirms that the synergistic combination of both the kernelization and localization stages is indispensable for its superior performance. This work contributes a powerful new tool for a significant class of outlier detection problems and underscores the importance of hybrid, multi-stage architectures.

20.
arXiv (CS.AI) 2026-06-16

Orchestrated Reality: From Role-Play to Living, Playable Game Worlds – LLM-Driven World Simulation as a Parameterized-Action POMDP

arXiv:2606.16014v1 Announce Type: cross Abstract: Many games rely on storytelling combined with systems that track levelling, NPC behaviour, and consequence simulation; bridging tightly-authored narrative with deeply-simulated worlds – most acute in sandbox and open-world settings – has been prohibitively expensive. LLM-driven worlds open a new path: a single harness can coordinate numerical state, narrative voice, storytelling pacing, and rule logic together. Realising this requires the LLM system to sustain a persistent world (who is where, what has just happened, what is currently true), which today's deployed systems do not: the narrative voice asserts state in free prose without any validated representation, so a fully autonomous game engine remains infeasible. We treat this as an architectural choice, not a limitation of language models, and report work in progress on a framework – orchestrated reality – that makes the world a canonical object owned by a singleton orchestration agent analogous to the tabletop-RPG Game Master (GM). We formalise an LLM-driven game world for a human player as a Parameterized-Action POMDP: state is a tree of canonical JSON entities, actions decompose as $a=(k, x_k)$ (a discrete intent kind plus structured JSON parameters), the agent observes only a narrative projection $o=O(s)$ of state, and the transition kernel $F$ is an LLM-driven Plan-Diff-Validate-Apply (PDVA) pipeline that commits schema-validated, content-hashed JSON deltas. We give the formal model, a JSON-state example, a worked single-turn example, and a catalogue of 15 illustrative incidents drawn from a real deployment showing the framework in action. Empirical validation through a planned human player study – together with multi-NPC concurrent agency and deployment as an RL environment – is situated as future work.

21.
arXiv (CS.LG) 2026-06-19

Advances in Scientific Machine Learning for Coupled Fluid Flow and Transport

arXiv:2606.19562v1 Announce Type: new Abstract: This chapter reviews recent advances in Scientific Machine Learning (SciML) for modeling coupled fluid flow and transport phenomena governed by the incompressible Navier-Stokes and scalar transport equations. Such systems, found in applications like turbidity currents and thermal convection, feature strong nonlinear coupling and multiscale behavior that make high-fidelity simulations computationally expensive. To address this, the chapter surveys state-of-the-art SciML methods for building efficient surrogate models, including linear reduced-order techniques based on Singular Value Decomposition (such as Dynamic Mode Decomposition) and nonlinear neural network approaches like Physics-Informed Neural Networks (PINNs) and $\beta$-Variational Autoencoders ($\beta$-VAEs). It first covers the authors' work combining these models with High Performance Computing strategies, including Adaptive Mesh Refinement/Coarsening (AMR/C) and scientific floating-point data compression. It then presents two new contributions: surrogate modeling of turbidity currents via PINNs, and the extraction of disentangled nonlinear modes from thermal flows using $\beta$-VAEs. Governing equations and representative benchmarks, including lock-exchange flows and Rayleigh-Bénard convection, illustrate these methodologies. The chapter is intentionally long, covering both the mathematical and physical foundations of coupled fluid flow and the computational aspects of state-of-the-art modeling. Overall, it demonstrates how SciML enables fast, accurate approximations of complex coupled systems within the specific data regimes and modeling assumptions considered, while substantially reducing computational cost relative to full-order simulations. Broader capabilities such as real-time prediction and uncertainty quantification remain active research directions whose feasibility depends strongly on the problem at hand.

22.
arXiv (CS.CL) 2026-06-19

Source-Grounded Data Generation for Text-to-JSON Learning

From financial filings to clinical records, legacy industries rely heavily on long, unstructured documents to store high-value information. Reliably extracting this information into structured, machine-readable representations is a key prerequisite to making the contents accessible to automated systems. JSON is a natural target for such structured extraction, yet constructing reliable and scalable text-to-JSON training data remains challenging. To address this gap, we propose STAGE (Spreadsheet-grounded Text-to-JSON Artifact GEneration), a source-grounded data generation pipeline that constructs reports and JSON schema by using LLMs for scalable synthesis while validating ground-truth values against the underlying spreadsheet. Evaluations on STAGE-Eval, our source-grounded benchmark with an 851-example test set, show that STAGE produces stronger training data than existing approaches. This improves Qwen3-4B exact match from 31.37% to 74.27% and value accuracy from 45.46% to 90.69%.

23.
arXiv (CS.CL) 2026-06-16

LLM-Powered Virtual Population for Demand Simulation and Pricing

We develop an LLM-powered virtual population model that simulates demand for pricing decisions, in settings where products are described by rich unstructured information, such as text descriptions and images, and where decision makers need not only mean-demand predictions but also uncertainty estimates for counterfactual prices. Our model represents exposed customers as draws from a finite mixture of customer personas. For each persona, product, and candidate price, an LLM elicits a persona-level purchase probability using both structured persona information and unstructured product information. These probabilities are aggregated through calibrated mixture weights to form a predictive distribution of aggregate demand. The resulting simulator can evaluate counterfactual prices under various pricing objectives, including expected revenue and risk-aware criteria such as conditional value at risk. We test the framework on an online H&M fashion dataset with product descriptions and images. The calibrated LLM-based simulator achieves the best overall predictive performance among the models considered, and supports sample-efficient pricing decisions. Our framework provides a practical way to use LLMs as demand simulators for products with limited historical demand data but rich product information. By producing a full predictive demand distribution rather than only a point forecast, it enables managers to compare candidate prices, quantify demand uncertainty, and choose prices that target either average-case revenue or risk-aware objectives.

24.
Science (Express) 2026-06-02

Another red alert for American science | Science

作者: 未知作者

Although research has bipartisan support in the US Congress, and trust in science is above 75% across the country, the Trump administration seems as determined as ever to mortally wound the nation’s scientific enterprise. After the scientific community persuaded Congress to restore most of the president’s draconian cuts to research funding last year, the White House Office of Management and Budget (OMB), under Russell Vought, has found new ways to circumvent the will of Congress and starve American science. At the beginning of this year, OMB dragged its feet in releasing instructions to federal agencies for how to distribute the funding appropriated by Congress, leading to lags in dispersal. Now, OMB has proposed revising the rules that govern how federal dollars are spent. The changes would inevitably lead to unlegislated reductions in funding and damage US leadership in science, both in academia and industry.

25.
arXiv (CS.LG) 2026-06-11

OGPO: Sample Efficient Full-Finetuning of Generative Control Policies

arXiv:2605.03065v2 Announce Type: replace Abstract: Generative control policies (GCPs), such as diffusion- and flow-based control policies, have emerged as effective parameterizations for robot learning. This work introduces Off-policy Generative Policy Optimization (OGPO), a sample-efficient algorithm for finetuning GCPs that maintains off-policy critic networks to maximize data reuse and propagate policy gradients through the full generative process of the policy via a modified PPO objective, using critics as the terminal reward. OGPO achieves state-of-the-art performance on manipulation tasks spanning multi-task settings, high-precision insertion, and dexterous control. To our knowledge, it is also the only method that can fine-tune poorly-initialized behavior cloning policies to near full task-success with no expert data in the online replay buffer, and does so with few task-specific hyperparameter tuning. Through extensive empirical investigations, we demonstrate that OGPO drastically outperforms methods alternatives on policy steering and learning residual corrections, and identify the key mechanisms behind its performance. We further introduce practical stabilization tricks, including success-buffer regularization, two-sided conservative advantages, and Q-variance reduction, to mitigate critic over-exploitation across state- and pixel-based settings. Beyond proposing OGPO, we conduct a systematic empirical study of GCP finetuning, identifying the stabilizing mechanisms and failure modes that govern successful off-policy full-policy improvement.