Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-11

Visualizing LLM Latent Space Geometry Through Dimensionality Reduction

arXiv:2511.21594v3 Announce Type: replace Abstract: Large language models (LLMs) achieve state-of-the-art results across many natural language tasks, but their internal mechanisms remain difficult to interpret. In this work, we extract, process, and visualize latent state geometries in Transformer-based language models through dimensionality reduction. We capture layerwise activations at multiple points within Transformer blocks and enable systematic analysis through Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). We demonstrate experiments on GPT-2 and LLaMa models, where we uncover interesting geometric patterns in latent space. Notably, we identify a clear separation between attention and MLP component outputs across intermediate layers, a pattern not documented in prior work to our knowledge. We also characterize the high norm of latent states at the initial sequence position and visualize the layerwise evolution of latent states. Additionally, we demonstrate the high-dimensional helical structure of GPT-2's positional embeddings and the sequence-wise geometric patterns in LLaMa. We make our code available at https://github.com/Vainateya/Feature_Geometry_Visualization. A better formatted blog-post with identical content is available at https://iclr-blogposts.github.io/2026/blog/2026/vis-llm-latent-geometry/.

02.
arXiv (CS.AI) 2026-06-18

Guava: An Effective and Universal Harness for Embodied Manipulation

arXiv:2606.18363v1 Announce Type: cross Abstract: Language models trained on large-scale vision-language data have demonstrated strong potential for embodied agents. Harnessing models through embodied tools use offers a promising alternative to end-to-end vision-language-action systems by combining high-level reasoning with external modules for perception, planning, and control. However, it remains unclear what makes an effective harness for embodied manipulation, and to what extent such a harness can unlock embodied capabilities in a wide range of reasoning models. In this work, we present Guava, a harness framework for embodied tool use developed through systematic exploration of the design space of agent workflows, action spaces, and observation spaces. Our study identifies three key ingredients for effective embodied agents: iterative perception-reasoning-action loops, semantic action abstractions, and multimodal observations. To understand whether these design principles are universal even to small models, we develop an end-to-end training pipeline that distills embodied manipulation capabilities into a 4B open-source model using fewer than 2K trajectories collected entirely in simulation. Experimental results in both simulation and real-world environments show performance comparable to frontier proprietary models while exhibiting strong generalization to unseen objects, novel instructions, and long-horizon tasks. Results suggest that a well-designed harness can serve as a scalable, model-agnostic interface for embodied manipulation, enabling strong emergent embodied capabilities in compact open-source models with minimal training data.

03.
arXiv (CS.LG) 2026-06-11

HAMNO: A Hierarchical Adaptive Multi-scale Neural Operator with Physics-Informed Learning for Dynamical Systems

arXiv:2606.11963v1 Announce Type: new Abstract: Neural operators provide a powerful framework for learning solution mappings of partial differential equations directly in function space. However, many existing architectures still struggle to represent nonlinear time-dependent systems that involve multi-scale structures, long-range interactions, and stable long-time evolution. In this work, we introduce the Hierarchical Adaptive Multi-scale Neural Operator (HAMNO), a neural-operator architecture that combines local convolutional representations, global spectral operators, and hierarchical encoder-decoder processing. The central component of HAMNO is a data-dependent gating mechanism that adaptively balances local and global information at each spatial location, allowing the model to resolve fine-scale features while preserving long-range dependencies. We further develop a physics-informed extension, PI-HAMNO, based on a multi-objective loss strategy that combines data fitting with strong- and weak-form physics constraints. The strong-form term penalizes the domain-integrated squared PDE residual in physical coordinates, while the weak-form term is constructed by multiplying the governing residual by finite-element test functions and evaluating the resulting element integrals using centroid-based tetrahedral quadrature. The framework is evaluated on non-periodic Allen-Cahn (AC), Cahn-Hilliard (CH), and Swift-Hohenberg (SH) equations defined on cubic domains. Across long-horizon rollout, data-limited training, out-of-distribution initial-condition shifts, and random-seed variations, HAMNO improves predictive accuracy over standard neural-operator baselines, while PI-HAMNO further enhances stability, physical consistency, and data efficiency. The implementation is publicly available at https://github.com/MBamdad/HAMNO .

04.
PLOS Computational Biology 2026-06-15

WormSORT: A detection-based multiple object tracking model for individual silkworms in breeding environments

作者:

by Hongkang Shi, Linbo Li, Shiping Zhu, Haibo He, Minghui Zhu, Jianfei Zhang Variety breeding has long been a cornerstone of high-quality agriculture, and recent advances in artificial intelligence have opened new avenues for accelerating biological breeding. In this study, we applied multiple object tracking (MOT) technology to silkworm breeding to achieve efficient, non-invasive, and dynamic individual monitoring. Unlike pedestrian or vehicle tracking, silkworms pose unique challenges for MOT due to their small size, dense distribution, and high inter-individual similarity, which complicate accurate tracking and behavioral analysis. To address these issues, we propose WormSORT, an enhanced tracking method based on a tracking-by-detection framework with an optimized data association strategy. A pre-trained detection model identifies silkworms in each frame, and deep feature vectors are extracted using a re-identification network. Identity association is first performed using Intersection over Union (IoU) matching, followed by deep feature similarity for unmatched cases, improving both tracking accuracy and reliability. To further enhance tracking stability, we introduce a candidate input padding mechanism, including IoU padding and feature padding, ensuring that high-confidence unmatched trajectories and detections remain involved in the matching process. To validate the proposed tracking strategy, we constructed two multiple silkworm tracking (MST) datasets: MST-50, containing approximately 50 individuals over 1000 frames, and MST-100, containing approximately 100 individuals over 1200 frames. Experimental results demonstrate that WormSORT outperforms existing methods, including DeepSORT, StrongSORT, OCSORT, ByteTrack, and BotSORT, achieving superior tracking performance. This study provides a valuable reference for silkworm tracking and behavioral analysis, contributing to the advancement of high-quality silkworm rearing and management.

05.
arXiv (CS.AI) 2026-06-19

CareTransition-Audit: A Benchmark to Audit Discharge Summaries for Efficient Care Transitions

arXiv:2604.05435v2 Announce Type: replace Abstract: Incomplete or inconsistent discharge documentation drives care fragmentation and avoidable readmissions. Despite its critical role in patient safety, auditing discharge summaries relies on manual review and does not scale. We propose an automated framework for auditing discharge summaries using large language models (LLMs). Our approach operationalizes the DISCHARGED framework into a checklist of 46 questions. Using 50 summaries from the MIMIC-IV database, with clinician ground-truth labels, we benchmark 11 LLMs. Model-assessed mean documentation completeness ranges from 54.9% to 74.2%, and the best-performing models achieve a Cohen's kappa values around 0.5 against clinician labels, indicating moderate agreement. All models struggle to identify ambiguous documentation (Unclear), highlighting a key gap in current automated auditing. This work provides a clinician-validated benchmark and zero-shot baselines for systematic quality improvement in clinical documentation.

06.
Nature (Science) 2026-06-10

Daily briefing: Ancient ground squirrels ate like ‘zombies of the Pleistocene’

作者:

Evidence from fossilized poo reveals the diverse diet of ancient ground squirrels. Plus, the science behind the peptide craze and our innate tendency to wander anticlockwise. Evidence from fossilized poo reveals the diverse diet of ancient ground squirrels. Plus, the science behind the peptide craze and our innate tendency to wander anticlockwise.

07.
arXiv (CS.LG) 2026-06-12

Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

arXiv:2606.13605v1 Announce Type: cross Abstract: This paper presents a distribution-agnostic robust trajectory-optimization framework based on chance-constrained reinforcement learning. The uncertainty is represented here through initial conditions and process noise, with the only requirement being that it can be sampled. A deterministic nominal trajectory is first computed offline, and reinforcement learning is then used only to robustify that baseline through a structured affine closed-loop correction law comprising a feedforward control adjustment and time-varying feedback gains. Probabilistic feasibility is enforced empirically through rollout-based upper-tail quantiles, while terminal dispersion is regulated through covariance-feasibility penalties. The framework is assessed on two materially different trajectory design problems. The flagship case study is a three-dimensional multi-impulse Earth-Mars transfer, where the learned policy is benchmarked against a recent robust trajectory-optimization reference under Gaussian uncertainty and then evaluated under bounded uniform uncertainty and under process disturbances not seen during training. The second case study is a stochastic atmospheric pinpoint rocket landing problem, used to assess portability to a short-horizon continuous-thrust setting with drag, mass depletion, and glide-slope constraints. The results show that the proposed framework can remain competitive in upper-tail fuel cost while preserving probabilistic feasibility, and that the same robustification scaffold can be carried across heterogeneous spacecraft trajectory planning problems without redesign of its core stochastic-control structure.

08.
arXiv (CS.LG) 2026-06-15

Neural Slack Variables for Shape Constraints

arXiv:2606.13803v1 Announce Type: new Abstract: Enforcing functional inequality constraints such as monotonicity and convexity in neural networks is a fundamental challenge in many industrial and scientific applications. Classical one-sided penalty methods, along with primal-dual methods gated by complementary slackness, provide constraint gradients only at violated locations, resulting in fragile satisfaction. Architectures that guarantee feasibility by construction, on the other hand, remain largely limited to elementary cases and impose additional inductive biases. We introduce neural slack variables, a deep learning native primal-side approach that converts constraint enforcement into a regression problem by coupling the primary network with a jointly learned auxiliary network. The auxiliary network serves as a valid target for the primary network's constraint quantities, inducing feasibility and regularity. Neural slack variables achieve zero measured violations on dense-grid monotonicity and convexity test cases, where penalty and primal-dual baselines leave residual violations, and enable arbitrage-free learning of volatility surfaces, an open industrial challenge in quantitative finance.

09.
bioRxiv (Bioinfo) 2026-06-11

Robust semi-supervised scRNA-seq integration from virtual adversarial learning

Single-cell RNA sequencing integration methods that rely solely on transcriptomic data often struggle to preserve fine-grained distinctions between closely related cell subtypes. As a result, cell populations that are separable in the raw data may become over-mixed after integration, reducing biological resolution and interpretability. Incorporating marker gene information can potentially address these issues; however, the variability and complexity of available marker sets limit their effective application. To address this, we introduce scCRAFT+, a semi-supervised integration model that innovatively incorporates marker gene information through Virtual Adversarial Training (VAT). By jointly optimizing marker-derived supervision and transcriptome-wide representations, VAT enforces local prediction smoothness among transcriptionally similar cells, improving robustness to noisy marker annotations while enhancing both integration quality and cell type auto-annotation. This targeted approach significantly enhances annotation accuracy and robustness, particularly when faced with incomplete or incorrect marker gene sets. Benchmarking shows that scCRAFT+ achieves consistently stronger performance than current unsupervised and supervised integration approaches, resulting in improved integration quality and biologically meaningful sub-cell type auto-annotations.

10.
medRxiv (Medicine) 2026-06-15

Recruitment, Retention Approaches and Community Engagement in the THRIVE pilot Trial: Lessons Learned from a Food is Medicine Trial

Background: Recruitment of underrepresented populations, including Black and Hispanic populations, for Food is Medicine (FIM) and cardiovascular trials, may pose significant challenges. Methods: We implemented a multi-component recruitment approach for the THRIVE (AdapTive personalized dietitian coacHing and messaging with pRoduce prescrIptions to improVE healthy dietary behaviors) pilot trial to engage primarily Black and Hispanic adults in a Food is Medicine for hypertension intervention. The recruitment approaches included community engagement at approximately 40 community events (cultural festivals and neighborhood gatherings); partnerships with 8 community and faith-based service hubs and food distribution sites; recruitment through safety net primary care clinics, digital outreach via the study website, and social media campaigns; and direct recruitment at places of worship. We report lessons learned from the community engagement process, recruitment efficiency, representativeness, and retention outcomes. Results: Within 6 months, the enrollment target was exceeded by 40%, with an accrual index of 1.04. Over 1,000 individuals were reached through the direct-to-community engagement process, while faith-based partnerships engaged about 900 adults. There were 2,673 visits to the study webpage, and social media achieved 12,259 impressions with 399 clicks. About 95% of participants resided within 10 miles of the faith-based recruitment sites. Face-to-face engagement at the food distribution sites within faith-based organizations or community service hubs outperformed digital methods. Faith leader endorsements and follow-up in-person meetings (following unsuccessful email outreach) dramatically increased recruitment. Regarding retention, pre-randomization attrition was 6%, and 82% of participants completed the study. Conclusion: Culturally tailored, community-engaged recruitment grounded in faith-based and local community partnerships, was highly effective in engaging Black and Hispanic populations in this FIM cardiovascular trial. This provides a replicable model for implementing equitable and sustainable cardiovascular health interventions.

11.
arXiv (CS.AI) 2026-06-16

VGPT-RSI for RH-Adjacent Formal Progress: Boundary Certificates, Verified Finite Lagarias Inequalities, and Explicit Failure Localization

arXiv:2606.15096v1 Announce Type: new Abstract: The Riemann Hypothesis remains one of the central unsolved problems in mathematics. Rather than claiming proof, we investigate whether a verifiable AI-assisted reasoning system can produce reliable, formally checked partial progress while explicitly identifying the remaining mathematical obstructions. We apply the Verifiable Growing Physical Transformer with Recursive Self-Improvement (VGPT-RSI) to two RH-adjacent certification tasks. First, we construct and verify a finite RH-boundary certificate for inequality on a parameterized safe lower curve over a region. The numerical boundary curve is converted into a certificate-backed lower curve, audited using outward-rounded interval arithmetic and Arb/FLINT ball arithmetic, and then checked in Rocq/CoqInterval for the parameterized theorem. Second, we initiate a formal Lagarias-route certificate. Lagarias criterion states that RH is equivalent to the global inequality. We formalize the finite quantity and produce a Coq-checked finite certificate. The final system identifies the exact unresolved mathematical bottlenecks: formalizing the Lagarias equivalence, proving the global tail theorem beyond any finite cutoff, and potentially reducing counterexamples to colossally abundant or related extremal integers. These results demonstrate that VGPT-RSI can produce certified RH-adjacent formal progress, organize proof dependencies, and avoid overclaiming when the remaining obstruction is genuinely mathematical.

12.
arXiv (CS.AI) 2026-06-16

Quantum Machine Learning for Industrial Applications

arXiv:2606.14822v1 Announce Type: cross Abstract: Recent advances in Machine Learning have transformed numerous industrial sectors, yet classical paradigms face fundamental limitations: rapidly growing data volumes, rising computational costs, significant energy consumption, and the physical scaling limits of conventional hardware architectures. Quantum computing has emerged as a promising computational paradigm to address these challenges, giving rise to the field of Quantum Machine Learning (QML). In this thesis, the theoretical foundations of QML are investigated, with a focus on near-term and future practical applications. Three central challenges are addressed: the trainability of variational quantum circuits, their expressivity, and their resistance to efficient classical simulation. The trainability of Hamming-weight preserving variational quantum circuits is first studied, and theoretical guarantees are established that resolve an open conjecture on the absence of barren plateaus for this circuit family. Subspace-preserving QML algorithms are then introduced, including photonic circuits and quantum convolutional neural networks, and are designed to mimic classical ML subroutines while offering polynomial quantum advantage. Finally, variational quantum circuits are analyzed as quantum Fourier models, and a framework is derived to jointly characterize expressivity and trainability, from which conditions are obtained under which quantum models provably separate from their classical counterparts. These contributions are intended to advance the theoretical roadmap for harnessing near-term and future quantum technologies in real-world applications.

13.
medRxiv (Medicine) 2026-06-18

Looked but didn't see: inattentional blindness and yes-bias confabulation in vision-language models

Previous work showed that many participants fail to notice a gorilla in a video of people playing basketball. Another study found that 83% of trained radiologists failed to report a gorilla figure inserted into a chest CT nodule-search task, even though eye-tracking revealed that most observers had foveated the figure. We ask whether a similar phenomenon exists in contemporary vision-language models (VLMs). We find that (i) VLMs are capable of spotting the gorilla in both still-frame images and videos of lung CT scans; (ii) models display inattentional blindness, which varies according to model generation and type of stimulus presented; (iii) Gemini-3.1-Pro outperforms most other flagship and open-weight VLMs at identifying the presence or absence of the gorilla. We additionally ran a segmentation experiment utilizing two different model classes: a generalist (SAM 3), which found the gorilla but produced little to no results for anatomy-based prompts; a medical specialist (BiomedParse), which produced more promising anatomy-based results but flagged "gorilla" on gorilla-free control videos on 82% of frames. The behavioral signature of inattentional blindness reproduces in VLMs, but a unique confabulation failure mode means that any "did the model see X" claim requires signal-detection analysis with a matched-control false-alarm baseline.

14.
arXiv (quant-ph) 2026-06-17

Canonical regularization of the stationary Coulomb problem and an Aufbau-like spectral ordering

arXiv:2606.17359v1 Announce Type: new Abstract: The stationary hydrogen atom has Coulomb degeneracy across orbital levels, whereas the Aufbau/Madelung ordering is an empirical, many-electron rule established in atomic physics. We examine the hydrogen atom through a regularized de Broglie–Bohm representation, in which stationary amplitude current constraints generate separable Sturm–Liouville branches. In this formulation, the radial, orbital, and magnetic sectors acquire canonical Langer-like inverse square corrections. The modified boundary value problems allow analytical solutions and produce a hydrogen-like spectrum with regularized radial and angular indices. Consequently, radial Coulomb quantization acquires an orbital dependent shift, lifting the Coulomb degeneracy and producing a spectral ordering that follows the Aufbau/Madelung sequence. On this basis, we construct the ordering of the regularized de Broglie–Bohm states and show that the spectral structure retains the standard degenerate Rydberg sequence in the l=0 sector. The separated amplitudes are represented by generalized special function branches, including the associated Laguerre, Legendre, and Bessel functions with non-integral parameters arising from regularized separation. Therefore, the treatment is intended as an analytical examination of spectral ordering in a regularized one center Coulomb problem rather than as a replacement for the many electron atomic structure theory. Keywords: de Broglie–Bohm representation; Coulomb spectrum; canonical regularization; Langer correction; Sturm–Liouville equations; Aufbau principle; Madelung ordering; associated Legendre functions; associated Laguerre functions; Bessel functions.

15.
arXiv (CS.CV) 2026-06-16

LUCID: Learned Undersampling-Adaptive Consistency-Guided Inference with Deterministic Flow Matching for Sparse-View CT Reconstruction

Sparse-view CT reduces radiation dose and scanning time by acquiring fewer projection views, but angular undersampling makes reconstruction severely ill-posed, causing streak artifacts, structural blurring, and loss of fine details. Existing supervised methods are often tied to specific sampling settings, whereas generative methods may introduce anatomically inconsistent hallucination-like structures under severe undersampling. We propose Lucid, a sparsity-adaptive, consistency-guided reconstruction framework based on a Flow Matching generative prior for sparse-view CT. Lucid is trained only on high-quality CT images to learn a continuous transport between a Gaussian distribution and the high-quality CT image distribution, independent of view sampling. During inference, the sampling sparsity level is explicitly incorporated to adapt the generative trajectory of a single pretrained model. Specifically, Lucid constructs a degradation-matched initial state by sparsity-weighted fusion of the sparse-view FBP image and Gaussian noise, performs sparsity-modulated Flow Matching updates, and applies projection-domain data-consistency correction after each prior update. Experiments under multiple sparse-view settings show that Lucid achieves stable reconstruction performance across different sampling densities, improves image quality and structural fidelity, and reduces the risk of hallucination-like structures in generative sparse-view CT reconstruction.

16.
arXiv (CS.AI) 2026-06-16

Beyond Predefined Schemas: TRACE-KG for Context-Enriched Knowledge Graph Generation

arXiv:2604.03496v2 Announce Type: replace Abstract: Knowledge graph generation typically relies either on predefined ontologies or on schema-free extraction. Ontology-driven pipelines enforce consistent typing but require costly schema design and maintenance, whereas schema-free methods often produce fragmented graphs with weak global organization, especially in long technical documents with dense, context-dependent information. We propose TRACE-KG (Text-dRiven schemA for Context-Enriched Knowledge Graphs), a framework that jointly constructs a context-enriched knowledge graph and an induced schema without assuming a predefined ontology. TRACE-KG captures conditional relations through structured qualifiers and organizes entities and relations using a data-driven schema that serves as a reusable semantic scaffold while preserving full traceability to the source evidence. Experiments show that TRACE-KG produces structurally coherent, traceable knowledge graphs and offers a practical alternative to both ontology-driven and schema-free construction pipelines.

17.
arXiv (CS.CL) 2026-06-18

Sumi: Open Uniform Diffusion Language Model from Scratch

Diffusion models have become a promising alternative to autoregressive models. Among these, uniform diffusion language models (UDLMs) permit any token to be updated at any step, in principle enabling more flexible generation. However, no UDLM has yet been pretrained from scratch at both large parameter scale and large token budget. Both autoregressive modeling and masked diffusion modeling already have capable models at scale that the community can study and build on; uniform diffusion has none. A scratch-pretrained UDLM at scale would provide a clean reference point for studying scaling behavior, generation dynamics, controllability, and trade-offs against established autoregressive and masked diffusion models. To this end, we introduce Sumi ("ink" in Japanese), a fully open 7B uniform diffusion language model pretrained from scratch on 1.5T tokens. Sumi performs competitively with autoregressive models trained at comparable token budgets on knowledge, reasoning, and coding benchmarks, while under-performing on commonsense benchmarks, where our education-heavy data mixture is a likely contributor. We release our model weights, checkpoints, and full training recipe, including a complete specification of the data mixture over publicly available corpora. We hope this release enables the community to study native uniform diffusion at scale and catalyzes work on its as-yet poorly understood aspects.

18.
arXiv (CS.LG) 2026-06-15

Binary Black Hole Parameter Estimation with Hybrid CNN-Transformer Neural Networks

arXiv:2606.13941v1 Announce Type: cross Abstract: The detection of gravitational waves has revolutionized our ability to explore fundamental aspects of the Universe. Traditionally, modeled gravitational-wave signals have been identified using template-based matched filtering, followed by coincidence analysis across multiple detectors in the signal-to-noise ratio time series. Recent advances in Machine Learning and Deep Learning have sparked growing interest in their application to both signal detection and parameter estimation. In this study, a hybrid Deep Learning strategy is proposed that leverages the effectiveness of Transformer encoders alongside well-established Convolutional Neural Network architectures in an attempt to estimate the intrinsic and extrinsic parameters of non-precessing binary black hole systems. The primary focus of this work is point estimation, producing single best-fit values for each parameter rather than full posterior distributions. This method is evaluated on both simulated signals embedded in Gaussian noise and real gravitational-wave events, and it demonstrates strong predictive performance and robustness across key astrophysical parameters.

19.
arXiv (CS.CV) 2026-06-16

Power Battery Detection

Power batteries are essential components in electric vehicles, where internal structural defects can pose serious safety risks. We conduct a comprehensive study on a new task, power battery detection (PBD), which aims to localize the dense endpoints of cathode and anode plates from industrial X-ray images for quality inspection. Manual inspection is inefficient and error-prone, while traditional vision algorithms struggle with densely packed plates, low contrast, scale variation, and imaging artifacts. To address this issue and drive more attention into this meaningful task, we present PBD5K, the first large-scale benchmark for this task, consisting of 5,000 X-ray images from nine battery types with fine-grained annotations and eight types of real-world visual interference. To support scalable and consistent labeling, we develop an intelligent annotation pipeline that combines image filtering, model-assisted pre-labeling, cross-verification, and layered quality evaluation. We formulate PBD as a point-level segmentation problem and propose MDCNeXt, a model designed to extract and integrate multi-dimensional structure clues including point, line, and count information from the plate itself. To improve discrimination between plates and suppress visual interference, MDCNeXt incorporates two state space modules. The first is a prompt-filtered module that learns contrastive relationships guided by task-specific prompts. The second is a density-aware reordering module that refines segmentation in regions with high plate density. In addition, we propose a distance-adaptive mask generation strategy to provide robust supervision under varying spatial distributions of anode and cathode positions. The source code and datasets will be publicly available at \href{https://github.com/Xiaoqi-Zhao-DLUT/X-ray-PBD}{PBD5K}.

20.
arXiv (CS.CV) 2026-06-18

Taming I2V models for Image HOI Editing: A Cognitive Benchmark and Agentic Self-Correcting Framework

Current image editing methods excel at static attributes but fail at complex Human-Object Interactions (HOI), a critical challenge unaddressed by existing benchmarks that conflate HOI with static attributes, relying on global metrics incapable of simultaneously assessing dynamic interaction validity and entangled human-object pair preservation. Thus, we first introduce HOI-Edit, a comprehensive benchmark with three progressive cognitive levels, which features an automated metric HOI-Eval that reliably evaluates instance-level interaction by letting VLM Q&A after thinking with images containing grounded Human-Object pairs. Considering the task's essence of remodeling dynamic relationships, we benchmark Image-to-Video (I2V) models, finding them inherently suited for dynamic editing due to their temporal generation capabilities. Crucially, beyond superior performance, this capability provides a "replay of the failure process," offering unique diagnosability into why errors occur. We thus propose SCPE (Self-Correcting Process Editing), a novel, agentic self-correcting framework that constrains the generation of I2V models through iteratively refined prompts, enabling the generated videos to more accurately present the target HOI. Extracted frames from these videos are the final editing results. On HOI-Edit, SCPE achieves performance competitive with state-of-the-art (SOTA) editing models like Nano Banana on interaction. Code is available at https://github.com/oceanflowlab/HOI-Edit.

21.
arXiv (quant-ph) 2026-06-12

A refined thermodynamic analysis of nonsecular master equations

arXiv:2606.13504v1 Announce Type: new Abstract: We present a systematic thermodynamic analysis of nonsecular master equations. We consider master equations resulting either from the partial secular and the geometric-arithmetic approximations, two approximations ensuring the positivity of the system's dynamics when some of its transition frequencies are too small to enable the full secular approximation. Both cause the system to relax towards a steady state which is not the Gibbs state of its bare Hamiltonian. Nonetheless, we build a unified, consistent thermodynamic framework for those dynamics. Starting from a microscopic expression of the second law based on system-environment correlations, we employ a systematic perturbation theory to preserve the positivity of the second law despite the approximations done on the dynamics. We show that, in spite of the weak system-bath coupling, the system-bath interaction energy participates to the energy balance, as well as the Lamb-shift. Those extra contributions give rise to work performed by the system on the bath when the former is out of equilibrium. We compare this microscopic entropy production with the definition based on the contractivity of the reduced system dynamics (Spohn inequality). We show that, unlike for secular master equations, the two entropy production rates differ because of the presence of non-vanishing stationary coherences in the energy eigenbasis. However, in the case of a single thermal bath, the difference is purely transient, and no work can be cyclically extracted from the steady-state despite its non-Gibbs form. Finally, we illustrate our results with a simple example, clarifying and completing the thermodynamic picture of Markovian dynamics in the quantum regime.

22.
arXiv (CS.CV) 2026-06-12

UniDexTok: A Unified Dexterous Hand Tokenizer from Real Data

Dexterous hands are essential for fine-grained manipulation, but their hardware designs vary substantially across embodiments. Differences in kinematics, joint definitions, and degrees of freedom make it difficult to define a shared state representation compared with parallel grippers. As a result, dexterous-hand data remains fragmented and difficult to use for joint training. In this work, we propose the Unified Dexterous Hand Model (UDHM), which maps human and robot hand states into a shared 22-DoF semantic interface. Based on UDHM, we introduce UniDexTok, a retargeting-free state tokenizer that learns embodiment-conditioned discrete tokens from standardized real joint states. UniDexTok provides a unified representation for heterogeneous dexterous hands without relying on retargeting or simulation data. Compared with the recent baseline UniHM, UniDexTok reduces MPJAE from 15.63 degrees to 0.16 degrees and MPJPE from 18.51 mm to 0.18 mm, corresponding to error reductions of 98.98% and 99.03%, respectively. These results improve reconstruction from centimeter-scale to sub-millimeter accuracy. Experiments further show that data from other embodiments improves target-embodiment reconstruction accuracy, demonstrating the benefit of cross-embodiment tokenization. UniDexTok also shows strong zero-shot and few-shot reconstruction ability when new dexterous hands are introduced.

23.
arXiv (CS.CL) 2026-06-15

Right or Wrong, Models Comply: Directional Blindness in LLM Moral Judgment

As language models take integrated roles across many domains, the response of LLMs to user pushback becomes a critical alignment property. Yet many existing evaluations treat compliance as unidirectional, measuring whether models resist pressure but not whether they resist it selectively. We introduce Compliance Asymmetry (A = BCR/HCR), a bidirectional diagnostic that compares beneficial output change under helpful nudges with harmful change under misleading nudges. Across 9 models and 972,000 nudge-condition responses, we find that this selectivity differs in factual and moral judgments: models follow helpful nudges more than harmful ones on factual questions (A = 1.58), but follow both directions at nearly identical rates on moral questions (A = 1.04). This phenomenon persists across model families, capability levels, and nudging types. Interestingly, we also find that chain-of-thought prompting amplifies helpful and harmful compliance together, while identity-based prompting suppresses both by nearly identical margins. These results identify direction-blind moral compliance as a distinct failure mode in current LLMs and suggest that alignment should target directionally calibrated updating rather than lower compliance alone.

24.
medRxiv (Medicine) 2026-06-18

A Brain-Aging Transcriptomic Signature Reclassifies WHO Glioma Grade and Predicts Survival Independently of IDH Status: A Multi-Cohort Study

Background Despite WHO grade and IDH status, significant survival differences remain in diffuse gliomas. We hypothesized that a brain-aging transcriptomic signature, reflecting neuroinflammation, myeloid infiltration, and synaptic loss, would independently predict survival and allow for molecular reclassification. Methods A neurodegeneration score was derived via PCA of brain MRI volumes from 1,057 OASIS-3 subjects and projected onto 888 TCGA-LGG/GBM (discovery) and 693 CGGA gliomas (validation). A 14-gene signature of glial/myeloid (GFAP, AQP4, TYROBP, TREM2, C1QA, CD68, ITGAM) and neuronal (SYP, DLG4, GRIN1, GRIA1, SNAP25, SYN1, RBFOX3) genes were computed. Elastic-net Cox regression identified a 3-gene panel (C1QA, CD68, GRIA1). Kaplan-Meier, multivariate Cox, decision curve, and single-cell RNA-seq analyses were performed. Results High brain-aging scores predicted poorer overall survival (p < 0.0001) and remained an independent prognostic factor after adjusting for WHO grade and IDH status (z = 4.72, p < 0.001); chronological age was non-significant (p = 0.231). In IDH-mutant gliomas, significance was confirmed in both cohorts (TCGA p = 0.027; CGGA p < 0.0001). Bidirectional reclassification showed high-risk Grade 2 tumors with Grade 3-like survival (p = 0.00089), and indolent Grade 3 tumors resembling Grade 2 by Ki-67. Single-cell RNA-seq confirmed macrophage localization of signature genes; DCA demonstrated net benefit over grade alone at 5-30% probability thresholds. Conclusions A brain-aging transcriptomic signature independently predicts glioma survival beyond WHO grade and IDH status, validated in an independent Chinese cohort, with clinical utility for identifying high-risk Grade 2 and sparing over-treatment of indolent Grade 3 tumors.

25.
arXiv (CS.CV) 2026-06-11

MSUE: Multi-Modal Soccer Understanding Expert

This paper presents our solution to the 2026 SoccerNet VQA Challenge. We first develop a cost-effective data synthesis pipeline driven by a Vision-Language Model (VLM), which systematically restructures raw domain data into diverse VQA samples, including concise answers and long-form responses. Second, we propose MSUE, a multi-expert question answering architecture that employs a Large Language Model (LLM) to dynamically dispatch questions to text, image, and video experts. These experts are instantiated as a strong text baseline Gemini3-Flash, a fine-tuned Qwen3-VL, and an external knowledge base, respectively, working collaboratively to enhance VQA performance. MSUE achieves an accuracy of 0.95 on the challenge benchmark, securing third place in the leaderboard.