Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-15

Running the Gauntlet: Re-evaluating the Capabilities of Agents Beyond Familiar Environments

arXiv:2606.14397v1 Announce Type: new Abstract: As agentic systems continue to evolve and are widely deployed in real-world scenarios, there is a growing demand to faithfully evaluate their capabilities. However, current benchmarks are typically built on popular applications with relatively simple tasks and focus on a narrow set of capabilities while overlooking broader dimensions, resulting in saturated performance on modern agents and failing to probe their limitations. To this end, we introduce GauntletBench, a web-based benchmark for evaluating agent generalisation in challenging scenarios, focusing on three underexplored capabilities (temporal perception, graphical understanding, and 3D reasoning), across five less-covered professional applications (Video Editor, Workflow Builder, 3D Modeller, Flight Analyser, and Circuit Designer), each with 20 vision-intensive tasks (100 in total). Our benchmark provides a modular pipeline that comprises an environment compatible with both open- and closed-source agent frameworks, a controlled web-based application, a well-structured task suite, and an automated evaluation engine with diverse metrics. Contrary to widespread expectations, our empirical results reveal that frontier agentic systems remain far from achieving human-level performance. Even the state-of-the-art agent achieves only a 19.1% success rate on our GauntletBench, highlighting the limitations in these overlooked capabilities and generalisation. By comparison, non-expert human annotators achieve over 80% success on our challenging yet feasible tasks, revealing the substantial gap between current agent capabilities and those required for complex real-world scenarios.

02.
arXiv (CS.AI) 2026-06-12

Creating and Evaluating K-12 GenAI Assessment Graders Through Context Engineering

arXiv:2606.12422v1 Announce Type: cross Abstract: The integration of large language models (LLMs) into educational assessment represents a transformative shift in classroom grading practices. While automated scoring systems and machine learning techniques have existed for decades, generative AI (GenAI) now enables educators to implement standards-based grading (SBG) with unprecedented efficiency and scale. This paper examines the theoretical foundations and evaluates an LLM grader that uses commercially available foundation models with context and prompt engineering to score student work against a rubric. Drawing on an empirical interrater agreement study using Massachusetts Comprehensive Assessment System (MCAS) data, we observed the Quadratic Weighted Kappa (QWK) and Proportional Reduction in Mean-Squared Error (PRMSE) across mathematics, science, and ELA, using Claude Sonnet 4, Haiku 4.5, GPT-5, and GPT-5 Mini. The results demonstrate that LLM graders, especially when based on foundational models with more parameters, achieve substantial agreement with human raters in mathematics and science assessments, while the performances vary in ELA, suggesting generic foundation models can be effective at scoring in given contexts. Additional analysis of teacher and student feedback reveals strong acceptance of AI-generated narrative feedback but skepticism toward numerical scores, suggesting that LLMs function most effectively as formative tools rather than summative evaluators. Our findings indicate that thoughtfully designed hybrid models that combine AI efficiency with teacher judgment can reduce workload, enhance feedback quality, and support equitable assessment practices without displacing professional expertise.

03.
arXiv (quant-ph) 2026-06-19

Impossibility of superluminal signalling rules out causal loops in conical spacetimes

arXiv:2606.20476v1 Announce Type: cross Abstract: In PRL 129, 110401 it was shown that it is theoretically possible to have operationally detectable causal loops without violating the principle of no superluminal signalling (NSS) in (1+1)-Minkowski spacetime. Whether or not such causal loops are also possible in $d > 1$ spatial dimensions, has remained a key open question. We resolve this question by showing that in a wide class of "conical" spacetimes, including Minkowski with d > 1, NSS does rule out all operationally detectable causal loops, in classical, quantum and post-quantum theories. This establishes that the relationship between the relativistic principles of NSS and no causal loops depends inherently on the geometry of spacetime.

04.
arXiv (CS.CL) 2026-06-19

MedRLM: Recursive Multimodal Health Intelligence for Long-Context Clinical Reasoning, Sensor-Guided Screening, Evidence-Grounded Decision Support, and Community-to-Tertiary Referral Optimization

Real-world clinical decision support requires reasoning over heterogeneous and longitudinal patient information rather than answering isolated medical questions. However, current medical large language models and retrieval-augmented generation systems often rely on single-step prompting or retrieval, which can be fragile when clinical evidence is distributed across long electronic health records, medical images, sensor streams, guidelines, and referral constraints. This paper proposes MedRLM, a Recursive Multimodal Health Intelligence framework for long-context clinical reasoning, sensor-guided screening, and community-to-tertiary referral support. Instead of compressing all patient information into one prompt, MedRLM treats the patient case as an external clinical environment that can be recursively inspected, decomposed, retrieved, verified, and synthesized. The framework coordinates specialized agents for clinical text, longitudinal EHR, medical imaging, physiological sensor signals, guideline retrieval, uncertainty auditing, and referral planning. It further introduces a Clinical Evidence Graph Memory to connect patient-specific observations with retrieved evidence, standardized definitions, sensor-derived biomarkers, and referral criteria. A sensor-guided recursive triggering mechanism activates deeper reasoning when abnormal physiological or behavioral patterns are detected, while uncertainty-gated refinement supports clinician review for high-risk or low-confidence cases. We also outline a real-data evaluation design using public and credentialed clinical datasets spanning EHR, radiology, ECG, ICU time series, and referral-proxy outcomes. MedRLM aims to move medical AI from static question answering toward auditable, multimodal, and workflow-aware clinical decision support.

05.
arXiv (CS.LG) 2026-06-18

Optimal scenario design for climate emulation

arXiv:2606.19302v1 Announce Type: cross Abstract: As deep learning for physical systems continues to grow in popularity, efforts to improve generalizability have primarily focused on designing architectures that embed physical constraints. However, for machine-learning surrogate climate models (emulators), we show that the low structural diversity in existing scenarios commonly used to generate training data places a ceiling on predictive skill. Here, we examine whether training datasets themselves can be optimized to improve generalization. We introduce a method to create datasets that produce emulators capable of generalizing to new, structurally different scenarios absent from the training data. We use a differentiable Simple Climate Model (SCM) to calculate the sensitivity of emulator loss to perturbations in the training data, iteratively updating the training data to maximize emulator skill. For an SCM, training on one scenario optimized in this fashion outperforms an emulator trained on six standard ScenarioMIP pathways. We achieve this higher predictive skill despite training on a smaller dataset, finding that our emulator successfully isolates distinct physical behaviors of different climate forcing agents (e.g., greenhouse gases vs. aerosols) without single-forcing runs. We then demonstrate that scenarios optimized using an SCM, when used to drive an intermediate-complexity climate model, produce a training dataset that yields a more skillful emulator than training on ScenarioMIP outputs. Our results suggest that, in the compute-constrained environment of running full-scale climate models, generating a small number of dynamically rich scenarios provides greater marginal value for emulation and characterizing system responses than expanding the suite of traditional emissions pathways.

06.
arXiv (CS.AI) 2026-06-12

Reducing the Complexity of Deep Learning Models for EEG Analysis on Wearable Devices

arXiv:2606.12742v1 Announce Type: new Abstract: Wearable healthcare devices are the fastest-growing Internet of Things (IoT) sector. Many automated healthcare services rely on two crucial biological signals, namely ECG and EEG, which reflect the activity of the heart and brain, respectively. Although deep neural networks are considered the primary way to process and analyze these signals, the very tight energy and computational power constraints in wearable devices are far below the computational, energy, and memory bandwidth demands of DNN models, thereby impeding the deployment of deep learning in many practical wearable services. This paper investigates the feasibility of deploying state-of-the-art DNN models in resource-constrained wearable devices. Notably, we explore the trade-off between accuracy and computational complexity of DNNs when parameter quantization and electrode reduction methods are used. Our investigation centers on several state-of-the-art DNN models designed for EEG signal analysis, specifically for detecting epileptic seizures. Our findings demonstrate that, when applied judiciously, these techniques can significantly reduce the complexity of the DNNs under consideration with minimal adverse effects on accuracy. These results reveal the explicit trade-offs between accuracy and complexity reduction encountered when adapting DNN-based online EEG analysis for wearable devices.

07.
arXiv (CS.LG) 2026-06-16

High-Dimensional Random Projection for Activation Steering in Language Models

arXiv:2606.15092v1 Announce Type: new Abstract: Activation steering has emerged as a key methodology for controlling the behavior of large language models (LLMs). Existing difference-in-means based methods, however, are fundamentally limited: they capture only mean differences between class activations and fail to recover discriminative signals that naturally exist in the nonlinear feature subspace under the superposition hypothesis. Motivated by that, we propose High-Dimensional Random-projection for Activation Steering (HiDRA), a training-free approach that integrates seamlessly with existing activation steering methods. By performing activation addition in the projected high-dimensional space, HiDRA can provably capture a better discriminative structure beyond the reach of linear methods. Experiments across diverse LLM families and benchmarks demonstrate that HiDRA consistently outperforms baseline counterparts, achieving stronger behavioral control without significant computational overhead.

08.
arXiv (CS.AI) 2026-06-17

Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents

arXiv:2606.18223v1 Announce Type: cross Abstract: With sophisticated cyber-attacks becoming increasingly prevalent, modern networks require intelligent autonomous cyber-defense agents trained via Reinforcement Learning (RL). These agents employ neurosymbolic approaches such as behavior trees with learning-enabled components (LECs) to learn, reason, adapt, and implement security rules while maintaining critical operations. However, these autonomous networks are partially observable systems, i.e., the cyber-attacker's (red agent's) actions are not observable, making it difficult for the defender to predict red actions, learn red policies, or assess the attacker's intrusion levels. To address this, we propose a Policy Learning Technique using imitation learning to learn policies for partially observable RL agents with discrete states and discrete actions. We apply this technique in an autonomous cyber environment to predict red agent's actions from network observations and defender actions. Integrated with a neurosymbolic cyber-defense agent, our method effectively handles different red policies and achieves high prediction accuracy across diverse simulated scenarios.

09.
arXiv (CS.LG) 2026-06-16

Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion

arXiv:2312.06173v2 Announce Type: replace Abstract: Merging models fine-tuned from a common, extensively pre-trained large model but specialized for different tasks has been demonstrated as a cheap and scalable strategy to construct a multi-task model that performs well across diverse tasks. Recent research, exemplified by task arithmetic, highlights that this multi-task model can be derived through arithmetic operations on task vectors. Nevertheless, current merging techniques frequently resolve potential conflicts among parameters from task-specific models by evaluating individual attributes, such as the parameters' magnitude or sign, overlooking their collective impact on the overall functionality of the model. In this work, we propose the CONtinuous relaxation of disCRETE (Concrete) subspace learning method to identify a common low-dimensional subspace and utilize its shared information to track the interference problem without sacrificing much performance. Specifically, we model the problem as a bi-level optimization problem and introduce a meta-learning framework to find the Concrete subspace mask through gradient-based techniques. At the upper level, we focus on learning a shared Concrete mask to identify the subspace, while at the inner level, model merging is performed to maximize the performance of the merged model. We conduct extensive experiments on both vision domain and language domain, and the results demonstrate the effectiveness of our method. The code is available at https://github.com/tanganke/subspace_fusion

10.
arXiv (CS.LG) 2026-06-12

When to Align, When to Predict: A Phase Diagram for Multimodal Learning

arXiv:2606.11190v2 Announce Type: replace Abstract: Cross-modal alignment (CA) and cross-modal prediction (CP) are the dominant paradigms for multimodal representation learning, yet there is no systematic understanding of when each succeeds, when each fails, and when cross-modal training helps at all – a gap that leaves practitioners, especially in scientific domains like biomedicine or astrophysics, with heterogeneous instruments and multiple levels of organization and measurement, unable to diagnose why standard methods underperform the best single modality. We develop a unified linear framework that addresses both questions. Under a spiked signal-plus-noise model with structured cross-modal nuisance correlation, we derive separation ratios for both objectives that expose complementary failure modes: alignment whitens each modality and fails when nuisance is strongly correlated across views; prediction encodes whatever is cross-predictable through a one-sided whitening, with recovery governed by source-modality quality. The resulting phase diagram partitions multimodal problems into four regimes: Both, CA only, CP only, and Neither. We present a data-driven procedure to locate real-world datasets in this diagram using a small labeled subsample, identifying the preferred objective and prediction direction before any cross-modal training. Experiments on synthetic data, stereo-vision benchmarks, image-caption pairs, and real astrophysical data validate the predictions in the nonlinear regime, including the Neither regime where cross-modal training is actively harmful. Our framework lets practitioners diagnose their multimodal problem and choose the right objective before committing to training. Code to reproduce the results is available at https://github.com/IlayMalinyak/mm_align_vs_pred.

11.
Nature (Science) 2026-06-10

Structural basis for chaperone-guided assembly of RNA-induced silencing complex

The RNA-induced silencing complex (RISC), comprising an Argonaute (AGO) protein and a small RNA, is the central effector in RNA silencing. Small RNAs are loaded onto AGO as bulky duplexes in an HSP70- and HSP90-dependent process1–3, but the molecular mechanism remains poorly understood. Here we identify the human AGO–HSP90–p23 complex, which captures AGO in an RNA-free state, termed the AGO maturation complex (AMC). The purified AMC enables RNA loading and AGO folding, faithfully recapitulating de novo RISC assembly. Using cryogenic electron microscopy, we determined the structure of AMC bound to a microRNA duplex. In contrast to its conformation in the RISC, AGO adopts a highly open conformation in the AMC: the N domain and the RNA-binding module (PAZ–MID–PIWI) are fully detached and anchored to opposite sides of the HSP90 dimer, connected solely by the unfolded L1 linker. This arrangement exposes a positively charged cleft that accommodates an RNA duplex. AGO folding is facilitated by a small RNA duplex containing a 5′-terminal phosphate—but not by single-stranded RNAs—revealing a role for the RNA duplex as a chaperone-like cofactor that directs AGO domain assembly. These findings elucidate the RISC assembly mechanism and establish the AMC as a molecular tool for probing optimal RNA features and chemical modifications for the rational design of small interfering RNA therapeutics. Our study also sheds light on how chaperones, together with ligands, can guide the folding of client proteins. Structures of the AGO maturation complex reveal how chaperones and an RNA duplex drive assembly of the RNA-induced silencing complex.

12.
arXiv (CS.AI) 2026-06-11

PermDoRA – Understanding Adapter Interference in Language Models: Limits of Parameter-Space Geometry

arXiv:2606.11262v1 Announce Type: cross Abstract: Access control in large language models (LLMs) requires modular mechanisms to enable domain-specific behavior without retraining or cross-domain interference. A common hypothesis is that interference during adapter composition arises from overlap in linear parameter updates, suggesting that enforcing orthogonality or directional independence should improve multi-domain performance. We test this hypothesis using DoRA-RBAC, a hierarchical adapter composition framework based on weight-decomposed low-rank adaptation. We compare conventional Euclidean merging with a geometry-aware Riemannian-inspired merging strategy that approximates the Frechet mean via normalized directional averaging across multiple QA benchmarks (GPQA, PubMedQA, SimpleQA, WMDP) on LLaMA-3.1-8B and Mistral-7B. Our results show that while single-domain performance matches LoRA, geometry-aware merging provides no consistent advantage over standard averaging in multi-domain settings.Diagnostic analysis further reveals that angular alignment and orthogonality of adapter updates are weak predictors of composition performance. These findings suggest that adapter interference is not governed primarily by parameter-space geometry, but is instead consistent with interactions in shared nonlinear representations.

13.
arXiv (CS.CV) 2026-06-16

A Comprehensive Survey of Knowledge-Based Vision Question Answering Systems: The Lifecycle of Knowledge in Visual Reasoning Task

Knowledge-based Vision Question Answering (KB-VQA) extends general Vision Question Answering (VQA) by not only requiring the understanding of visual and textual inputs but also extensive range of knowledge, enabling significant advancements across various real-world applications. KB-VQA introduces unique challenges, including the alignment of heterogeneous information from diverse modalities and sources, the retrieval of relevant knowledge from noisy or large-scale repositories, and the execution of complex reasoning to infer answers from the combined context. With the advancement of Large Language Models (LLMs), KB-VQA systems have also undergone a notable transformation, where LLMs serve as powerful knowledge repositories, retrieval-augmented generators and strong reasoners. Despite substantial progress, no comprehensive survey currently exists that systematically organizes and reviews the existing KB-VQA methods. This survey aims to fill this gap by establishing a structured taxonomy of KB-VQA approaches, and categorizing the systems into main stages: knowledge representation, knowledge retrieval, and knowledge reasoning. By exploring various knowledge integration techniques and identifying persistent challenges, this work also outlines promising future research directions, providing a foundation for advancing KB-VQA models and their applications.

14.
arXiv (CS.CV) 2026-06-19

Training-Free Metrics for Synthetic Object Detection Data: A Proxy for Detector Performance

With the recent advent of image generative models, synthetic data are increasingly being used to supplement limited real datasets for training computer vision models. However, not all synthetic datasets improve performance equally, and their effectiveness can only be assessed by training a downstream model, which is computationally expensive and time-consuming. This problem is pronounced in the task of object detection, where the required annotations are much more dense due to bounding boxes. In this paper, we propose a pre-computable metric family, dubbed Conditional-Composition Domain Match (CCDM), which serves as a proxy for the relative utility of candidate synthetic training sets for downstream detection. Experiments on the VisDrone-DET dataset show that the CCDM metric families achieve a Spearman correlation of 1.0 with the downstream performance of YOLOv8, clearly outperforming existing metrics for synthetic image evaluation.

15.
arXiv (CS.CL) 2026-06-17

ZeroSyl: Simple Zero-Resource Syllable Tokenization for Spoken Language Modeling

Pure speech language models aim to learn language directly from raw audio without textual resources. A key challenge is that discrete tokens from self-supervised speech encoders result in excessively long sequences, motivating recent work on syllable-like units. However, methods like Sylber and SyllableLM rely on intricate multi-stage training pipelines. We propose ZeroSyl, a simple training-free method to extract syllable boundaries and embeddings directly from a frozen WavLM model. Using L2 norms of features in WavLM's intermediate layers, ZeroSyl achieves competitive syllable segmentation performance. The resulting segments are mean-pooled, discretized using K-means, and used to train a language model. ZeroSyl outperforms prior syllabic tokenizers across lexical, syntactic, and narrative benchmarks. Scaling experiments show that while finer-grained units are beneficial for lexical tasks, our discovered syllabic units exhibit better scaling behavior for syntactic modeling.

16.
arXiv (CS.LG) 2026-06-16

Design and Scheduling of an AI-based Queueing System

arXiv:2406.06855v3 Announce Type: replace-cross Abstract: To leverage prediction models to make optimal scheduling decisions in service systems, we must understand how predictive errors impact congestion due to externalities on the delay of other jobs. Motivated by applications where prediction models interact with human servers (e.g., content moderation), we consider a large queueing system comprising of many single server queues where the class of a job is estimated using a prediction model. By characterizing the impact of mispredictions on congestion cost in heavy traffic, we design an index-based policy that incorporates the predicted class information in a near-optimal manner. Our theoretical results guide the design of predictive models by providing a simple model selection procedure with downstream queueing performance as a central concern, and offer novel insights on how to design queueing systems with AI-based triage. We illustrate our framework on a content moderation task based on real online comments, where we construct toxicity classifiers by finetuning large language models.

17.
arXiv (CS.AI) 2026-06-19

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

arXiv:2606.20474v1 Announce Type: cross Abstract: Context-heavy agents place unusual pressure on the key-value (KV) cache: long prefixes are reused across many short turns, while concurrency determines whether the serving system can keep GPUs utilized. We study 4-bit KV-cache compression for this setting, using TurboQuant-style rotation and codebook quantization as a quality anchor and vLLM FP8 KV caching as the deployment anchor. We report three contributions. First, we frame 4-bit KV caching around multi-round agent workloads where task quality, cache residency, and serving throughput must be measured jointly. Second, we describe the practical design choices needed to make the 4-bit path robust, including asymmetric K/V treatment, Walsh-Hadamard rotation, QJL removal, and block-scale variants. Third, we present serving optimizations on AMD GPUs, including optimized decode-attention kernels and UltraQuant, an FP4 approximation path that uses FP8 queries, FP4 KV tensors, UE8M0 group scales, and native scaled-MFMA support on CDNA4. On a long-context, multi-turn agentic workload, UltraQuant cuts P50 time-to-first-token by 3.47x in the cache-pressured late rounds (2.3x across all rounds) and raises output throughput by 1.63x over the FP8 KV baseline.

18.
arXiv (CS.CL) 2026-06-18

CoreMem: Riemannian Retrieval and Fisher-Guided Distillation for Long-Term Memory in Dialogue Agents

Personalized dialogue agents require continuous long-term memory to maintain coherent interactions across multiple sessions. However, deploying these capabilities on consumer-grade hardware (e.g., 8 GB VRAM edge devices) introduces severe memory and compute bottlenecks. Existing systems typically rely on isotropic cosine similarity for retrieval and heuristic rules for context compression. These approaches lack a unified theoretical foundation, frequently suffering from the hubness problem in high-dimensional retrieval and syntactic fragmentation during compression. To overcome these limitations, we propose CoreMem, a resource-efficient edge-cloud memory architecture fundamentally unified by information geometry. First, Riemannian retrieval replaces cosine matching with a locally adaptive Fisher-Rao metric, effectively penalizing hub memories via Mahalanobis distance with O(Ndr) Woodbury acceleration for real-time search. Second, Fisher-guided discrete token distillation (FDTD) introduces a hierarchical sentence-to-token compression mechanism. It derives sensitivity scores from Fisher information traces, providing a principled compression-KL tradeoff augmented with explicit structural syntax protection. Evaluated on the LOCOMO and LongMemEval-S benchmarks, CoreMem achieves strong accuracy improvements, yielding substantial gains in Open-domain (+4.51 pp) and Temporal (+4.17 pp) reasoning. Extensive profiling confirms that CoreMem operates seamlessly within a strict 8 GB VRAM budget, successfully bridging the gap between resource-constrained edge devices and the demand for theoretically grounded, lifelong memory agents.

19.
medRxiv (Medicine) 2026-06-16

Re-evaluating the Cross-Sectional Prevalence of Severe Age-Related Hearing Loss Using Extreme Value Statistics

作者:

Standard demographic models of age-related hearing loss (presbycusis) predominantly utilize symmetric functions, such as log-normal distributions for age-binned thresholds and 4-parameter logistic curves for prevalence estimates. While these models capture early-to-moderate degradation effectively, they structurally struggle to characterize the heavy tails associated with severe clinical impairment. In this study, we present a statistical critique using a secondary analysis of the historical Medical Research Council (MRC) National Study of Hearing (1980-1986) dataset. By applying Generalized Extreme Value (GEV) distribution theory, we demonstrate that as severity increases, the underlying statistical geometry of hearing loss shifts. The asymmetric, heavy-tailed GEV distribution provides a parsimonious description of severe impairment, requiring fewer parameters than standard symmetric models. However, we explicitly acknowledge that utilizing static population data to infer progression introduces an ecological fallacy. Furthermore, the dataset's historical nature embeds unquantified generational cohort effects. We conclude that while extreme value statistics offer a compelling mathematical framework for modeling the variance of severe presbycusis, true longitudinal datasets are required to isolate physiological degradation from historical cohort variance.

20.
arXiv (CS.CV) 2026-06-11

VOID: Defeating Unauthorized Mimicry in Latent Diffusion Models

While Latent Diffusion Models (LDMs) have revolutionized visual synthesis, they are increasingly exploited for unauthorized mimicry of individuals. Existing defenses inject deceptive perturbations to steer the generated images toward irrelevant targets. However, this approach hinges on an ungrounded assumption: subtle perturbations can maintain their deceptive efficacy throughout an LDM's extensive generation process. In reality, the model's innate restoration mechanism will remove such perturbations and cause individual identities to re-emerge in the images generated. We propose VOID, a defense framework that overcomes this conundrum by manipulating an LDM's intrinsic stochasticity. VOID perturbs the diffusion pipeline in two novel ways: 1) amplifying the latent encoding errors to shatter an image's semantic structure, and 2) counteracting the target guidance signals to suppress the model's restoration capabilities. This results in a semantic corruption that thwarts any unauthorized mimicry. Notably, the security gain does not come at the price of visual utility, as VOID simultaneously manages to confine perturbations to human-imperceptible regions of protected images. Our comprehensive evaluation of 24 state-of-the-art defenses against 10 mimicry attacks on 5 datasets demonstrates VOID's unprecedented protection power: it increases the average Frechet Inception Distance (FID) from 113 to 365, a 223% improvement over the strongest defense to date.

21.
bioRxiv (Bioinfo) 2026-06-10

Folding the unfoldable 2: using AlphaFold and ESMFold to explore spurious proteins

Motivation: Spurious protein sequences, resulting from gene prediction errors, theoretically should not yield folded structures. AlphaFold2 was previously shown to predict short spurious sequences with high pLDDT scores and was therefore unlikely to distinguish between real proteins and spurious proteins which are usually short. We evaluate whether newer structure prediction methods (ESMFold and AlphaFold3) similarly predict short sequences with high pLDDT or if they better discriminate between spurious and real proteins. Results: All three structure prediction methods (ESMFold, AlphaFold2, and AlphaFold3) predict short spurious sequences from AntiFam with unexpectedly high pLDDT scores, however the discrimination between spurious and real proteins improves beyond 100 amino acids. By analysing sequences with disparate pTM and pLDDT scores, we identified two likely spurious shadow ORFs in Swiss-Prot and one potentially non-spurious AntiFam entry. Using the structure prediction scores, we developed a Gaussian Process Model and evaluated its performance on AlphaFold DB, identifying potential spurious proteins at scale. While limited on its own, this model can increase confidence in spurious protein identification when combined with other methods.

22.
arXiv (CS.CL) 2026-06-16

Exploring Extrinsic and Intrinsic Properties for Effective Reasoning with Code Interpreter

Reasoning with a Code Interpreter (CI) has emerged as an effective paradigm for enhancing the reasoning capabilities of large language models (LLMs) through executable computation and iterative verification. Despite its growing adoption, the behavioral properties underlying effective code reasoning remain largely underexplored. In this work, we investigate code reasoning from two distinct perspectives inspired by prior studies of natural language reasoning: extrinsic properties, represented by crucial tokens, and intrinsic properties, represented by code-specific cognitive behaviors. Across multiple LLMs, we find that stronger CI reasoning models consistently exhibit a higher prevalence of crucial tokens and cognitive behaviors, particularly verification, backtracking, and backward chaining. Building on these observations, we examine how these properties can be leveraged during both inference and training. At inference time, appending code-specific crucial tokens improves performance on several reasoning capabilities, including mathematical, ordering, and optimization, while yielding limited benefits elsewhere. At training time, augmenting a state-of-the-art framework with code-specific cognitive behaviors improves supervised fine-tuning and reinforcement learning performance in two of three evaluated models. Further analysis shows that these behaviors reduce overthinking in incorrect responses and improve token efficiency, while also revealing factors that limit gains in a certain model. Our findings provide the first systematic characterization of effective reasoning with CI and demonstrate both the potential and limitations of leveraging key properties to improve CI-based reasoning.

23.
arXiv (CS.CV) 2026-06-11

TopoHR: Hierarchical Centerline Representation for Cyclic Topology Reasoning in Driving Scenes with Point-to-Instance Relations

Topology reasoning is crucial for autonomous driving. Current methods primarily focus on instance-level learning for centerline detection, followed by a sequential module for topology reasoning that relies on simplified MLP layers. Moreover, they often neglect the importance of point-to-instance (P2I) relationships in topology reasoning. To address these limitations, we present TopoHR (Topological Hierarchical Representation), a novel end-to-end framework that establishes cyclic interaction between centerline detection and topology reasoning, allowing them to iteratively enhance each other. Specifically, we introduce a hierarchical centerline representation including point queries, instance queries, and semantic representations. These multi-level features are seamlessly integrated and fused within a hierarchical centerline decoder. Furthermore, we design a hierarchical topology reasoning module that captures both fine-grained P2I relationships and global instance-to-instance (I2I) connections within a unified architecture. With these novel components, TopoHR ensures accurate and robust topology reasoning. On the OpenLane-V2 benchmark, TopoHR refreshes state-of-the-art performance with significant improvements. Notably, compared with previous best results, TopoHR achieves +3.8 in $\mathrm{DET}_{l}$, +5.4 in $\mathrm{TOP}_{ll}$ on $subset_A$ and +11.0 in $\mathrm{DET}_{l}$, +7.9 in $\mathrm{TOP}_{ll}$ on $subset_B$, validating the effectiveness of the proposed components. The code will be shared publicly at https://github.com/Yifeng-Bai/TopoHR.git.

24.
arXiv (quant-ph) 2026-06-16

What does measuring one qubit reveal about another? $K$-networks as a directed diagnostic for quantum circuits

arXiv:2606.16549v1 Announce Type: new Abstract: Many-qubit circuit states are hard to inspect directly, so they are often summarized by pairwise graph weights. Common pairwise weights report symmetric correlations, while many circuit questions are directed and basis-specific: if qubit $i$ is measured in a given basis, how strongly does the outcome reshape the conditional state of qubit $j$? We define $K_{i\to j}$, a directed, basis-conditioned edge weight for this question. It is large when the two measurement outcomes occur with comparable probability and leave qubit $j$ in clearly different conditional states; it is zero when the source outcome is deterministic or the target states are indistinguishable. The scalar uses standard binary-ensemble distinguishability; the paper's contribution is to turn this conditional comparison into a directed network layer for circuit states. The resulting networks are computable from two-qubit reduced density matrices. They are diagnostic (not entanglement measures): for pure two-qubit states $K$ reduces to the tangle $C^2$ (squared concurrence)[WoottersConcurrence,CKWTangle], while separable mixed states can reach $K=1$. Examples on teleportation, Grover, QAOA, and random circuit families show the intended use: $K$-networks map feed-forward, phase, and interaction-graph structure that symmetric or computational-basis summaries can leave weak or absent.

25.
arXiv (CS.CV) 2026-06-16

High-Fidelity 4D Hand-Object Capture via Multi-View Spatiotemporal Tracking and Physics-Aware Gaussians

The growing demand for high-fidelity 4D hand-object interaction (HOI) data in embodied AI and spatial computing is currently bottlenecked by the reliance on pre-scanned object templates and physical markers. While recent methods have demonstrated promising results in reconstructing 4D hand-object interaction from videos, they are highly sensitive to initial estimates of hand and object poses. Yet, estimating these poses from images is challenging, in particular under severe occlusion which is inherent in hand-object interaction scenarios. We propose a novel system for the robust and accurate reconstruction of hands and objects from synchronized and calibrated multi-view videos without requiring any templates or markers. Our system consists of two main components with key innovations: (1) a multi-view feed-forward transformer model that aggregates cross-view geometry and temporal cues to provide a reliable, metric-consistent initialization for both poses and dense object geometry, and (2) a hand-object physics-aware Gaussian-based optimization framework to refine the initial estimates, integrating tetrahedral constraints, collision refinement, and appearance decomposition to produce physically plausible and visually accurate reconstruction. Validated on public benchmarks and an extensive internal dataset, our pipeline achieves highly robust, artifact-free reconstruction, providing an efficient foundation for automated 4D asset generation. Our project page are available at https://zyshen021.github.io/HOSTPG/.