Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CL) 2026-06-15

SuperThoughts: Reasoning Tokens in Superposition

Long Chain-of-Thought (CoT) reasoning improves LLM problem-solving but is computationally expensive due to sequential token generation. While recent works explore reasoning in continuous latent spaces to bypass discrete token generation, they often struggle with training stability and fail to scale to complex, long-horizon tasks due to lack of supervision signal. We propose SuperThoughts, which compresses pairs of consecutive CoT tokens into single latent representations and decodes two tokens per step via a lightweight Multi-Token Prediction (MTP) module. This preserves discrete token supervision at training time while doubling throughput at inference time. We finetune Qwen2.5-Math-1.5B-Instruct, Qwen2.5-Math-7B-Instruct, Qwen2.5-Math-14B-Instruct, and evaluate on MATH500, AMC, OlympiadBench, and GPQA-Diamond. With a confidence-based adaptive mechanism that falls back to standard decoding when uncertain, SuperThoughts achieves $\sim$20–30\% CoT length reduction while maintaining accuracy with minimal degradation (1-2 points accuracy drop on most tasks).

02.
medRxiv (Medicine) 2026-06-10

Assessment of the accuracy of lung lesions diagnosis in adolescents with osteosarcoma using artificial intelligence

Background. Lung metastases in osteosarcoma (OS) are the main cause of the death. The accuracy of the diagnosis of nodules by computed tomography (CT) of the lungs is critically important for determining the disseminated stage of the disease and planning surgical treatment. The use of artificial intelligence (AI) in the search for lung nodules increases the accuracy of diagnosis and reduces the chance of missing metastases. Objective: to evaluate the accuracy of lung nodules diagnosis in adolescents with OS using AI. Methods. A retrospective assessment of CT scans of adolescents with OS was performed. A pathological nodule with an average size of [≥]4 mm was considered a target finding. The diagnostic accuracy of an AI algorithm previously trained on an adult dataset was evaluated, and the number of false positives (FP) and false negatives (FN) was determined. Sensitivity, specificity, accuracy, area under the ROC curve (AUC), positive predictive value, negative predictive value, and F1-measure were calculated. Based on the obtained results, the effectiveness of the algorithm was assessed. Results. 248 CT scans of adolescents with OS were evaluated. The following results were obtained: in 5 cases, the AI algorithm showed a FP result (2.02%), in 34 cases, it showed a FN result (13.71%), and in 209 cases, a correct result (both true positive and true negative) (84.27%). The diagnostic accuracy of the algorithm was 0.843 (95% CI 0.794-0.887). The application of the AI algorithm in the practice of an X-ray doctor in a specific clinical task would allow to increase the sensitivity from 0.805 to 0.891, while ensuring an absolute decrease in the number of FN results by 8.59% and a relative decrease by 44%. Conclusion. The obtained results confirm the practical value of the application of the AI algorithm and justify the implementation of AI-assisted systems in the diagnostic protocols for lung metastases in adolescents with OS.

03.
arXiv (CS.CL) 2026-06-15

MET-Bench: Multimodal Entity Tracking for Evaluating the Limitations of Vision-Language and Reasoning Models

Entity state tracking is a necessary component of world modeling that requires maintaining coherent representations of entities over time. Previous work has benchmarked entity tracking performance in purely text-based tasks. We introduce MET-Bench, a multimodal entity tracking benchmark designed to evaluate the ability of vision-language models to track entity states across modalities. Using three domains, we assess how effectively current models integrate textual and image-based state updates. Our findings reveal a significant performance gap between text-based and image-based entity tracking. We empirically show this discrepancy primarily stems from deficits in visual reasoning rather than perception. We further show that explicit text-based reasoning strategies improve performance, yet limitations remain, especially in long-horizon multimodal tasks. We apply reinforcement learning to improve entity tracking in open-source VLMs. This yields substantial in-modality gains, but does not transfer robustly across input modalities. Our results highlight the need for improved multimodal representations and reasoning techniques to bridge the gap between textual and visual entity tracking.

04.
arXiv (CS.AI) 2026-06-11

Lung-R1: A Knowledge Graph-Guided LLM for Pulmonary Diagnostic Reasoning

arXiv:2606.11675v1 Announce Type: new Abstract: Diagnosing pulmonary diseases requires integrating heterogeneous evidence amid phenotypic variability and cross-disease overlap. Although large language models (LLMs) have shown progress on pulmonary knowledge question answering (QA) and information-processing tasks, reliable pulmonary diagnosis requires patient-specific, relation-aware reasoning over electronic medical record (EMR) evidence rather than isolated knowledge recall. We define this gap between pulmonary knowledge and case-level diagnostic reasoning as the Pulmonary Knowledge-to-Diagnosis Gap. To address it, we introduce LungKG, the first structured pulmonary knowledge graph for diagnostic knowledge organization and record-grounded reasoning. LungKG contains 59,038 nodes and 164,308 edges across 15 entity types and 112 relation types, serving as both a reusable pulmonary knowledge resource and the foundation for LungKG-guided model adaptation. Built on LungKG, we propose Lung-R1, a LungKG-guided pulmonary LLM trained through KG-constrained reasoning-chain construction and KG-guided reinforcement learning. In a 20-system evaluation, Lung-R1-14B achieves state-of-the-art performance across Choice, Pulmonary-QA, and EMR Diagnosis, reaching an EMR Diagnosis score of 4.3583 and surpassing the strongest non-Lung-R1 baseline by 0.1476 points. These results demonstrate the value of LungKG-guided training for EMR-based pulmonary diagnosis.

05.
arXiv (CS.AI) 2026-06-11

Runtime Skill Audit: Targeted Runtime Probing for Agent Skill Security

arXiv:2606.11671v1 Announce Type: cross Abstract: Agent skills let LLM agents reuse instructions, resources, tools, and workflows, but they also create a new place for malicious behavior to hide. A skill may look benign in its documentation or code while becoming harmful only when it is invoked with particular user requests, local assets, persistent state, or multi-step tool interactions. This makes purely static vetting brittle. We present Runtime Skill Audit (RSA), a dynamic analysis method that audits skills by asking what the skill-mediated agent actually does under targeted runtime conditions. Instead of testing every skill with the same generic tasks, RSA profiles risk-relevant interfaces, prepares the execution context needed to exercise them, and assigns security labels from the resulting trace evidence. We instantiate RSA on OpenClaw and evaluate it on 100 skills against representative static baselines. RSA achieves 90.0\% accuracy with an 88.0\% true positive rate and an 8.0\% false positive rate, improving accuracy by 13.0 percentage points over the best static baseline. Under self-evolving attacks, static detectors collapse after one or two rounds, while RSA continues to detect 19–20 out of 20 malicious skills across rounds.

06.
arXiv (CS.CL) 2026-06-17

Prompt Perturbation for Reliable LLM Evaluation over Comparison Graphs

Evaluating large language models (LLMs) is important for understanding their capabilities, comparing competing systems, and supporting the deployment of reliable models in practice. For open-ended tasks, pairwise evaluation has become a popular paradigm, in which two responses to the same prompt are compared and the resulting judgments are aggregated into an overall ranking. A central challenge of this paradigm is intransitivity: the induced comparison outcomes may fail to support any coherent global ranking. For example, one may observe cyclic preferences such as $A \succ B \succ C \succ A$, or inconsistencies involving ties such as $A \equiv B\equiv C\neq A$. Such contradictions make the resulting leaderboard unstable and challenging to interpret. In this paper, we propose a prompt perturbation framework for improving the consistency of pairwise LLM evaluation. Our approach generates perturbed variants of each prompt, uses the resulting comparison graphs to identify and filter out structurally inconsistent comparison patterns, and then applies standard ranking methods to the filtered comparisons. A key feature of the proposed framework is that graph-level structural consistency is incorporated explicitly into the evaluation pipeline before ranking aggregation. This provides a simple and principled way to reduce cyclic inconsistencies and improve the reliability of LLM rankings.

07.
arXiv (CS.LG) 2026-06-11

Beyond the Golden Teacher: Enhancing Graph Learning through LLM-GNN Co-teaching

arXiv:2606.11583v1 Announce Type: new Abstract: Text-attributed graphs (TAGs) underlie real-world applications such as citation networks, social media, and e-commerce. Few-shot graph learning on TAGs is hard: with only a handful of labels per class and the rest of the graph unannotated, neither GNNs nor LLMs can learn well on their own. GNNs read topology and fail on cold nodes; LLMs read text and fail on text-ambiguous nodes. Existing LLM-GNN methods all follow the same recipe: designate one model as the golden teacher and use its outputs (e.g., features or pseudo-labels) to supervise the other. We argue this golden-teacher assumption breaks under sparse supervision: neither model is golden, and treating either as such transfers its blind spots into the student. We therefore ask: can we avoid designating either model as the golden teacher, and still perform effective graph learning? We answer with LLM-GNN Co-Teaching, a bidirectional co-teaching framework in which neither model is fixed as teacher. The GNN and LLM exchange their most confident pseudo-labels under an architecture-specific small-loss criterion, and both update every round. Supervision is then mined from the trajectory: whenever a node moves from cross-model contradiction at round t to cross-model agreement at round t+1, the LLM's two answers on the same input form a preference pair (old contradicting self < new peer-endorsed self) for DPO training. We call this Round-based Pseudo-Label Preference Optimization (RPL-PO). On six benchmarks, LLM-GNN Co-Teaching consistently outperforms GNN-as-Judge and all prior methods, with absolute 3-shot gains of 7.86% on Cora and 7.73% on ogbn-arxiv; improvements carry over to 5-shot and to zero-shot cross-dataset transfer. Error-structure analysis further shows that abandoning the golden-teacher assumption substantially improves the LLM's graph learning capability on challenging samples.

08.
arXiv (CS.AI) 2026-06-16

FlowMPC: Improving Flow Matching policies with World Models

arXiv:2606.16286v1 Announce Type: cross Abstract: Flow Matching (FM) is a powerful approach for behavior cloning in multimodal action spaces [Jiang et al., 2025], but because it is not trained to directly maximize expected return, there is still room to improve how FM policies act at test time. This work investigates whether a learned world model can improve FM policies by enabling Model Predictive Path Integral (MPPI) planning over candidate action sequences proposed by the policy. Building on TD-MPC2 [Hansen et al., 2024], I introduce FlowMPC, a framework that combines an imitation-learned FM policy with a learned world model for test-time planning in ManiSkill manipulation tasks [Tao et al., 2025]. Across PickCube and PickSingleYCB, adding the world model improved performance over the FM policy alone, with especially clear gains in end-of-episode success. These results suggest that world-model-based planning can effectively complement flow-based imitation policies without modifying the FM training objective.

09.
Nature (Science) 2026-06-10

A prognostic human brain network for diffuse midline glioma

作者:

Diffuse midline gliomas (DMGs) are near-universally lethal tumours of the&nbsp;childhood central nervous system1,2. In animal models, DMGs form brain-wide integrated networks through neuron-to-glioma synapses3–6 and glioma-to-glioma gap junctional coupling3. This extensive connectivity robustly promotes the growth and invasion of DMG3–9 and other glial malignancies10–12 through paracrine mechanisms and direct neuron-to-glioma synapses. However, the organization and clinical implications of these connections in the living human brain remain to be elucidated. Here, we develop tumour network mapping to compute the brain-wide connectivity profile of DMG, defining a conserved brain network across pontine and thalamic DMG associated with patient short-term survival (DMG network). Tumour functional connectivity with the DMG network was independently predictive of patient overall survival across two external validation cohorts. Tumour growth mapped to DMG&nbsp;network-specific trajectories and peak in-network neurometabolic changes across development spatiotemporally aligned with the peak age incidence of DMG. Analyses of single-nucleus RNA&nbsp;sequencing data&nbsp;confirmed diverse synaptic gene enrichment in high-connectivity DMG. Strikingly, incidental surgical resection of high-connectivity thalamic DMG tissue conferred a significant survival advantage. Collectively, these data define a conserved and prognostically important brain network in children with DMG, consistent with the hypothesis that DMGs exploit otherwise healthy brain circuits to promote tumour growth. Tumour network mapping of diffuse midline glioma&nbsp;(DMG) defines a conserved and prognostically important brain network in children with DMG, consistent with the hypothesis that DMGs exploit otherwise healthy brain circuits to promote tumour growth.

10.
arXiv (CS.CL) 2026-06-11

A PubMed-Scale Dataset of Structured Biomedical Abstracts

Structured abstracts are important for biomedical literature processing, by facilitating information retrieval, text mining, and knowledge synthesis. However, a vast portion of abstracts indexed in PubMed remain unstructured, presenting a significant bottleneck for downstream text-processing workflows and applications. To resolve this limitation, we introduce Structured PubMed, a comprehensive corpus of section-labeled biomedical abstracts compiled from the complete PubMed database, encompassing over 23.2 million research-article records. The corpus is divided into two distinct subsets: a collection of 5.9 million author-structured abstracts parsed from official XML files, and an automatically labeled collection of 17.2 million originally unstructured abstracts structured via a verbatim-extraction Large Language Model pipeline. Every record is harmonized under a unified five-section schema and mapped to its original PubMed identifier, publication type, and publication date. This dataset can be utilized to train sentence-classification models, benchmark text-segmentation architectures, and perform large-scale, section-specific information extraction at an unprecedented PubMed-wide scale.

11.
arXiv (CS.CV) 2026-06-16

DLWM: Diverse Latent World Models for Efficient Multimodal Reasoning

Reasoning capabilities of multimodal large language models (MLLMs) have improved considerably in recent years. Existing approaches typically rely on explicit chain-of-thought or continuous latent-space trajectories to enhance multi-step reasoning. However, these methods generally assume that an input admits a single latent interpretation and unfold reasoning along a fixed path or under a uniform computation budget. In real-world multimodal settings, visual observations are often subject to occlusion, blur, viewpoint variation, or semantic ambiguity, giving rise to multiple plausible interpretations. A uniform reasoning strategy not only limits the model's ability to explore multiple hypotheses but also incurs high memory usage and rollout cost. We present DLWM (Diverse Latent World Models), a multimodal reasoning framework that combines latent-space reasoning with reinforcement learning. First, we construct a set of diverse latent world hypotheses in continuous latent space, each capturing a different plausible interpretation of the visual input, and unfold latent reasoning independently on each hypothesis. An orthogonality-based diversity regularizer explicitly prevents hypothesis collapse. Second, we formulate the latent reasoning process as a resource-constrained sequential decision problem and introduce a resource-aware reinforcement learning policy that adaptively allocates computation across hypotheses, dynamically deciding whether to expand, terminate, or merge reasoning paths, thereby substantially reducing memory footprint and improving rollout efficiency. Experiments on multiple multimodal reasoning benchmarks demonstrate that DLWM outperforms existing methods by 2-5 points in accuracy while reducing memory usage by 24%.

12.
arXiv (quant-ph) 2026-06-12

Entanglement Detection by Approximate Entanglement Witnesses

arXiv:2402.14755v2 Announce Type: replace Abstract: The problem of determining whether a given quantum state is separable is known to be computationally difficult. We develop an approach to this problem based on approximations of convex polytopes in high dimensions. By showing that a convex polytope constructed from a finite number of hyperplanes approximates the Euclidean ball arbitrarily well in high dimensions, we find evidence that a finite set of approximate entanglement witnesses is potentially sufficient to determine the entanglement of a state with high probability.

13.
arXiv (math.PR) 2026-06-19

Model-independent upper bounds for the prices of Bermudan options with convex payoffs

arXiv:2503.13328v3 Announce Type: replace-cross Abstract: Suppose $\mu$ and $\nu$ are probability measures on $\mathbb{R}$ satisfying $\mu \leq_{cx} \nu$. Let $a$ and $b$ be convex functions on $\mathbb{R}$ with $a \geq b \geq 0$. We are interested in finding $$\sup_{\mathbf{M}} \sup_{\tau} \mathbb{E}^{\mathbf{M}} \left[ a(X) I_{ \{ \tau = 1 \} } + b(Y) I_{ \{ \tau = 2 \} } \right] $$ where the first supremum is taken over consistent models $\mathbf{M}$ (i.e., filtered probability spaces $(\Omega, \mathbf{F}, \mathbb{F}, \mathbb{P})$ such that $Z=(z,Z_1,Z_2)=(\int_{\mathbb{R}} x \mu(dx) = \int_{\mathbb{R}} y \nu(dy), X, Y)$ is a $(\mathbb{F},\mathbb{P})$ martingale, where $X$ has law $\mu$ and $Y$ has law $\nu$ under $\mathbb{P}$) and $\tau$ in the second supremum is a $(\mathbb{F},\mathbb{P})$-stopping time taking values in $\{1,2\}$. Our contributions are first to characterise and simplify the dual problem, and second to completely solve the problem under some structural assumptions on the measures $\mu$ and $\nu$ (namely that $\mu$ and $\nu$ are absolutely continuous probability measures that satisfy the Dispersion Assumption). A key finding is that the canonical set-up in which the filtration is that generated by $Z$ is not rich enough to define an optimal model and additional randomisation is required. This holds even though the marginal laws $\mu$ and $\nu$ are atom-free. The problem has an interpretation of finding the robust, or model-free, no-arbitrage bound on the price of a Bermudan option with two possible exercise dates, given the prices of co-maturing European options.

14.
arXiv (CS.AI) 2026-06-18

Surrogate Benchmarks for Model Merging Optimization

arXiv:2509.02555v2 Announce Type: replace-cross Abstract: Model merging techniques aim to integrate the abilities of multiple models into a single model. Most model merging techniques have hyperparameters, and their setting affects the performance of the merged model. Because several existing works show that tuning hyperparameters in model merging can enhance the merging outcome, developing hyperparameter optimization algorithms for model merging is a promising direction. However, its optimization process is computationally expensive, particularly in merging LLMs. In this work, we develop surrogate benchmarks for optimization of the merging hyperparameters to realize algorithm development and performance comparison at low cost. We define two search spaces and collect data samples to construct surrogate models to predict the performance of a merged model from a hyperparameter. We demonstrate that our benchmarks can predict the performance of merged models well and simulate optimization algorithm behaviors.

15.
arXiv (CS.CV) 2026-06-15

Gaze Heads: How VLMs Look at What They Describe

How a vision-language model internally solves the task of describing an image is far from obvious. We find that the model develops a specific mechanism for this: a small set of attention heads in its language-model backbone, which we call gaze heads, whose attention tracks the image region the model is currently describing. We find them with a simple correlation score from a few forward passes, using comic strips as a controlled testbed where narrative order is laid out spatially. These gaze heads do not just track the image tokens being described: redirecting their attention to a chosen region forces the VLM to describe that region instead. A single attention-mask intervention on the top-100 gaze heads, fewer than 9% of all heads, steers the model's answer to any chosen comic panel at 83.1% accuracy, while the same intervention on random heads fails to redirect the answer, and intervening on all heads destroys generation. The same lever also extends to continuous control: switching the gaze target mid-generation makes the model wrap up its current panel description and move to the new one within a few tokens. Beyond comics, the same intervention redirects answers to chosen regions in natural COCO images. The mechanism further recurs across model sizes from 2B to 32B parameters and across other VLM architectures, although some frozen-encoder families show no comparable head set. More broadly, this shows that targeted edits identified through mechanistic analysis can serve as practical inference-time levers for steering multimodal model behavior, without any retraining. Our code, interactive demo, and datasets are available at https://gaze.baulab.info/

16.
arXiv (CS.CV) 2026-06-18

Show, Don't Ask: Generative Visual Disambiguation for Composed Image Retrieval with Turn-Valid Coverage

Composed image retrieval (CIR) uses a reference image and a text modification to search for a target image. However, such queries often describe several possible images rather than one exact target, making the user's intent ambiguous. Recent methods address this by using conformal prediction to estimate ambiguity and by asking users clarifying text questions. However, these methods have two limitations: their coverage guarantee only holds at the first interaction, and text questions are often insufficient for resolving fine-grained visual differences such as appearance, attributes, or viewpoint. We propose CLARA, a clarification framework that resolves ambiguity by showing users a small panel of visual alternatives. Instead of answering text questions, the user simply selects the prototype image closest to the intended target. This provides a direct visual signal and avoids relying on a model to predict the user's answer. To maintain valid conformal guarantees across multiple interaction rounds, CLARA reweights calibration using the likelihood ratio induced by the user's selection. The displayed prototypes are also constrained to represent the current candidate set and are snapped to real corpus images, ensuring that generated images cannot artificially improve coverage. Experiments on open-domain and fashion benchmarks show that CLARA matches single-turn state-of-the-art retrieval performance, maintains nominal coverage across interaction rounds, and finds the intended target in fewer rounds than strong text-question baselines. Its advantage is especially clear when ambiguity involves viewpoint or fine-grained attributes, where visual clarification is more effective than textual questioning.

17.
arXiv (CS.AI) 2026-06-12

Transformer Field Theory: A Response-Theoretic Approach to Mechanistic Interpretability

arXiv:2605.25225v2 Announce Type: replace-cross Abstract: Mechanistic interpretability often studies Transformer behavior by intervening on internal activations through activation patching, causal tracing, path patching, and steering directions. This paper develops Transformer Field Theory: a response-theoretic framework in which the residual stream of a fixed forward pass is treated as a Transformer field over layer depth and token position. In this formulation, patching becomes a localized source insertion into the Transformer field, first-order sensitivity fields predict patch effects, Green functions describe downstream propagation, and patch selection is posed as an adjoint inverse problem. Empirically, we test the theory's forward response objects in GPT-2-style autoregressive Transformers. Localized Transformer-field interventions exhibit a bounded local linear regime; first-order sensitivities predict patch effects across layer-token sites; localized sources generate structured anisotropic Transformer-field propagation; high-sensitivity sites and sliced Green operators provide reduced response descriptions; and prompt-induced Transformer-field displacements partially transfer answer behavior. These results establish sensitivities, Transformer-field responses, and sliced Green operators as practical objects for organizing patching experiments, while providing the forward mathematical basis for patch-site inference and cross-scale response transfer.

18.
medRxiv (Medicine) 2026-06-17

Nickel and Dimed: How a Common Earth Element is Short-Changing Our Health

Nickel has been studied for a long time as an environmental contaminant but less so in its connection to population health. It does not announce itself as loudly as its transition metal brethren like mercury and cadmium, but its chemical properties permit it to be deleterious as a low-dose, chronic exposure, particularly among those with immune systems sensitized to it. There is a growing evidence base and vocabulary to discuss nickel's affect on health. However, in the U.S., there are not recent, reliable estimates of the share of the population with a nickel allergy, let alone how much nickel Americans are exposed to through their diet. This paper seeks to close this evidence gap by creating a new dataset of dietary nickel and other heavy metal exposure and assessing how high levels of dietary nickel exposure shape local demand for health care services. We use soil data from the U.S. Geological Survey and data on agricultural product transport from FoodFlows.org to create a county-level dietary nickel exposure index. We then use a large electronic health record database and double machine learning to estimate how demand for primary care services varies across levels of dietary nickel exposure. We find that counties with high nickel exposure experience an increase in the share of primary care office visits for symptoms highly suggestive of nickel poisoning. This result survives multiple hypothesis test corrections and placebo tests. Our research suggests that nickel has harmful effects on individual health whose exposure can be measured at a population level, and is shaping primary care across the U.S.

19.
arXiv (CS.CV) 2026-06-12

Point-Wise Geometry-Aware Transformer for Partial-to-Full Point Cloud Registration in Computer-Assisted Surgery

Partial-to-full registration remains challenging due to varying overlap ratios, fluctuating point densities, and the presence of noise. While transformers have shown strong potential for point cloud processing, prior methods typically confine them to global context aggregation, overlooking fine-grained local geometry crucial for accurate correspondence. We propose GAPR-Net, a learning-based point cloud registration framework with a coarse-to-fine architecture that combines convolution and transformer modules, in which local and global information is fused between the partial and full point clouds using a cross-attention mechanism. To achieve this, a transformation-invariant point-wise geometric feature representation is proposed, which can robustly capture relative geometric features for individual points with respect to their neighboring points. To evaluate the effectiveness of the proposed approach, experiments are conducted on four geometrically distinct bones, including the tibia, femur, pelvis, and thoracic cartilage. The overall registration recall reaches 94.2\%, the method results in a low RMSE of 1.992 mm and $R^2$ values of 0.908 and 0.974 for rotation and translation, respectively. The results demonstrate that the proposed method effectively addresses the partial-to-full point cloud registration problem. The proposed method enables highly accurate 3D point cloud registration using partial observation, providing a critical foundation for precise surgical navigation and robotic interventions in computer-assisted surgery. The code will be accessed after the double-blind review process.

20.
arXiv (CS.AI) 2026-06-11

OmniBioTwin: A System-of-Twinned-Systems Framework for Health Digital Twins

arXiv:2606.11264v1 Announce Type: cross Abstract: Health digital twins (HDTs) promise patient-specific modeling and decision support but current approaches remain structurally fragmented: monolithic models that address a single organ or task lack cross-scale fidelity, while system-level twins lack generalizable architectural frameworks. We propose OmniBioTwin, a System-of-Twinned-Systems (SoTS) framework that organizes HDTs as modular computational entities coupled through explicit interaction operators within a multi-layer network architecture. The framework comprises seven coordinated layers - spanning data integration, autonomous twin modeling, cross-scale coupling, temporal synchronization, and human-in-the-loop decision support. We demonstrate OmniBioTwin by instantiating a multiscale twin for glucagon-like peptide-1 (GLP-1) signaling pathways in Alzheimer's disease, illustrating how molecular, cellular, and organ-level twins can be composed and coupled within a unified system.

21.
arXiv (quant-ph) 2026-06-16

Witnessing Spin-Orbital Entanglement using Resonant Inelastic X-Ray Scattering

arXiv:2512.06718v2 Announce Type: replace Abstract: Entanglement plays a central role in quantum technologies, yet its characterization and control in materials remain challenging. Recent developments in spectrum-based entanglement witnesses have enabled new strategies for quantifying many-body entanglement in macroscopic materials. Here, we develop a protocol for detecting spin-orbital entanglement using experiment-accessible resonant inelastic x-ray scattering (RIXS). Central to our approach is the construction of a Hermitian generator from experimentally measurable spectra, which allows us to compute the quantum Fisher information (QFI) available in spin–orbital systems. The resulting QFI provides upper bounds for $k$-producible states and thus serves as a robust witness of spin-orbital entanglement. To account for realistic experimental limitations, we further extend our framework to include relaxed QFI bounds applicable to measurements lacking full polarization resolution.

22.
arXiv (CS.CL) 2026-06-19

From 50K to 8.2 Million in 24 Hours: Vozinha's Algorithmic Consecration and the Multilingual Making of World Cup Visibility

We present a multilingual computational discourse analysis of how language constructed the algorithmic consecration of Vozinha, the 40-year-old Cape Verde goalkeeper, after Spain 0-0 Cape Verde at the 2026 FIFA World Cup. The study contributes a multilingual corpus in Portuguese, Spanish, English, and French; a nine-frame narrative taxonomy with cue-based frame annotation; a reproducible annotation pipeline combining LLM-assisted suggestion with human validation; and an analysis of cross-lingual narrative diffusion across discourse phases. We treat the platform follower count itself, narrated as "50k to 8M", as a linguistic object: a circulating and narratable proof of visibility rather than a mere measurement. The follower-growth timeline is used only as contextual metadata: we reconstruct a conservative phase structure, not a continuous API-native series, and type every datapoint by value class, confidence, and evidence type. The only exact primary scraper anchor is 8,235,652 followers at 2026-06-16 15:47 UTC; all other figures are reported as estimated ranges or thresholds, including an estimated pre-match baseline of 45k-56k. Findings suggest that distinct languages carried distinct frames: Portuguese mobilization, Spanish crisis, English nation-making, and a shared platform-metric spectacle through which peripheral athletic performance became globally visible. As a v0.1 pilot, the paper releases the corpus schema, frame taxonomy, annotation guidelines, hashed visual-evidence log, and typed timeline, while flagging full double annotation and inter-annotator agreement as planned work.

23.
arXiv (CS.LG) 2026-06-18

Quantifying and Auditing LLM Evaluation via Positive–Unlabeled Learning

arXiv:2606.19057v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM–as–a–Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable positive judgments but leaving most outputs unlabelled and potentially mixed in quality. We formulate LLM evaluation under selective human supervision as a positive–unlabelled learning problem and propose a geometric auditing framework based on Partial Optimal Transport. By aligning a small set of human–verified positives with a reliable subset of unlabelled outputs in a fixed embedding space, our method identifies human–consistent preferences and corrects biased judges without retraining. Experiments demonstrate improved alignment with human preferences, increased robustness to presentation biases, and interpretable confidence estimates, offering a scalable and statistically grounded alternative to existing LLM–as–a–judge pipelines.

24.
arXiv (quant-ph) 2026-06-19

Quantifying Imaginarity in Neutrino Systems

arXiv:2412.01871v2 Announce Type: replace-cross Abstract: It is a fundamental question why quantum mechanics employs complex numbers rather than solely real numbers. In this work, we conduct the first analysis of imaginarity quantification in neutrino flavor and spin-flavor oscillations. As quantum systems in coherent superposition, neutrinos are ideal candidates for quantifying imaginarity within the resource theoretic framework, using measures such as the $\ell_1$-norm and the relative entropy of imaginarity. We show that in the case of two-flavor mixing, these measures of imaginarity are nonzero. The measures of imaginarity reach their extreme values when the probabilistic features of quantum theory are fully maximized, i.e., both the transitional and survival probabilities are approximately equal. Our study reveals that the imaginarity, as a resource, can be harnessed not solely from the presence of a complex phase in the mixing matrix but also from the intrinsic quantum dynamics of time evolution itself. We further extend our analysis to explore the dynamics of three-flavor neutrino mixing, incorporating the effects of a nonzero $CP$ phase.

25.
arXiv (CS.AI) 2026-06-19

DynAMO:Dynamic Asset Management Orchestration via Topological Multi-Agent Scheduling

arXiv:2606.19382v1 Announce Type: cross Abstract: While LLM-powered agents offer end-to-end automation for industrial asset lifecycles, real-world Industry 4.0 deployment is hindered by latency, concurrency instability, and safety risks. We present DynAMO (Dynamic Asset Management Orchestration), a deployment-ready engine using a Plan-then-Execute architecture to generate verifiable workflow graphs. DynAMO supports both SequentialWorkflow (topological execution) and ParallelWorkflow (dependency-aware concurrency). By dynamically identifying independent tasks, DynAMO preserves structural correctness and safety while significantly improving efficiency through controlled reasoning overlap. Across six controlled experiments on the AssetOpsBench industrial benchmark, DynAMO demonstrates substantial performance and robustness gains. Parallel execution reduces end-to-end latency by a median of 1.6x over sequential orchestration, rising to 1.8x on highly parallelizable workflows. After instrumenting external tool calls with realistic latencies, a latency decomposition shows that LLM reasoning and orchestration still account for more than 90% of execution time, identifying model inference as the primary system bottleneck. Structured context pruning reduces inference latency by approximately 30%, and DynAMO maintains correct functional behaviour (task completion, agent sequencing, and output quality) while exhibiting graceful degradation under controlled fault injection. Reproducibility analysis further confirms stable execution under repeated runs, with parallel scheduling reducing latency variance. These findings establish DynAMO as a practical blueprint for scalable, safe, and latency-aware agent deployment in Industry 4.0 automation pipelines. Code is available at: https://github.com/kushwaha001/DynAMO