The halo effect: how academic hierarchy undermines peer review and enables fraud
Letter to the Editor
Academic Intelligence · Curated Daily
AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。
Letter to the Editor
arXiv:2508.18625v2 Announce Type: replace Abstract: Portfolio optimization lies at the core of quantitative finance and aims to determine how assets should be allocated to balance expected returns against risk. It can be formulated as a Quadratic Unconstrained Binary Optimization (QUBO) problem, which is NP-hard. Quantum computing offers the potential to solve such problems more efficiently than classical methods. In this work, we employ the Variational Quantum Eigensolver (VQE) to address the portfolio optimization problem. To increase the likelihood of converging to high-quality solutions, we propose using the Weighted Conditional Value-at-Risk (WCVaR) as the cost function and the Covariance Matrix Adaptation Evolution Strategy (CMA-ES) as the optimizer. Our experiments are conducted using both classical simulations and quantum hardware on the Wuyue QuantumAI platform. Together, these results demonstrate that the combination of WCVaR and CMA-ES improves the performance of VQE for portfolio optimization and provides a practical route for applications on NISQ devices.
arXiv:2606.25929v1 Announce Type: new Abstract: This work develops a phase-based formulation of relativistic wave dynamics, demonstrating that the Klein–Gordon equation emerges naturally from the foundational assumption of intrinsic phase periodicity in material fields. Mapping the phase directly onto the classical action, we postulate that localized excitations possess an invariant rest-frame oscillation governed by a proper frequency $\omega_0$. This physical condition establishes an operational mass-frequency relation, $m = \hbar \omega_0 / c^2$, without requiring rest energy as an independent, axiomatic input. We show that the Klein–Gordon equation arises as the minimal local, linear, Lorentz-invariant field equation compatible with this internal phase structure. Within this framework, mass acts as an intrinsic frequency scale governing wave propagation, and relativistic kinematics is fully recovered as a structural consequence of phase coherence. This approach provides a unified wave-mechanical interpretation where particle dynamics maps onto the group velocity of dispersive wave packets, offering an intuitive account of free propagation, dispersion, and tunneling across potential barriers.
arXiv:2606.25358v1 Announce Type: new Abstract: Assessing financial literacy during gameplay without disrupting the learning experience remains a key challenge in serious games for education. We present the Agentic BKT pipeline, a multi-agent large language model architecture for stealth assessment of financial competencies from open-ended gameplay events. The pipeline processes events from a 2D platformer serious game aligned with the OECD/INFE financial literacy framework through four phases: (1) the game captures every player decision as a structured event log; (2) an LLM event classifier labels each action on a four-point rubric validated against three domain experts (Fleiss kappa = 0.624, substantial agreement); (3) four domain-specific agents specializing in risk mitigation, investing, spending, and credit management perform session-level reasoning over behavioral trajectories, feeding per-competency Bayesian Knowledge Tracing that estimates mastery within each domain; and (4) an expert judge agent synthesizes the domain-level estimates into an overall mastery score. Evaluated with 193 K-12 participants across 264 game sessions, the Agentic BKT pipeline yields mastery estimates significantly correlated with learning gain (r = 0.276, p = 0.0001) and post-test scores (r = 0.333, p < 0.0001) while showing no correlation with pre-test scores, providing both convergent and discriminant validity. The multi-agent approach approximately triples the predictive validity of a single-LLM baseline (r = 0.095, not significant) in this study, demonstrating that domain decomposition and session-level reasoning play a central role in capturing the multidimensional nature of financial literacy from gameplay
arXiv:2602.00510v2 Announce Type: replace Abstract: Most LLM code-synthesis benchmarks rely on unit tests as the reward oracle, but PCB schematic design has none: correctness is defined by structured physical constraints over real IC packages and pin-level assignments, per-task golden references are unavailable, and SPICE simulation does not validate schematic-level correctness. We introduce PCBSchemaGen, a training-free inference-time framework that turns a frozen LLM into a verifiable, repairable PCB schematic generator. The framework induces a domain schema from IC datasheets to ground LLM decoding, pairs it with a deterministic 5-layer continuous-reward verifier with pin-level error localization, and refines candidates through a Thompson Sampling arm-acquiring bandit. We evaluate on 2 PCB benchmarks covering 227 real-IC tasks across 22 unified circuit domains, including a public-schematic-derived suite that serves as a fully held-out generalization test (verifier, KG library, and prompts frozen before any evaluation). Under our framework, an open-weight 31B model (Gemma-4-31B) passes 81.3% of PCBBench tasks on average, and the same framework transfers across both benchmarks with zero verifier code changes; a Circuitron-style inference-time prompting baseline on the same Gemma-4-31B backbone collapses on hard system-level designs. This suggests inference-time refinement under a deterministic structural verifier is a general recipe for reference-free LLM code synthesis in domains without unit-test oracles. Our benchmarks and deterministic verifier are publicly available at https://github.com/HZou9/PCBSchemaGen_v2.
Multi-modal fusion and multi-model ensembling are prevalent in enhancing the performance of 3D semantic segmentation. Despite the impressive performance, these methods either rely on auxiliary input signals or suffer from costly computational expense. To efficaciously enhance the segmentation performance without introducing intolerable costs, we propose to transfer the rich knowledge from the multi-modal model (i.e., point clouds and images) and multiple model experts to the point-cloudbased network through knowledge distillation. Specifically, we present Information-oriented Heterogeneous Distillation (IHD) to help the uni-modal model absorb the complementary knowledge from the multi-modal teacher. We design the Information-Oriented Filtering (IOF) strategy to select informative images from the continuous image sequence for multi-modal fusion. This practice can boost the performance of the multi-modal teacher, thus benefiting the learning of the student. Besides, as opposed to vanilla model ensembling that requires the separate training of each expert, we propose Adept Snapshot Distillation (ASD). ASD treats the freely available model snapshots generated during the training phase as multiple experts, which significantly reduces the training cost for model ensembling. For each expert teacher, it only provides supervision to the student in the class where it is adept. The resulting Heterogeneous and Adept Snapshot Knowledge Distillation, dubbed HAS-KD, attains state-of-the-art results on ScanNetV2 and S3DIS datasets. HAS-KD can be seamlessly integrated into contemporary 3D segmentation algorithms and bring considerable gains without introducing extra inference burdens. The code will be made publicly available upon publication.
Existing vision-language model (VLM)-based AI-generated image quality assessment (AIGIQA) methods suffer from a fundamental semantic-distortion dimensional conflict: monolithic representations optimized for semantic discrimination inherently entangle compositional understanding with low-level perceptual sensitivity, rendering them blind to fine-grained quality degradations. We introduce MST-CLIPIQA, a multi-scale two-stream framework that achieves hierarchical vision-language alignment through explicit representational decoupling. Our architecture leverages dual CLIP encoders with complementary patch granularities: coarse-grained streams capture global semantic coherence while fine-grained streams preserve textural signatures and artifact patterns. An information bottleneck-inspired gated fusion mechanism performs adaptive cross-scale distillation, with optional cross-attention enabling prompt-anchored correspondence evaluation when generation prompts are available. Extensive experiments across five benchmarks establish new state-of-the-art results, achieving average improvements of 1.11 percent SRCC on quality and 2.35 percent SRCC on text-image correspondence prediction, while maintaining efficiency with only 0.8M trainable parameters. Our project is available at https://github.com/YMlinfeng/MST-CLIPIQA.
Cerebrovascular research heavily relies on quantitative analysis of intracranial arteries from time-of-flight magnetic resonance angiography, yet existing processing pipelines remain limited by inconsistent artery labeling and a high manual correction burden. We present ArteryX, a toolbox for extracting features that standardizes artery classification across proximal and distal vascular territories. It integrates segmentation handling, isotropic processing, vessel-fused graph construction, and constrained landmark-based classification within a unified artery-specific feature reporting and reproducible workflow. The toolbox extracts morphological, topological, and complexity features including total length, mean radius, volume, surface area, branch count, tortuosity, and fractal dimensionality for standardized artery-segments. Test-and-validation were performed using three complementary datasets: (1)TopBrain-Challenge benchmarking with annotated arteries, (2)synthetic known-reference validation, and (3)exploratory in-vivo cohort of cerebral small vessel disease. In TopBrain analyses, ArteryX with supervised nnUnet segmentation showed minimal bias, while iCafe showed the highest bias and a large limit-of-agreement. ArteryX consistently demonstrated robust downstream quantification performance across segmentation sources (unsupervised/supervised). Agreement analyses showed minimal bias for radius and good sensitivity of extent-dependent metrics throughout the noisier segmentations compared to the state-of-the-art iCafe-toolbox. Furthermore, a stage-wise human-in-the-loop protocol showed lower intervention time than iCafe. In an in-vivo-cohort (48CSVD+, 20CSVD-), ArteryX-derived distal and territory-level features showed group-level differences, not evident with iCafe. To facilitate adoption-and-reproducibility, ArteryX is designed with versioned builds, tutorials, and documentation.
arXiv:2606.20107v1 Announce Type: new Abstract: Optimal Reinforcement Learning (RL) algorithms typically rely on carefully constructed count-based uncertainty estimates to drive exploration. Although theoretically sound, such estimates are hard to compute in practical settings and therefore offer limited insight for designing exploration heuristics. Meanwhile, ensembling has emerged as a practical approach, but remains without theoretical justification. Building on a recent ensemble-based method for Multi-Armed Bandits, we propose a quantile-based ensemble method for finite-horizon Markov Decision Processes (MDPs). Our simple count-free approach achieves optimal variance-dependent regret bounds, providing theoretical grounding for ensemble-based exploration in RL.
Object pose estimation is crucial to many industrial applications, with one example being automated spray painting using a robot. However, confidentiality concerns often limit access to high-quality 3D models, posing a significant challenge for point-cloud-based pose estimation. In such scenarios, rotational symmetry, a readily accessible characteristic of many industrial objects, can provide valuable prior information to facilitate pose estimation.In this paper, we propose a method that leverages the rotational symmetry commonly found in industrial objects to address the challenge caused by the absence of 3D models. The object pose is jointly estimated with point cloud refinement through an iterative optimization process. This optimization relies on a rotational symmetry constraint loss. To construct this loss, each 3D point is rotated according to the currently estimated pose, and multiple correspondences are identified using nearest-neighbor search by exploiting the rotational symmetry property. These correspondences are then used to compute the rotational symmetry constraint loss, which iteratively refines both the pose and the point cloud.By explicitly incorporating rotational symmetry into the optimization process, the proposed method achieves robust pose estimation and generalizes well across diverse object types. The proposed method is evaluated on a dataset specifically created for point clouds without known 3D models, consisting of four categories of synthetic objects and one real wheel hub collected from a production line. Experimental results demonstrate that the proposed method achieves performance comparable to methods that rely on known 3D models.
arXiv:2509.25148v2 Announce Type: replace Abstract: Post-training alignment of large language models often combines supervised fine-tuning (SFT) on expert demonstrations with reinforcement learning (RL) from preference or verifiable feedback. SFT provides a useful behavioral anchor but can overfit to static demonstrations, whereas RL encourages exploration but may drift from expert behavior or exploit imperfect rewards. We propose AAPA (Adversarially Anchored Preference Alignment), a plug-in framework that augments existing post-training objectives with a sentence-level adversarial anchoring signal. AAPA compares policy rollouts with offline, pre-collected expert responses using a fixed lightweight discriminator, and therefore requires neither online teacher inference nor discriminator co-training during policy optimization. The same anchoring term can be added to SFT, GRPO, and CHORD while preserving their original training pipelines. Experiments on instruction-following benchmarks show that AAPA consistently improves the corresponding base objectives across model scales. In particular, the staged AAPA configuration improves over a strong GRPO baseline by 5.77\% on \texttt{Qwen3-0.6B} and 3.75\% on \texttt{Qwen3-4B}. Further analyses on response length, log-probability distributions, and discriminator variants suggest that adversarial anchoring provides a stable semantic grounding signal for preference optimization. Code is available at \url{https://github.com/IsFaqq/AAPA}.
arXiv:2510.27353v2 Announce Type: replace Abstract: Recent studies have suggested that Large Language Models (LLMs) could provide interesting ideas contributing to mathematical discovery. This claim was motivated by reports that LLM-based genetic algorithms produced heuristics offering new insights into the online bin packing problem under uniform and Weibull distributions. In this work, we reassess this claim through a detailed analysis of the heuristics produced by LLMs, examining both their behavior and interpretability. Despite being human-readable, these heuristics remain largely opaque even to domain experts. Building on this analysis, we propose a new class of algorithms tailored to these specific bin packing instances. The derived algorithms are significantly simpler, more efficient, more interpretable, and more generalizable, suggesting that the considered instances are themselves relatively simple. We then discuss the limitations of the claim regarding LLMs' contribution to this problem, which appears to rest on the mistaken assumption that the instances had previously been studied. Our findings instead emphasize the need for rigorous validation and contextualization when assessing the scientific value of LLM-generated outputs.
arXiv:2605.05983v2 Announce Type: replace Abstract: Recently, steering vectors (SVs) have emerged as an effective and lightweight approach to steer behaviors of large language models (LLMs), among which fine-tuned SVs are more effective than optimization-free ones. However, current approaches to fine-tuned SVs suffer from two limitations. First, they require careful selection of steering factors on a per-SV basis to balance steering effectiveness and generation quality at inference time. Second, they operate as full-sequence SVs (FSSVs), which can sacrifice generation quality regardless of factor selection due to excessive intervention on the model generation process. To address the first limitation, we propose joint training of steering factors and directions, such that post-hoc factor selection is no longer required. Using neural network scaling theory, we find that moderately large initialization sizes and learning rates for steering factors are essential for stability and efficiency of joint training. To tackle the second limitation, we draw inspiration from representation fine-tuning and introduce Prompt-only SV (PrOSV), an SV that intervenes only on a few prompt tokens. Our empirical results show that PrOSV outperforms traditional FSSVs on AxBench when using our joint training scheme. We also find that PrOSV achieves a better tradeoff between general model utility and adversarial robustness than FSSV.
arXiv:2606.13779v1 Announce Type: new Abstract: Efficient simulation of open quantum systems is central to modeling noisy quantum hardware and many-body dynamics. In trajectory-based tensor network methods, cost is often associated with trajectory-level quantities such as entanglement growth or bond dimension. However, the total cost of a fixed-accuracy simulation also depends on statistical sampling, and the interplay between per-trajectory complexity and sampling effort remains poorly understood. Here we introduce a cost-resolved framework for matrix product state (MPS)-based quantum trajectory simulations that decomposes total cost into memory per trajectory, runtime per trajectory, and sampling effort. We show that physically equivalent stochastic unravelings of the same Lindblad dynamics do not necessarily reduce total cost, but instead redistribute cost between trajectory complexity and statistical convergence. This trade-off is quantified by two dimensionless inflation factors: a bond dimension inflation $\alpha$ and a sampling inflation $\kappa$, which together determine the preferred unraveling under hardware-dependent memory and parallelism constraints. We provide a practical protocol for extracting $(\alpha,\kappa)$ from modest pilot simulations and demonstrate it using benchmarks across multiple noise channels. The resulting decision maps show that the computationally favorable unraveling can change with noise strength, time-step resolution, system size, and available parallelism. These results establish unraveling choice as a hardware-aware simulation design problem rather than an intrinsic optimization of trajectory entanglement alone.
Foundation models have revolutionized AI, but adapting them efficiently for multimodal tasks, particularly in dual-stream architectures composed of unimodal encoders, such as DINO and BERT, remains a significant challenge. ParameterEfficient Fine-Tuning (PEFT) methods like LowRank Adaptation (LoRA) enable lightweight adaptation, yet they operate in isolation within each modality, limiting their ability in capturing cross-modal interactions. In this paper, we take a step in bridging this gap with Cross-Modal LowRank Adaptation (CoLA), a novel PEFT framework that extends LoRA by introducing a dedicated inter-modal adaptation pathway alongside the standard intra-modal one. This dual-path design enables CoLA to adapt unimodal foundation models to multimodal tasks effectively, without interference between modality-specific and crossmodal learning. We evaluate CoLA across a range of vision-language (RefCOCO, RefCOCO+, RefCOCOg) and audio-visual (AVE, AVS) benchmarks, where it consistently outperforms LORA, achieving a relative gain of around 3% and 2%, respectively, while maintaining parameter efficiency. Notably, CoLA enables the first multitask PEFT framework for visual grounding, bridging a key gap in efficient multimodal adaptation. Code is available at https://github.com/peterwisu/CoLA
arXiv:2305.04908v3 Announce Type: replace Abstract: Phase estimation, due to Kitaev [arXiv'95], is one of the most fundamental subroutines in quantum computing. In the basic scenario, one is given black-box access to a unitary $U$, and an eigenstate $\lvert \psi \rangle$ of $U$ with unknown eigenvalue $e^{i\theta}$, and the task is to estimate the eigenphase $\theta$ within $\pm\delta$, with high probability. The cost of an algorithm for us is the number of applications of $U$ and $U^{-1}$. We tightly characterize the cost of several variants of phase estimation where we are no longer given an eigenstate, but are required to estimate the maximum eigenphase of $U$, aided by advice in the form of states (or a unitary preparing those states) which are promised to have at least a certain overlap $\gamma$ with the top eigenspace. We give algorithms and nearly matching lower bounds for all ranges of parameters. We show that a small number of copies of the advice state (or of an advice-preparing unitary) are not significantly better than having no advice at all. We also show that having lots of advice (applications of the advice-preparing unitary) does not significantly reduce cost, and neither does knowledge of the eigenbasis of $U$. We immediately obtain a lower bound on the complexity of the Unitary recurrence time problem, resolving an open question of She and Yuen~[ITCS'23]. Lastly, we study how efficiently one can reduce the error probability in the basic phase-estimation scenario. We show that a phase-estimation algorithm with precision $\delta$ and error probability $\epsilon$ has cost $\Omega\left(\frac{1}{\delta}\log\frac{1}{\epsilon}\right)$, matching an easy upper bound. This contrasts with some other scenarios in quantum computing (e.g., search) where error-probability reduction costs only a factor $O(\sqrt{\log(1/\epsilon)})$. Our lower bound uses a variant of the polynomial method with trigonometric polynomials.
arXiv:2606.13568v1 Announce Type: new Abstract: Many important observables in physics and geometry are cup products of cochains. The adjusted cup product neural layer has been introduced in this paper. It is a neural primitive that hard wires the cup product with an adjustment term from higher gauge theory. This creates a readout that is gauge invariant by design. Their main theoretical result shows that on a closed cycle the output relies entirely on the adjustment coefficient. Setting this coefficient to zero removes the output completely regardless of other parameters. Thus the adjustment is the only source of gauge invariant signal. They prove this observable is a nonzero quadratic form and is exactly invariant under one and two gauge transformations.
Speech foundation models struggle with low-resource Pacific Indigenous languages because of severe data scarcity. Furthermore, full fine-tuning risks catastrophic forgetting. To address this gap, we present an empirical study adapting models to real-world Pacific datasets. We investigate the impact of data volume, adaptation strategies, and representational drift on speech foundation models for various Pacific languages. Additionally, we analyze a continual learning framework for sequential language acquisition. Empirical results across three distinct Pacific Indigenous languages demonstrate that adapting to these linguistically distant languages induces severe internal representational drift. Consequently, these models face a strict plasticity and stability dilemma. While LoRA adapts well initially, it suffers from catastrophic forgetting during sequential learning. Ultimately, this study highlights the urgent need for robust adaptation strategies tailored to underrepresented languages.
arXiv:2603.14483v2 Announce Type: replace Abstract: Parametric system identification methods estimate the parameters of explicitly defined physical systems from data. Yet, they remain constrained by the need to provide an explicit function space, typically through a predefined library of candidate functions chosen via available domain knowledge. In contrast, deep learning can demonstrably model systems of broad complexity with high fidelity, but black-box function approximation typically fails to yield explicit descriptive or disentangled representations revealing the structure of a system. We develop a novel identifiability theorem, leveraging causal representation learning, to uncover disentangled representations of system parameters without structural assumptions. We derive a graphical criterion specifying when system parameters can be uniquely disentangled from raw trajectory data, up to permutation and diffeomorphism. Crucially, our analysis demonstrates that global causal structures provide a lower bound on the disentanglement guarantees achievable when considering local state-dependent causal structures. We instantiate system parameter identification as a variational inference problem, leveraging a sparsity-regularised transformer to uncover state-dependent causal structures. We empirically validate our approach across four synthetic domains, demonstrating its ability to recover highly disentangled representations that baselines fail to recover. Corroborating our theoretical analysis, our results confirm that enforcing local causal structure is often necessary for full identifiability.
arXiv:2509.26494v3 Announce Type: replace Abstract: Oracle-based quantum algorithms require coherent evaluation of classical functions on superposed inputs, and in fault-tolerant architectures this cost is dominated by non-Clifford gates: generic lookup constructions incur $T$-counts that grow with the data size. Here we show that affine Boolean functions $f(\mathbf{x})=A\mathbf{x}+\mathbf{b}$ over $\mathbb{F}_2$ – the algebraic core of parity checks, linear feedback shift registers, and cipher linear layers – are exactly the functions admitting computational-basis-preserving Clifford oracles, and we develop this correspondence into Stab-QRAM, a compiler mapping a specification $(A,\mathbf{b})$ to an ancilla-free circuit of CNOT and $X$ gates with zero $T$-count. Via K\"{o}nig's edge-coloring theorem, the compiled schedule provably attains the minimum depth for its gate set. Case studies spanning Simon-type oracles, block-encodings of $X$-type coset operators, and syndrome extraction for CSS codes show one compiler serving the algorithm, primitive, and error-correction layers of the quantum stack.
Online hate speech and misinformation frequently overlap, yet NLP research has mainly treated them in isolation. While LLMs represent a scalable solution for assisting humans in the generation of counterspeech for both threats, zero-shot models frequently generate repetitive and vague responses, underscoring the need for high-quality examples to steer model generation. However, existing counterspeech datasets against the overlap of hate and misinformation are scarce and limited to single-turn English dialogues, while real-life interactions span across multiple turns and languages. To bridge this gap, we introduce the first large-scale, expert-curated, multilingual dataset of dialogues tackling the intersection of hate and misinformation. To ensure factual grounding, the dialogues are also anchored in verified external knowledge (i.e., fact-checking articles and NGO reports) and include document- and chunk-level span annotations, making it directly applicable for RAG systems. Covering five languages and targeting hate directed at seven marginalized groups, this novel resource enables the training and evaluation of more persuasive, factually grounded counterspeech models.
Psychological instruments designed for humans are increasingly used to assign large language models (LLMs) stable psychological profiles that affect their usability, safety assessment, and use as proxies for human participants in research. Using a formal psychometric framework, we show that these profiles are largely a measurement artifact. Administering a battery of personality and risk-preference instruments spanning self-reports and behavioral tasks to 56 instruction-tuned LLMs alongside large human reference samples, we report four findings. First, differences between models are driven not by the traits an instrument targets but by a directional response bias, a tendency to respond toward one end of the scale, or one labeled option, regardless of item content; a variance decomposition attributes 81-90% of between-model variation to this bias, against 9-16% in humans. Second, the bias declines with model capability but is not eliminated by it. Third, because bias rather than trait drives responding, an instrument's apparent reliability is almost entirely predicted by its response orthogonality, a term we coin for the proportion of items for which trait and bias point in opposite directions. Fourth, the profile a model appears to have shifts with the items used and can be manufactured through item selection. These results demonstrate that the apparent psychological profiles of LLMs are artifacts of the instrument used to measure them, not properties of the models themselves. As instruments borrowed from human psychology are rarely fully orthogonal and may inherently lack validity for LLMs, we call for dedicated assessments centered on response orthogonality.
arXiv:2606.25410v1 Announce Type: new Abstract: Machine unlearning is an emerging domain that ensures the safe removal of elements (includes concepts, attributes, entity and class) from the trained model along with least drop in model performance. The domain of machine unlearning brings its own indigenous challenges since the removal of pre-trained elements from model will always degrade the model performance on remaining elements. The existing methods basically rely on retraining for removal of elements from the pre-trained model, which is compute extensive. In this work, we propose a machine unlearning method which helps to reduce the computational requirement for faster retain-dataset accuracy convergence which also does not require extensive retraining of the pre-trained model. The proposed method, Data-Frugal Machine Unlearning (DFMU) requires only a single forward and backward pass for computing the importance score of various computational blocks of a model. The importance score computation is based on knowledge preserving pruning which helps to converge faster and requires far less data as compared to the existing methods. Experimentally, it achieves 40% more retain-accuracy with just 13% of data samples in comparison with SOTA method on various public datasets and also averages 88% faster processing time for forgetting a given class.
Diffuse midline gliomas (DMGs) are near-universally lethal tumours of the childhood central nervous system1,2. In animal models, DMGs form brain-wide integrated networks through neuron-to-glioma synapses3–6 and glioma-to-glioma gap junctional coupling3. This extensive connectivity robustly promotes the growth and invasion of DMG3–9 and other glial malignancies10–12 through paracrine mechanisms and direct neuron-to-glioma synapses. However, the organization and clinical implications of these connections in the living human brain remain to be elucidated. Here, we develop tumour network mapping to compute the brain-wide connectivity profile of DMG, defining a conserved brain network across pontine and thalamic DMG associated with patient short-term survival (DMG network). Tumour functional connectivity with the DMG network was independently predictive of patient overall survival across two external validation cohorts. Tumour growth mapped to DMG network-specific trajectories and peak in-network neurometabolic changes across development spatiotemporally aligned with the peak age incidence of DMG. Analyses of single-nucleus RNA sequencing data confirmed diverse synaptic gene enrichment in high-connectivity DMG. Strikingly, incidental surgical resection of high-connectivity thalamic DMG tissue conferred a significant survival advantage. Collectively, these data define a conserved and prognostically important brain network in children with DMG, consistent with the hypothesis that DMGs exploit otherwise healthy brain circuits to promote tumour growth. Tumour network mapping of diffuse midline glioma (DMG) defines a conserved and prognostically important brain network in children with DMG, consistent with the hypothesis that DMGs exploit otherwise healthy brain circuits to promote tumour growth.
Unified models can handle both multimodal understanding and generation within a single architecture, yet they typically operate in a single pass without iteratively refining their outputs. Many multimodal tasks, especially those involving complex spatial compositions, multiple interacting objects, or evolving instructions, require decomposing instructions, verifying intermediate results, and making iterative corrections. While test-time scaling (TTS) has demonstrated that allocating additional inference compute for iterative reasoning substantially improves language model performance, extending this paradigm to unified multimodal models remains an open challenge. We introduce UniT, a framework for multimodal chain-of-thought test-time scaling that enables a single unified model to reason, verify, and refine across multiple rounds. UniT combines agentic data synthesis, unified model training, and flexible test-time inference to elicit cognitive behaviors including verification, subgoal decomposition, and content memory. Our key findings are: (1) unified models trained on short reasoning trajectories generalize to longer inference chains at test time; (2) sequential chain-of-thought reasoning provides a more scalable and compute-efficient TTS strategy than parallel sampling; (3) training on generation and editing trajectories improves out-of-distribution visual reasoning. These results establish multimodal test-time scaling as an effective paradigm for advancing both generation and understanding in unified models.