Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-18

Depth-Width tradeoffs in Algorithmic Reasoning of Graph Tasks with Transformers

Transformers have revolutionized the field of machine learning. In particular, they can be used to solve complex algorithmic problems, including graph-based tasks. In such algorithmic tasks a key question is what is the minimal size of a transformer that can implement the task. Recent work has begun to explore this problem for graph-based tasks, showing that for sub-linear embedding dimension (i.e., model width) logarithmic depth suffices. However, an open question, which we address here, is what happens if width is allowed to grow linearly, while depth is kept fixed. Here we analyze this setting, and provide the surprising result that with linear width, constant depth suffices for solving a host of graph-based problems. This suggests that a moderate increase in width can allow much shallower models, which are advantageous in terms of inference and train time. For other problems, we show that quadratic width is required. Our results demonstrate the complex and intriguing landscape of transformer implementations of graph-based algorithms. We empirically investigate these trade-offs between the relative powers of depth and width and find tasks where wider models have the same accuracy as deep models, while having much faster train and inference time due to parallelizable hardware.

02.
arXiv (CS.CV) 2026-06-17

CASR: A Robust Cyclic Framework for Arbitrary Large-Scale Super-Resolution with Distribution Alignment and Self-Similarity Awareness

Arbitrary-Scale SR (ASISR) remains fundamentally limited by cross-scale distribution shift: once the inference scale leaves the training range, noise, blur, and artifacts accumulate sharply. We revisit this challenge from a cross-scale distribution transition perspective and propose CASR, a simple yet highly efficient cyclic SR framework that reformulates ultra-magnification as a sequence of in-distribution scale transitions. This design ensures stable inference at arbitrary scales while requiring only a single model. CASR tackles two major bottlenecks: distribution drift across iterations and patch-wise diffusion inconsistencies. The proposed SSAM module aligns structural distributions via superpixel aggregation, preventing error accumulation, while SARM module restores high-frequency textures by enforcing correlation-guided consistency and preserving self-similarity structure through correlation alignment. Despite using only a single model, our approach significantly reduces distribution drift, preserves long-range texture consistency, and achieves superior generalization even at extreme magnification.

03.
arXiv (CS.AI) 2026-06-16

A Learning Method with Gap-Aware Generation for Heterogeneous DAG Scheduling

arXiv:2603.23249v2 Announce Type: replace-cross Abstract: Efficient scheduling of directed acyclic graphs (DAGs) is a core problem in large-scale data-intensive computing systems, where query plans, data-processing workloads, and computation graphs consist of dependent tasks competing for limited heterogeneous resource pools. In practice, achieving high-performance execution requires schedulers to adapt across environments with varying resource pools and task types, while generating schedules under tight runtime budgets. We propose WeCAN, an end-to-end reinforcement learning framework for heterogeneous DAG scheduling that addresses task-pool compatibility coefficients and generation-induced optimality gaps. It adopts a two-stage single-pass design: a single forward pass produces task-pool scores and global parameters, followed by a generation map that constructs schedules without repeated network calls. Its weighted cross-attention encoder models task-pool interactions gated by compatibility coefficients, and is size-agnostic to environment fluctuations. Moreover, widely used list-scheduling maps can incur generation-induced optimality gaps from restricted reachability. We introduce an order-space analysis that characterizes the reachable set of generation maps via feasible schedule orders, explains the mechanism behind generation-induced gaps, and yields sufficient conditions for gap elimination. Guided by these conditions, we design a skip-extended realization with an analytically parameterized decreasing skip rule, which enlarges the reachable order set while preserving single-pass efficiency. Experiments on real-world TPC-H query DAGs, resource-intensive workload datasets, and ML-compiler computation graphs demonstrate improved makespan over strong baselines, with inference time comparable to classical heuristics and faster than multi-round neural schedulers.

04.
arXiv (CS.CV) 2026-06-19

Composed Object Retrieval: Object-level Retrieval via Composed Expressions

Retrieving fine-grained visual content based on user intent remains a challenge in multimodal systems. Although current Composed Image Retrieval (CIR) methods combine reference images with retrieval texts, they are constrained to image-level matching and cannot localize specific objects. To this end, we propose Composed Object Retrieval (COR), a new object-level retrieval task that retrieves target object(s) from candidate objects in a target image and grounds the retrieved result with pixel-level masks. Given a reference object, its mask, a target image, and a retrieval text describing the desired modification, COR requires models to perform composed visual-textual reasoning rather than relying on explicit category names. This setting introduces several challenges, including fine-grained compositional matching, negative-object filtering under visually similar distractors, and flexible single- or multi-object retrieval. We construct COR125K, the first large-scale COR benchmark, containing 125,541 retrieval triplets across 408 categories with base/novel splits for evaluating category-level generalization. We also present CORE, a unified end-to-end model that integrates reference region encoding, adaptive vision-text interaction, and region-level contrastive learning to align composed representations with target objects while suppressing background and distractors. Extensive experiments demonstrate that CORE significantly outperforms existing CIR-based pipelines and strong baselines in both base and novel categories, establishing a simple and effective foundation for fine-grained object-level multimodal retrieval. Code will be released publicly at https://github.com/wangtong627/COR.

05.
arXiv (CS.CV) 2026-06-11

MARIC: Multi-Agent Reasoning for Image Classification

Image classification has traditionally relied on parameter-intensive model training, requiring large-scale annotated datasets and extensive fine tuning to achieve competitive performance. While recent vision language models (VLMs) alleviate some of these constraints, they remain limited by their reliance on single pass representations, often failing to capture complementary aspects of visual content. In this paper, we introduce Multi Agent based Reasoning for Image Classification (MARIC), a multi agent framework that reformulates image classification as a collaborative reasoning process. MARIC first utilizes an Outliner Agent to analyze the global theme of the image and generate targeted prompts. Based on these prompts, three Aspect Agents extract fine grained descriptions along distinct visual dimensions. Finally, a Reasoning Agent synthesizes these complementary outputs through integrated reflection step, producing a unified representation for classification. By explicitly decomposing the task into multiple perspectives and encouraging reflective synthesis, MARIC mitigates the shortcomings of both parameter-heavy training and monolithic VLM reasoning. Experiments on 4 diverse image classification benchmark datasets demonstrate that MARIC significantly outperforms baselines, highlighting the effectiveness of multi-agent visual reasoning for robust and interpretable image classification.

06.
arXiv (CS.LG) 2026-06-15

A Longitudinal Attribute-Conditioned Neural Network for Modeling Health-State Transition Probabilities in Temporally Irregular Data: The LANTERN Framework

arXiv:2606.13880v1 Announce Type: new Abstract: Accurate estimation of long-term care transition probabilities is central to disability insurance pricing, reserving, and solvency assessment. Classical actuarial multi-state models commonly rely on Markov, semi-Markov, or proportional-hazard specifications, which provide a direct connection to cohort projection but may be restrictive for irregular longitudinal health data with nonlinear aging patterns and heterogeneous covariate histories. This paper develops a well-calibrated estimator of multi-state transition probabilities for irregular longitudinal health data. The model learns from individual health history, incorporates the time elapsed between observations, and conditions transition probabilities on demographic and socioeconomic attributes. It produces a valid probability distribution over the next observed health state, with four possible states: healthy, mild disability, severe disability, and death. Individual probabilities are aggregated by age group and origin state to form transition matrices compatible with actuarial cohort projection. Using longitudinal data from the Health and Retirement Study, we compare the proposed estimator with logistic regression, gradient-boosted trees, a recurrent neural network, and a last-state persistence benchmark. The evaluation considers probabilistic accuracy, endpoint discrimination and calibration for severe disability and death, risk concentration, and transition matrix error after aggregation. The proposed estimator improves severe disability discrimination relative to logistic regression and gradient-boosted tree benchmarks, maintains strong calibration, and yields the lowest transition matrix error among the evaluated models in the held-out test analysis. Results show that a structured machine learning estimator can support long-term care transition modeling when judged by calibration and projection fidelity, beyond discrimination.

07.
arXiv (CS.CL) 2026-06-17

Correct When Paired, Wrong When Split: Decoupling and Editing Modality-Specific Neurons in MLLMs

Although Knowledge Editing provides an efficient mechanism for updating the knowledge of Multimodal Large Language Models (MLLMs), we find that current paradigms still suffer from an important yet remain underexplored issue : editing decoupling failure, where entity-related knowledge can be updated when the model is triggered by multimodal inputs (text–image query pairs), however, it often reverts to outdated pre-edit facts when the paired inputs are split into unimodal ones. Our in-depth empirical analysis reveals that the entity knowledge in MLLMs is not stored as a unified representation, but is instead distributed across disentangled modality-specific pathways. As a result, updates biased toward multimodal queries fail to propagate effectively to unimodal circuits. To bridge this gap, we propose DECODE, which explicitly disentangles and localizes modality-specific neuron groups for targeted knowledge. Extensive experiments demonstrate that DECODE consistently achieves effective knowledge updates under different modality triggers, thereby mitigating editing decoupling failures.

08.
arXiv (CS.CV) 2026-06-18

Pyramid Self-Contrastive Learning for Single-shot Test-time Ultrasound Image Denoising

The inherent electronic and speckle noise complicates clinical interpretation of ultrasound images. Conventional denoising methods rely on explicit noise assumptions whose validity diminishes under composite noise conditions. Learning-based methods are usually pretrained in a limited image domain using a labeled dataset, which implies inevitable domain shift in complex in vivo environments. This study proposes a Pyramid Self-Contrastive Learning (PSCL) framework for test-time ultrasound image denoising without pretraining. Given multiple noisy samples from only one-shot imaging, PSCL disentangles anatomical similarity and noise randomness into separate pyramid latent spaces. The clean image is then decoded from the anatomy space while discarding the noise space. We first apply PSCL to synthetic aperture ultrasound (SAU), where an Aperture-to-Aperture loop serves as a self-supervised proxy task to ensure denoising fidelity. Simulation experiments, including noise levels from 0 to 30 dB and inclusion geometries from simple to complex, demonstrated improvements of 69.3% in SNR and 34.4% in CNR. The in vivo results showed 84.8% SNR and 25.7% CNR gains using only two aperture data of the heart in six echocardiographic views, liver, and kidney. PSCL delivers clear images across diverse imaging targets and configurations, paving the way for more reliable anatomical visualization without domain shift and pretraining costs.

09.
arXiv (CS.CV) 2026-06-19

Geometry-Preserving in 3D Gaussian Splatting for LiDAR-Camera Extrinsic Calibration

Accurate LiDAR-camera calibration is essential for robust multi-modal perception. Targetless approaches avoid manual setup but remain limited by the scarcity of discriminative cross-modal features. Recent methods address this by reconstructing the scene within a differentiable model, enabling extrinsic optimization through dense photometric supervision. Among these, 3D Gaussian Splatting (3DGS) has been widely adopted as a geometric proxy that bridges LiDAR and camera within a single differentiable framework. However, since 3DGS was originally designed for novel view synthesis, existing methods tend to prioritize rendering quality, causing the proxy geometry to drift from the true LiDAR structure. We propose a framework that preserves the metric geometry of the Gaussian proxy by aggregating multi-view LiDAR observations for dense depth supervision and blocking photometric gradients from updating the Gaussian spatial parameters. We validate our method on public driving datasets, where it consistently outperforms existing targetless methods in calibration accuracy.

10.
arXiv (math.PR) 2026-06-15

Hierarchical symmetry selects log-Poisson cascades: classification, uniqueness, and stability

arXiv:2604.01632v2 Announce Type: replace Abstract: Within i.i.d. multiplicative cascades, a single axiom – the hierarchical symmetry, a linear contraction on incremental scaling exponents – is shown to be necessary and sufficient for the cascade multiplier to be log-Poisson. We prove: (1) a characterization theorem determining the log-Poisson law with explicit parameters, within the class of all multipliers with finite lattice moments; (2) a classification theorem locating the log-Poisson class inside the log-infinitely-divisible family and identifying the mechanism by which every rival sub-family fails the symmetry; (3) a stability theorem with sharp constants – $(1+\beta)^{1/2}$ when the limiting increment is known, $\sqrt{2}$ when it is fitted – and (4) an unconditional propagation theorem transferring the bound to the multiplier distribution at the sharp rate $\Theta(\sqrt{\varepsilon})$, with a matching lower bound. Beyond independence, the classification extends exactly at the level of asymptotic statistics (limiting cumulant generating function, large deviations, multifractal spectrum) and provably not at the level of laws: an explicit stationary ergodic Markov multiplier satisfies the symmetry exactly with a non-log-Poisson marginal, while exchangeable multipliers collapse to the i.i.d. log-Poisson cascade and finite-state Markov multipliers cannot satisfy the symmetry at all. In the continuous category of exactly scale-invariant log-infinitely-divisible multifractal random measures, no finite moment window of structure-function exponents identifies the cascade class, whereas at the level of the scale-invariance generator the symmetry selects exactly the Barral-Mandelbrot compound Poisson cascade, with scale-ratio-free stability constants. The proofs reduce to second-moment identities on [0,1] via the change of variables $u = e^{kx}$, boundedness of the multiplier, and multiplicative couplings.

11.
arXiv (quant-ph) 2026-06-24

Altermagnet-Superconductor Heterostructure: a Scalable Platform for Braiding of Majorana Modes

arXiv:2506.08095v2 Announce Type: replace-cross Abstract: Topological quantum computation, featuring qubits built out of anyonic excitations known as Majorana zero modes (MZMs), have long presented an exciting pathway towards scalable quantum computation. Recently, the advent of altermagnetic materials has presented a new pathway towards localized MZMs on the boundary of two-dimensional materials, consisting of an altermagnetic film, subject to a superconducting proximity effect from a superconducting substrate. In this work, we demonstrate the possibility for an altermagnet-superconductor heterostructure, to not only harbor MZMs, but also freely manipulate their position along the topological boundary of the material, via rotation of the Néel vector. Using this mechanism, on a square platform, we utilize a time-dependent method to simulate the Z-gate via braiding, and then extend this to a larger H-junction, where we implement the $\sqrt{X}$ and $\sqrt{Z}$ gate on a single-qubit system. Further, this structure is eminently scalable to many-qubit systems, thus providing the essential ingredients towards universal quantum computation.

12.
arXiv (quant-ph) 2026-06-24

Monitoring Beam Splitter Entanglement using Quantumness

arXiv:2606.24242v1 Announce Type: new Abstract: We report on an experiment in which two independent squeezed vacuum states get entangled by mixing them with a balanced beam splitter. We follow standard practice and use an inseparability criterion to quantify their entanglement. However, this only allows us to witness the entanglement, but not to determine the deleterious effects of experimental imperfections due to the beam splitter mixing and the associated mode-mismatch and detection imperfections. We therefore introduce an alternative framework suitable for continuous variable systems using the states' quantumness, $\Xi$. We show that, under ideal circumstances, $\Xi$ is a conserved quantity under beam mixing. This allows us to benchmark the experiment's performance by comparing the states' quantumness $\Xi$ after the beam splitter mixing with $\Xi$ before. Such a comparison is not possible with entanglement witnesses, as the input states are unentangled. This highlights the main strength of our approach: its ability to generally quantify the quantumness of multi-mode continuous variable states and use this to probe different stages in an experiment.

13.
arXiv (quant-ph) 2026-06-17

Emergent de Sitter Space and Non-Unitary Tensor Networks from Non-Hermitian Quantum Criticality

arXiv:2606.17983v1 Announce Type: new Abstract: Extending the holographic principle to de Sitter (dS) spacetimes remains one of the most vital open frontiers in quantum gravity, where a microscopic, bottom-up tensor-network framework that relates boundary quantum data to emergent de Sitter spacetime is still lacking. In this work, we first show the emergence of de Sitter spacetime from boundary entanglement by formulating a non-unitary continuous multi-scale entanglement renormalization ansatz (cMERA) for a concrete non-Hermitian critical fermion chain. Within this emergent spacetime, we analyze the associated geodesics and show that they act as extremal Ryu-Takayanagi (RT) surfaces undergoing a smooth timelike-to-null transition. Remarkably, we demonstrate that this continuum trajectory dictates a distinct tensor-network architecture in which the bond-counting contribution naturally truncates at the discrete timelike-to-null transition toward the deep infrared. In the resulting architecture, the null ray along the horizon is represented by zero-cost links, since the associated cut severs no tensor legs. This network structure successfully reproduces the logarithmic scaling of non-unitary critical entanglement entropy, offering a bond-counting picture for the de Sitter RT formula. Our results provide the long-sought dS/(c)MERA correspondence at the level of both emergent spacetime and discrete holographic entanglement.

14.
arXiv (CS.LG) 2026-06-15

Classification of Astronomical Spectra Using PCA-Compressed Flux and Inverse-Variance Features

arXiv:2606.13978v1 Announce Type: cross Abstract: This paper evaluates a signal-processing and supervised-learning pipeline for classifying SDSS DR17 astronomical spectra into stars, galaxies, and quasars. Each spectrum is represented by its measured flux and inverse-variance information, combining spectral shape with a wavelength-dependent reliability profile. After resampling onto a common logarithmic wavelength grid, the flux and inverse-variance vectors are standardized and separately compressed using principal component analysis. The resulting components are concatenated and used to train several classifiers. The best performance was obtained with the LightGBM gradient-boosting classifier, reaching $94.6\%$ accuracy and $92.1\%$ balanced accuracy on the test set.

15.
arXiv (CS.CV) 2026-06-16

VigilFormer: Deformable Attention for Video Anomaly Detection with Causal Risk Inference

Authors:

Video anomaly detection in surveillance settings must balance detection accuracy against real-time throughput, a tension that existing methods address either through stronger feature extractors or more efficient architectures, but rarely both. We present VigilFormer, a unified framework that combines deformable spatio-temporal attention with causal temporal modeling to detect anomalies in untrimmed surveillance video. The proposed Deformable Spatio-Temporal Encoder (DSTE) attends to a sparse set of informative locations across frames, avoiding the quadratic cost of dense attention while retaining the ability to capture irregular motion patterns. A Causal Anomaly Classifier (CAC) applies dilated causal convolutions over snippet-level features and optimizes a contrastive multiple-instance learning objective that separates anomalous and normal representations without frame-level labels. To meet deployment constraints, an Adaptive Confidence Scheduler (ACS) dynamically skips low-information frames at inference time, reducing redundant computation in static scenes. Evaluated on UCF-Crime, ShanghaiTech, and CUHK Avenue, VigilFormer achieves AUC scores of 87.83%, 97.21%, and 89.74% respectively, at 41.5 FPS on a single GPU, outperforming recent weakly-supervised methods in both accuracy and speed.

16.
arXiv (CS.AI) 2026-06-15

Can Editing 1 Neuron Fix Repetition Loops in LLMs?

arXiv:2606.13705v1 Announce Type: cross Abstract: Yes. Can it cure doom loops? Probably not. The Gemma 4 instruction-tuned models share a reproducible failure: on long factual enumeration prompts, such as listing every episode of a TV series, the 88 IAU constellations, or the 151 original Pokemon, they collapse into repetition, either a tight verbatim loop or a list whose entries decay onto a single answer. These loops occur at rates as high as 95% and survive prompt rewording, inference-engine changes, and most sampling adjustments. In this paper we explore whether this behavior is localized enough to remove by weight edits. To localize the cause, we use per-layer ablation and per-neuron attribution, then confirm the strongest candidates with full-generation sweeps. The loops trace to a small set of MLP neurons (or, in the 26B-A4B Mixture-of-Experts model, a few routed experts) which we suppress with static weight edits. These "surgeries" can be as small as a single sign-inverted neuron (in the E2B model). The size of the effective edits grows with model scale, but in all cases, the loop patterns can be addressed at normal generation budgets while preserving general-purpose benchmark scores. However, the edits do not solve everything: we also study longer thinking budgets, where the two larger models most visibly enter doom looping, i.e. a non-convergent regime in which the model self-corrects in circles over a fact it cannot recall, exhausting the budget without committing to a final answer. We show this residual failure is reduced but not eliminated by the same edits, and argue it is fundamentally a knowledge-precision problem rather than a removable circuit; weight surgery can delete a loop, but it cannot supply a missing fact. Our results are both a feasibility demonstration, that is, evidence that a concrete generation pathology can be localized to a few parameters and edited out, and a delineation of where that approach stops.

17.
medRxiv (Medicine) 2026-06-24

Cardiologists perspectives on sociocultural and structural factors shaping cardiovascular genetic testing

Introduction: Genetic testing is increasingly central to the diagnosis and management of cardiovascular genetic conditions. However, use and follow-through vary across patient populations. Examining clinician perspectives on sociocultural and structural factors influencing testing is important for understanding these differences and informing public health genomics research and implementation efforts. Methods: We conducted semi-structured interviews with 15 cardiologists from health systems across the United States who have integrated cardiogenetics in their practice. Interviews explored experiences diagnosing cardiovascular genetic conditions among patients from underrepresented backgrounds, as well as approaches to incorporating social and contextual information into care. Data were coded thematically and analyzed using a framework analysis guided by the Health Equity Implementation Framework and Social Determinants of Health domains. Results: Clinicians described multi-level factors shaping genetic testing practices, including patient-provider interactions, clinical workflows, health system infrastructure, and broader policy contexts. Key themes included challenges communicating complex genetic information across language and literacy differences; patient trust shaped by prior healthcare experiences; fragmented insurance coverage separating genetic testing from genetic counseling; and challenges interpreting variants of uncertain significance, particularly for populations underrepresented in genomic reference databases. Clinicians also described adaptive strategies, such as interdisciplinary collaboration, telehealth, and patient assistance programs, that supported testing in some settings but were often inconsistent or resource-dependent. Conclusion: Among cardiologists using genetic testing, system-level and sociocultural factors shape the feasibility and downstream use of cardiovascular genetic testing. Findings highlight considerations for public health-informed genomic infrastructure that accounts for social context, supports communication, and reduces reliance on individual clinician workarounds, with implications for clinical decision support and related public health genomics initiatives.

18.
arXiv (CS.CV) 2026-06-16

TurboGS: Accelerating 3D Gaussian Splatting via Error-Guided Sparse Pixel Sampling and Optimization

Consumer-level applications require fast optimization of 3D Gaussian Splatting (3DGS) with high-fidelity novel view rendering. However, existing 3DGS acceleration approaches still incur substantial computation on redundant pixels while sacrificing fine details. In this paper, we present TurboGS, an error-guided training framework that accelerates 3DGS by concentrating optimization on perceptually informative pixels. TurboGS is built upon four core components: (1) a tile-wise sparse pixel sampling, which, driven by multi-view reconstruction errors during training, prioritizes challenging regions and skips well-reconstructed ones to avoid redundant gradient computation; (2) a tile-wise structure-aware loss with sparse Normalized Cross-Correlation, which provides sparse yet effective supervision to preserve fine details and stabilize training; (3) an error-driven Gaussian density control strategy, which dynamically allocates model capacity and removes redundant primitives; and (4) a tailored hybrid optimizer that couples Hessian-informed updates with Adam moment damping to stabilize and improve convergence under sparse supervision. Experiments on standard benchmarks demonstrate that TurboGS can deliver on par or superior rendering quality within 100 seconds on a single RTX 5090 GPU card (up to 10x training speedup over vanilla 3DGS).

19.
arXiv (CS.LG) 2026-06-16

Distilling latent electrostatics from foundation machine learning interatomic potentials

arXiv:2606.15001v1 Announce Type: cross Abstract: Foundation machine learning interatomic potentials (MLIPs) have enabled atomistic simulations across broad regions of chemical and materials space, but many remain computationally expensive and lack explicit electrostatics, limiting their use for systems governed by long-range interactions and electrical response. Previously, we introduced Latent Ewald Summation (LES), which learns latent atomic charges and long-range electrostatics from density functional theory (DFT) energy and force labels alone. Here, we use LES to extract electrostatics that are latent in foundation models: energies and forces predicted by a teacher model are used to train a lightweight LES-augmented student MLIP, with optional fine-tuning on additional DFT data. The resulting models reduce computational cost while providing access to Born effective charge tensors, and infrared spectra. We benchmark student models distilled from a broad set of foundation MLIPs, including UMA, MACE, Orb, eSEN, GemNet-OC, PET, and EquiformerV2-based models, against experimental infrared spectra for liquid water, concentrated hydrochloric acid, and the anatase TiO2(101)-water interface. Across these systems, electrostatic response can be extracted from most foundation MLIPs. The benchmark further shows that the underlying DFT level and dataset used to train the teacher model play a larger role than architecture in determining electrostatic and spectroscopic accuracy. For the TiO2-water interface, fine-tuning with a modest amount of higher-level DFT data improves structural and infrared predictions. LES-based distillation therefore provides a practical route for converting foundation MLIPs into efficient, electrically responsive models, while also testing the physical fidelity encoded in foundation models.

21.
Nature (Science) 2026-06-08

Fifty years since a simple equation described the chaos of biology

An exploration of chaos theory in population dynamics showed that unpredictable systems can often be modelled using surprisingly simple mathematics. An exploration of chaos theory in population dynamics showed that unpredictable systems can often be modelled using surprisingly simple mathematics.

22.
arXiv (CS.AI) 2026-06-19

Physical Atari: A Robust and Accessible Platform for Real-time Reinforcement Learning on Robots

arXiv:2606.19357v1 Announce Type: cross Abstract: We built a robot called the Robotroller that actuates an Atari CX40+ controller and a device called the Atari Devbox that renders the game frame and the reward signal from the Arcade Learning Environment on a screen. The Robotroller and the Atari Devbox, together with an off-the-shelf camera and a desktop computer, constitute a system that can be used to study reinforcement learning algorithms in the physical world. We call the full system Physical Atari. In this paper, we detail the key decisions that make Physical Atari a robust and accessible platform. To make the system robust, we designed the Robotroller so that all movement is done through bearings, which reduces wear. Additionally, we wrote software that monitors the state of the servos at a high frequency and intervenes to limit stress. To make the system accessible, we used affordable off-the-shelf components and parts that can be manufactured using consumer 3D printers. Physical Atari can be built for under $1,000 and has been used for weeks of non-stop reinforcement learning experiments without any mechanical failures. We used it to validate that reinforcement learning algorithms can learn directly on robots and show that even small distribution shifts between learning and deployment can significantly degrade the performance of policies. Our results underscore the importance of on-device adaptation for strong performance on robots.

23.
arXiv (CS.CV) 2026-06-19

Shape of Thought: Progressive Object Assembly via Visual Chain-of-Thought

Multimodal models for text-to-image generation have achieved strong visual fidelity, yet they remain brittle under compositional structural constraints, notably generative numeracy, attribute binding, and part-level relations. To address these challenges, we propose Shape-of-Thought (SoT), a visual CoT framework for process-supervised progressive shape assembly in the rendered 2D domain, without external engines at inference time. SoT trains a unified multimodal autoregressive model to generate interleaved textual plans and rendered intermediate states, helping the model capture shape-assembly logic without producing explicit geometric representations. Unlike text-only CoT, each decision is grounded in a rendered state, making counts, attachments, topology, and intermediate part-addition errors inspectable across the trajectory. To support this paradigm, we introduce SoT-26K, a large-scale dataset of grounded assembly traces derived from part-based CAD hierarchies, and T2S-CompBench, a benchmark for evaluating structural integrity and trace faithfulness. Fine-tuning on SoT-26K achieves 88.4% on component numeracy and 84.8% on structural topology, outperforming direct generation by +24.2 points on component numeracy and +19.3 points on structural topology. SoT establishes a transparent testbed for rendered-domain structure-aware generation. The code is available at https://github.com/yuhuo03/Shape-of-Thought.

24.
arXiv (CS.AI) 2026-06-24

Integrated Sensing and Communications for Real-time Avatar Control in XR over 5G

arXiv:2606.23771v1 Announce Type: cross Abstract: Extended Reality (XR) presents a challenging use case for 5G and 6G networks, requiring high data-rates and lowlatency communication to deliver a truly immersive experience. Moreover, in order to seamlessly translate physical actions to the virtual world, accurate gesture recognition and pose estimation are required. Current XR interaction solutions based on handheld controllers and cameras cannot easily capture full-body poses, inhibit the free use of hands, and require good visibility and a clear line of sight. In this work, we propose a multimodal sensing architecture for XR that combines 5G MillimeterWave (mmWave) Integrated sensing and communication (ISAC) and surface electromyography (sEMG) signals. 5G mmWave ISAC cannot only be used to deliver content wirelessly to the Head-mounted display (HMD), but also the same communication signals can be used to derive coarse body-level gestures and poses of the user, to support real-time avatar control. For fine-grained finger-level gestures, our architecture leverages lightweight sEMG sensors that capture forearm muscle activity. To illustrate the need of both modalities, we present evaluations of both sensing technologies. At the body level (5G), our architecture relies on power-per-beam-pair (PPBP), which can be computed from standard beam management or beam sweeping procedures of the 5G NR standard. PPBP-based sensing achieves 82.2$\pm$5.9% average accuracy when evaluated on users not seen during training. For fine-grained finger-level interactions, we show that surface electromyography (sEMG) carries strong discriminative information achieving consistent promising performance across different movement settings. Thus, combining the two modalities enables multi-scale gesture recognition, at the body level via existing 5G signals and finger level via lightweight sEMG sensors, forming a complete XR framework.

25.
arXiv (CS.AI) 2026-06-18

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

arXiv:2606.18304v1 Announce Type: cross Abstract: Mixture-of-Experts (MoE) models scale compute efficiently, yet remain expensive to deploy due to their substantial memory footprint and inference overhead. Prior compression methods mainly operate at the expert level, either removing entire experts or ranking experts by coarse-grained importance scores. However, such expert-wise decisions are often too coarse to capture fine-grained redundancy, leading to misallocated pruning budgets and limited compression. To address this problem, we observe that information within MoE experts is highly concentrated in a small subset of channels, leaving substantial redundancy even in experts deemed important. Based on this observation, we propose a structural pruning framework tailored for MoE models. Our method reformulates prune-ratio allocation as a channel-score coverage maximization problem and solves it efficiently using an attribution-based approximation. Experiments on DeepSeek and Qwen MoE models show that our method preserves model accuracy under 50% or 25% structured pruning when combined with 4-bit quantization. On Qwen3-30B-A3B, our approach reduces memory footprint by 5.27$\times$ and consistently outperforms state-of-the-art baselines across diverse benchmarks.