Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

CRC-Screen: Certified DNA-Synthesis Hazard Screening Under Taxonomic Shift

作者:

arXiv:2605.00074v2 Announce Type: replace-cross Abstract: DNA-synthesis providers screen incoming orders by searching the requested sequence against curated hazard lists. We show that this baseline collapses to a 100% false-flag rate when the hazardous sequence comes from a taxonomic family absent from the reference set: under Conformal Risk Control's certified miss-rate constraint, a low-discrimination signal forces the threshold below the entire test-benign mass. We compose three signals derived from a synthesis order's public annotation: $k$-mer Jaccard similarity to known toxins, the trimmed-mean score of a five-LLM judge panel, and cosine similarity to clustered embedding centroids. Fused under a monotone logistic aggregator and calibrated by Conformal Risk Control, the resulting screener certifies $\mathbb{E}[\mathrm{FNR}] \le \alpha + \mathrm{TV}$, where the additive term is the calibration-to-test distribution shift under family holdout (a certified ceiling of 24-49% across folds). Across ten leave-one-taxonomic-family-out folds at $\alpha=0.05$ on UniProt KW-0800 reviewed toxins, the calibrated screener achieves 0% empirical test miss rate on every fold and 0% test false-flag rate on nine of ten folds. The bound's finite-sample slack $1/(n_{\mathrm{cal}}+1)$ caps the certifiable miss rate at 1.77% on our 200-hazard subsample; reaching procurement-grade $\alpha=10^{-3}$ requires an $18\times$ larger calibration set, which the full reviewed UniProt KW-0800 corpus is large enough to deliver. The binding constraint on certifiable DNA-synthesis screening is calibration data, not algorithms. Code: https://github.com/najmulhasan-code/crc-screen

02.
medRxiv (Medicine) 2026-06-11

Ferritin across long-term conditions in England: cross-sectional primary care study

Background Iron deficiency (ID) is a readily treatable condition once identified. Ferritin is the primary diagnostic marker, but cut-offs vary and inflammation complicates interpretation in patients with long-term conditions (LTCs). Aim To describe ferritin distribution and the prevalence of threshold-defined low ferritin in adults with and without LTCs in primary care. Design and setting Cross-sectional observational study using routinely collected electronic health records from a national primary care database in England (1st January 2015 to 31st December 2021). Method Adults with >1 ferritin test in Clinical Practice Research Datalink (CPRD) Aurum were included. LTCs were identified using validated primary-care code lists. Outcomes included ferritin distribution and threshold-defined ID prevalence using World Health Organization (WHO) (

03.
arXiv (quant-ph) 2026-06-12

Exploring Exotic Spin-Dependent Interactions Beyond the Standard Model: Theoretical Foundations and Experimental Investigations

arXiv:2606.13318v1 Announce Type: cross Abstract: New interactions mediated by novel particles propose solutions to several important questions in modern physics. Axions serve as examples of such particles; they are lightweight and interact weakly with ordinary matter. This category of particles, including those similar to axions-termed Axion-Like Particles (ALPs)-arises from diverse theoretical frameworks, such as the Peccei-Quinn mechanism addressing the strong CP problem, string theory, and spontaneous supersymmetry breaking. Given their light mass and weak coupling, ALPs are also possible candidates for cold dark matter. Introducing these new interactions mediated by novel particles not only tackles several challenges in modern physics but also raises a crucial question: Are there undiscovered interactions beyond the Standard Model? Many of the interactions predicted by these theories are spin-dependent, which is the primary focus of this review. In this review, we first outline the theoretical foundations for investigating exotic spin-dependent interactions, highlighting their importance in various models beyond the Standard Model. We examine the potential roles of new lightweight particles in mediating these interactions, which may enhance our understanding of dark matter. Relevant formulas derived from theoretical models are included to support experimental investigations. Following this theoretical framework, we conduct a detailed review of recent experimental efforts to detect these exotic interactions. A systematic review of current constraints on these interactions is presented, along with an assessment of various detection approaches.

04.
arXiv (CS.LG) 2026-06-19

Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale

arXiv:2606.19491v1 Announce Type: new Abstract: Pretrained transformers sit near singular minima of the loss, where the Fisher information metric degenerates along dead directions: directions in parameter space along which the directional Fisher vanishes. Locating such a direction normally needs a forward pass and an eigendecomposition of activations, or a sampling-based complexity estimate; none returns a direction computable from the network's parameters alone. We give one, for LayerNorm transformers. The inverse-scale direction $\gamma^{-1}/\|\gamma^{-1}\|$ of the LayerNorm affine is an exact algebraic kernel of the post-final-norm centred activation covariance, for any input distribution, and induces a corresponding dead direction in parameter space. It is read from the LN scale parameter alone, with no forward or backward pass and no eigensolve: the cheapest dead-direction read, specific to LayerNorm. We test it on $14$ pretrained transformers ($9$ LayerNorm, $5$ RMSNorm; $160$M-$35$B; language and vision objectives). At random initialisation the predicted direction matches the measured bottom singular direction (one forward pass, direct SVD) to four decimal places on $9/9$ LayerNorm models, and is correctly absent on $5/5$ RMSNorm models, which lack the mean-subtraction projector that creates it. On the trained checkpoint the covariance eigenvalue along this direction deepens by ${\sim}10^3\times$ and further dead directions open; the random-init-to-trained gap is a one-forward-pass, per-checkpoint readout of singular structure along the predicted coordinate. Two consequences follow in closed form: the residual stream's smallest singular value is preserved block-to-block on $13/14$ transformers measured on their own input distribution, the one exception (Gemma$4$-$31$B) a genuine dead direction the same read pinpoints; and the kernel direction's presence classifies a transformer's normalisation from the parameters alone.

05.
arXiv (CS.LG) 2026-06-12

Learning with Simulators: No Regret in a Computationally Bounded World

arXiv:2606.13576v1 Announce Type: new Abstract: Understanding the minimal assumptions necessary for generalization is the fundamental question in learning theory. Unfortunately, most results rely heavily on independence (or some proxy thereof) of the data-generating process, while results for strongly dependent data are far more limited. Towards addressing this gap, we introduce the framework of simulatable processes, where the learner has access to a simulator that approximates the distribution generating the data (which may be an arbitrarily complex and dependent process). Surprisingly, given access to such a simulator, we show that we can recover the same learning guarantees as in the classical setting with independent data, namely, error bounds that depend on the VC dimension. Further, we use this framework to study the power of conditional sampling and show strict statistical and computational advantages in this setting. As a highlight of our framework, we exhibit a single algorithm that simultaneously learns any given VC class under all processes samplable in bounded polynomial time, with regret controlled by the time-bounded Kolmogorov complexity of the process. This provides a significant conceptual broadening of the classical PAC model.

06.
arXiv (CS.AI) 2026-06-11

Quantized Stochastic Primal-Dual Methods for Distributed Optimization under Relaxed Global Geometry

arXiv:2606.11339v1 Announce Type: cross Abstract: We study distributed optimization with stochastic gradients and finite-bit communication modeled by random (unbiased) quantization. We propose q-PDGD, a quantized stochastic primal-dual method, and analyze it under relaxed global geometry. Under restricted secant inequality (RSI), a constant step-size yields linear contraction to an explicit neighborhood determined by gradient noise, quantization distortion, and network connectivity, while a diminishing step-size achieves O(1/k) convergence without shared-minimizer assumptions. Under Polyak-Lojasiewicz (PL) inequality, we obtain linear-to-neighborhood convergence in the same stochastic quantized setting. Our results match the best-known centralized stochastic rates in oracle complexity, and are supported by experiments demonstrating the predicted tradeoffs between quantization level, step-size choice, and graph structure.

07.
arXiv (CS.CL) 2026-06-11

Where Do Backdoors Live? A Component-Level Analysis of Backdoor Propagation in Speech Language Models

Speech language models (SLMs) are systems of systems: independent components that unite to achieve a common goal. Despite their heterogeneous nature, SLMs are often studied end-to-end; how information flows through the pipeline remains obscure. We investigate this question through the lens of backdoor attacks. We first establish that backdoors can propagate through the SLM, leaving all tasks highly vulnerable. From this, we design a component analysis to discover the role each component takes in backdoor learning. We find that backdoor persistence or erasure is highly dependent on the targeted component. Beyond propagation, we examine how backdoors are encoded in shared multitask embeddings, showing that poisoned samples are not directly separable from benign ones, challenging a common separability assumption used in filtering defenses. Our findings emphasize the need to treat multimodal pipelines as intricate systems with unique vulnerabilities, not solely extensions of unimodal ones.

08.
arXiv (CS.LG) 2026-06-18

Hierarchical Planning with Latent World Models

arXiv:2604.03208v2 Announce Type: replace Abstract: World models are a promising path to zero-shot embodied control through planning. However, existing world model planners struggle on long-horizon, multi-stage tasks: prediction errors compound and naive search is exponential in the planning horizon. Hierarchy mitigates both by decomposing tasks into shorter, tractable subproblems; yet prior hierarchical approaches either amortize control into task-specific policies (hierarchical RL) or assume low-dimensional states and known dynamics (classical hierarchical MPC). We present Hierarchical Planning with Latent World Models (HWM), an architecture and planning paradigm for hierarchical model predictive control (MPC) directly on visual world models trained solely via next-latent prediction. HWM learns world models at multiple temporal scales within a shared latent space, so predictions from the long-horizon model serve as subgoals for the short-horizon model via latent matching, without task-specific rewards, skill learning, or hierarchical policies. To keep long-horizon search tractable, HWM learns an action encoder that compresses primitive action chunks into latent macro-actions. On real-world Franka manipulation, HWM solves pick-and-place from a single goal image at 70% success vs. 0% for single-level planning. Across simulated push manipulation and maze navigation, HWM consistently improves performance on long-horizon tasks while requiring up to 3x less planning compute.

09.
arXiv (CS.LG) 2026-06-19

Predictability as a Fine-Grained Measure for Privacy

arXiv:2606.20546v1 Announce Type: new Abstract: Differential privacy (DP) ensures rigorous individual-level privacy guarantees against even the most knowledgeable attackers, but its worst-case nature can impose a costly privacy-accuracy tradeoff. We introduce privacy via predictability, a fine-grained framework that explicitly incorporates the attacker's core knowledge, a compromised portion of the dataset generated by a stochastic process, and a specified family of queries. Predictability measures privacy leakage as the incremental gain in an attacker's ability to predict sensitive information about unknown individuals after observing the algorithm's output, beyond what can already be inferred from the compromised data. We show that predictability and DP are generally incomparable: each can be small while the other is large. However, in the worst-case regime where all but one individual is compromised, and all binary queries are considered sensitive, predictability implies mutual-information DP. More generally, predictability provides a finer-grained privacy metric tailored to specific sensitive information and specific attacker models. We introduce a general framework, using the generalized method of moments (GMM), to analyze asymptotic predictability when the compromised data is generated by a stationary, ergodic, mixing process. Using this analysis, we derive a predictability-calibrated output perturbation scheme for ERM. Our approach is complementary to DP and can be used alongside DP to provide fine-grained privacy control.

10.
arXiv (CS.AI) 2026-06-19

Analyzing Defensive Misdirection Against Model-Guided Automated Attacks on Agentic AI Systems

arXiv:2606.20470v1 Announce Type: cross Abstract: Agentic AI systems increasingly rely on language-model components to interpret instructions, process external data, invoke tools, and coordinate with other agents. These capabilities make prompt-injection and jailbreak attacks more consequential, especially as attackers adopt model-guided automation to scale probing, prompt refinement, and response evaluation. This work analyzes the resulting attack-defense setting through a probabilistic model of a target system, its defense mechanism, and the attacker's automated judge. Our analysis shows that conventional detect-and-block defenses can allow attacker success rate (ASR) to approach one as the query budget grows, since predictable refusals provide useful feedback to automated search. We then examine detect-and-misdirect, where detected malicious interactions receive controlled, non-operational responses designed to induce false-positive errors in the attacker's judge. This strategy reduces the positive predictive value of attacker-selected candidates and yields a bounded asymptotic ASR. We evaluate a proof-of-concept realization of this strategy through Contextual Misdirection via Progressive Engagement (CMPE), a lightweight conversational misdirection method designed to replace predictable refusal text with safe but strategically misleading responses in automated jailbreak settings. On jailbreak benchmarks, CMPE reduces estimated ASR upper bounds by up to two orders of magnitude and nearly eliminates verified attack success in end-to-end PAIR and GPTFuzz attack runs.

11.
arXiv (CS.AI) 2026-06-19

A Tool for the Synthesis of Adaptive Probabilistic Processors Based on the Ising Model

arXiv:2606.19533v1 Announce Type: cross Abstract: This work presents a tool for the synthesis and simulation of probabilistic architectures for solving combinatorial optimization problems by mapping them to the Ising model. The proposed approach automatically constructs the Ising Hamiltonian and determines the number of probabilistic elements (p-bits) based on problem characteristics such as size and topology. Furthermore, the tool introduces an adaptive strategy for selecting the most suitable update algorithm among Gibbs Sampling, Simulated Annealing (SA), Simulated Quantum Annealing (SQA), and cluster-based methods. Experimental results using benchmark problems demonstrate improved convergence behavior and flexibility compared to fixed approaches. The proposed framework enables systematic evaluation of probabilistic computing strategies and supports the development of future hardware implementations based on MTJs and p-bits.

12.
arXiv (CS.AI) 2026-06-18

Vibe Coding Ate My Homework: An evaluation of AI approaches to greenfield software engineering and programming

arXiv:2606.18293v1 Announce Type: cross Abstract: Thanks to rapid developments in generative AI, we are in the midst of a paradigm shift that may change how we interact with computers forever. We have observed a growth in the use of natural language prompts to build applications and coding infrastructures without underlying knowledge of the field, and this practice has been dubbed `vibe coding.' It arguably represents what the field of programming has been building towards since the beginning, with every higher level of abstraction that is conceived. Vibe coding promises to be the endpoint for the meta of high-level programming as far as method of input is concerned: eliminating a human's use of code syntax entirely in favour of programming in their mother tongue. This paper aims to evaluate the viability of vibe coding for greenfield software engineering tasks, as well as analyse the benchmarks that have been used to measure its software engineering prowess. To this end, we have developed an evaluation suite for analysing an LLM's proficiency in carrying out simple, isolated greenfield programming tasks in Python to provide scoped insight on the matter.

13.
arXiv (CS.LG) 2026-06-12

Robustness Verification of Recurrent Neural Networks with Abstraction Refinement

arXiv:2606.12490v1 Announce Type: new Abstract: Certified local robustness verification for recurrent neural networks (RNNs) is challenging because approximation errors introduced by nonlinear relaxations can propagate through recurrent connections and accumulate over time. As a result, scalable linear bound propagation methods often become overly conservative and fail to certify inputs that are in fact robust, especially when many pre-activation intervals cross zero. We propose an abstraction-refinement framework for RNN verification that partitions such intervals to remove the dominant relaxation error: on each refined branch, ReLU becomes exact, and smooth activations such as tanh and sigmoid admit substantially tighter linear envelopes. To control the combinatorial cost of splitting in long sequences, we introduce a SHAP-guided timestep selection strategy that ranks hidden states by their contribution to the verification objective and refines only the most critical timesteps in temporal order. Experiments on CIFAR10 and MNIST stroke benchmarks demonstrate consistent improvements in verification success and robustness-margin tightness over abstraction-only baselines, while exposing clear runtime trade-offs between ReLU and tanh models.

14.
arXiv (quant-ph) 2026-06-16

Against probability: A quantum state is more than a list of probability distributions

arXiv:2601.18872v2 Announce Type: replace Abstract: The state of a quantum system can be represented by listing the outcome probabilities for a tomographically complete set of measurements. Such representations appear throughout physics, for example, in quantum field theory via correlation functions and in quantum foundations within generalized probabilistic frameworks. In this paper, we show a no-go result: To enable useful statements, the probability representation must be topologically robust$\unicode{x2014}$preserving the notion of closeness between states. Yet, a topologically robust probability representation cannot simultaneously retain other essential structure, such as the subsystem structure.

16.
arXiv (CS.AI) 2026-06-11

Towards Deep Learning Surrogate for the Forward Problem in Electrocardiology: A Scalable Alternative to Physics-Based Models

arXiv:2512.13765v2 Announce Type: replace-cross Abstract: The forward problem in electrocardiology, computing body surface potentials from cardiac electrical activity, is traditionally solved using physics-based models such as the bidomain or monodomain equations. While accurate, these approaches are computationally expensive, limiting their use in real-time and large-scale clinical applications. We propose a proof-of-concept deep learning (DL) framework as an efficient surrogate for forward solvers. The model adopts a time-dependent, attention-based sequence-to-sequence architecture to predict electrocardiogram (ECG) signals from cardiac voltage propagation maps. A hybrid loss combining Huber loss with a spectral entropy term was introduced to preserve both temporal and frequency-domain fidelity. Using 2D tissue simulations incorporating healthy, fibrotic, and gap junction-remodelled conditions, the model achieved high accuracy (mean $R^2 = 0.99 \pm 0.01$). Ablation studies confirmed the contributions of convolutional encoders, time-aware attention, and spectral entropy loss. These findings highlight DL as a scalable, cost-effective alternative to physics-based solvers, with potential for clinical and digital twin applications.

17.
arXiv (CS.AI) 2026-06-19

Flickering Multi-Armed Bandits

arXiv:2602.17315v3 Announce Type: replace-cross Abstract: We introduce Flickering Multi-Armed Bandits (FMAB) to model sequential decision-making in environments with changing action availability, where accessibility of the next action is restricted to a subset dependent on the agent's current choice. We formalize these constraints through stochastically evolving graphs where actions are limited to local neighborhoods. This mobility-constrained structure imposes a dual challenge: the statistical requirement of information acquisition and the physical overhead of navigation. We analyze FMAB under i.i.d. Erdős–R'enyi and Edge-Markovian process, proposing a two-phase lazy random walk algorithm for robust exploration. We establish high-probability sublinear regret bounds and prove near-optimality via a matching information-theoretic lower bound. Our results characterize the intrinsic cost of learning under local-move constraints, complemented by a robotic disaster-response simulation.

18.
bioRxiv (Bioinfo) 2026-06-18

Identification of environmental factors and growth stages in the prediction of fibre yield and fibre quality traits in rain-grown cotton

Context Understanding how and when environmental conditions influence overall crop performance is crucial for optimising the development of genotypes to a specific breeding target environment. We focused on economically important traits of Australian rain-grown cotton including fibre yield and quality traits, which have not been investigated comprehensively. The aim of the study was to identify relevant environmental factors, and the timing and extent of their impact on rain-grown cotton production. Methods We used a data driven approach to analyse the relationship between ten climate related environmental factors across various plant growth stages and eight fibre yield and quality traits, using a large-scale field dataset of 9,283 records collected over 23 years at 4 locations, with 53 unique year-location combinations. We applied eight complementary statistical models including stepwise, penalised and Bayesian linear regression, regression-tree based ensemble methods and deep learning frameworks to (1) select the most essential environmental covariates affecting rain-grown cotton production, and (2) evaluate the predictive performance of these models. Results The environmental impacts on rain-grown cotton production were trait and growth-stage specific. Number of rainy days and solar radiation were identified as the most influential environmental factors for fibre yield traits, vapour pressure deficit at maximum daily temperature was the most influential factor for majority of fibre quality traits. However, each analysed trait was influenced by multiple environmental factors across multiple growth stages (rather than a single factor or a single growth stage). These influential covariates explained a wide range of variation in the traits, accounting for 5.8% to 68.2%. Using the best-fit random forest model, our findings revealed non-linear relationships between key environmental covariates and the traits. Conclusions Environmental factors at different rain-grown cotton growth stages are key determinants for the performance of end-of-season fibre yield and fibre quality parameters. These findings highlight the need to account for environment conditions when developing cotton varieties optimised for rain-grown production systems. Potential strategies are proposed whereby these key environmental factors can be used to increase the rate of genetic gain in rain-grown cotton production systems. Implications The results of this study will be crucial for future genetic evaluations and analyses of genotype-by-environment interaction effects in rain-grown cotton, which must account for the influence of the environment on plant performance. Furthermore, these methods can be applied to other species to identify critical growth stages and environmental factors which most influence crop performance.

19.
arXiv (CS.CV) 2026-06-16

TIMI: Training-Free Image-to-3D Multi-Instance Generation with Spatial Fidelity

Precise spatial fidelity in Image-to-3D multi-instance generation is critical for downstream real-world applications. Recent work attempts to address this by fine-tuning pre-trained Image-to-3D (I23D) models on multi-instance datasets, which incurs substantial training overhead and struggles to guarantee spatial fidelity. In fact, we observe that pre-trained I23D models already possess meaningful spatial priors, which remain underutilized as evidenced by instance entanglement issues. Motivated by this, we propose TIMI, a novel Training-free framework for Image-to-3D Multi-Instance generation that achieves high spatial fidelity. Specifically, we first introduce an Instance-aware Separation Guidance (ISG) module, which facilitates instance disentanglement during the early denoising stage. Next, to stabilize the guidance introduced by ISG, we devise a Spatial-stabilized Geometry-adaptive Update (SGU) module that promotes the preservation of the geometric characteristics of instances while maintaining their relative relationships. Extensive experiments demonstrate that our method yields better performance in terms of both global layout and distinct local instances compared to existing multi-instance methods, without requiring additional training and with faster inference speed.

20.
arXiv (CS.CL) 2026-06-16

WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation

Large language models show that simple autoregressive training can yield scalable and coherent generation, but extending this paradigm to speech remains challenging due to the entanglement of semantic and acoustic information. Most existing speech language models rely on text supervision, hierarchical token streams, or complex hybrid architectures, departing from the single-stream generative pretraining paradigm that has proven effective in text. In this work, we introduce WavSLM, a speech language model trained by quantizing and distilling self-supervised WavLM representations into a single codebook and optimizing an autoregressive next-chunk prediction objective. WavSLM jointly models semantic and acoustic information within a single token stream without text supervision or text pretraining. Despite its simplicity, it achieves competitive performance on consistency benchmarks and speech generation while using fewer parameters, less training data, and supporting streaming inference.

21.
arXiv (quant-ph) 2026-06-11

Q-DICE: Quantum Distributed Interconnect Compiler and Emulator

arXiv:2606.11340v1 Announce Type: new Abstract: As distributed quantum computing (DQC) offers a leading path towards scalable quantum computation, the ability to benchmark distributed algorithms under realistic conditions becomes critical for system co-design. However, without access to physical systems, researchers lack tools to evaluate distribution protocols. We introduce Q-DICE (Quantum Distributed Interconnect Compiler and Emulator), a hardware-aware emulation environment for benchmarking distributed quantum circuits on classical simulators and on NISQ-era monolithic hardware. This work provides three core contributions: (1) a programmatic scheme to construct distributed QPU backends, utilizing two novel techniques - QPU slicing and stitching - to facilitate distributed circuit mapping, (2) a methodology for modeling nonlocal link noise using physically motivated Kraus operators and stochastic error channels, and (3) a boundary-aware circuit mapping algorithm enforcing distributed QPU topology constraints during transpilation. Together, these components constitute a distribution-aware compiler and noise-modeling engine that faithfully enforces the physical limitations of distributed quantum hardware within existing execution environments. We validate Q-DICE against a multitude of experimentally demonstrated quantum circuits, including a distributed Grover's search on optically linked trapped-ion hardware, achieving a worst-case fidelity deviation of 4% between simulated and experimental results. These findings demonstrate Q-DICE's capacity to accurately reproduce real distributed quantum system behavior across platforms, streamlining experimentation with distributed quantum algorithms and architectures.

22.
arXiv (CS.CV) 2026-06-17

MagicSim: A Unified Infrastructure for Executable Embodied Interaction

Robot learning and embodied agents now require simulation to serve as a shared execution substrate linking control, skills, and planning, not only as a renderer, controller testbed, or fixed task environment. Existing pipelines split these layers with "magic" actions, disconnected training environments, or forward-only renders that cannot reproduce, evaluate, and annotate the same episode. We present MagicSim, an embodied interaction infrastructure built around one deterministic batched runtime and a shared Markov decision process (MDP). From YAML-first specifications that decouple contents, placement, behavior, and agent exposure, MagicSim constructs diverse executable worlds spanning task families, interaction regimes, physics, layouts, sensors, avatars, and robot embodiments in one reset-and-step loop. A common execution interface grounds high-level commands through controllers, atomicskills, planner primitives, and asynchronous planning, realizing them as robot actions rather than simulator-side state edits. One task definition supports three capabilities: benchmark and RL evaluation, an autocollect interface that automatically turns commands into grounded trajectories, and agent/VLM-facing interaction. For automatic execution, commands flow through a Command->Skill->Planner->Robot->Record pipeline, while per-environment command, skill, planning, retry, annotation, and episode states advance independently above the shared physics tick. Successful rollouts are saved as structured multimodal trajectories aligning language supervision, action representations, visual/geometric representations, and task-level status with the executed episode. MagicSim thus unifies diverse world construction, embodied execution, task evaluation, automatic rollout generation, and interactive agent interfaces in one planner-in-the-loop runtime.

23.
arXiv (CS.AI) 2026-06-16

Post-Hoc Merging is Not Enough: Many-Shot Model Merging with Loss-Gap Balancing

arXiv:2606.16501v1 Announce Type: new Abstract: Model merging has become a practical post-training strategy for building a single multi-task large language model (LLM) by combining multiple task-specialized models. However, most existing approaches rely on post-hoc merging, in which task-specific models are merged only once after training. This one-shot aggregation often suffers from task interference, leading to information erasure across individual tasks. In this work, we show that replacing post-hoc merging with an iterative many-shot merging protocol is effective in improving multi-task performance. Building on this insight, we propose METIS, Mitigating Erasure from Task Interference for Stable many-shot merging. METIS is a loss-aware many-shot merging method that addresses information erasure in post-hoc merging through task-wise loss-gap weighting and consensus-based masking. Notably, METIS exhibits significant performance improvement on the worst-performing task, effectively mitigating information erasure. (Project page: https://imkyungjin.github.io/METIS/)

24.
arXiv (CS.CV) 2026-06-11

ReMoT: Reinforcement Learning with Motion Contrast Triplets

We present ReMoT, a unified training paradigm to systematically address the fundamental shortcomings of VLMs in spatio-temporal consistency – a critical failure point in navigation, robotics, and autonomous driving. ReMoT integrates two core components: (1) A rule-based automatic framework that generates ReMoT-16K, a large-scale (16.5K triplets) motion-contrast dataset derived from video meta-annotations, surpassing costly manual or model-based generation. (2) Group Relative Policy Optimization, which we empirically validate yields optimal performance and data efficiency for learning this contrastive reasoning, far exceeding standard Supervised Fine-Tuning. We also construct the first benchmark for fine-grained motion contrast triplets to measure a VLM's discrimination of subtle motion attributes (e.g., opposing directions). The resulting model achieves state-of-the-art performance on our new benchmark and multiple standard VLM benchmarks, culminating in a remarkable 25.1% performance leap on spatio-temporal reasoning tasks.

25.
arXiv (math.PR) 2026-06-16

A Low-Regularity Semigroup Sewing Lemma via Quotient Structures

arXiv:2606.16164v1 Announce Type: new Abstract: We develop a low-regularity Sewing theory for the semigroup coboundary $\hat\delta=\delta-a$ associated with a strongly continuous semigroup $S$. Unlike the ordinary low-regularity Sewing problem, the semigroup setting has an intrinsic algebraic non-uniqueness below the threshold $1$, in the sense that solutions are canonical only modulo semigroup cocycles. Accordingly, the natural target is a quotient space rather than an increment space. We identify this quotient structure and construct the corresponding semigroup Sewing map. The construction uses a frozen terminal-time transform, which rewrites semigroup defects, for each terminal time, as ordinary low-regularity Sewing problems on a frozen simplex. This reduction, however, does not by itself produce a genuine semigroup increment; the main additional step is to prove that the frozen solution classes are compatible as the terminal time varies and hence assemble into a canonical quotient class for $\hat\delta$. This yields canonical classes for $0