Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-11

Experimental straintronics in nanotube quantum dots

arXiv:2606.12180v1 Announce Type: cross Abstract: Single-wall carbon nanotubes (SWCNTs) are narrow ribbons of graphene with atomically precise edges and a single quantum transport channel, at experimentally-relevant dopings. This makes them ideal systems to harness quantum transport straintronics (QTS), i.e. using mechanical strain to control accurately quantum transport. We present QTS data from three single-wall carbon nanotube quantum dot (SWCNT-QD) transistors over a broad range of in-situ tunable and reversible uniaxial strain ($\Delta\varepsilon_mech\approx$ 0 to 3 %). We first present the nanofabrication of the suspended SWCNT transistors whose channel lengths are $\approx$ 30 nm. The channels are strained by moving gold clamps holding firmly the nanotubes. We present detailed charge transport data, $dI/dV_{B} - V_{B} - V_{G}$ and $dI/dV_{B} - V_{B} - \Delta\varepsilon_mech$, showing a large mechanical-gating effect of the SWCNT-QDs. The precise reversibility of the data, and their agreement with QTS theory, confirms that the tubes are strained elastically. We demonstrate that the mechanical control of the QD doping is not due to capacitive-gating effects, but to quantitatively predictable bandstructure changes including a strain-tunable bandgap. This precise mechanical control of the doping and bandgap of SWCNT-QDs could find applications in qubits, condensed matter physics, and homojunction molecular transistors.

02.
arXiv (CS.LG) 2026-06-11

My Chemical Harness: Evolutionary Molecular Design over Synthetic Pathways with Large Language Model Agents

arXiv:2606.11256v1 Announce Type: cross Abstract: Designing molecules with target properties is most useful when candidate structures are accompanied by feasible synthetic routes. We introduce My Chemical Harness, a route-native evolutionary framework for goal-directed molecular design in which the search population consists of executable synthetic pathways rather than isolated molecular graphs. Each route is built from purchasable building blocks and reaction templates, executed by deterministic chemistry tools, and scored through task-specific molecular oracles. Large language models (LLMs) are used only as strategy controllers that select high-level preferences over route length, move type, reaction families, motifs, and exploration pressure, while local code performs route construction, validation, deduplication, scoring, selection, and memory updates. This separation lets the LLM guide exploration without allowing it to introduce hallucinated products or unsupported reaction steps. On a soluble epoxide hydrolase proxy task, our LLM agent improves over single pass LLM and deterministic controllers, reaching state-of-the-art performance across the sEH score, synthetic accessibility score, and AiZynthFinder success rate metrics. These results suggest that constrained LLM agents can play a significant role in molecular discovery without requiring training, fine-tuning, or dedicated generative models.

03.
arXiv (CS.CV) 2026-06-17

AlignDrive: Aligned Lateral-Longitudinal Planning for End-to-End Autonomous Driving

Practical autonomous driving requires models that generalize by reasoning through spatial-temporal possibilities to exclude unsafe outcomes. While state-of-the-art (SOTA) methods use parallel planning architectures, they fail to explicitly couple speed decisions with agent behavior along the driving path, leading to suboptimal coordination. To address this, we propose a cascaded framework that transforms longitudinal planning from an independent prediction task into a path-conditioned reasoning process. On the model side, we introduce an anchor-based regression design that conditions longitudinal prediction on the lateral drive path, and reformulate longitudinal planning as 1D displacement prediction along the path. This reduces geometric uncertainty and sharpens the model's focus on interaction-driven dynamics. On the data side, we introduce a planning-oriented data augmentation strategy that simulates rare safety-critical events by programmatically inserting agents and relabeling longitudinal targets to enforce collision avoidance. Evaluated on the challenging Bench2Drive benchmark, our method achieves SOTA performance with a driving score of 89.07 and a success rate of 73.18%, demonstrating significantly improved coordination and safety. Further evaluation on Fail2Drive confirms strong generalization to rare edge cases where parallel formulations typically fail. Project page:https://yanhaowu.github.io/AlignDrive/.

04.
arXiv (CS.AI) 2026-06-17

Handling Feature Heterogeneity with Learnable Graph Patches

arXiv:2606.17667v1 Announce Type: cross Abstract: In recent years, the rapid development of foundation models and graph pre-training technologies has spurred increasing interest in constructing a universal pre-trained graph model or Graph Foundation Model (GFM). However, a significant challenge is that existing models are unable to address feature heterogeneity in graph data without textual information, which hinders the transferability of graph models across different datasets. To bridge this gap, we propose the concept of learnable graph patches, which we regard as the smallest semantic units of any graph data. We decompose the graph into learnable graph patches by unfolding the node features and constructing corresponding patch structures separately. We then design a framework that mines transferable information from graph data across domains. Specifically, after extracting graph patches, we propose a patch encoder to extract knowledge from each unit and a patch aggregator to learn how the units are combined into a whole. Due to its domain-agnostic nature, the model can be applied to downstream data across different domains. Furthermore, we analyze the connection between our method and existing graph models, as well as the transferability of the node embeddings it generates. Empirically, our method not only achieves the capability to use multi-domain graphs for pre-training, but also shows enhanced performance across various downstream datasets and tasks. Moreover, we observe consistent improvement in downstream performance as the volume of pre-training data increases.

05.
arXiv (CS.AI) 2026-06-16

TrustedARI: Towards Trust-Native Agentic Routing Infrastructure for Agentic AI

arXiv:2606.15822v1 Announce Type: new Abstract: AI agents increasingly access external models, tools, and services through Agentic Routing Infrastructure (ARI) to manage the overhead of heterogeneous interfaces and fragmented subscriptions. Yet, the architecture of ARI introduces fundamental trust risks: it obtains plaintext access to agent queries and service responses, while leaving agents unable to verify that their queries are routed to intended service providers or that requests and responses remain untampered. To address this problem, we present TrustedARI, the first trust-native agentic routing infrastructure for agentic AI. Architecturally, TrustedARI is built upon three core innovations: (i) an ARI-adapted three-party TLS handshake that enables the agent and ARI to jointly authenticate the service provider through role-specific distribution of TLS key materials; (ii) a privacy-preserving query-construction protocol that allows the agent and ARI to collaboratively construct well-formed queries without exposing their respective private inputs; and (iii) a verifiable billing protocol that supports fair usage-based settlement while preserving the integrity and confidentiality of service responses. We implemented and extensively evaluated a prototype of TrustedARI to validate its performance. Experiments confirm that TrustedARI is highly efficient: our ARI-adapted handshake protocol reduces communication overhead by 39.34% compared to the existing three-party TLS handshake. Furthermore, the privacy-preserving query-construction protocol imposes negligible overhead-averaging 0.19 seconds in computation time and 0.58 MB in communication costs-while the verifiable billing protocol speeds up proof generation by 28.20x. Crucially, TrustedARI is readily deployable without any modification to the service providers.

06.
arXiv (CS.AI) 2026-06-16

Protein Design with Agent Rosetta: A Case Study for Specialized Scientific Agents

arXiv:2603.15952v2 Announce Type: replace Abstract: Large language models (LLMs) are capable of emulating reasoning and using tools, creating opportunities for autonomous agents that execute complex scientific tasks. Protein design provides a natural testbed: although machine learning (ML) methods achieve strong results, these are largely restricted to canonical amino acids and narrow objectives, leaving unfilled need for a generalist tool for broad design pipelines. We introduce Agent Rosetta, an LLM agent paired with a structured environment for operating Rosetta, the leading physics-based heteropolymer design software, capable of modeling non-canonical building blocks and geometries. Agent Rosetta iteratively refines designs to achieve user-defined objectives, combining LLM reasoning with Rosetta's generality. We evaluate Agent Rosetta on design with canonical amino acids, matching specialized models and expert baselines, and with non-canonical residues – where ML approaches fail – achieving comparable performance. Critically, prompt engineering alone often fails to generate Rosetta actions, demonstrating that environment design is essential for integrating LLM agents with specialized software. Our results show that properly designed environments enable LLM agents to make scientific software accessible while matching specialized tools and human experts.

07.
arXiv (CS.AI) 2026-06-19

PCBSchemaGen: Reward-Guided LLM Code Synthesis for Printed Circuit Boards (PCB) Schematic Design with Structured Verification

arXiv:2602.00510v2 Announce Type: replace Abstract: Most LLM code-synthesis benchmarks rely on unit tests as the reward oracle, but PCB schematic design has none: correctness is defined by structured physical constraints over real IC packages and pin-level assignments, per-task golden references are unavailable, and SPICE simulation does not validate schematic-level correctness. We introduce PCBSchemaGen, a training-free inference-time framework that turns a frozen LLM into a verifiable, repairable PCB schematic generator. The framework induces a domain schema from IC datasheets to ground LLM decoding, pairs it with a deterministic 5-layer continuous-reward verifier with pin-level error localization, and refines candidates through a Thompson Sampling arm-acquiring bandit. We evaluate on 2 PCB benchmarks covering 227 real-IC tasks across 22 unified circuit domains, including a public-schematic-derived suite that serves as a fully held-out generalization test (verifier, KG library, and prompts frozen before any evaluation). Under our framework, an open-weight 31B model (Gemma-4-31B) passes 81.3% of PCBBench tasks on average, and the same framework transfers across both benchmarks with zero verifier code changes; a Circuitron-style inference-time prompting baseline on the same Gemma-4-31B backbone collapses on hard system-level designs. This suggests inference-time refinement under a deterministic structural verifier is a general recipe for reference-free LLM code synthesis in domains without unit-test oracles. Our benchmarks and deterministic verifier are publicly available at https://github.com/HZou9/PCBSchemaGen_v2.

08.
arXiv (CS.LG) 2026-06-15

MUFFLe: Efficient Model Update Compression via Generalized Deduplication for Federated Learning

arXiv:2606.14354v1 Announce Type: new Abstract: Federated learning is well suited to edge environments but is often limited by the uplink cost of transmitting model updates. This Work-in-Progress paper presents MUFFLe, a communication-efficient update compression scheme that integrates generalized deduplication (GD) into the FedAvg pipeline. MUFFLe deduplicates repeated patterns across the update vector, yielding a fixed-rate, variable-count compression scheme. Preliminary experiments on IID MNIST with 20 clients show that MUFFLe reaches the target accuracy of $92.93\%$ with 38~MB cumulative uplink communication, compared with 75~MB for 8-bit quantization, 86~MB for Top-$k$ sparsification, and 310~MB for uncompressed FedAvg. These results demonstrate the feasibility of applying GD to communication-efficient federated learning.

09.
arXiv (CS.LG) 2026-06-16

Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning

arXiv:2606.14970v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) has become a central application of modern optimization, enabling pretrained models to adapt to diverse downstream tasks and domain-specific data. A major obstacle in large-scale fine-tuning is the memory overhead of backpropagation, which requires storing activations, gradients, and optimizer states. Zeroth-order (ZO) optimization offers a memory-efficient alternative, but its performance is highly sensitive to the stepsize and smoothing parameter, often requiring costly task-specific tuning. Parameter-free (PF) optimization addresses this issue by adapting algorithmic parameters without prior knowledge of problem-dependent constants. Moreover, large-scale fine-tuning can benefit from geometry-aware updates that account for the heterogeneous structure of parameter blocks, which can be modeled through methods that exploit linear minimization oracle (LMO). In this work, we study PF adaptation for LMO-based ZO optimization and introduce $\texttt{AdaNAGED}$, a method that unifies gradient-free training, adaptive tuning, and non-Euclidean update geometry. We establish convergence guarantees and validate the method on large-scale LLM fine-tuning task with $\texttt{OPT}-1.3\mathrm{B}$ model.

10.
arXiv (CS.CL) 2026-06-11

Detecting Sensitive Personal Information in Japanese Pre-Training Corpora for Large Language Models

Sensitive personal information can appear in large-scale pre-training corpora for large language models (LLMs). Detecting and filtering such information is therefore essential to ensure compliance with privacy regulations and prevent unintended information leakage. However, in contrast to English and other languages, research into sensitive personal information has been limited in the Japanese language. In this study, we focus on sensitive personal data defined as special care-required personal information (SCPI) under Japan's Act on the Protection of Personal Information (APPI). We construct an SCPI dataset using LLM-based annotation and train machine learning models to rapidly detect SCPI in text. As a result, our SCPI classifier can effectively identify information related to SCPI. This study is the first to explore SCPI detection in Japanese text corpora, highlighting the challenges of accurate detection.

11.
arXiv (CS.AI) 2026-06-16

Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability

arXiv:2606.01365v2 Announce Type: replace Abstract: Failure-aware observability diagnoses wasted computation in multi-agent LLM systems before final-answer evaluation can explain what went wrong. We propose a trace-based framework for a three-agent architecture – orchestrator, search agent, and execution agent – that converts structured events into online signals for loops, budget pressure, low information gain, and tool instability, then adds offline semantic grounding metrics and selective LLM-as-judge evaluation. On 165 GAIA validation traces under identical caps, 98 runs produce usable final answers and 67 fail or stop without one. Among warned failed runs, 58.1% of tokens are spent after the first warning on average, indicating substantial opportunity for intervention. A 10-task Level-2 pilot uses warnings to diversify search or require evidence, reducing post-warning token fraction from 0.638 in the baseline to 0.304. The results support a layered design: cheap online signals help the orchestrator redirect or halt redundant behavior, while deeper semantic checks identify whether completed answers are grounded enough to trust.

12.
arXiv (CS.LG) 2026-06-18

Learning to Annotate Delayed and False AEB Events: A Practical System for Extreme Class Imbalance and Asymmetric Label Noise

arXiv:2606.19186v1 Announce Type: cross Abstract: Autonomous Emergency Braking (AEB) optimization relies on accurately annotated real-world trigger events, particularly rare but critical delayed and false AEB triggers that expose system deficiencies. However, these minority samples comprise less than 5% of thousands of daily triggers, making manual annotation prohibitively expensive at scale. We present the first automated AEB annotation framework to address this problem. During development, we identified two fundamental challenges that severely impair delayed/false trigger annotation accuracy: (1) Extreme class imbalance where delayed/false triggers are overwhelmed by true triggers; (2) Asymmetric label noise where mislabeled majority samples (true triggers) suppress minority samples (delayed/false triggers) learning. To overcome these challenges, we propose two key innovations: (1) Specific data augmentation that synthesizes realistic samples by manipulating focal target attributes, transplanting ego-vehicle dynamics, and masking non-focal agents; (2) noise suppression using stable hardness estimation and probe-guided adaptive threshold to clean mislabeled true trigger samples. Crucially, we deploy our model as a practical annotation system with full-stack architecture, efficiently identifying critical delayed/false triggers from thousands of daily AEB events. Production results demonstrate 80% improvement in recall of delayed/false triggers and 50% reduction in manual workload. Beyond immediate gains, the system enables continuous self-improvement through accumulated high-quality annotations, establishing a necessary data foundation for on-vehicle AEB system optimization

13.
arXiv (CS.CL) 2026-06-15

Towards Direct Latent-Space Synthesis for Parallel Branches in LLM-Agent Workflows

Large language models increasingly serve as execution engines for agentic systems, yet they still consume context through a sequential text interface. This creates a mismatch with modern structured agent workflows, in which independent branches explore subtasks, retrieve evidence, or generate candidate solutions before a final synthesis step. Existing systems typically merge these branches by concatenating their textual outputs, which discards the parallel structure and incurs redundant prefill computation. In this work, we introduce Parallel-Synthesis, a plug-and-play framework that enables a synthesizer to directly consume the KV caches produced by parallel worker agents. Parallel-Synthesis combines a cache mapper that calibrates independently generated branch caches with a fine-tuned synthesizer adapter that enables generation from this non-sequential cache interface. We train Parallel-Synthesis using data that exposes the synthesizer to parallel cache contexts, teaches aggregation across cached branches, and distills reasoning behavior from standard text-concatenation-based synthesis. Across nine downstream datasets spanning math, science QA, code generation, GAIA, and multi-agent database diagnosis, Parallel-Synthesis matches or outperforms text-based synthesis on seven datasets and remains close on the other two. It also reduces time-to-first-token by 2.5x-11x, suggesting that direct cache-based synthesis is a promising interface for more native and efficient synthesis over parallel agent branches.

14.
arXiv (CS.CV) 2026-06-16

VEPHand: View-Efficient Photometric Hand Performance Capture at Scale

Robust, high-fidelity 3D hand capture, while fundamental to digital human creation, remains challenging with practical multi-view systems that balance rich photometry with the geometric ambiguities of reconstruction arising from limited viewpoint density. This paper presents an end-to-end pipeline for dynamic hand performance capture and registration, specifically designed for view-efficient setups ($\sim$20 views). We address key challenges with two primary innovations. First, to overcome reconstruction difficulties like limited view overlap and background clutter, our mask-free neural method robustly extracts detailed hand geometry and appearance from unmasked images using scene parameterization and scenario-specific density regularization. Second, addressing registration challenges such as accurately capturing non-linear skin deformations and ensuring plausible results during severe self-contact, we propose a physics-inspired framework. It aligns reconstructions to a personalized hand model by optimizing intrinsic volumetric offsets within its canonical tetrahedral mesh, alongside pose parameters. This approach, supported by robust losses and optimization, captures fine surface deformations, ensures plausible results under severe articulation and self-contact, and demonstrates strong tolerance to input noise. We demonstrate the scalability and robustness of our automated pipeline on an extensive dataset of over 12,000 sequences, from which we also derive a large-scale, high-quality synthetic 2D/3D hand dataset for training downstream tasks. This showcases its effectiveness for single hands, intricate two-hand interactions, and natural hand-object manipulations. Our method achieves state-of-the-art reconstruction fidelity in view-efficient, unmasked scenarios and highly accurate registration. Our project page are available at https://zyshen021.github.io/VEPHand/.

15.
arXiv (CS.LG) 2026-06-15

Code Correctness Signals in LLM Hidden States: Pre-Generation Probing and Repair Geometry

arXiv:2606.14530v1 Announce Type: new Abstract: Large language models encode rich information in their hidden states. This work asks whether code correctness is legible in the hidden states of Qwen3-4B-Instruct-2507, before it generates and as it repairs a failed attempt, studied on 444 LiveCodeBench tasks. It reports two findings connected by a single confound-control tool: residualization. First, the correctness of the model's first-attempt code is linearly decodable from the prompt-final hidden state, with a leakage-free held-out AUC of 0.931 +/- 0.008 across 50 outer splits. After the linear effect of prompt length is removed from each hidden state dimension, the probe still reaches 0.911 +/- 0.010, well above a prompt-length baseline of 0.754 +/- 0.014. Second, on 236 cleaned cases where the model attempts to repair a failed first attempt, the hidden state shift from the failing attempt to its repair carries a statistically detectable contrastive direction, significant on both a magnitude and a split-half test against label-shuffled nulls. This direction does not survive a conditional residualization against repair-context covariates that differ between successful and failed repairs, marking it as a correlate of repair success driven by the repair context rather than an isolated repair-comprehension feature. The probe layer is selected by nested cross-validation, and the same residualization approach that upholds the pre-generation correctness result overturns the repair-direction interpretation. The contribution is as much methodological as empirical: a diagnostic honest enough to report a negative result alongside a positive one.

16.
arXiv (math.PR) 2026-06-19

Theory of uncertain probability: can we derive the probability density function of uncertain random experiments with continuously changing conditions?

作者:

arXiv:2606.20169v1 Announce Type: new Abstract: This paper aims to explore the formation mechanism of probability distribution in situations where the differences among random experiments are distinguishable, and these differences continue to evolve along with the dynamic changes in conditions and their mechanisms of action. To this end, we are motivated to devise a new theoretical system – theory of uncertain probability (TUP) with Kolmogorov's system and nonlinear theories as special cases. TUP develops a novel model that integrates probability and uncertainty as well as the known and unknown to more accurately depict numerous typical random phenomena under more realistic assumptions, and thus provides appropriate tools for greater variety of real needs. It also allows for pioneering interpretation of the causal mechanisms underlying many important distributional characteristics and incorporation of pathwise property to distribution model.

17.
arXiv (math.PR) 2026-06-12

Interference Queueing Networks: A Replica Mean-Field Approach in the Symmetric Setting

arXiv:2606.13264v1 Announce Type: new Abstract: We propose a model for evaluating the performance of wireless communication networks beyond the ubiquitous full-buffer assumption, under which every transmitter is always active. The network is represented by N interacting queues arranged on a torus, with homogeneous arrival rate and service rates depending on the activity of neighboring interferers. More precisely, each queue is associated with a transmitter-receiver pair, and its service rate is given by the Shannon capacity, which depends on the corresponding Signal-to-Interference-plus-Noise Ratio (SINR). Since interfering transmitters only emit when their queue is non-empty, the SINR and hence the service rate improves when neighboring queues are empty. We derive the stability region of the system, together with approximations of its stationary distribution and its exponential rate of convergence to stationarity. These approximations are obtained via a replica mean-field limit, for which we establish propagation of chaos and long-time behavior results.

18.
arXiv (CS.CV) 2026-06-17

A geometric and deep learning reproducible pipeline for monitoring floating anthropogenic debris in urban rivers using in situ cameras

The proliferation of floating anthropogenic debris in rivers has emerged as a pressing environmental concern, exerting a detrimental influence on biodiversity, water quality, and human activities such as navigation and recreation. The present study proposes a novel methodological framework for the monitoring the aforementioned waste, utilising fixed, in-situ cameras. This study provides two key contributions: (i) the continuous quantification and monitoring of floating debris using deep learning and (ii) the identification of the most suitable deep learning model in terms of accuracy and inference speed under complex environmental conditions. These models are tested in a range of environmental conditions and learning configurations, including experiments on biases related to data leakage. Furthermore, a geometric model is implemented to estimate the actual size of detected objects from a 2D image. This model takes advantage of both intrinsic and extrinsic characteristics of the camera. The findings of this study underscore the significance of the dataset constitution protocol, particularly with respect to the integration of negative images and the consideration of temporal leakage. In conclusion, the feasibility of metric object estimation using projective geometry coupled with regression corrections is demonstrated. This approach paves the way for the development of robust, low-cost, automated monitoring systems for urban aquatic environments.

19.
arXiv (CS.CV) 2026-06-17

MoonSplat: Monocular Online Gaussian Splatting with Sim(3) Global Optimization

Online 3D reconstruction from monocular image sequences is a challenging and ongoing research topic. 3D Gaussian Splatting (3DGS), leveraging its high-quality real-time rendering capability, empowers online 3D reconstruction to represent dense scenes with enhanced expressiveness, and thus holds great promise for a wide range of applications such as robotics and AR/VR. However, existing online 3DGS methods still suffer from some key challenges: fragile camera pose estimation due to the lack of global optimization, and low optimization efficiency in large-scale or long-sequence scenarios. To address these issues, we propose a robust and efficient online voxelized 3DGS reconstruction framework integrated with global $Sim(3)$ optimization, which enables reliable camera tracking and efficient global loop closure for both camera poses and voxelized 3DGS. To accelerate the convergence of the voxelized 3DGS, we further introduce a color residual learning strategy, which not only boosts optimization speed but also enhances rendering quality. Extensive experiments on diverse indoor and outdoor datasets demonstrate that our method achieves state-of-the-art performance in both camera pose estimation accuracy and rendering quality, while retaining real-time efficiency. Additionally, we develop and deploy a real-world UAV-based active reconstruction system grounded on our proposed method, validating its robustness and generalizability for practical online 3D reconstruction tasks. Our code and data are available at https://github.com/TrickyGo/MoonSplat.

20.
arXiv (quant-ph) 2026-06-16

Adiabatic preparation of a fractional quantum Hall fluid by coherently pumping atoms from a Bose-Einstein condensate

arXiv:2606.15951v1 Announce Type: cross Abstract: We propose a protocol to adiabatically prepare a many-particle fractional quantum Hall fluid of bosonic ultracold atoms exploiting a time-dependent coherent coupling of a strongly interacting atomic state with a large dilute Bose-Einstein condensate. Starting from an empty cloud, atoms with well-defined angular momentum are coherently pumped into the fluid by Raman beams with a Laguerre-Gauss profile. Compared to number-conserving schemes which rely on finite-size-induced topological gaps, we identify an adiabatic path in the Fock space which avoids crossing topological phase transitions and thus maintains a sizable adiabatic gap open at all times. The efficiency of our preparation protocol is numerically assessed for typical experimental parameters up to particle numbers that largely exceed the experimental state-of-the-art. The crucial advantage of including an anharmonic confinement is finally highlighted.

21.
arXiv (CS.CV) 2026-06-16

MoECa: Aligning Feature Reuse with Expert Decomposition in Diffusion Transformers

Diffusion Transformers with Mixture-of-Experts (DiT-MoE) improve model capacity under sparse activation, but diffusion inference is still bottlenecked by redundant computation across timesteps. Existing caching methods mainly operate at the token level, which becomes suboptimal in DiT-MoE because each token update is internally decomposed into multiple routed expert branches. Our analysis shows that cross-timestep redundancy in DiT-MoE is better characterized at the expert-branch level than at the whole-token level. Based on this observation, we propose MoECa, a fine-grained caching framework that performs branch-level feature reuse across timesteps. MoECa further introduces expert-aware adaptive control and synchronized cache updates across MoE and attention paths to maintain stable intermediate states. Experiments on multiple DiT-MoE models show that MoECa consistently achieves a better speed-quality trade-off than prior caching methods, with up to 2.83$\times$ inference speedup and minimal quality degradation.

22.
Nature (Science) 2026-06-15

Daily briefing: Iron-Age human bones were made into tools before interment

作者:

Newly uncovered bones hint at how Iron Age Britons treated their dead. Plus, AI models have failed to beat human mathematicians at research-level problems and the everyday items that make great scientific tools. Newly uncovered bones hint at how Iron Age Britons treated their dead. Plus, AI models have failed to beat human mathematicians at research-level problems and the everyday items that make great scientific tools.

23.
arXiv (CS.LG) 2026-06-16

Learning the Geometry of Data: A Mathematical Review of Shape Space Analysis

arXiv:2606.17022v1 Announce Type: cross Abstract: A central objective of machine learning is to identify structure and patterns in data. Advances in data acquisition have increasingly produced datasets whose observations possess rich geometric form, giving rise to shape spaces that encode variability in object geometry. Such datasets arise across a wide range of disciplines, including biology, medicine, anthropology, and computer vision, where subtle geometric differences often carry important scientific information. Traditional machine learning methods, however, are frequently ill-equipped to account for the nonlinear geometric structure underlying these data. This survey synthesizes a rapidly growing body of work on shape space analysis, which provides a mathematical and computational framework for the study of geometric data. Drawing on ideas from differential geometry, statistics, and machine learning, we organize the literature around a common analytical pipeline: shape representation and parameterization, the rigorous construction of robust geodesic metrics, statistical analysis on shape spaces, and geometry-aware learning methods. We discuss how these tools enable the characterization of shape variability, the comparison of geometric objects, and the analysis of structural trajectories across populations and time. To illustrate the breadth of the field, we highlight applications spanning multiple scales of biological organization, including studies of subcellular morphology and primate tooth evolution. Across these and many other domains, researchers face common challenges arising from complex, nonlinear, and often unaligned geometric variation. The review concludes by identifying key theoretical and computational challenges, as well as emerging opportunities driven by increasingly large and diverse geometric datasets.

24.
arXiv (CS.CV) 2026-06-16

LentiAvatar: Pseudo-Multiview Reconstruction and Subpixel Prism Rendering for Real-Time Stereoscopic Communication

Real-time stereoscopic video communication has long been a goal of immersive telepresence, yet practical systems still require specialized capture rigs or reduce remote users to a single portrait view. We present LentiAvatar, a Gaussian head-avatar system that connects monocular avatar capture with subpixel-encoded glasses-free lenticular display for real-time autostereoscopic communication. From a monocular portrait video, LentiAvatar reconstructs a controllable head avatar and optimizes it for the lateral viewing zones induced by the display. The method uses natural head turns as pseudo-multiview (PMV) supervision to constrain regions that are otherwise weakly observed in monocular training, including hair, ears, jaw contours, and neck boundaries. Reliable side frames are yaw-binned, aligned to virtual cameras, and supervised within a strict head-and-hair domain; contour-aware losses and staged regularization further suppress ghosting, alpha leakage, and depth instability while preserving lateral detail. At runtime, LentiAvatar renders 32 virtual views and encodes them into a 4K lenticular raster with calibrated subpixel-routing masks. The live-tracker prototype sustains 10.65 FPS, and a subject-specific distilled driver raises the same display pipeline to 38.49 FPS.

25.
arXiv (CS.CV) 2026-06-17

GASE: Gaussian Splatting-Based Automated System for Reconstructing Embodied-Simulation Environments

Training embodied agents in the real world requires skilled operators and expensive hardware. Simulation environments offer a compelling alternative by enabling large-scale, cost-effective data augmentation. Consequently, rapidly constructing high-fidelity simulation scenes with a minimal sim-to-real gap has become a critical objective in robot learning. While reconstruction-based methods provide superior visual quality, current workflows are hindered by inefficient data acquisition and subpar foreground object extraction. We thus propose GASE, a highly automated system for simulation scene construction. GASE leverages multi-view video streams from panoramic camera arrays to enable rapid environment scanning. To ensure high-quality asset generation, our pipeline introduces a camera-pose-based strategy that robustly extracts objects across frames in the 2D domain, followed by high-fidelity scene inpainting. Foreground objects and the static background are then reconstructed independently and seamlessly imported into physics simulators for policy training. Extensive experiments demonstrate that GASE outperforms existing 3D Gaussian-based methods in segmentation accuracy by over 10\% while achieving state-of-the-art inpainting quality. Furthermore, real-robot deployments across manipulation and navigation tasks maintains a performance gap of less than 10\% compared to policies trained purely on real-world data. These results confirm that GASE provides an efficient and highly effective solution for bridging the sim-to-real gap. Code will be released.