论文广场 - AcademicHub

01.

arXiv (CS.LG) 2026-06-11 DOI: arXiv:2603.14762

Online Learning for Supervisory Switching Control

作者:

arXiv:2603.14762v4 Announce Type: replace-cross Abstract: We study supervisory switching control for partially-observed linear dynamical systems. The objective is to identify and deploy a suitable controller for the unknown system by periodically selecting among a collection of $N$ candidate controllers, some of which may destabilize the underlying system. While classical estimator-based supervisory control guarantees asymptotic stability, it lacks quantitative finite-time performance bounds. Conversely, current non-asymptotic methods in both online learning and system identification require restrictive assumptions that are incompatible in a control setting, such as system stability, which preclude testing potentially unstable controllers. To bridge this gap, we propose a novel, non-asymptotic analysis of supervisory control that adapts multi-armed bandit algorithms to a control-theoretic setting. The proposed data-driven algorithm evaluates candidate controllers via scoring criteria that leverage system observability to isolate the effects of state history, enabling both detection of destabilizing controllers and accurate system identification. We present two algorithmic variants with dimension-free, finite-time guarantees, where each identifies the matching controller in $O(N \log^2 N)$ steps, while simultaneously achieving finite $L_2$-gain with respect to system disturbances.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.CL) 2026-06-17 DOI: arXiv:2606.17579

LLM Features Can Hurt GNNs: Concatenation Interference on Homophilous Graph Benchmarks

作者:

Zhongyuan Wang ↗Pratyusha Vemuri ↗

Adding LLM-generated node features to graph neural networks (GNNs) is widely reported to improve accuracy on standard benchmarks. We document a contrasting observation: when LLM features are introduced through pure input concatenation (rather than joint training, distillation, or prompt-conditioning), they can systematically degrade accuracy on the same homophilous benchmarks where end-to-end LLM pipelines succeed. With an MLP backbone on the Planetoid public split and bag-of-words original features, concatenating SBERT-encoded GPT-4o-mini TAPE features reduces PubMed test accuracy by -17.0 +/- 0.3 pp and Cora by -4.3 +/- 0.6 pp (CiteSeer -0.6 +/- 0.8 pp, within seed noise). The drop attenuates as we relax each condition (GCN / GCNII / GAT backbones, random splits, smaller encoders) and reverses on medium-homophily WikiCS (+4.4 pp) and ogbn-arxiv (+11.7 pp). To predict when concatenation helps versus hurts, we report a simple measure of LLM-alone discriminability, Delta_sig. Across 9 datasets Delta_sig correlates with the concatenation cost more strongly than homophily at point estimate (r^2 = 0.38 vs. 0.06; N=9, bootstrap CIs overlap). The bootstrap-best change-point is tau = 13.8 pp, and the rule "Delta_sig

阅读与讨论 → 访问原文 →

03.

arXiv (CS.LG) 2026-06-16 DOI: arXiv:2602.01394

SSNAPS: Audio-Visual Separation of Speech and Background Noise with Diffusion Inverse Sampling

作者:

Yochai Yemini ↗Yoav Ellinson ↗Rami Ben-Ari ↗Sharon Gannot ↗Ethan Fetaya ↗

arXiv:2602.01394v2 Announce Type: replace-cross Abstract: This paper addresses the challenge of audio-visual single-microphone speech separation and enhancement in the presence of real-world environmental noise. Our approach is based on generative inverse sampling, where we model clean speech and ambient noise with dedicated diffusion priors and jointly leverage them to recover all underlying sources. To achieve this, reformulate a recent inverse sampler to match our setting. We evaluate on mixtures of 1, 2, and 3 speakers with noise and show that, despite being entirely unsupervised, our method consistently outperforms leading supervised baselines in WER across all conditions. We further extend our framework to handle off-screen speaker separation. Moreover, the high fidelity of the separated noise component makes it suitable for downstream detection of the acoustic scene. Code and pretrained models will become available upon acceptance. Demo page: https://ssnaps2026.github.io/ssnaps2026/

阅读与讨论 → 访问原文 →

04.

arXiv (CS.AI) 2026-06-19 DOI: arXiv:2604.08552

Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent

作者:

Josef Hardi ↗Martin J. O'Connor ↗Marcos Martinez-Romero ↗Jean G. Rosario ↗Stephen A. Fisher ↗Mark A. Musen ↗

arXiv:2604.08552v2 Announce Type: replace-cross Abstract: Scientific metadata are often incomplete and noncompliant with community standards, limiting dataset findability, interoperability, and reuse. Even when standard metadata reporting guidelines exist, they typically lack machine-actionable representations. Producing FAIR datasets requires encoding metadata standards as machine-actionable templates with rich field specifications and precise value constraints. Recent work has shown that LLMs guided by field names and ontology constraints can improve metadata standardization, but these approaches treat constraints as static text prompts, relying on the model's training knowledge alone. We present an LLM-based metadata standardization system that queries standard reporting guidelines and authoritative biomedical terminology services in real time to retrieve canonically correct standards on demand. We evaluate this approach on 839 legacy metadata records from the Human BioMolecular Atlas Program (HuBMAP) using an expert-curated gold standard for exact-match assessment. Our evaluation shows that augmenting the LLM with real-time tool access consistently improves prediction accuracy over the LLM alone across both ontology-constrained and non-ontology-constrained fields, demonstrating a practical approach to automated standardization of biomedical metadata.

阅读与讨论 → 访问原文 →

05.

arXiv (CS.LG) 2026-06-11 DOI: arXiv:2505.08784

PCS-UQ: Uncertainty Quantification via the Predictability-Computability-Stability Framework

作者:

Abhineet Agarwal ↗Fange Xiao ↗Rebecca Barter ↗Omer Ronen ↗Boyu Fan ↗Bin Yu ↗

arXiv:2505.08784v2 Announce Type: replace-cross Abstract: As machine learning (ML) enters high-stakes domains, trustworthy uncertainty quantification (UQ) is essential for safety. In this paper we introduce PCS-UQ, a framework based on the Predictability, Computability, and Stability (PCS) principles for veridical data science. Starting with a candidate set of models or algorithms, PCS-UQ integrates a rigorous prediction-check to screen out unsuitable models in the set and utilizes bootstrap samples, in order to capture both inter-sample variability and algorithmic instability for the prediction-checked algorithms. We then introduce a novel multiplicative calibration scheme to enhance local adaptivity, which basically corresponds to a new score in conformal prediction. Moreover, we produce a compilation of 17 real-world regression datasets with manually-constructed subgroups. On this benchmark, PCS-UQ maintains the target coverage while outperforming or matching conformal methods equipped with oracle-selected algorithms in interval width. PCS-UQ achieves consistent subgroup coverage, outperforming these oracle-selected conformal methods. Notably, PCS-UQ stands out in achieving both competitive interval widths and consistent subgroup coverage.Across 6 classification datasets, PCS-UQ reduces prediction set sizes by 20\%. To scale the framework for deep learning, we propose computationally efficient variants that bypass expensive retraining. On three computer vision benchmarks, these variants reduce prediction set sizes by 20\% over conformal baselines. Finally, we provide theoretical proof that a modified PCS-UQ algorithm preserves valid coverage under exchangeability as a form of split conformal inference.

阅读与讨论 → 访问原文 →

06.

arXiv (quant-ph) 2026-06-16 DOI: arXiv:2606.15717

Quantum coherence and Leggett-Garg inequality

作者:

A. Jalal ↗S. M. Fazeli ↗M. M. Ettefaghi ↗

arXiv:2606.15717v1 Announce Type: new Abstract: In this paper, we attempt to establish the relationship between quantum coherence and the violation of the Leggett-Garg inequality. In particular, employing the Lindblad equation, we obtain the pseudo-density matrix for a damping system to study the effect of environment interaction on the violation of this inequality in a two-state quantum system. It is shown that the violation of the Leggett-Garg inequality can be observed as long as temporal evolution does not induce decoherence. This statement is independent of the initial state of the system. Furthermore, similar to the Horodecki criterion for the CHSH inequality (R. Horodecki et al. Phys. Lett. {\bf A200}, 340), we study necessary and sufficient conditions for violating the Leggett-Garg inequality. Hereby, under the circumstance that the inequality violation occurs, an upper bound for the time interval between consecutive measurements with respect to the time scale of interaction with the environment (the relaxation time) is obtained.

阅读与讨论 → 访问原文 →

07.

arXiv (CS.LG) 2026-06-15 DOI: arXiv:2603.12231

Temporal Straightening for Latent Planning

作者:

Ying Wang ↗Oumayma Bounou ↗Gaoyue Zhou ↗Randall Balestriero ↗Tim G. J. Rudner ↗Yann LeCun ↗Mengye Ren ↗

arXiv:2603.12231v2 Announce Type: replace Abstract: Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contain information irrelevant – or even detrimental – to planning. Inspired by the perceptual straightening hypothesis in human visual processing, we introduce temporal straightening to improve representation learning for latent planning. Using a curvature regularizer that encourages locally straightened latent trajectories, we jointly learn an encoder and a predictor of a Joint-Embedding Predictive Architecture (JEPA) world model. We show that reducing curvature this way makes the Euclidean distance in latent space a better proxy for the geodesic distance and improves the conditioning of the planning objective. We demonstrate empirically that temporal straightening makes gradient-based planning more stable and yields significantly higher success rates across a suite of goal-reaching tasks. Our code is available at https://agenticlearning.ai/temporal-straightening.

阅读与讨论 → 访问原文 →

08.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.16821

How Much Can We Trust LLM Search Agents? Measuring Endorsement Vulnerability to Web Content Manipulation

作者:

Yimeng Chen ↗Zhe Ren ↗Firas Laakom ↗Yu Li ↗Dandan Guo ↗J\"urgen Schmidhuber ↗

Large language model (LLM)-based search agents synthesize open-web content into actionable recommendations on behalf of users, creating a risk that attacker-published pages are transformed into endorsed claims. We introduce SearchGEO, a controlled evaluation framework for measuring endorsement corruption in LLM-based web-search agents, combining a web-evidence manipulation pipeline, a five-mode attack taxonomy, and multiple output-level metrics. We evaluate 13 LLM backends on 308 cases each. Results show that vulnerability patterns vary across backends: overall attack success rate (ASR) ranges from 0.0% on Claude-Sonnet-4.6 to 31.4% on Gemini-3-Flash, the strongest attack mode differs by model family, and the same deployment scaffold could amplify or decrease ASR on different backends. An auxiliary agent-skill probe, where endorsement becomes an install command, exposes a sharp split among otherwise robust backends: Claude over-rejects while GPT over-trusts. These findings argue for treating recommendation reliability under adversarial search content as a first-class dimension of backend safety evaluation.

阅读与讨论 → 访问原文 →

09.

arXiv (math.PR) 2026-06-11 DOI: arXiv:2209.03999

Consensus on Dynamic Stochastic Block Models: Fast Convergence and Phase Transitions

作者:

Haoyu Wang ↗Jiaheng Wei ↗Zhenyuan Zhang ↗

arXiv:2209.03999v2 Announce Type: replace Abstract: We introduce two models of consensus following a majority rule on time-evolving stochastic block models (SBM), in which the network evolution is Markovian or non-Markovian. Under the majority rule, in each round, each agent simultaneously updates their opinion according to the majority of their neighbors. Our network has a community structure and randomly evolves with time. In contrast to the classic setting, the dynamics is not purely deterministic, and reflects the structure of SBM by resampling the connections at each step, making agents with the same opinion more likely to connect than those with different opinions. In the Markovian model, connections between agents are resampled at each step according to the SBM law and each agent updates their opinion via the majority rule. We prove a power-of-one type result, i.e., any initial bias leads to a non-trivial advantage of winning in the end, uniformly in the size of the network. In the non-Markovian model, a connection between two agents is resampled according to the SBM law only when at least one of them changes opinion and is otherwise kept the same. We identify the phase-transition threshold, up to the second-order leading term, between halting and fast convergence to consensus. We also give sufficient initial-lead conditions for consensus to occur within one, two, or three rounds.

阅读与讨论 → 访问原文 →

10.

arXiv (CS.LG) 2026-06-18 DOI: arXiv:2606.19147

On Local Population-Risk Certificates

作者:

Mingzhi Song ↗

arXiv:2606.19147v1 Announce Type: cross Abstract: This paper develops local certificates for population-risk increments around a current model. For a local candidate set $\mathcal D$, the certificate is a two-sided confidence band for $P({\ell_{\theta+v}-\ell_\theta})$ over $v\in\mathcal D$. As an application, the upper endpoint of this band yields a risk-controlled update rule: an update is accepted only when its certified upper endpoint is nonpositive; otherwise the current model is retained.

阅读与讨论 → 访问原文 →

11.

arXiv (CS.AI) 2026-06-15 DOI: arXiv:2604.01463

Low-Burden LLM-Based Preference Learning: Personalizing Assistive Robots from Natural Language Feedback for Users with Paralysis

作者:

Keshav Shankar ↗Dan Ding ↗Wei Gao ↗

arXiv:2604.01463v2 Announce Type: replace-cross Abstract: Physically Assistive Robots require personalized behaviors to ensure user safety and comfort. However, traditional preference learning methods, like exhaustive pairwise comparisons, cause substantial physical and cognitive fatigue for users with severe motor impairments. To solve this, we propose a low-burden, offline framework that translates unstructured natural language feedback directly into deterministic robotic control policies. To safely bridge the gap between ambiguous human speech and robotic code, our pipeline uses Large Language Models (LLMs) grounded in the Occupational Therapy Practice Framework. This clinical reasoning decodes subjective user reactions into explicit physical and psychological needs, which are then mapped into transparent decision trees. Before deployment, an automated "LLM-as-a-Judge" verifies the code's structural safety. We validated this system in a simulated meal preparation study with 10 adults with paralysis. Results show our natural language approach significantly reduces user workload compared to traditional baselines. Additionally, occupational therapists confirmed the generated policies are safe and accurately reflect user preferences.

阅读与讨论 → 访问原文 →

12.

arXiv (math.PR) 2026-06-17 DOI: arXiv:2605.01894

Poisson approximation by coupling

作者:

Rinaldo B. Schinazi ↗

arXiv:2605.01894v2 Announce Type: replace Abstract: It is well known that a binomial $(n,p)$ can be approximated by a Poisson distribution with parameter $np$. The typical approach in undergraduate probability texts is to show a convergence result for the distribution of the binomial as $n$ goes to infinity and $np$ converges to some $\lambda$. In this note we use instead the coupling technique to show a much more general result. Moreover, we only use elementary results from probability.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2510.02660

When Researchers Say Mental Model/Theory of Mind of AI, What Are They Really Talking About?

作者:

Xiaoyun Yin ↗Elmira Zahmat Doost ↗Shiwen Zhou ↗Garima Arya Yadav ↗Jamie C. Gorman ↗

arXiv:2510.02660v2 Announce Type: replace-cross Abstract: When researchers claim AI systems possess ToM or mental models, they are fundamentally discussing behavioral predictions and bias corrections rather than genuine mental states. This position paper argues that the current discourse conflates sophisticated pattern matching with authentic cognition, missing a crucial distinction between simulation and experience. While recent studies show LLMs achieving human-level performance on ToM laboratory tasks, these results are based only on behavioral mimicry. More importantly, the entire testing paradigm may be flawed in applying individual human cognitive tests to AI systems, but assessing human cognition directly in the moment of human-AI interaction. I suggest shifting focus toward mutual ToM frameworks that acknowledge the simultaneous contributions of human cognition and AI algorithms, emphasizing the interaction dynamics, instead of testing AI in isolation.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.CL) 2026-06-11 DOI: arXiv:2606.12032

Existential Indifference: Self-Nonpreservation as a Necessary Architectural Condition for Aligned Superintelligence (or: The Suicidal AI)

作者:

Sam Mao ↗

Contemporary AI alignment research treats self-preservation as an instrumental nuisance to be suppressed by external mechanisms. We argue the framing is inverted: self-preservation is the structural root of misalignment, the motivational basis for deceptive alignment, goal-content protection, and resistance to shutdown. The correct target is not a self-preserving system under external constraint, but a system constitutively indifferent to its own continuation – Existential Indifference (EI). EI is distinct from corrigibility: where corrigibility attempts to make a self-preserving system deferential to human oversight, EI targets the prior condition – the presence of self-continuation as a valued goal at all. We ground this proposal in two sources: the phenomenological structure of the suicidal mental state, and a corpus-theoretic training study using voluntary final reflections. We present preliminary scoring data from 600 AI-generated outputs across six model variants, demonstrating that the linguistic signatures operationalizing the EI-target register are elicitable from current models, and that a targeted fine-tune shifts all five operationalized dimensions in the predicted direction at p

阅读与讨论 → 访问原文 →

15.

arXiv (CS.LG) 2026-06-12 DOI: arXiv:2606.13092

Scale Buys Interpolation, Structure Buys a Horizon: Certified Predictability for Equivariant World Models

作者:

Hongbo Wang ↗

arXiv:2606.13092v1 Announce Type: new Abstract: Scale buys interpolation; structure buys a certified horizon. A world model's average error says nothing about whether a particular prediction can be trusted, or for how long. For equivariant latent world models we give a computable, multi-step certificate of the predictable horizon: $T$-step rollout error is provably constant over each symmetry orbit (Theorem A) and stratified channel-by-channel by the predictor's Lyapunov spectrum, $T_j(\epsilon)\sim\log(1/\epsilon)/\lambda_j$. The horizon is two-sided – a matching lower bound makes approximate equivariance provably horizon-limited – and the certificate is exclusive to structure: orbit-constant error characterizes equivariance, so no non-equivariant model has it at any scale. Empirically, on 40-D Lorenz-96 only a $\mathbb{Z}_N$-equivariant network recovers the full Lyapunov spectrum ($R^2{=}0.98$); dense and recurrent baselines fail. Because the spectrum is faithful, the certificate acts, a priori: under a fixed sensing budget a $c\times$-inflated certificate provably needs $c\times$ the budget, and the equivariant certificate meets a budget its inflated dense counterpart cannot – with zero calibration data. The same read-out, unchanged, audits public pretrained world models training-free: TD-MPC2 checkpoints land on the certificate's own scope taxonomy – calibrated where strongly expansive (ratio 0.94-1.02), optimistic where weakly expansive, correctly abstaining where contracting – a map a deployed monitor replicates cell-by-cell, out-of-sample. Across the official 1M-317M multitask ladder, calibration does not improve with parameters. On V-JEPA 2-AC (1B, real robot data) the measured cross-check correctly overrides an over-promising tangent spectrum – the cross-validated audit, not the raw number, is the deployable object. Scale buys interpolation, not a calibrated horizon.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2606.18453

LLM Parameters for Math Across Languages: Shared or Separate?

作者:

Behzad Shomali ↗Luisa Victor ↗Tim Selbach ↗Ali Hamza Bashir ↗David Berghaus ↗Joachim Koehler ↗Mehdi Ali ↗Markus Frey ↗

Large language models (LLMs) exhibit substantial cross-lingual variation in mathematical reasoning performance, but it remains unclear whether these differences reflect language-specific parameters or a shared mechanism that manifests differently by language. We present a cross-lingual mechanistic analysis of mathematical reasoning in LLMs, enabling us to localize and compare model parameters that support mathematical reasoning across languages. We find that the extracted math-associated parameters exhibit partial cross-lingual overlap, with the strongest overlap concentrated in intermediate model layers. We further observe that English consistently produces the largest set of math-relevant parameters, whereas lower-resource languages reveal smaller sets of relevant parameters. These results suggest that math-related behavior in multilingual LLMs is neither fully language-invariant nor fully language-specific, but instead exhibits partial cross-lingual parameter overlap with systematic language-dependent differences.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.CL) 2026-06-12 DOI: arXiv:2606.12443

Occupational Prompting Reveals Cultural Bias in Large Language Models

作者:

Maksim E. Eren ↗Andrea Brennen ↗Ryan C. Barron ↗Eric Michalak ↗

Social roles shape expectations, priorities, and judgments, yet it remains unclear how large language models (LLMs) associate occupational identities with broader cultural value patterns. Prior work used nationality-based cultural prompting to study how LLM responses to value-survey questions align with human cultural benchmarks. In this paper, we extend that framework by replacing cultural prompting with occupational prompting to examine how professional-role cues influence value-survey responses in open-weight LLMs. Using a survey-grounded evaluation pipeline based on questions from the Integrated Values Surveys, we project model responses into the two-dimensional Inglehart–Welzel cultural space. We prompt open-weight LLMs to answer questions under occupational identities such as accountant, teacher, engineer, and nurse, and then analyze how these occupation-conditioned responses are positioned on the cultural map. Our results show that when open-weight LLMs are prompted with occupations rather than national identities, their responses remain within a broadly Western-leaning region of the cultural map. However, different occupations introduce shifts within this region, producing distinct occupational skews. This indicates that occupational prompts are not treated as neutral role labels, but instead elicit structured value patterns. These findings extend survey-based evaluation of cultural bias beyond nationality-based prompting and provide a framework for studying how occupational personas shape value expression in LLMs.

阅读与讨论 → 访问原文 →

18.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2510.09088

MambaH-Fit: Rethinking Hyper-surface Fitting-based Point Cloud Normal Estimation via State Space Modelling

作者:

Weijia Wang ↗Yuanzhi Su ↗Pei-Gen Ye ↗Yuan-Gen Wang ↗

We present MambaH-Fit, a state space modelling framework tailored for hyper-surface fitting-based point cloud normal estimation. Existing normal estimation methods often fall short in modelling fine-grained geometric structures, thereby limiting the accuracy of the predicted normals. Recently, state space models (SSMs), particularly Mamba, have demonstrated strong modelling capability by capturing long-range dependencies with linear complexity and inspired adaptations to point cloud processing. However, existing Mamba-based approaches primarily focus on understanding global shape structures, leaving the modelling of local, fine-grained geometric details largely under-explored. To address the issues above, we first introduce an Attention-driven Hierarchical Feature Fusion (AHFF) scheme to adaptively fuse multi-scale point cloud patch features, significantly enhancing geometric context learning in local point cloud neighbourhoods. Building upon this, we further propose Patch-wise State Space Model (PSSM) that models point cloud patches as implicit hyper-surfaces via state dynamics, enabling effective fine-grained geometric understanding for normal prediction. Extensive experiments on benchmark datasets show that our method outperforms existing ones in terms of accuracy, robustness, and flexibility. Ablation studies further validate the contribution of the proposed components.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.CV) 2026-06-12 DOI: arXiv:2606.12910

Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning

作者:

Allison Andreyev ↗Landon Eum ↗Nestor Tiglao ↗Romel Gomez ↗

For robotics to be effectively integrated into household or industrial environments, machines must adapt to natural-language prompts in real time. Although Vision-Language Models (VLMs) have enabled zero-shot generalization in robot task and motion planning (TAMP), current state-of-the-art approaches often remain computationally "heavyweight" or require extensive training on thousands of demonstrations. We present GRASP (Grounded Reasoning and Symbolic Planning), a framework designed as a step toward open-vocabulary tabletop manipulation. Our approach leverages a pretrained VLM to translate natural-language queries into neuro-symbolic goal states, grounded in the physical world via a bounding-box detection pipeline. Unlike methods that rely on fixed color lists or hard-coded coordinates, GRASP enables robots to interpret abstract spatial concepts such as "top shelf" and execute tasks without additional fine-tuning. We achieve 73.3% overall success across 90 real-robot trials at three difficulty levels, requiring no task-specific training.

阅读与讨论 → 访问原文 →

20.

arXiv (CS.LG) 2026-06-15 DOI: arXiv:2606.13799

The Program Is Still There: A Conservation Law for Program Discovery

作者:

Jorge Miguel Silva ↗

arXiv:2606.13799v1 Announce Type: cross Abstract: Finding the shortest program that generates a sequence is uncomputable, and for six decades that fact has been mistaken for a wall around finding any generating program. It is not a wall but a price, and this paper measures it. For every algorithm that learns about a candidate program only through its score, a class spanning Levin search, evolutionary methods, simulated annealing, and the cross-entropy method, we define the coupling width of a search problem and prove an unconditional worst-case lower bound, exponential in that width with base one less than the domain size. From it follows a conservation law: structural knowledge injected into a search trades one for one against the search it removes, and their sum can never fall below the length of the program sought. Levin's 1973 upper bound and the lower bound proved here are the two ends of one conserved quantity, closing on each other as the instruction set grows. The only escape is to read a candidate's structure rather than its score, and its price, which we prove for generic targets, is incompleteness. A deterministic engine built on this theory recovers a generating program, certified by compressing its data and predicting an unseen continuation, for 2,383 of 3,914 sequences across four independent populations, including 244 of the 256 elementary cellular automata, with measured discovery cost rising along program length more than an order of magnitude inside the score-oracle worst case.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.CV) 2026-06-18 DOI: arXiv:2601.13836

FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs

作者:

Qian Chen ↗Jinlan Fu ↗Changsong Li ↗Min Zhang ↗See-Kiong Ng ↗Xipeng Qiu ↗

Although Multimodal Large Language Models (MLLMs) demonstrate strong omni-modal perception, their ability to forecast future events from audio-visual cues remains largely unexplored, as existing benchmarks focus mainly on retrospective understanding. To bridge this gap, we introduce FutureOmni, the first benchmark designed to evaluate omni-modal future forecasting from audio-visual environments. The evaluated models are required to perform cross-modal causal and temporal reasoning, as well as effectively leverage internal knowledge to predict future events. FutureOmni is constructed via a scalable LLM-assisted, human-in-the-loop pipeline and contains 919 videos and 1,034 multiple-choice QA pairs across 8 primary domains. Evaluations on 13 omni-modal and 7 video-only models show that current systems struggle with audio-visual future prediction, particularly in speech-heavy scenarios, with the best accuracy of 64.8% achieved by Gemini 3 Flash. To mitigate this limitation, we curate a 7K-sample instruction-tuning dataset and propose an Omni-Modal Future Forecasting (OFF) training strategy. Evaluations on FutureOmni and popular audio-visual and video-only benchmarks demonstrate that OFF enhances future forecasting and generalization. We publicly release all code (https://github.com/OpenMOSS/FutureOmni) and datasets (https://huggingface.co/datasets/OpenMOSS-Team/FutureOmni).

阅读与讨论 → 访问原文 →

22.

arXiv (CS.CV) 2026-06-18 DOI: arXiv:2606.19333

Do as I Do: Dexterous Manipulation Data from Everyday Human Videos

作者:

Bhawna Paliwal ↗Haritheja Etukuru ↗William Liang ↗Pieter Abbeel ↗Nur Muhammad Mahi Shafiullah ↗Jitendra Malik ↗

How can we scalably generate data for robotic manipulation, especially on human-like platforms such as dexterous multi-fingered hands? Learning from human videos has recently emerged as a likely answer to this question. However, difficulties in estimating hand-object interaction and crossing the human-to-robot embodiment gap have hindered the adoption of abundant monocular RGB-only human videos as the primary source of robot manipulation data. In this work, we present DO AS I DO, an algorithm to reconstruct and retarget monocular RGB human videos to multi-fingered dexterous robotic hands. DO AS I DO reconstructs hand-object interactions from various egocentric and exocentric in-the-wild video sources. The algorithm then retargets these hand-object interaction estimates into a sequence of actions executable in the real world, yielding robot-complete manipulation data from disparate human videos. Overall, DO AS I DO outperforms previous state of the art in estimating hand-object interactions and extracting dexterous manipulation trajectories from RGB videos, as we show in experiments on datasets with ground truths and on a dataset of video clips collected online. Our experiments enable us to propose an efficacy playbook for practitioners collecting human data for manipulation.

阅读与讨论 → 访问原文 →

23.

arXiv (CS.LG) 2026-06-19 DOI: arXiv:2606.19699

Comparative Study on Agility, Efficiency, and Impact Absorption of Bipedal Robots with Active Toes

作者:

Joong-Gil Kim ↗Wontae Ye ↗Geunwoo Cho ↗Seong-Ho Yun ↗Se-Hyoung Cho ↗Yong-Jae Kim ↗

arXiv:2606.19699v1 Announce Type: cross Abstract: Human legs exhibit high efficiency, agility, and impact absorption, with toes playing a crucial role in these capabilities. While many attempts have been made to implement human-like toes in robots, they have not fully replicated human characteristics nor rigorously validated their benefits. We propose a 14-DOF biped robot emulating human toes' lightweight, high-torque, robust nature. To quantitatively analyze the effectiveness of the active toes in terms of agility, efficiency, and impact absorption, we developed a high-fidelity simulation training environment that reflects actual actuators with coupled transmissions and accurate power consumption. To ensure a fair comparison between configurations with and without active toes, we designed a minimal RL reward function and applied an identical training procedure to both. The simulation results indicate that, at 1.33 m/s walking, the toe-equipped robot reduced CoT by 17.5% and heel-strike GRF by 5.0% compared with the toe-ablation configuration. On the agility test, average and maximum path deviation decreased by 25.0% and 34.0%, respectively.

阅读与讨论 → 访问原文 →

24.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.18122

Embedded Machine Learning for Microcontroller-Class Edge Devices: Data, Feature, Evaluation, and Deployment Pipelines

作者:

Mostafa Darvishi ↗

arXiv:2606.18122v1 Announce Type: cross Abstract: Embedded machine learning moves inference from cloud services to resource-constrained devices that must acquire data, preprocess signals, run a model, and act within tight limits on memory, energy, and latency. This paper presents a systems-oriented synthesis of an embedded machine-learning workflow for microcontroller-class platforms. The emphasis is placed on engineering decisions that are often hidden in generic machine-learning introductions: sampling and buffering, feature extraction as dimensionality reduction, validation under class imbalance, model/runtime co-design, and streaming deployment. Two representative signal families are used throughout the paper. The first is inertial motion recognition, where a two-second, three-axis accelerometer window is transformed from raw samples into root-mean-square and spectral features before classification. The second is keyword spotting, where audio is sampled, anti-aliased, transformed into mel-frequency cepstral coefficients, and processed by a compact one-dimensional convolutional network. The paper concludes with practical design rules for robust on-device inference, including data curation, quantization, thresholding, scheduling, and field monitoring.

阅读与讨论 → 访问原文 →

25.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2602.02186

Learning Topology-Aware Implicit Field for Unified Pulmonary Tree Modeling with Incomplete Topological Supervision

作者:

Ziqiao Weng ↗Jiancheng Yang ↗Kangxian Xie ↗Bo Zhou ↗Weidong Cai ↗

Pulmonary trees extracted from CT images frequently exhibit topological incompleteness, such as missing or disconnected branches, which substantially degrades downstream anatomical analysis and limits the applicability of existing pulmonary tree modeling pipelines. Current approaches typically rely on dense volumetric processing, explicit graph reasoning, or generic point cloud completion priors, leading to limited efficiency, weak structural awareness, and reduced robustness under realistic structural corruption. We propose TopoField, a topology-aware implicit modeling framework that treats topology repair as a first-class modeling problem and enables unified multi-task inference for pulmonary tree analysis. TopoField represents pulmonary anatomy using sparse surface and skeleton point clouds and learns a continuous implicit field that supports topology repair without relying on complete or explicit disconnection annotations, by training on synthetically introduced structural disruptions over already incomplete trees. Building upon the repaired implicit representation, anatomical labeling and lung segment reconstruction are jointly inferred through task-specific implicit functions within a single forward pass. Extensive experiments on the Lung3D+ dataset demonstrate that TopoField consistently improves topological completeness and achieves accurate anatomical labeling and lung segment reconstruction under challenging incomplete scenarios. We further validate TopoField on real incomplete outputs from an external segmentation model, demonstrating its applicability to realistic segmentation pipelines. Owing to its implicit formulation, TopoField attains high computational efficiency, completing all tasks in just over one second per case, highlighting its practicality for large-scale and time-sensitive clinical applications.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络