Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

02.
arXiv (CS.CL) 2026-06-11

Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

Existing LLM moral benchmarks usually ask which isolated moral act, value, or foundation a model prefers. This is useful but incomplete. Realistic judgments often require a model to combine several moral signals within the same option. We introduce **Moral Trolley Arena**, a two-stage blind ELO benchmark for measuring how LLMs compose moral evidence. The single-scene arena first calibrates individual moral acts from a 229-scenario corpus across five Moral Foundations Theory foundations; the composite arena then combines calibrated acts into two-act moral items over a controlled intensity grid and measures the resulting composite preferences. Across ten frontier models, composite judgments are largely predicted by component act strength, but the relation is consistently compressed rather than simply additive. Models also show non-additive intensity anchoring, bounded foundation-specific residuals after component control, and highly convergent composite preference surfaces across providers. These results suggest that moral audits should measure composition rules for moral evidence, not only rankings over isolated acts.

03.
arXiv (quant-ph) 2026-06-12

Toward Entanglement Bootstrap for Conformal Field Theory in Any Dimension

arXiv:2606.12540v1 Announce Type: cross Abstract: Given a quantum critical wavefunction in any dimension, we propose a reconstructed Hamiltonian, analogous to the ones previously found for 1+1d CFT and for 2+1d bosonic liquid topologically-ordered states. We test numerically that, for known regularized approximate CFT groundstates (on the icosahedron and the fuzzy sphere), (1) they are close to the groundstate of their reconstructed Hamiltonian, and (2) the spectrum of their reconstructed Hamiltonian on the unit sphere has CFT properties (integer spacing of descendants) and matches known low-lying energies. We show that this provides an automated method to improve the finite-size effects in a fixed Hilbert space.

04.
arXiv (CS.AI) 2026-06-12

Hellinger Multimodal Variational Autoencoders

arXiv:2601.06572v4 Announce Type: replace-cross Abstract: Multimodal variational autoencoders (VAEs) are widely used for weakly supervised generative learning with multiple modalities. Predominant methods aggregate unimodal inference distributions using either a product of experts (PoE), a mixture of experts (MoE), or their combinations to approximate the joint posterior. In this work, we revisit multimodal inference through the lens of probabilistic opinion pooling, an optimization-based approach. We start from Hölder pooling with $\alpha=0.5$, which corresponds to the unique symmetric member of the $\alpha-divergence$ family, and derive a moment-matching approximation, termed Hellinger. We then leverage such an approximation to propose HELVAE, a multimodal VAE that avoids sub-sampling, yielding an efficient yet effective model that: (i) learns more expressive latent representations as additional modalities are observed; and (ii) empirically achieves better trade-offs between generative coherence and quality, outperforming state-of-the-art multimodal VAE models.

05.
arXiv (CS.AI) 2026-06-16

Parallelizing Tool Execution and LLM Generation for Low-Latency Agent Serving

arXiv:2603.18897v2 Announce Type: replace-cross Abstract: LLM-powered agents execute tasks through a sequential loop of model generation and tool execution. Today's serving systems serialize this loop, leaving tool latency exposed on the task critical path. This paper presents PASTE, a tool-aware agent-serving system that predicts concrete future tool invocations from recurring agent patterns and executes them speculatively while the LLM is still generating. PASTE isolates speculative results until confirmed by the LLM and jointly schedules tool execution and returning LLM sessions to avoid shifting bottlenecks to the GPU. Across deep research, coding, and scientific-agent workloads, PASTE reduces average task completion time by 43.5% and lowers observed tool latency by 1.8x.

06.
arXiv (quant-ph) 2026-06-17

Efficient time-series prediction on NISQ devices via time-delayed quantum extreme learning machine

arXiv:2602.21544v2 Announce Type: replace Abstract: We proposed a time-delayed quantum extreme learning machine (TD-QELM) for efficient time-series prediction on noisy intermediate-scale quantum (NISQ) devices. By encoding multiple past inputs simultaneously, TD-QELM achieves shallow circuit depth independent of sequence length, thereby, mitigating noise accumulation and reducing computational complexity. Experiments using the NARMA benchmark on both noiseless simulations and IBM's 127-qubit processor demonstrate that TD-QELM consistently outperforms conventional quantum reservoir computing in prediction accuracy and noise robustness. These results highlight TD-QELM as a practical and scalable framework for time-series learning on current NISQ hardware.

07.
arXiv (CS.CL) 2026-06-16

Multimodal Evaluator Preference Collapse: Cross-Modal Contagion in Self-Evolving Agents

作者:

When AI agents use language models to evaluate their own outputs in a feedback loop, systematic biases emerge. We show that Evaluator Preference Collapse (EPC) is dramatically amplified in multimodal settings. Using GPT-4o to evaluate DeepSeek-chat across text and visual tasks, we find that a single strategy (step_by_step) absorbs 48.4% of all weight – 3.2x the collapse observed in text-only self-evaluation – while three visual-domain strategies receive only 9.1% combined weight. We then demonstrate a novel phenomenon we term cross-modal contagion: evaluator preferences acquired on one modality transfer to and corrupt strategy selection on another. Through a four-phase isolation training paradigm, we measure contagion coefficients and document strategy inversion – the optimal strategy for a modality reverses after cross-modal exposure. A Phase 3 statistical validation across four evaluator configurations (N=53 total independent repetitions, 15,592 API calls) reveals a clear hierarchy: cross-model evaluation (GPT-4o, N=8) produces strong but symmetric bidirectional contagion (mean gamma_{T->V}=1.176, gamma_{V->T}=1.089, Delta=-0.088, p=0.575, Cohen's d=0.29); high round counts (DashScope, 50 rounds) cause collapse to single-strategy dominance (70% zero contagion); and self-evaluation provides near-complete immunity – 97% of runs (N=30, DeepSeek-chat) yield exactly zero contagion (mean gamma=0.033, 95% CI [-0.031, 0.010], p=0.642, d=0.07). No evaluator condition shows statistically significant directional asymmetry. We introduce the contagion matrix indexed by evaluator identity, release the MM-EPC experimental framework, and identify cross-model evaluator architecture as the primary risk factor for preference contagion.

09.
arXiv (CS.AI) 2026-06-11

Search Discipline for Long-Horizon Research Agents

arXiv:2606.11522v1 Announce Type: new Abstract: Autoresearch agents now propose, evaluate, and select scientific candidates against a metric, and that metric is usually an aggregate reduced over a heterogeneous space of regions, slices, or cohorts. We show that when scientific validity lives in that disaggregated structure, the aggregate can rank the wrong candidate first. The headline number improves while the structure underneath inverts, so a decision made on the number accepts a candidate that quietly breaks the model. The failure is not domain-specific. It appears wherever a candidate's validity is multi-dimensional but its verifier is a single reduction. We demonstrate the inversion on a fire-model task in the Ecosystem Demography model. The highest-scoring candidate and a slightly lower one are within noise of each other on global score, yet the top-scoring one collapses the protected boreal regions while the other preserves them. What separates them is the per-region behavior, not the headline number. This decision should not be left to the agent that produced the candidates. The agent optimizing the score is the last party likely to catch the score being wrong, and a prompt has no remaining turn once the agent has stopped. We move the decision to an external control loop that audits each candidate on its disaggregated behavior and acts after the agent has decided. It can demote a candidate the agent would have accepted, and it can reopen a run the agent had declared finished. Our contribution is the inversion finding itself, and a search-discipline protocol that decides on reviewable candidate-effect evidence instead of the score.

10.
arXiv (CS.CV) 2026-06-15

VideoWeave: Unlocking Geometric Consistency in Video Generation via Joint Geometry-Video Modeling

Large-scale video diffusion models often fail to preserve 3D structure over time, causing geometric drift and implausible motion under viewpoint changes. Existing methods usually enforce geometric consistency by using explicit geometry reconstructions, such as depth maps, point clouds, or reconstructed 3D structures, to define conditions, supervision, or reward signals, making the generator sensitive to errors from upstream geometry pipelines. We propose VideoWeave, a latent-space post-training framework that uses implicit geometry-model features to constrain the generative distribution, providing a more flexible and non-rigid form of guidance that mitigates the impact of reconstruction errors from geometry models. Specifically, VideoWeave adapts these features into geometry latents and jointly models them with video latents in a shared denoising space, allowing geometry to shape the generative distribution during training. To support this process, we build GeoVid-80K, an 80K-video dataset with paired appearance and geometry representations. Experiments on text-to-video and image-to-video generation show that VideoWeave improves geometric coherence while preserving strong visual quality. VideoWeave project page at https://videoweave.github.io/

11.
arXiv (CS.AI) 2026-06-15

An Agentic Retrieval Framework for Autonomous Context-Aware Data Quality Assessment

arXiv:2606.13692v1 Announce Type: cross Abstract: Data quality assessment is a critical prerequisite for effective data analytics and data-driven decision-making, yet it remains a challenging task due to the inherently context-dependent nature of data quality. Existing approaches often rely on static rules or manual assessment strategies, limiting their adaptability to diverse usage scenarios and constraining automation at scale. Recent advances in artificial intelligence, particularly large language models, offer new opportunities for automating data quality assessment, but raise concerns related to reliability, grounding, and execution safety. In this paper, we propose a unified agentic-retrieval framework for autonomous context-aware data quality assessment. The framework interprets natural-language descriptions of intended data usage, derives context-aware assessment strategies, and generates executable validation logic through a multi-agent workflow. To ensure operational reliability, the framework introduces a feasibility validation stage that evaluates the realism and executability of generated assessment specifications before execution, enabling iterative refinement when necessary. Accepted validation logic is executed deterministically to guarantee reproducible and auditable results. We implement the proposed framework as an end-to-end prototype and evaluate it across multiple usage scenarios applied to the same dataset. The results demonstrate that assessment outcomes adapt meaningfully to different intended uses, while feasibility-gated execution reduces unrealistic or non-executable rule generation. The proposed approach provides a practical foundation for deploying autonomous yet controlled data quality assessment in modern data-driven environments.

12.
Nature (Science) 2026-06-23

Europe must seize the moment to lead on free and open science

作者: 未知作者

An under-appreciated research powerhouse, Europe has a responsibility to champion democratic science that is accessible to all the world’s research talent. An under-appreciated research powerhouse, Europe has a responsibility to champion democratic science that is accessible to all the world’s research talent.

13.
arXiv (CS.CL) 2026-06-16

Fast When, Careful Who: Dual-Process Multiparty Turn-Taking with Diffusion Augmentation

Reliable turn-taking is essential for spoken dialogue systems. However, most existing methods are designed for two-speaker interaction and struggle with realistic multiparty audio containing overlap and rapid speaker changes. We study multiparty turn-taking on the VoxConverse dataset and propose an audio-only two-stage pipeline that separates when to trigger a turn boundary from whether the floor is actually transferring. A fast trigger scans the audio and proposes candidate end-of-turn times, while a lightweight verifier runs only at those times to decide \textsc{Hold} or \textsc{Shift} and support next-speaker prediction. We report results in the full multiparty setting and a controlled dyadic top-2 projection for comparability. We also investigate diffusion-based, label-preserving background-audio mixing as a data augmentation strategy. Results show improved shift detection over a baseline, with further improvements from diffusion augmentation.

14.
arXiv (CS.AI) 2026-06-18

A Reproducible Log-Driven AutoML Framework for Interpretable Pipeline Optimization in Healthcare Risk Prediction

arXiv:2605.21528v2 Announce Type: replace-cross Abstract: Accurate disease risk prediction is challenged by heterogeneous features, limited data, and class imbalance. This study presents yvsoucom-iterkit, a deterministic AutoML framework that models pipeline optimization as a configuration-level system with full reproducibility and traceable execution logs, enabling systematic analysis of component attribution, interactions, similarity, and cross-seed robustness. Experiments on the Pima Indians Diabetes and Stroke datasets across more than 18,000 pipeline configurations reveal a structured yet partially redundant search space, where performance is dominated by a small subset of interacting components. Ensemble models achieve stable performance, reaching a Weighted-F1 of 0.89 on Pima and 0.94 on Stroke. Macro-F1 reaches approximately 0.88 on Pima but drops to 0.6560 on Stroke due to severe imbalance. Cross-seed experiments show that ensembles reduce variance compared to single models. Friedman testing ($p < 0.05$) confirms significant ranking differences across configurations. Based on analysis of component attribution, interaction, and similarity, optimal configuration design reveals dataset-dependent behavior. For the Pima dataset, computational efficiency benefits from simplified search spaces where redundant components can be removed, with split ratio playing a key role. In contrast, the Stroke dataset requires enhanced imbalance-aware strategies, where RandomOverSampler improves Macro-F1 from 0.6560 to 0.6766. These findings demonstrate that effective AutoML optimization is achieved through optimal configuration design, where carefully constraining the search space to high-impact components can improve performance, stability, and interpretability while reducing unnecessary search complexity.

15.
arXiv (CS.AI) 2026-06-16

HoloRec: Holistic Encoding and Interleaved Reasoning for Generative Recommendation

arXiv:2606.15331v1 Announce Type: cross Abstract: Generative recommendation models that formulate the task as sequence generation overcome the objective fragmentation problem of traditional cascade architectures, yet existing approaches still suffer from flat semantic representations lacking hierarchical structure for multi-step reasoning and an externally constructed chain-of-thought (CoT) that requires expensive annotations and remains disconnected from the generation objective. We propose HoloRec, an endogenous chain-of-thought recommendation mechanism that unifies representation, reasoning, and generation by constructing a hierarchical semantic encoding matrix via multi-granularity nested residual quantization optimized by a holistic reconstruction loss. HoloRec supports two inference modes: a non-thinking mode that uses lightweight multi-granularity supervised alignment for fast prediction, and a thinking mode that employs an interleaved reasoning scheme to generate CoT steps on the fly, directly embedding reasoning into the generation process without external data. Experiments on multiple public recommendation datasets demonstrate that HoloRec consistently outperforms baselines, with especially significant gains in sparse scenarios, and the thinking mode achieves better accuracy than the non-thinking mode with only modest inference overhead.

16.
medRxiv (Medicine) 2026-06-18

Cost analysis of overseas versus domestic vaccination of US-bound refugees

Context: To ensure healthy resettlement and protect US health security, the Vaccination Program for US-bound Refugees (VPR) offers some recommended vaccines to refugees overseas before resettlement to the United States. The selected vaccines and number of doses vary by country of departure. VPR was found to be cost-saving in 2018 but had since expanded to more sites. Objective: Assess VPR's current costs and impact on post-arrival domestic vaccination needs and costs. Setting and Participants: A model-based analysis of the Federal government costs for VPR and post-arrival (US) vaccination of resettled refugees separated across five regions: Africa, Asia, the Middle East and North Africa/Republic of Turkiye and Middle East, Europe, and the Americas using fiscal year 2024 data. Design: We quantified and compared full vaccination costs for refugees under two scenarios: (1) 'No VPR' and (2) 'VPR'. Refugees would receive no vaccines overseas and be fully vaccinated after US arrival under 'No VPR'. Under 'VPR', refugees receive one or two doses of selected vaccines overseas before completing vaccination schedules after arrival. Main Outcomes: Costs were reported in 2023 US dollars for 'VPR' and 'No VPR' scenarios and further subdivided by grouping countries/sites depending on whether the International Organization for Migration (IOM) provides vaccination services for refugees (IOM sites) versus non-IOM providers (non-IOM sites). Results: 'VPR' resulted in average net cost savings of $147 per person or $14.7 million per 100,000-refugee cohort compared to providing all vaccines after US arrival ('No VPR'). 'VPR' was cost-saving across most regions, except for IOM sites in Europe, where a net cost of $44 per person was observed. Net cost savings per person were highest for IOM sites in Africa ($333). Conclusions: VPR remains a cost-saving strategy, while protecting US-bound refugees' health and US health security by preventing disease outbreaks during resettlement.

17.
arXiv (quant-ph) 2026-06-11

Power-law-graded Ising Interactions Stabilize Time Crystals Realizing Quantum Energy Storage and Sensing

arXiv:2508.14847v3 Announce Type: replace Abstract: We study discrete time-crystalline (DTC) phases in one-dimensional spin-1/2 chains with power-law-graded Ising interactions under periodic Floquet driving. By generalizing Stark localization to power-law-graded Ising interaction profiles, we identify robust period-doubled dynamics across a wide range of interaction exponents, stabilized by the interplay between coherent driving and spatially varying coupling. Within the DTC phase, the energy stored in the system, interpreted as a quantum battery, increases superlinearly with system size, although no scaling advantage persists in normalized power. Beyond energy storage, we demonstrate that the DTC phase supports enhanced quantum sensing. The quantum Fisher information associated with estimating timing deviations in the drive scales superextensively with system size, surpassing the Heisenberg limit. The degree of quantum advantage can be tuned by varying the interaction exponent, though DTC behavior remains robust throughout. Our results position power-law-graded Ising interacting Floquet systems as robust platforms for storing quantum energy and achieving metrological enhancement.

18.
bioRxiv (Bioinfo) 2026-06-12

Computational Design of Optimal Sequences for Targeted Hypermutagenesis Using Recombination-Coupled Diversity-Generating Retroelements

Diversity-generating retroelements (DGRs) are natural systems that accelerate evolution via targeted hypermutation at adenines. We previously developed DGRec, a system combining DGRs and recombineering for programmable mutagenesis in Escherichia coli. We here address two important issues with DGRec: the dependence of mutagenesis efficiency on the dgrRNA secondary structure and the variability of the reverse-transcription biases with sequence context and position. First, we introduce and validate a method to recode non-functional templates, i.e. with low mutagenesis efficiency, into highly functional ones through synonymous mutations. Second, we develop a Long Short-Term Memory (LSTM) model to predict DGRec mutational profiles for any given template sequence. By integrating this LSTM model with our recoding method, we establish a comprehensive workflow for customized directed evolution, enabling researchers to precisely fine-tune DGRec in vivo mutagenesis to their engineering needs.

19.
arXiv (CS.CV) 2026-06-16

Uncertainty Quality of VGGT: An Analysis on the DTU Benchmark Dataset

Visual Geometry Grounded Transformer (VGGT) has already attracted a great deal of attention in a short period of time, not least due to the Best Paper Award at CVPR-2025. Similar to DUSt3R and MASt3R, VGGT aims to bring about a paradigm shift by replacing established methods like bundle adjustment and feature matching with a simple, unified, feed-forward neural network that predicts camera poses, depth maps, and dense 3D structure directly from multiple images of a scene in a few seconds. A key aspect is its ability to process an arbitrary number of views consistently in a single forward pass without any post-processing or iterative optimization. For photogrammetry, this opens new possibilities for real-time, scalable, and accessible 3D reconstruction. In this context, not only high reconstruction accuracy but also high-quality uncertainty estimates are crucial, as they foster trust and enable robust quality assurance. This paper therefore investigates the quality of VGGT's uncertainty predictions. The analysis identifies an effective confidence threshold for filtering VGGT's raw output and demonstrates that enhancing uncertainty quality holds strong potential for improving the accuracy of its 3D reconstructions.

20.
arXiv (CS.CL) 2026-06-17

Learning task-specific subspaces via interventional post-training of speech foundation models

Speech foundation models, pre-trained on large corpora of unlabelled speech data, produce general-purpose representations which are useful across tasks. However, these representations encode information about salient speech variables in a distributed manner, while downstream speech tasks rely on only some of this variability. In this work, we propose a post-training refinement approach using interventional contrastive learning. By leveraging an interventional dataset and multi-part contrastive loss, we learn a transformation from the entangled representation space of speech foundation models into separate content and speaker subspaces. We evaluate the learnt representations on speaker verification and keyword spotting tasks, showing improved out-of-domain speaker verification performance and evidence that speaker and content information are separated across the learned subspaces.

21.
arXiv (CS.CL) 2026-06-12

Does AI Reviewer See the Full Picture? Attacking and Defending Multimodal Peer Review

The integration of Large Language Models (LLMs) and Multimodal LLMs (MLLMs) into scientific peer-review workflows introduces novel and significant risks for adversarial manipulation, especially given the multimodal nature of scientific papers where figures, not just text, convey core evidence. This creates a significant gap: current robustness studies on AI peer-review are overwhelmingly text-only. Moreover, the problem is distinct from standard jailbreaking, as a peer-review attack seeks to induce a domain-specific, targeted failure (e.g., "inflate this score") rather than a general safety policy violation, for which no practical defenses exist. To address this, we introduce PaperGuard, the first comprehensive benchmark designed to systematically evaluate and defend AI-generated peer-review against these domain-specific, cross-modal attacks. Our framework is built on three pillars: (1) a new multimodal peer-review dataset spanning multiple scientific domains; (2) a unified suite of attacks, including black-box prompt injections and white-box perturbations, specifically designed to target both text (GCG) and figures (PGD); and (3) a practical defense, motivated by the long-context challenge of academic papers, that uses chunk-based embedding search to efficiently localize and mitigate harmful instructions. Our extensive experiments, conducted across state-of-the-art models, confirm that AI reviewers are pervasively vulnerable. PaperGuard establishes the foundational benchmark, protocols, and actionable defense necessary to pioneer trustworthy, attack-resilient AI-assisted scholarly reviewing.

22.
arXiv (quant-ph) 2026-06-16

Scheme for Transport-based Global Entanglement Distribution using Quantum Processors

arXiv:2606.15421v1 Announce Type: new Abstract: We propose a scheme for distributing entanglement over global distances in a heralded manner by using satellites to physically transport entangled processor nodes with rare-earth-ion qubits. A full analysis of channel losses, errors and background light is performed to determine the fidelity and number of entangled pairs that can be distributed between two ground stations. We show that the scheme works already with a single satellite and can distribute close to the theoretical maximum number of entangled pairs that can be generated in a satellite overpass. In addition, we argue that in theory transportation-based schemes outperform other satellite-based schemes and can be scaled up to a constellation without additional channel losses. Daytime operation seems feasible as long as the sky is clear, with an EPR pair fidelity ranging from 99.3% at shorter network lengths to 93.9% with global coverage and can be further improved by active error correction or entanglement purification.

23.
arXiv (CS.AI) 2026-06-24

FP8 is All You Need (Part 2): Efficient Ozaki-Bailey Style FFT Through Tensor-core Garner Reformulation and Kulisch Escape Route

arXiv:2606.23698v1 Announce Type: cross Abstract: NVIDIA's Blackwell Ultra (B300) cuts FP64 vector throughput to ~1.3 TFLOPS per GPU, roughly 30x below B200 and well below the level at which bandwidth-limited FP64 workloads stay memory-bound. The Ozaki Scheme II framework recovers FP64-equivalent throughput by routing dense matrix multiply through FP8 tensor cores with a mantissa-sliced Chinese-remainder reconstruction. A companion Part (1) paper covers dense GEMM, batched GEMV, stencils, and SpMV; this paper adds the fifth canonical primitive, the 3-D FFT. We present Ozaki-Bailey FFT, an emulated 3-D FFT via the Bailey six-step decomposition with both 1-D FFT GEMMs on FP8 tensor cores. Bailey's small inner factor k ~ sqrt(N) (k=32 for N=1024) puts the kernel in the regime k

24.
arXiv (CS.CV) 2026-06-15

UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities

Retrieval-Augmented Generation (RAG) has shown substantial promise in improving factual accuracy by grounding model responses with external knowledge relevant to queries. However, most existing approaches are limited to a text-only corpus, and while recent efforts have extended RAG to other modalities such as images and videos, they typically operate over a single modality-specific corpus. In contrast, real-world queries vary widely in the type of knowledge they require, which a single type of knowledge source cannot address. To address this, we introduce UniversalRAG, an any-to-any RAG framework designed to retrieve and integrate knowledge from heterogeneous sources with diverse modalities and granularities. Specifically, motivated by the observation that forcing all modalities into a unified representation space derived from a single aggregated corpus causes a modality gap, where the retrieval tends to favor items from the same modality as the query, we propose modality-aware routing, which dynamically identifies the most appropriate modality-specific corpus and performs targeted retrieval within it, and further justify its effectiveness with a theoretical analysis. Moreover, beyond modality, we organize each modality into multiple granularity levels, enabling fine-tuned retrieval tailored to the complexity and scope of the query. We validate UniversalRAG on 10 benchmarks of multiple modalities, showing its superiority over various modality-specific and unified baselines.

25.
arXiv (CS.CV) 2026-06-12

Budget-Constrained Step-Level Diffusion Caching

Step-level caching accelerates diffusion models by exploiting temporal redundancy across denoising steps. Existing methods make per-step cache decisions using threshold-based heuristics, without directly optimizing for final output quality. As a result, their inference latency varies across inputs and is difficult to control at deployment. In this work, we propose BudCache, which inverts this formulation: rather than letting per-step error thresholds dictate the runtime cost, we fix the compute budget in advance and search for the cache policy that best preserves the final output. To tackle the combinatorial complexity of step selection, we combine Simulated Annealing with deterministic Hill Climbing. This offline search identifies high-quality cache policies within minutes and introduces no online search or thresholding overhead during inference. When the compute budget is very tight, we further introduce cache-aware schedule alignment, which adapts the time discretization to the selected cache policy to reduce cache-induced trajectory mismatch. Experiments on FLUX.1-dev and Wan2.1 show that BudCache achieves better generation quality than heuristic caching baselines under the same inference budgets. Code is available at https://github.com/Westlake-AGI-Lab/BudCache