Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-12

Thermodynamic assessment of machine learning models for solid-state synthesis prediction

arXiv:2602.04075v2 Announce Type: replace-cross Abstract: Machine learning models have recently emerged to predict whether hypothetical solid-state materials can be synthesized. These models aim to circumvent direct first-principles modeling of solid-state phase transformations, instead learning from large databases of successfully synthesized materials. Here, we assess the alignment of several recently introduced synthesis prediction models with material and reaction thermodynamics, quantified by the energy with respect to the convex hull and a metric accounting for thermodynamic selectivity of enumerated synthesis reactions. A dataset of successful synthesis recipes was used to determine the likely bounds on both quantities beyond which materials can be deemed unlikely to be synthesized. With these bounds as context, thermodynamic quantities were computed using the CHGNet foundation potential for thousands of new hypothetical materials generated using the Chemeleon generative model. Four recently published machine learning models for synthesizability prediction were applied to this same dataset, and the resultant predictions were considered against computed thermodynamics. We find these models generally overpredict the likelihood of synthesis, but some model scores do trend with thermodynamic heuristics, assigning lower scores to materials that are less stable or do not have an available synthesis recipe that is calculated to be thermodynamically selective. In total, this work identifies existing gaps in machine learning models for materials synthesis and introduces a new approach to assess their quality in the absence of extensive negative examples (failed syntheses).

02.
arXiv (CS.AI) 2026-06-15

Scalable Production Scheduling: Linear Complexity via Unified Homogeneous Graphs

arXiv:2604.23841v2 Announce Type: replace-cross Abstract: Efficiently solving the Job Shop Scheduling Problem in real-world industrial applications requires policies that are both computationally lean and topologically robust. While Reinforcement Learning has shown potential in automating dispatching rules, existing models often struggle with a scalability bottleneck caused by quadratic graph complexity or the architectural overhead of heterogeneous layers. We introduce a unified graph framework that employs feature-based homogenization to project distinct node roles into a shared latent space. This allows a standard homogeneous Graph Isomorphism Network to capture complex resource contention with linear complexity, ensuring low-latency inference for large-scale industrial applications. Our empirical results demonstrate that our framework achieves state-of-the-art performance while exhibiting consistent zero-shot generalization. We identify the job-to-machine ratio as the primary driver of policy effectiveness, rather than absolute problem size. Based on this, we propose a hypothesis of structural saturation, demonstrating that policies trained on critically congested instances ($\mathcal{J} \approx \mathcal{M}$) learn scale-invariant resolution strategies. Agents trained at this saturation point internalize invariant conflict-resolution logic, allowing them to treat massive rectangular instances as a sequential concatenation of saturated sub-problems. This approach eliminates the need for expensive scale-specific retraining and prevents overfitting to statistical shortcuts, providing a robust and efficient pathway for deploying RL solutions in dynamic production environments.

03.
arXiv (CS.LG) 2026-06-11

Mitigating Disparate Impact of Differentially Private Learning through Bounded Adaptive Clipping

arXiv:2506.01396v2 Announce Type: replace Abstract: Differential privacy (DP) has become an essential framework for privacy-preserving machine learning. Existing DP learning methods, however, often have disparate impacts on model predictions, e.g., for minority groups. Gradient clipping, which is often used in DP learning, can suppress larger gradients from challenging samples. We show that this problem is amplified by adaptive clipping, which will often shrink the clipping bound to tiny values to match a well-fitting majority, while significantly reducing the accuracy for others. We propose bounded adaptive clipping, which introduces a tunable lower bound to prevent excessive gradient suppression. Our method improves worst-class accuracy by over 10 percentage points on Skewed and Fashion MNIST compared to unbounded adaptive clipping, 7 points compared to Automatic clipping, and 5 points compared to constant clipping. The code is available at https://github.com/TrustworthyMLHelsinki/adaptive-clipping-fairness.

04.
bioRxiv (Bioinfo) 2026-06-16

Super Learner Ensemble Modeling of CPTAC Proteomic Data for Survival Prediction in Head and Neck Squamous Cell Carcinoma

Survival analysis in head and neck squamous cell carcinoma (HNSCC) is traditionally performed using Cox proportional hazards models, alongside some exploration into black-box machine learning methods. The Super Learner (SL) algorithm addresses this model selection dilemma by combining diverse candidate algorithms into a weighted ensemble to perform comparably to the best candidate method. This study evaluates the performance of SL in HNSCC. Proteomic features as well as clinical covariates from 96 CPTAC HNSCC samples were modeled with three candidate algorithms (Cox LASSO, Cox Ridge, and Random Survival Forest) as well as the ensemble SL method. Models were optimized via Uno's time-dependent Concordance Index (C-index) and tested at 1- and 3-year time horizons using 2000 bootstrap resamples. The Cox Ridge regression model achieved the highest predictive accuracy among the four total methods. However, the SL demonstrated stable performance over both time horizons (1-year C-index: 0.985; 3-year C-index: 0.960). Variable importance analysis of the Cox Ridge model successfully identified malignant proteins (ATR, MAML1, MIEN1) alongside novel potential prognostic indicators (ZNF800, KERA). This analysis emphasizes the statistical necessity for larger cohorts for ensemble learning, while providing a benchmark of proteomic indicators in HNSCC.

05.
arXiv (CS.CL) 2026-06-18

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Memory management is essential for LLM agents in long-term interactions. Current memory frameworks typically treat agents as passive ``recorders'' and retrieve information without understanding its deeper implications. They may fail in scenarios requiring reasoning and complex decision-making. To bridge this critical gap, we propose a novel actionable memory framework called ActMem that integrates memory retrieval with active causal reasoning. ActMem transforms unstructured dialogue history into a structured causal and semantic graph. By leveraging counterfactual reasoning and commonsense completion, it enables agents to deduce implicit constraints and resolve potential conflicts between past states and current intentions. Furthermore, we introduce a comprehensive dataset ActMemEval to evaluate agent reasoning capabilities in logic-driven scenarios, moving beyond the fact-retrieval focus of existing memory benchmarks. Experiments demonstrate that ActMem significantly outperforms baselines in handling complex, memory-dependent tasks, paving the way for more consistent and reliable intelligent assistants.

06.
arXiv (CS.AI) 2026-06-11

A Survey on Evaluating Quality and Trustworthiness in LLM-Generated Data

arXiv:2601.17717v3 Announce Type: replace Abstract: Large Language Models (LLMs) have emerged as powerful tools for generating data across various modalities. By transforming data from a scarce resource into a controllable asset, LLMs mitigate the bottlenecks imposed by the acquisition costs of real-world data for model training, evaluation, and system iteration. However, ensuring the high quality of LLM-generated synthetic data remains a critical challenge. Existing research primarily focuses on generation methodologies, with limited direct attention to the quality of the resulting data. Furthermore, most studies are restricted to single modalities, lacking a unified perspective across different data types. To bridge this gap, we propose the LLM Data Auditor framework. In this framework, we first describe how LLMs are utilized to generate data across six distinct modalities. More importantly, we systematically categorize intrinsic metrics for evaluating synthetic data from two dimensions: quality and trustworthiness. This approach shifts the focus from extrinsic evaluation, which relies on downstream task performance, to the inherent properties of the data itself. Using this evaluation system, we analyze the experimental evaluations of representative generation methods for each modality and identify substantial deficiencies in current evaluation practices. Based on these findings, we offer concrete recommendations for the community to improve the evaluation of data generation. Finally, the framework outlines methodologies for the practical application of synthetic data across different modalities.

07.
medRxiv (Medicine) 2026-06-17

County Year Informatics Model for Annual and Cumulative Unique Lung Cancer Screening Eligibility in Maryland, 2026 to 2045

Purpose: Population-level lung cancer screening programs require denominators that reflect age, smoking history, geography, and changing eligibility over time. We estimated annual prevalent and 20-year cumulative unique low-dose computed tomography screening eligibility for Maryland residents under alternative screening criteria. Methods: We built a deterministic cohort-cell stock-flow simulation using Maryland county-equivalent jurisdiction projections by age, sex, and race/ethnicity, with ACS socioeconomic/nativity covariates and smoking-history priors for ever-smoked status, pack-years, and quit-years. Scenarios included USPSTF 2013 legacy, USPSTF 2021, ACS 2023/2024, a risk-model-expanded sensitivity, and ever-smoked-only capacity stress tests. Cumulative unique eligibility counted people once at first eligibility rather than summing annual prevalent person-years. Results: Under USPSTF 2021, an estimated 238,346 Maryland residents were eligible in 2026 and 245,326 in 2045. The 20-year cumulative unique denominator was 768,668, whereas naively summing annual prevalent counts produced 4,850,735 person-years, a 6.31-fold overcount. ACS 2023/2024 expanded annual eligibility to 314,616 in 2026 and cumulative unique eligibility to 902,796 by adding remote former smokers. Ever-smoked-only adult eligibility was 1,957,699 in 2026 and 3,383,683 cumulative unique over 20 years. Conclusion: A Maryland statewide screening initiative should plan from cumulative unique eligibility and county-equivalent jurisdiction-specific burden rather than annual prevalence alone. Explicit pack-year and quit-year modeling materially changes statewide and county allocation compared with current-smoking proxy models.

08.
arXiv (CS.LG) 2026-06-16

Biarchetype analysis for univariate functional data. An application to macroeconomic financial time series

arXiv:2606.15881v1 Announce Type: cross Abstract: We introduce biarchetype analysis for the first time in the context of univariate functional data. This unsupervised methodology extends archetype analysis by simultaneously identifying archetypal structures across both the cases (countries, in our application) and the temporal argument. Both cases and time points are expressed as mixtures of biarchetypes, yielding a concise and highly interpretable representation of complex functional observations. Although biarchetype analysis is not intended as a clustering technique, it offers superior interpretability compared with biclustering approaches, as it is based on extreme, representative patterns rather than average centroids, thereby enhancing human comprehension. We apply the proposed method to 10-year government bond yields of European countries over the period 2001-2025. The results identify three distinct time regimes (the pre-crisis period, the euro-area sovereign debt crisis, and the post-crisis period), and reveal Germany, Greece, and Hungary as country archetypes.

09.
arXiv (quant-ph) 2026-06-19

Spatial Localization of Relativistic Quantum Systems: The Commutativity Requirement and the Locality Principle. Part II: A Model from Local QFT

arXiv:2604.04173v3 Announce Type: replace-cross Abstract: This paper is the second and final part of a two-part study. We construct positive-energy relativistic spatial localization observables in Minkowski spacetime within standard quantum field theory, using the stress–energy–momentum tensor smeared with suitable test functions. For each fixed timelike direction, the construction gives positive operator-valued measures (POVMs) on spacelike hypersurfaces, well defined on every $n$-particle sector and satisfying a relativistic causality condition excluding superluminal propagation of detection probabilities. The observables are built from local or quasi-local field-theoretic quantities, thus providing a rigorous version of earlier heuristic proposals. In the one-particle sector, the construction reduces to the observable previously introduced by the author, and its first moment gives the Newton–Wigner position operator under appropriate normalization and centering assumptions. Because the Reeh–Schlieder theorem prevents the normally ordered stress–energy–momentum tensor from being positive on the full Fock space, we use quantum energy inequalities to obtain lower bounds controlling deviations from positivity. This leads to regularized operator families, bounded from below, which approximate the localization effects. Finally, we define conditional localization observables for finite laboratories through modified local energy operators. By Haag duality, the corresponding conditional POVMs belong to local von Neumann algebras and commute for causally separated regions, in accordance with the Araki–Haag–Kastler framework. The results show how commutativity of localization observables is recovered for conditional measurements in finite spacetime regions.

10.
arXiv (CS.CL) 2026-06-16

Evaluative Judgement in Teaching AI-based Translation: A Class-room Case Study of AI-Mediated Translation and Post-Editing

作者:

Drawing on 23 anonymized student pro-jects from a fourth-year Machine Transla-tion and Post-editing course in a BA-level translation programme, this paper exam-ines how structured comparison of gen-eral-purpose LLMs and online MT sys-tems can elicit evaluative judgement in AI-mediated translation. Students translat-ed short specialised English Wikipedia texts into Catalan or Spanish, generated four system outputs, evaluated them using automatic metrics and human adequa-cy/fluency assessment, selected one output for post-editing, and justified their deci-sion in written reports. Descriptive counts are reported for all 23 projects, while qualitative interpretation is based on the 22 cases accompanied by written reports. Results show that students did not treat automatic metrics as final authority: final post-editing selections often diverged from metric rankings and were justified through adequacy, fluency, terminology, naturalness, and expected post-editing ef-fort. The study therefore does not bench-mark systems under controlled conditions; it analyses how students justified system choice within an authentic classroom as-signment.

11.
arXiv (quant-ph) 2026-06-17

Hybrid Ferromagnet-SNSPDs: Single photon induced order-to-disorder transition in ferromagnets coupled to thin film superconductors

arXiv:2606.17177v1 Announce Type: cross Abstract: The development of midwave and longwave infrared single photon detectors is crucial for their emerging applications in spectroscopy, remote sensing, exoplanet detection, and free space quantum communications. However, existing sensors need to be operated at extremely low temperatures (0.08-0.9K) to reduce dark noise and hence require the use of advanced cryogenics such as dilution refrigerators or $^3$He cryogens, significantly limiting applications. Here we propose a vortex-engineering approach based on a hybrid phase transition in a ferromagnet/superconductor bilayer to increase the operating temperature of infrared single photon detectors up to 3.75K. We show that the introduction of a ferromagnetic layer produces a local magnetic field which impedes vortex crossing in the superconductor, reducing dark noise. When a single photon is incident, the photon-induced hotspot causes an order-to-disorder transition in the ferromagnet, leading to a vortex-induced phase transition in the superconducting layer. By engineering the ferromagnet's Curie temperature to be close to the device's operating temperature, single photon sensitivity can be achieved at increased operating temperatures. We predict at midwave/longwave infrared wavelengths (3-14$\mu$m) the operating temperature can be raised to 3.25-3.75K, enabling significantly simpler cooling systems.

12.
medRxiv (Medicine) 2026-06-15

Fanconi Anemia as a Window into Premalignant Field Cancerization of the Oral Mucosa

Head and neck squamous cell carcinoma (HNSCC) evolves through stepwise clonal expansion within genetically altered mucosa fields, yet actionable biomarkers remain undefined. Leveraging Fanconi anemia (FA), a cancer predisposition syndrome with extreme HNSCC risk due to defective DNA interstrand crosslink repair, we profiled premalignant changes in the oral cavity using noninvasive brush biopsies. Consistent with our prior demonstration of genomic instability in FA-associated SCCs, we detected pathogenic TP53 variants in 26% and copy number alterations in 60.5% in clinically normal-appearing oral mucosa of individuals with FA. These subclinical clonal expansions define candidate biomarkers of early clonal evolution amenable to serial sampling for risk stratification and prevention studies. Since FA-associated SCCs share genomic features with sporadic HNSCC, these findings may extend to the broader population. We also identify somatic reversion of a pathogenic FANCB variant, providing evidence of genomic self-correction and suggesting a potential avenue for gene-based cancer prevention in FA.

13.
arXiv (CS.CV) 2026-06-17

EmbodiTTA: Resource-Efficient Test-Time Adaptation for Embodied Visual Systems

Continual Test-time adaptation (CTTA) continuously adapts the deployed model on every incoming batch of data. While achieving optimal accuracy, existing CTTA approaches present poor real-world applicability on resource-constrained edge devices, due to the substantial memory overhead and energy consumption. In this work, we first introduce a novel paradigm – on-demand TTA – which triggers adaptation only when a significant domain shift is detected. Then, we present OD-TTA, an on-demand TTA framework for accurate and efficient adaptation on edge devices. OD-TTA comprises three innovative techniques: 1) a lightweight domain shift detection mechanism to activate TTA only when it is needed, drastically reducing the overall computation overhead, 2) a source domain selection module that chooses an appropriate source model for adaptation, ensuring high and robust accuracy, 3) a decoupled Batch Normalization (BN) update scheme to enable memory-efficient adaptation with small batch sizes. Extensive experiments show that OD-TTA achieves comparable and even better performance while reducing the energy and computation overhead remarkably, making TTA a practical reality.

15.
arXiv (CS.AI) 2026-06-19

One Probe Won't Catch Them All: Towards Targeted Deception Detection

arXiv:2602.01425v2 Announce Type: replace Abstract: Linear probes are a promising approach for monitoring AI systems for deceptive behaviour. Previous work has shown that a linear classifier trained on a contrastive instruction pair and a simple dataset can achieve good performance. However, these probes exhibit notable failures even in straightforward scenarios, including spurious correlations and false positives on non-deceptive responses. In this paper, we demonstrate that deception detection is inherently heterogeneous: while a single universal probe achieves modest improvements (+0.032 AUC), post-hoc oracle analysis reveals substantially higher potential (+0.108 AUC) when probes are matched to specific deception types, and synthetic validation experiments suggest this ceiling is achievable a priori when the deception type is known in advance. Our findings reveal that instruction pairs capture deceptive intent rather than content-specific patterns, explaining why prompt choice dominates probe performance (70.6% of variance). Given this heterogeneity, we conclude that organizations should define their specific threat models and deploy appropriately matched probes rather than seeking a universal deception detector.

16.
arXiv (quant-ph) 2026-06-19

Operator Learning for efficient Quantum Computation

arXiv:2606.20184v1 Announce Type: new Abstract: An efficient implementation of quantum algorithms is often hindered by the lack of efficient primitives for operators and state preparation. This limits both the ability of near-term quantum hardware to simulate complex problems and the potential of fault-tolerant algorithms to achieve practical quantum advantage. To address this, we propose a full-stack variational framework that transforms arbitrary operators to compact quantum circuits. The resulting variational circuits can be tailored to the connectivity and long-range interaction of the target hardware. The learning process employs backpropagation together with a cost function that efficiently optimizes unitary operators and non-unitary – dense or sparse – operators using only a single ancilla qubit for block encoding. Additionally, we introduce a regularization term that reduces the approximation error. The approach is validated for both quantum mechanical and engineering applications. In the former case, we learn propagators that arise in native quantum problems – such as quantum simulation and quantum chemistry – and achieve improved resource scaling in comparison to standard Suzuki-Trotter expansions. In the latter case, we demonstrate the approach's ability to implement the second-order central finite difference approximation of the Laplace operator – relevant for solving partial differential equations – while improving upon current error metrics. The final example deals with learning a dense, non-unitary operator that arises in the analysis of inviscid potential flow around an airfoil. This universality of the framework opens the door for solving general problems beyond prototypical engineering and quantum applications.

17.
arXiv (CS.LG) 2026-06-15

Time Series Causal Discovery via Context-Conditioned and Causality-Augmented Pretraining

arXiv:2605.26759v2 Announce Type: replace Abstract: Causal discovery from time series is critical for many real-world applications, such as tracing the root causes of anomalies. Existing approaches typically rely on dataset-specific optimization, making it difficult to transfer their causal discovery capabilities to new time series governed by diverse causal mechanisms. In this paper, we propose PTCD, a novel Pretraining framework for Time-series Causal Discovery, which improves cross-task generalization through context-conditioned modeling and transferable causal augmentation. To model complex temporal causal dependencies, PTCD employs a dual-scale iterative attention mechanism to capture window-level causal relationships, and a Gaussian mixture with a context-level routing mechanism to handle heterogeneous exogenous distributions. To further address distribution shifts across causal graphs, PTCD adopts a pretraining paradigm on synthetic datasets that integrates intervention-based learning and a causal mixup strategy, promoting stable causal discovery and stronger generalization. Extensive experiments on multiple real-world out-of-distribution (OOD) datasets demonstrate that PTCD excels in both causal discovery and root cause identification.

18.
arXiv (CS.CL) 2026-06-15

Token-Level LLM Collaboration via FusionRoute

Large language models (LLMs) exhibit strengths across diverse domains. However, achieving strong performance across these domains with a single general-purpose model typically requires scaling to sizes that are prohibitively expensive to train and deploy. On the other hand, while smaller domain-specialized models are much more efficient, they struggle to generalize beyond their training distributions. To address this dilemma, we propose FusionRoute, a robust and effective token-level multi-LLM collaboration framework in which a lightweight router simultaneously (i) selects the most suitable expert at each decoding step and (ii) contributes a complementary logit that refines or corrects the selected expert's next-token distribution via logit addition. Unlike existing token-level collaboration methods that rely solely on fixed expert outputs, we provide a theoretical analysis showing that pure expert-only routing is fundamentally limited: unless strong global coverage assumptions hold, it cannot in general realize the optimal decoding policy. By augmenting expert selection with a trainable complementary generator, FusionRoute expands the effective policy class and enables recovery of optimal value functions under mild conditions. Empirically, across both Llama-3 and Gemma-2 families and diverse benchmarks spanning mathematical reasoning, code generation, and instruction following, FusionRoute outperforms both sequence- and token-level collaboration, model merging, and direct fine-tuning, while remaining competitive with domain experts on their respective tasks.

19.
arXiv (CS.CV) 2026-06-19

U$^2$Mamba: A Two-level Nested U-structure Mamba for Salient Object Detection

Mamba-based models have emerged as a promising alternative for salient object detection (SOD), offering significant advantages in modeling long sequences. However, existing models often fail to explore contextual information and the depth of the entire architecture. This paper introduces U$^2$Mamba, a powerful and innovative U-structured network for salient object detection. We propose multiscale Mamba U-blocks (MMUBs) that enhance the model depth to improve local feature extraction capabilities. Our newly developed nested U-structure, incorporating MMUBs, enables the network to integrate various receptive fields from shallow and deep layers, thereby collecting richer contextual information and longer-range data without being constrained by resolution. Instead of using the traditional deep supervision scheme and top-level supervised training, we propose a hierarchical training supervision method where the loss is computed at each level during the training process. Extensive experiments demonstrate that U$^2$Mamba achieves highly competitive performance against state-of-the-art methods. The source code is available at \url{https://github.com/JL021/U2Mamba}.

20.
arXiv (CS.AI) 2026-06-11

MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios

arXiv:2602.22638v2 Announce Type: replace Abstract: Route-planning agents powered by large language models (LLMs) have emerged as a promising paradigm for supporting everyday human mobility through natural language interaction and tool-mediated decision making. However, systematic evaluation in real-world mobility settings is hindered by diverse routing demands, non-deterministic mapping services, and limited reproducibility. In this study, we introduce MobilityBench, a scalable benchmark for evaluating LLM-based route-planning agents in real-world mobility scenarios. MobilityBench is constructed from large-scale, anonymized real user queries collected from Amap and covers a broad spectrum of route-planning intents across multiple cities worldwide. To enable reproducible, end-to-end evaluation, we design a deterministic API-replay sandbox that eliminates environmental variance from live services. We further propose a multi-dimensional evaluation protocol centered on outcome validity, complemented by assessments of instruction understanding, planning, tool use, and efficiency. Using MobilityBench, we evaluate multiple LLM-based route-planning agents across diverse real-world mobility scenarios and provide an in-depth analysis of their behaviors and performance. Our findings reveal that current models perform competently on Basic information retrieval and Route Planning tasks, yet struggle considerably with Preference-Constrained Route Planning, underscoring significant room for improvement in personalized mobility applications. We publicly release the benchmark data, evaluation toolkit, and documentation at https://github.com/AMAP-ML/MobilityBench.

21.
arXiv (CS.CV) 2026-06-16

Text-Vision Co-Instructed Image Editing

Existing image editing methods can be generally categorized into textual instruction-based and visual prompt-based ones. Textual instructions are semantically expressive, but are limited by the coarse granularity of spatial control of the editing results. In contrast, visual prompts such as drag and point can provide precise spatial guidance, but are limited by the inherent ambiguity in semantic intent. To unify the strength of textual and visual prompts, we present Text-Vision Co-Instructed Image Editing, which jointly models textual instructions as semantic intent and sparse visual instructions as spatial guidance, aiming to achieve precise and intent-faithful image manipulation. To this end, we first construct a textual-visual instruction paired dataset with more than 23K samples derived from dynamic videos, enabling aligned supervision for cross-modal instruction. We then propose TV-Edit, a Textual-Visual instruction unified Editing framework to contextualize drag or point-based visual instructions with image-text semantics and lift them into semantic-aware control representations for pretrained editing backbones. By integrating semantic intent and spatial constraints, TV-Edit leads to more precise spatial control, less instruction ambiguity, and stronger structural consistency than text-only or drag-based alternatives. Finally, we establish TV-Edit-Bench, a deliberately designed benchmark to evaluate semantic faithfulness, spatial alignment, and visual consistency with ground-truth references and controlled textual-visual variations for reliable assessment. Our experiments across multiple editing backbones demonstrate that TV-Edit consistently yields more precise and intent-faithful edits, significantly outperforming state-of-the-art instruction-based and drag-based baselines.

22.
arXiv (CS.LG) 2026-06-11

From inverse problems to neural operators: prediction, mechanism, and generalization of data-driven models

作者:

arXiv:2606.08956v2 Announce Type: replace Abstract: Scientists have historically relied on mathematical models based on differential equations to relate system inputs – forces, fluxes, or heat sources – to outputs, such as displacement, velocity, concentration, and temperature. These models rely on deep domain knowledge to determine the form of the governing differential equation, which is then calibrated with data by solving an inverse problem. In recent years, the field of Scientific Machine Learning has introduced a variety of alternative modeling strategies for physical systems. A method called Sparse Identification of Nonlinear Dynamics learns the governing equation as a sparse linear combination of terms in a user-defined library. Neural Ordinary Differential Equations construct the governing equation by taking in the state and its derivatives at the input layer of a neural network. Entirely foregoing the modeling framework of differential equations, neural operators directly learn a non-linear mapping between the system inputs and outputs. From inverse problems to neural operators, all of these modeling strategies can be conceptualized as data-driven machinery to predict a system's response over a range of inputs. It is then natural to wonder how exactly these various strategies relate to each other, and whether they can be neatly taxonomized. Drawing from the philosophical literature on scientific models, we argue that many model types have a common structure, differing only in the assumed model class of the input-output relation they define. Connecting to philosophical ideas on mechanism, and arguing that data from physical systems arises from solutions to parsimonious differential equations, we propose that only certain models are capable of mechanism discovery, and thus generalization. Our analysis is intended to unite apparently disparate modeling strategies and provide insight into their appropriate use cases.

23.
arXiv (CS.AI) 2026-06-16

On-Policy Distillation with Curriculum Turn-level Guidance for Multi-turn Agents

arXiv:2606.15912v1 Announce Type: cross Abstract: Multi-turn agents that plan, invoke tools, and interact with environments offer a promising paradigm for solving complex tasks, yet their capabilities typically rely on very large models whose inference cost is prohibitive in practice.On-Policy Distillation (OPD) is a natural recipe for transferring such capabilities to smaller students, but we find that it suffers a characteristic failure mode in this setting: small student errors compound across turns and push the trajectory out of the teacher's familiar state distribution, so the teacher's supervision becomes least reliable precisely where the student needs it most.We propose Guided On-Policy Distillation (Guided-OPD), a simple yet effective algorithm that mixes teacher- and student-generated turns within each rollout and schedules the teacher's intervention probability along a curriculum that decays to zero.Strong guidance keeps early trajectories close to the teacher distribution and is then gradually withdrawn to recover the purely on-policy regime used at inference.On ALFWorld, ScienceWorld, and WebShop, distilling Qwen3 students from a Qwen3-30B-A3B teacher, Guided-OPD improves Score by 21.1\% and Success Rate by 25.5\% over vanilla OPD on average, with larger gains on smaller students.

24.
arXiv (CS.AI) 2026-06-18

PSyGenTAB: A Privacy-Preserving Framework for Synthetic Clinical Tabular Data Generation via Constrained Optimization

arXiv:2606.18518v1 Announce Type: cross Abstract: The development of medical AI is constrained by limited access to high-quality clinical data due to institutional silos and strict privacy regulations such as HIPAA and GDPR. Synthetic data generation offers a potential solution, but existing methods lack principled mechanisms to explicitly manage the privacy-utility trade-off, often degrading clinically meaningful patterns or risking patient re-identification. We present PSyGenTAB, a privacy-preserving generative framework that formulates synthetic healthcare data generation as a constrained optimization problem solved using the Augmented Lagrangian Method. By embedding configurable privacy constraints directly into model training, PSyGenTAB enforces minimum privacy thresholds while maximizing clinical data utility. Across multiple clinically motivated benchmarks, PSyGenTAB preserves inter-feature clinical relationships and minority-class diagnostic patterns essential for reliable health AI. Downstream evaluation using Train-on-Synthetic, Test-on-Real and Train-on-Real, Test-on-Synthetic protocols shows that models trained on synthetic data achieve performance comparable to those trained on real patient records. Privacy auditing further demonstrates reduced exact record reproduction and strong resilience to membership inference attacks. These results establish PSyGenTAB as a principled framework for balancing privacy protection and clinical utility in synthetic healthcare data, supporting secure cross-institutional AI development.

25.
arXiv (CS.CV) 2026-06-17

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization

Group Relative Policy Optimization has emerged as essential for aligning video diffusion models with human preferences, but faces a critical computational bottleneck: training a 14B parametered model typically demands hundreds of GPU days per experiment. Existing efficiency methods reduce costs through sliding window subsampling training timesteps, but fundamentally compromise optimization, exhibiting severe instability and failing to reach full trajectory performance. We present Flash-GRPO, a single-step training framework that outperforms full trajectory training in alignment quality under low computational budgets while substantially improving training efficiency. Flash-GRPO addresses two critical challenges: iso-temporal grouping eliminates timestep-confounded variance by enforcing prompt-wise temporal consistency, decoupling policy performance from timestep difficulty; temporal gradient rectification neutralizes the time-dependent scaling factor that causes vastly inconsistent gradient magnitudes across timesteps. Experiments on 1.3B to 14B parameter models validate Flash-GRPO's effectiveness, demonstrating substantial training acceleration with consistent stability and state-of-the-art alignment quality.