Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-16

Intrinsic Computational Functionalism and Simulated Consciousness

arXiv:2606.15348v1 Announce Type: cross Abstract: A common objection to artificial or simulated consciousness is that a simulated brain is no more conscious than simulated water is wet. We address this from the perspective of Intrinsic Computational Functionalism (ICF): if consciousness is computationally constituted, it depends not on externally imposed descriptions but on the computational structures a system physically realizes in virtue of its own causal-dynamical organization. In previous work we developed Canonical Functionalism as a mathematically precise special case of this anti-interpretivist program, identifying functional states by their complete future input-output roles under a fixed interface. Here we argue that this input-output construction, though important, is incomplete: as a behavioral boundary case of ICF, it makes lookup tables and unfolded systems that preserve the same boundary behavior canonically equivalent. A consciousness-relevant canonical representation must instead include internal mechanisms, interventions, and joint readouts belonging to the relevant intrinsic organization. We therefore define a mechanism-enriched canonical structure and use it to formulate Intrinsic Causal-Computational Realization (ICCR), a realization relation preserving physical implementation, intrinsic state individuation, transition structure, intervention profiles, and the relevant agent-body-world boundary. The central result is conditional: if conscious properties are invariants of intrinsic causal-computational organization, then any system satisfying ICCR realizes the same consciousness-relevant properties, whether biological, artificial, or simulated. We discuss objections including biological naturalism and integrated information theory. We conclude that to deny consciousness to a simulation, one must identify a consciousness-relevant intrinsic causal-computational structure that the simulation fails to realize.

02.
arXiv (CS.CL) 2026-06-17

Rethinking Groups in Critic-Free RLVR

Reinforcement learning (RL) has become a central paradigm for post-training large language models. Existing critic-free RL methods typically generate a group of rollouts for the same question to estimate value baselines for advantage computation. However, this design suffers from data inefficiency, group synchronization barriers, and inflexibility with structured rollouts. In this work, we revisit the role of the ``group'' and show that its underlying function is not merely to estimate baselines but to prevent false penalties on negative samples. Building on this insight, we propose negative token filtering, a simple and effective strategy that enables stable single-rollout training. We apply it to two batch-level advantage methods, achieving comparable performance on reasoning tasks and stronger performance on agentic tasks relative to group-based RL techniques.

03.
arXiv (CS.CV) 2026-06-11

Spatially Selective Self-Training for Unsupervised Building Change Detection

Unsupervised building change detection aims to learn building-change masks from unlabeled bi-temporal remote sensing images. Existing label-free methods often follow a discrepancy-to-mask paradigm, directly using temporal differences, frozen foundation-model responses, prompt-based outputs, or post-processing results as final change maps. Although these strategies provide annotation-free cues, they do not learn a task-specific building-change detector and remain vulnerable to the gap between generic temporal discrepancies and building-defined structural changes. In practice, such discrepancies are often noisy and task-irrelevant, as appearance shifts, registration errors, and non-building modifications can produce strong but misleading responses. To address this problem, we propose SST-CD, a spatially selective self-training framework that reformulates fully label-free building change detection as end-to-end detector learning under noisy pseudo supervision. SST-CD uses temporal discrepancies as candidate pseudo labels and trains the detector only on spatially reliable pixels, whose reliability is estimated by a local consistency criterion that filters inconsistent regions from supervision. To further stabilize noisy self-training, a lightweight feature adapter recalibrates bi-temporal features, while a prototype-based decoder produces compact change and no-change representations. Experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show that SST-CD achieves F1 scores of 83.08%, 91.69%, and 86.60%, respectively, outperforming existing unsupervised and label-free baselines.

05.
arXiv (CS.AI) 2026-06-17

Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting

arXiv:2606.17996v1 Announce Type: cross Abstract: Cyclicity and trend are important components of time series data and many studies based on cyclicity and trend have achieved good results in long-term time series forecasting. However, we believe that current work neglects the influence of real-world inter-channel correlations in time series data which leads to suboptimal predictions. Furthermore, these models rely on complex designs to capture diverse information so that resulting in low computational efficiency. To address this challenge, we propose McWC, a long-term time series forecasting model that separately models the cyclicity, trend, and inter-channel correlations. Specifically, McWC first decouples cyclical information from data using a multi-layer cyclicity construction module. Then, it extracts inter-channel correlations using multi-layer perceptron. Next, it models and fuses the multi-layer high-frequency and low-frequency information from data using a multi-level wavelet decomposition module. Finally, it aggregates the results of different components to obtain the output. Simultaneously, we decouple intra-channel autocorrelations by calculating a loss function in the frequency domain. Experiments on six real-world datasets demonstrate that McWC achieves state-of-the-art performance, exhibiting excellent computational efficiency and historical information extraction capabilities.

06.
arXiv (CS.LG) 2026-06-12

Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

arXiv:2606.13605v1 Announce Type: cross Abstract: This paper presents a distribution-agnostic robust trajectory-optimization framework based on chance-constrained reinforcement learning. The uncertainty is represented here through initial conditions and process noise, with the only requirement being that it can be sampled. A deterministic nominal trajectory is first computed offline, and reinforcement learning is then used only to robustify that baseline through a structured affine closed-loop correction law comprising a feedforward control adjustment and time-varying feedback gains. Probabilistic feasibility is enforced empirically through rollout-based upper-tail quantiles, while terminal dispersion is regulated through covariance-feasibility penalties. The framework is assessed on two materially different trajectory design problems. The flagship case study is a three-dimensional multi-impulse Earth-Mars transfer, where the learned policy is benchmarked against a recent robust trajectory-optimization reference under Gaussian uncertainty and then evaluated under bounded uniform uncertainty and under process disturbances not seen during training. The second case study is a stochastic atmospheric pinpoint rocket landing problem, used to assess portability to a short-horizon continuous-thrust setting with drag, mass depletion, and glide-slope constraints. The results show that the proposed framework can remain competitive in upper-tail fuel cost while preserving probabilistic feasibility, and that the same robustification scaffold can be carried across heterogeneous spacecraft trajectory planning problems without redesign of its core stochastic-control structure.

07.
arXiv (CS.CV) 2026-06-15

UniversalRAG: Retrieval-Augmented Generation over Corpora of Diverse Modalities and Granularities

Retrieval-Augmented Generation (RAG) has shown substantial promise in improving factual accuracy by grounding model responses with external knowledge relevant to queries. However, most existing approaches are limited to a text-only corpus, and while recent efforts have extended RAG to other modalities such as images and videos, they typically operate over a single modality-specific corpus. In contrast, real-world queries vary widely in the type of knowledge they require, which a single type of knowledge source cannot address. To address this, we introduce UniversalRAG, an any-to-any RAG framework designed to retrieve and integrate knowledge from heterogeneous sources with diverse modalities and granularities. Specifically, motivated by the observation that forcing all modalities into a unified representation space derived from a single aggregated corpus causes a modality gap, where the retrieval tends to favor items from the same modality as the query, we propose modality-aware routing, which dynamically identifies the most appropriate modality-specific corpus and performs targeted retrieval within it, and further justify its effectiveness with a theoretical analysis. Moreover, beyond modality, we organize each modality into multiple granularity levels, enabling fine-tuned retrieval tailored to the complexity and scope of the query. We validate UniversalRAG on 10 benchmarks of multiple modalities, showing its superiority over various modality-specific and unified baselines.

08.
arXiv (quant-ph) 2026-06-16

Information Is Not Physical: Possibility Spaces, Erasure, and the Structure of Unrealized Alternatives

arXiv:2606.15120v1 Announce Type: cross Abstract: The slogan ``information is physical,'' introduced by Rolf Landauer and developed through quantum information theory and black-hole thermodynamics, has achieved near-axiomatic status in modern physics. Yet the ontological status of information remains surprisingly underexamined: most discussions either reduce information to a form of energy or treat it as a purely mathematical object. This paper proposes a third position. I argue that information is neither a physical substance nor a free-floating abstraction, but rather the structure of physically realizable alternatives – a counterfactual structure that a physical system instantiates in virtue of the possibility space available to it. Building on Shannon's combinatorial definition, the Landauer principle, the no-cloning theorem, and the black-hole information paradox, I show that the informational content of any physical event is constituted by the set of outcomes that could have occurred but did not. This counterfactual reading dissolves several persistent confusions: it explains why erasing information dissipates heat without making information ``material,'' why quantum superposition is informationally richer than any classical mixture, and why information loss in black holes is physically significant beyond mere bookkeeping. The proposal sits within a structural-realist framework but departs from standard structural realism by locating the relevant structure in modal, not merely actual, relations. I conclude by sketching implications for the foundations of quantum mechanics, quantum gravity, and scientific ontology more broadly.

09.
arXiv (CS.LG) 2026-06-15

Side-Channel Attacks Bypass Protection in 3D Printers

arXiv:2606.13952v1 Announce Type: cross Abstract: Active Motor Noise Cancellation (AMNC) ships in commercial fused deposition modeling (FDM) 3D printers as a hardware countermeasure against acoustic side-channel attacks that target intellectual property (IP). We present the first empirical evaluation of a deployed AMNC countermeasure, using a public dataset of synchronized acoustic and vibration recordings from two AMNC-equipped Bambu Lab printers across 12 object classes. AMNC fully neutralizes the acoustic channel: classification accuracy is indistinguishable from the 8.33% random baseline. The vibration channel, which AMNC does not target, still leaks. With summary statistics the leak is coarse and amplitude-driven (vibration accuracy approximately 31% pooled, 36-47% within-printer), while the waveform shape carries essentially nothing (frequency-only features at chance). A full-sequence temporal model that ingests the ordered evolution of the print raises accuracy to approximately 61%, and an order-shuffling control (approximately 33%) shows that a substantial component is genuinely sequential and tied to print progression. The leak is device-specific: a classifier trained on one printer transfers near chance to the other. We conclude that AMNC is an acoustic-only defense: vibration remains a partial, geometry-correlated side channel it does not address, but one that does not, on this dataset, support full geometric reconstruction; reconstruction-grade attacks would require the magnetic or power channels AMNC also leaves untouched. We release all code.

10.
arXiv (CS.AI) 2026-06-16

Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build

arXiv:2605.21629v2 Announce Type: replace-cross Abstract: How much have students' ordinary learning processes shifted in response to generative AI, and how does that affect their durable learning outcomes? Self-report surveys show little change, while small-scale behavioral studies report widespread AI use without the scale or duration to measure learning consequences. We address both questions using a ten-year panel of $3.2$ million ALEKS learning interactions for investigating time-on-task, complemented by ALEKS PPL placement-assessment data for examining proctoring and learning outcomes, with a quasi-experimental design exploiting variation in tasks that are more susceptible to AI (text-based word problems) and less susceptible to AI (interactive graph-based problems). Learning time on AI-susceptible problems declines $2.8\%$ per quarter among college students after ChatGPT's release, cumulating to $26.9\%$ over eleven quarters; high-schoolers show $31.3\%$, middle-schoolers $9.0\%$, and Grade 5 students no detectable change. Among college students, the post-ChatGPT divergence vanishes entirely under proctoring, ruling out broad efficiency gains as the likely explanation. Logistic fixed-effects models on randomly assigned proctored retention items yield a $25\%$ cumulative decline in odds of correct response; the same estimator on non-proctored assessment produces a large opposite-signed increase – inconsistent with any platform, cohort, or curriculum explanation. These results are among the first large-scale behavioral and outcome evidence that generative AI has altered how students study and the knowledge they build – the population-level indicator of cognitive surrender, with direct implications for educational research, assessment governance, and AI policy.

11.
arXiv (CS.AI) 2026-06-19

Co-policy: Responsive Human-Robot Co-Creation for Musical Performances

arXiv:2606.19914v1 Announce Type: cross Abstract: Art has long stood as a pivotal expression of human creativity. Embodied artificial intelligence offers a route for generative models to participate in that creativity through physical action rather than disembodied digital content. In robotic music co-creation, it is challenging to connect semantic musical understanding with real-time and physically executable performance. We present Co-policy, a framework for human-robot musical co-creation that separates semantic intent grounding, constrained musical variation, and visuomotor execution. To ground musical semantics, Co-policy uses pre-inference semantic anchors and a fine-tuned Qwen-vl planner (F-Qwen) to transform speech, live musical seeds, and visual observations into structured co-creation plans. To support low-latency execution, Co-policy introduces a Gaussian-Mixture Visuomotor Policy (GMP), implemented as a conditional mixture-density policy that maps target notes and visual context to multimodal robot actions in a single forward pass. Unlike robotic playback systems that merely reproduce user-specified notes, Co-policy generates complementary musical responses under both musical and physical constraints. Real-robot chime experiments, ablations, and expert evaluation show improved intent alignment, execution accuracy, and response frequency over diffusion-policy and ablated baselines, supporting physically grounded action generation as a key requirement for embodied human-AI co-creation.

12.
arXiv (CS.AI) 2026-06-11

GEAR-VLA: Learning Geometry-Aware Action Representations for Generalizable Robotic Manipulation

arXiv:2606.08530v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models achieve strong benchmark performance but still struggle in real-world deployment with unseen objects, background shifts, and different robot embodiments. We argue that this stems from the lack of a unified geometry-aware manipulation representation, leaving existing VLAs vulnerable to low-level trajectory supervision, misaligned 3D features, and embodiment differences. To address this, we propose GEAR-VLA, a VLA framework for learning unified geometry-aware action representations for generalizable robotic manipulation. GEAR-VLA adopts coarse-to-fine action learning, where multi-source embodied pretraining equips the VLM with embodied reasoning and discrete action understanding before latent action tokens connect action semantics to a gradient-decoupled DiT continuous action expert. It further performs semantic-aligned 3D integration by aligning a trainable 3D spatial backbone with the VLA representation while freezing the original VLM-aligned visual pathway. To share this representation across robots, GEAR-VLA uses embodiment canonicalization, where embodiment-aware states and embodiment-invariant actions confine robot differences to the low-level interface. Extensive simulation and real-world experiments demonstrate strong generalization: GEAR-VLA achieves state-of-the-art performance on LIBERO, zero-shot LIBERO-Plus, and RoboTwin 2.0, reaches 85.9% success on AgileX and 81.0% on the pretraining-unseen LDT-01 embodiment, and obtains 90.1% success on a 6,360-trial universal grasping benchmark with 212 unseen objects. Code and models will be released at https://github.com/babynabeauty/GEAR-VLA.

13.
arXiv (CS.CL) 2026-06-11

Language Shapes Mental Health Evaluations in Large Language Models

Multilingual large language models (LLMs) are increasingly used in socially sensitive mental health contexts, including support chatbots, screening, and content moderation. This raises a reliability question: do semantically equivalent mental health inputs elicit comparable evaluations across languages, or systematic shifts consistent with language-associated social and cultural contexts? We examine this question in an English-Chinese setting with GPT-4o and Qwen3-32B using a two-level framework: construct-level evaluative orientation, measured by psychometric stigma instruments, and decision-level behavior, measured by binary stigma detection and four-class depression severity classification. Across instruments and models, Chinese prompts elicit higher stigma-related scores than English prompts. At the decision level, Chinese prompts reduce sensitivity to stigmatizing content and produce more conservative depression severity judgments, leading to more under-estimation errors. These findings show that prompt language can shift both evaluative orientation and downstream behavior in LLM-based mental health evaluation. They highlight the need to evaluate multilingual LLMs not only for aggregate performance, but also for whether they apply comparable evaluative standards across languages in socially sensitive domains.

14.
arXiv (math.PR) 2026-06-11

Integrated expectile-based measures of inequality

arXiv:2606.12333v1 Announce Type: cross Abstract: Expectiles provide a class of asymmetric location functionals that incorporate the magnitude of deviations and admit a natural geometric interpretation. Building on their structural consistency with the convex stochastic order, this paper introduces a family of integrated expectile functionals for measuring risk, dispersion, and inequality. The proposed functionals admit analytical representations as integrals of expectiles across asymmetry levels. For a distinguished subclass of these constructions, a geometric representation is available: the resulting quantities can be expressed as weighted areas of star-shaped sets encoding the distributional asymmetry of a random variable. This approach yields a new class of expectile-based inequality indices, constituting a natural counterpart to classical Gini-type measures while preserving desirable monotonicity and consistency properties. Empirical counterparts are derived in closed form and admit explicit decompositions over finite samples. The framework extends naturally to multivariate settings through directional expectile constructions, leading to measures capable of capturing genuinely joint forms of multivariate dispersion and inequality.

15.
arXiv (CS.CL) 2026-06-16

Risk-Aware LLM Agents for Geospatial Data Retrieval: Design and Preliminary Adversarial Evaluation

We present an LLM-driven framework for retrieving remote sensing data from cloud-based geospatial catalogues using natural language queries. The system converts user intent into structured API calls, enabling efficient access to satellite imagery and environmental datasets. The architecture integrates three agents: Guardrail for safety and policy enforcement, General-QA for intent interpretation, and Recommender-Analyst for schema-aware API call generation. This coordinated design ensures reliable, semantically aligned interaction with external data services. The modular framework is portable across platforms through API schema substitution and supports applications in environmental monitoring, disaster response, and climate analysis. It establishes a scalable interface between user intent and geospatial infrastructure, enabling streamlined and automated Earth observation workflows. Preliminary experiments under adversarial multi-turn settings show that prompt-level safety instructions improve robustness, although rare high-impact failures persist in API manipulation scenarios and highlight the need for adaptive, system-level defenses that balance safety, usability, and cost efficiency, which motivates the use of our intercept-level Guardrail agent.

16.
arXiv (CS.LG) 2026-06-19

A Solver-Free Training Method for Predict-then-Optimize

arXiv:2606.19587v1 Announce Type: cross Abstract: We propose a scalable method for training prediction (machine learning) models in the predict-then-optimize paradigm, where model outputs serve as coefficients for a subsequent linear optimization task. Directly minimizing the empirical decision regret is intractable for linear programming and combinatorial optimization since the decision mapping is piecewise constant, and the gradients are zero almost everywhere. While existing methods address this by smoothing the differentiation process, they suffer from scalability issues, since a computationally expensive solver call is required for every gradient evaluation. To address this, we propose a decision-focused learning pipeline based on a measure transformation principle, which yields a new surrogate loss that is completely optimization-solver-free during training. We establish theoretical guarantees, including Fisher consistency and excess risk bounds. Empirically, our method achieves decision quality competitive with state-of-the-art methods while reducing training time by orders of magnitude.

17.
arXiv (math.PR) 2026-06-16

A Concavity Theorem for the Parisi PDE

作者:

arXiv:2606.15432v1 Announce Type: new Abstract: We prove that the map sending the diffusion profile to the solution of a time-changed Parisi PDE evaluated at time-space $(0,0)$ is concave. This result strengthens the raywise concavity result proven by Auffinger and Chen (2016). As an application, for the balanced multispecies Ising spin glasses, the lower bound of Bates and Sohn (2025) matches the Hopf-type upper bound given by the Hamilton–Jacobi framework developed by Mourrat, Chen and Xia.

18.
arXiv (CS.LG) 2026-06-16

A Bifurcation Theory Framework for Gradient Descent on the Edge of Stability

作者:

arXiv:2606.15551v1 Announce Type: new Abstract: The Edge of Stability (EoS) phenomenon, where gradient descent operates with sharpness exceeding the classical convergence threshold yet the loss decreases over long timescales, is ubiquitous in modern deep learning but remains poorly understood in realistic settings. Prior rigorous analyses have been largely confined to scalar or low-dimensional losses with specific structural forms. In this work, we develop a bifurcation theory framework for gradient descent on the edge of stability that applies directly to overparameterized neural networks. By decomposing the training dynamics into components normal and tangent to the manifold of minimizers, we show that stable EoS training arises from a flip bifurcation in the normal direction, governed by the sign of the first Lyapunov coefficient, while the tangent dynamics drift toward regions of decreasing sharpness. Under mild spectral and geometric assumptions on the loss landscape, we prove convergence to the minimizing manifold when training at the EoS threshold. As a corollary, we recover and unify prior results: we show that the product-stability condition of Gan (2026) is an instance of our framework.

19.
arXiv (CS.LG) 2026-06-19

An adaptive framework for the axisymmetric pulsar magnetosphere using physics-informed Kolmogorov-Arnold networks

arXiv:2606.10686v2 Announce Type: replace-cross Abstract: The pulsar magnetosphere has only recently been addressed using Physics-Informed Neural Networks (PINNs), by deploying a domain-decomposition approach and treating the separatrix and equatorial current sheet as infinitesimally thin discontinuities. However, this baseline requires extensive manual hyperparameter tuning, achieves limited final accuracy and demands several hours of training. We refine this framework by introducing domain-specific neural architectures based on Kolmogorov-Arnold networks, an automated adaptive training pipeline and a physics-based convergence criterion that eliminate the need for manual calibration. The proposed methodology delivers self-consistent axisymmetric magnetosphere solutions with mean squared errors of the PDE residuals at O(1e-6) in double precision - an improvement of two orders of magnitude over the baseline - while achieving convergence in under 20 minutes in single precision. Importantly, the method reliably resolves stellar radii reduced by up to 80% compared to the baseline, overcoming the severe spatial scale disparities that also challenge traditional solvers. Furthermore, by varying the flux that opens to infinity, we provide a correction to the equation that connects it to the equatorial T-point's position. The complete framework is released as the open-source library PulsarX.

20.
arXiv (CS.CV) 2026-06-18

Domain Generalizable Adaptation of 3D Vision-Language Models via Regularized Fine-Tuning

Domain adaptation remains a central challenge in 3D vision, especially for multimodal foundation models that align 3D point clouds with visual and textual data. While these models demonstrate strong general capabilities, adapting them to downstream domains with limited data often leads to overfitting and catastrophic forgetting. To address this, we introduce ReFine3D, a regularized fine-tuning framework designed for domain-generalizable tuning of 3D large multimodal models (LMMs). ReFine3D combines selective layer tuning with two targeted regularization strategies: multi-view consistency across augmented point clouds and text diversity through synonym-based prompts generated by large language models. Additionally, we incorporate point-rendered vision supervision and a test-time augmentation mechanism with confidence-based aggregation to further enhance robustness. Extensive experiments across different 3D domain generalization benchmarks show that ReFine3D improves base-to-novel class generalization by 1.36%, cross-dataset transfer by 2.43%, robustness to corruption by 1.80%, and few-shot accuracy by up to 3.11%, outperforming prior state-of-the-art methods with minimal added computational overhead.

21.
arXiv (CS.AI) 2026-06-12

Toward Instructions-as-Code: Understanding the Impact of Instruction Files on Agentic Pull Requests

arXiv:2606.13449v1 Announce Type: cross Abstract: AI-agents (e.g., GitHub Copilot) collaborate as teammates in different software engineering tasks, including code generation proposed through pull requests (Agentic-PRs). For better agent efficiency, developers create instruction files that guide the AI-agents, including how to navigate the project, locate the right components, run tests, respect best practices, and more. In this paper, we investigate the relationship between the creation of these instructions and the performance of AI-agents in creating better pull requests, which have a higher chance of success (i.e., the merge rate), address more complex tasks (e.g., code churn), and require less effort to be merged (e.g., time to merge). To this end, we analyze 15,549 agentic PRs from 148 projects in the AIDev dataset. Using the three dimensions, we compare each project before and after the creation of the instruction files. We find that specifying instructions for AI-agents does not necessarily lead to better results. With the instruction files, 27.7\% of the projects increased their merge rate by at least 20\%, while 26.35\% decreased it. The same observation is seen with the amount of changes (e.g., code churn, number of modified files) and with the efforts to merge an agentic PR (e.g., merge time and number of comments). From a first exploration, we find that projects that managed to increase their merge rate have substantially longer instruction files, which are also well structured into a higher number of sections and sub-sections. Our results motivate the need for research to assist practitioners in framing the development of instruction files as a software engineering activity (aka, Instructions-as-Code).

22.
arXiv (CS.CV) 2026-06-18

ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL

Recent advances in instruction-based image editing have enabled models to perform complex visual edits from natural language instructions. However, in product-centric scenarios where preserving product features, branding, and textual elements are critical, current open and closed source models often struggle to maintain this fine-grained object identity. This issue is further compounded by the lack of datasets for instruction-based product image editing with text fidelity constraints, leaving it largely treated as an implicit capability of instruction-based image editing models. In this work, we introduce the ProductConsistency dataset which is designed to improve product-centric image editing. Our approach includes a supervised fine-tuning (SFT) dataset of 87k samples for product editing, a reinforcement learning (RL) dataset with 869 unique product images, and a new benchmark dataset, the ProductConsistency Benchmark, to allow rigorous and standardized evaluation of editing models. To guide RL training, we propose a Cyclic Consistency reward that enforces semantic preservation of product identity by using caption similarity between the original product description and captions generated from the edited image. We fine-tune both Qwen-Image-Edit-2511 and Flux.1-Kontext-dev using our dataset and demonstrate consistent improvements over baseline models in OCR and Perceptual metrics, and MLLM-based evaluations as well, indicating stronger product consistency, text rendering, and overall visual quality; with the Qwen-Image-Edit-2511 model achieving a 5x reduction in the character error rate. The code and pipeline is available at https://anonymous.4open.science/r/ProductConsistency-6FCC/README.md

23.
arXiv (CS.AI) 2026-06-18

SHIFT: Semantic Harmonization via Index-side Feature Transformation for Multilingual Information Retrieval

arXiv:2606.18801v1 Announce Type: cross Abstract: With the rapid expansion of massive multilingual corpora, Multilingual Information Retrieval (MLIR) has emerged as a critical technology for global information access. MLIR enables users to retrieve semantically relevant documents from multilingual text collections using a single-language query. However, recent multilingual dense retrieval models often exhibit a strong preference for documents in the same language as the query. This leads to severe language bias, where top-ranked results are dominated by documents of specific languages, even when documents in other languages contain more semantically relevant information. To address this issue, we propose SHIFT, a training-free method applicable in the indexing stage. Specifically, SHIFT utilizes parallel translation pairs to estimate a relative language vector for each target language with respect to a source language. Subsequently, SHIFT corrects the language-specific offset by subtracting this relative language vector from document embeddings during indexing. Our comprehensive evaluation across four MLIR benchmarks and diverse dense retrieval models confirms that SHIFT can effectively mitigate language bias and enhance MLIR performance.

24.
arXiv (CS.LG) 2026-06-12

Reliability of Probabilistic Emulation of Physical Systems

arXiv:2606.12997v1 Announce Type: new Abstract: Two dominant approaches have emerged for generating probabilistic forecasts of physical systems: generative models, such as diffusion or flow matching; and ensembles of deterministic models with stochasticity injected, trained using the continuous ranked probability score (CRPS) loss. While both approaches have demonstrated strong predictive accuracy, the reliability of their uncertainties has not been systematically assessed. We address this gap by developing a framework to evaluate both approaches across diverse 2D spatiotemporal physical systems, under matched model size and computational budget. We assess the reliability of probabilistic emulation by inspecting the empirical coverage of predictive intervals, while also considering accuracy and computational efficiency metrics. CRPS-trained ensembles typically achieve more reliable uncertainties on both single-step prediction and autoregressive rollouts, demonstrating better coverage than the standard alternative of training generative models in a latent space. Moreover, the CRPS approach offers significantly faster inference. When generative models are trained in ambient rather than a compressed latent space, which is often infeasible for high-dimensional problems, they exhibit comparable coverage to CRPS-trained ensembles, though with substantially larger inference latency. In contrast, when CRPS-trained ensembles are trained in latent space they do not show a marked degradation in coverage with respect to ambient space. Both generative models and CRPS-trained ensembles demonstrate good predictive accuracy. To facilitate future research and application, we release AutoCast, a modular framework implementing both generative models and CRPS-trained ensembles, alongside AutoSim, a flexible dataset generation package for rapid prototyping.

25.
arXiv (CS.LG) 2026-06-15

Efficient On-Device Diffusion LLM Inference with Mobile NPU

arXiv:2606.13740v1 Announce Type: new Abstract: Diffusion large language models (dLLMs) accelerate generation by denoising multiple tokens in parallel, making them attractive for latency-sensitive mobile inference. However, repeated denoising introduces substantial computation on smartphones. Mobile neural processing units (NPUs) offer high-throughput dense matrix computation, but efficiently exploiting them remains challenging: token commitment shrinks per-block effective workloads, token revision complicates KV cache reuse, and limited NPU-visible address space incurs costly remapping and data transfer overheads. In this paper, we propose llada.cpp, the first NPU-aware inference framework for accelerating dLLMs on smartphones. llada.cpp aligns block-wise dLLM inference with the execution characteristics of mobile NPUs through three techniques. (1) Multi-Block Speculative Decoding fills the shrinking workload in late-stage current-block decoding with speculative future-block tokens. (2) Dual-Path Progressive Revision keeps committed tokens revisable until stable and refreshes unstable tokens through a CPU-side path without stalling dense NPU execution. (3) Swap-Optimized Memory Runtime compacts NPU-visible address layouts and overlaps data staging with NPU computation to reduce remapping and transfer overheads. We implement llada.cpp as an end-to-end framework and evaluate it across diverse hardware platforms and dLLM workloads. llada.cpp reduces LLaDA-8B generation latency by 17x-42x over the CPU baseline with prefix KV cache reuse, while preserving generation quality.