Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-12

Toward Entanglement Bootstrap for Conformal Field Theory in Any Dimension

arXiv:2606.12540v1 Announce Type: cross Abstract: Given a quantum critical wavefunction in any dimension, we propose a reconstructed Hamiltonian, analogous to the ones previously found for 1+1d CFT and for 2+1d bosonic liquid topologically-ordered states. We test numerically that, for known regularized approximate CFT groundstates (on the icosahedron and the fuzzy sphere), (1) they are close to the groundstate of their reconstructed Hamiltonian, and (2) the spectrum of their reconstructed Hamiltonian on the unit sphere has CFT properties (integer spacing of descendants) and matches known low-lying energies. We show that this provides an automated method to improve the finite-size effects in a fixed Hilbert space.

03.
medRxiv (Medicine) 2026-06-18

Hospital staff views on the visibility, role and impact of Acute Learning Disability Liaison Services in Wales: a service evaluation

People with a learning disability experience marked health inequalities. In Wales, Acute Learning Disability Liaison Services (ALDLS) are delivered by specialised learning disability services, and all roles within them are undertaken by Learning Disability Liaison Nurses (LDLN). These services aim to enable access to, and delivery of, secondary care by supporting reasonable adjustments, facilitating communication, and coordinating care for people with learning disability during hospital encounters. However, independent evidence of the impact of ALDLS on patient care remains limited. This evaluation tries to address this evidence gap by examining hospital staff perceptions of the visibility, role, and impact of ALDLS across Welsh Health Boards, with the aim of informing service design and development and improving secondary care access and care for people with learning disability. The service evaluation used a qualitative approach involving interviews and a focus group with hospital staff across the seven Welsh Health Boards who had experience working with or interacting with ALDLS staff to care for patients with learning disability. Findings cover six key areas including i) visibility and delivery of ALDLS, ii) Barriers and challenges to effective ALDLS delivery, iii) Enablers of effective ALDLS delivery, iv) Positive impacts for patients with learning disability, v) Negative impacts and unintended consequences when the service is absent or limited, and vi) Participants recommendations for future improvements of ALDLS. To synthesise the findings, we developed an overview diagram, which illustrates how ALDLS may influence care quality in acute hospitals. The overview places the liaison service at the centre, showing how organisational enablers and barriers shape its delivery, and how its core functions support improvements in safety, timeliness, effectiveness, efficiency, equity, and patient-centred care. From the findings we have identified recommendations for practice and policy. These include that ALDLS should be recognised as a core, safety-critical component of acute hospital care for people with a learning disability, rather than an optional add-on. In practice, services should be more visibly embedded within routine pathways, with consistent site-based presence, clear referral criteria, early identification through electronic flagging and notification systems, and routine involvement in multidisciplinary planning for complex admissions and procedures. At policy level, ALDLS provision should be recognised within equality and patient safety frameworks as an essential service requiring sustained investment, national minimum configuration standards, adequate staffing, and better-integrated digital systems to support continuity, equitable access, and person-centred care.

04.
arXiv (CS.CV) 2026-06-15

IndustryBench-MIPU: Benchmarking Multi-Image Attribute Value Extraction for Industrial Products

Industrial products such as valves and circuit breakers are defined by dense technical specifications that govern procurement, compatibility, and safety across supply chains. These specifications are scattered across multiple heterogeneous product images, including specification tables, nameplates, and technical drawings, yet whether Multimodal Large Language Models (MLLMs) can reliably recover them remains underexplored. To fill this gap, we introduce IndustryBench-MIPU, the first large-scale benchmark for multi-image industrial product understanding, built around structured attribute extraction – recovering property-value pairs from product images. This task jointly probes text recognition on specification tables and nameplates, visual reasoning over technical drawings, domain knowledge to decode industrial terminology, and cross-image evidence integration to assemble scattered specifications. Concretely, the benchmark comprises 4,559 products across 27,652 images with 103,703 annotations spanning 18 industrial categories, constructed through multi-model consensus and three-tier quality assurance. Evaluating nine MLLMs under both single-image and product-level multi-image settings reveals a stark completeness gap: models achieve high precision (86–94%) but the best recovers only 49.9% of product-level attributes; moving from single-image to multi-image extraction costs 15–34 percentage points of recall. Multi-image completeness, not single-image accuracy, is the core bottleneck. Dataset and code are publicly available.

05.
arXiv (CS.CL) 2026-06-19

Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning

This work presents a general framework for training large language models (LLMs) to "Connect the Dots" (CoD), a meta-capability required by long-lifecycle agents: as an LLM-based AI agent gets deployed in an environment, it solves a long sequence of tasks while continuously exploring the environment, learning from its own experiences, and iteratively self-updating its context about the environment, thereby achieving progressively better performance on future tasks conditioned on the updated context. Major components of the CoD framework include: (1) algorithm design and infrastructure for end-to-end reinforcement learning (RL) with long rollout sequences interleaving solve-task and update-context episodes; (2) tasks and environments for incentivizing and eliciting the targeted meta-capability in LLMs during training, as well as for faithfully measuring progress during evaluation. We present proof-of-concept implementations of the CoD framework, including a GRPO-style RL algorithm with fine-grained credit assignment, as well as tasks and environments tailored to the targeted meta-capability (rather than domain-specific LLM capabilities or standard task-by-task RL). Empirical results validate the efficacy of end-to-end RL training in the CoD setting, and demonstrate the potential for out-of-distribution generalization – within the training domains, across different domains, and from CoD to Ralph-loop settings – of the elicited meta-capability. Our investigation of CoD connects several lines of prior works, and opens up new opportunities for advancing LLMs and AI agents. To facilitate further research and applications, we release our implementations at \url{https://github.com/agentscope-ai/Trinity-RFT/tree/research/cod/examples/research_cod}.

06.
arXiv (CS.LG) 2026-06-16

Factorized Neural Operators Decompose Dynamic and Persistent Responses

arXiv:2606.16900v1 Announce Type: new Abstract: Physical systems often exhibit heterogeneous mechanisms, where rapidly evolving dynamics coexist with persistent structures. Capturing such multiscale physical behavior remains challenging for existing neural operators, which typically rely on single dominant inductive bias and therefore couple distinct physical responses into a shared representation. We introduce the Unified Green's Function Framework across domains and propose the Factorized Neural Operators (FaNO), which decompose spectral representations into equivariant dynamic responses and invariant persistent responses, leading to better interpretability and generalization. Mechanistically, we show that the two operator branches spontaneously specialize into distinct physical roles that remain consistent across scales and domains: the equivariant branch captures rapidly varying transient dynamics, whereas the invariant branch extracts coherent persistent structures. This factorized mechanism of FaNO improves prediction accuracy, parameter efficiency and cross-scale generalization across physical systems and domains. In particular, it maintains consistent predictions under long-horizon autoregressive rollout, cross-resolution extrapolation and physical-regime shifts. These findings suggest that scalable physical modeling may benefit from moving beyond single-inductive-bias formulations toward factorized operator representations that better reflect the heterogeneous organization of physical systems, accelerating the reliable deployment of machine learning for scientific computing and discovery.

07.
arXiv (CS.AI) 2026-06-18

Private Learning with Public Feature Conditioning

arXiv:2606.18773v1 Announce Type: cross Abstract: We study differentially private (DP) regression in settings where each data sample includes public, non-sensitive features – common in applications such as recommendation and advertising systems. While such label-DP or semi-sensitive-feature settings have been primarily explored in the context of classification, effective approaches for regression remain underexplored. We introduce Cond-DP, a conditioned variant of DPSGD that leverages the structure of public feature matrices to improve optimization under privacy constraints. Motivated by the observation that these public features often exhibit rapidly decaying spectra, Cond-DP incorporates a data-driven conditioning matrix to reshape the optimization landscape and accelerate convergence. We provide convergence guarantees for convex, strongly convex, and non-convex settings, and recover standard DPSGD as a special case when the conditioning matrix is the identity. We show how to construct an effective conditioning matrix for Cond-DP directly from public features, enabling provably faster convergence than DPSGD in private linear regression without incurring additional privacy cost. Empirically, Cond-DP with this conditioning matrix consistently outperforms state-of-the-art baselines across a wide range of datasets and model architectures under label DP, demonstrating strong and robust performance in practice.

08.
arXiv (CS.CL) 2026-06-17

Position: Coding Benchmarks Are Misaligned with Agentic Software Engineering

Coding agents have become a major mode of software engineering, but the benchmarks we use to compare them were designed in a pre-agent era: they collapse model, harness, and environment into a single end-to-end score, typically computed against one reference solution, with no component-level signal for iteration. We argue that current coding benchmarks are misaligned with agentic software engineering. A coding agent in practice is not a model: it is a system harness – a composite of models, harnesses, contexts, environments, and feedback signals, any one of which can move the benchmark score by margins comparable to those between adjacent model generations. We discuss three symptoms: (i) benchmark scores conflate the model with the rest of the harness; (ii) grading against a single reference solution penalises equally valid alternatives; and (iii) the absence of signal at the level of individual harness components makes the end-to-end system score difficult to iterate on.

09.
arXiv (math.PR) 2026-06-15

Uniform-in-time error estimates for McKean-Vlasov SDEs with common noise and stochastic algorithms

arXiv:2606.14170v1 Announce Type: new Abstract: In this work, by construct an asymptotic coupling by reflection, we first explore the uniform-in-time estimate on probability distance for two measure-valued processes induced by a McKean-Vlasov SDE with common noise and an interacting particle system, where the drift terms are dissipative merely in the long distance. As direct applications of this estimate, we establish the uniform-in-time error estimates for the numerical solutions derived via backward/tamed/adaptive Euler-Maruyama methods. Moreover, as another direct application, the uniform-in-time conditional propagation of chaos is quantified.

10.
arXiv (CS.LG) 2026-06-24

XConv: Low-memory stochastic backpropagation for convolutional layers

arXiv:2106.06998v5 Announce Type: replace Abstract: Training convolutional neural networks at scale demands substantial memory, largely because intermediate activations must be stored for backpropagation. Existing remedies (checkpointing, invertible architectures, or gradient-approximation methods such as randomized automatic differentiation) either add significant computation, impose architectural constraints, or require non-trivial code changes. We propose XConv, a near-drop-in replacement for standard 2D and 3D convolutional layers that addresses all three: it preserves standard backpropagation, imposes no architectural constraints, and integrates into existing codebases with minimal changes. XConv exploits the algebraic structure of convolutional weight gradients, storing highly compressed projections of the activations rather than the full tensors and approximating the gradients via multi-channel randomized trace estimation. The number of probing vectors sets a memory-accuracy tradeoff and recovers the exact gradient in the limit. We establish convergence guarantees and error bounds for the estimator, showing that its gradient-error variance is comparable to that of stochastic gradient descent. Empirically, XConv matches exact-gradient methods across classification, generative modeling, super-resolution, inpainting, and segmentation, with gaps that narrow as the number of probing vectors grows, while reducing activation memory by a factor of two or more when convolutional activations dominate, and remaining computationally competitive with optimized convolution kernels at larger batch sizes. At half precision the gradient-approximation error falls to the rounding floor, so XConv adds essentially no error beyond that of low-precision arithmetic. The savings matter most where activation memory rather than compute is the binding constraint, such as high-resolution and volumetric training and on-device finetuning.

11.
arXiv (CS.CV) 2026-06-25

Privacy-Aware Visual Language Models

As Visual Language Models (VLMs) become increasingly embedded in everyday applications, ensuring they can recognise and appropriately handle privacy-sensitive content is thus essential to protect users. To this end, we conduct a comprehensive evaluation of twelve state-of-the-art VLMs and identify limitations in their understanding of visual privacy. However, existing privacy-related datasets often suffer from label inconsistencies, limiting their reliability. To address this, we introduce two compact, high-quality benchmarks, PrivBench and PrivBench-H, that focus on commonly recognised visual privacy categories aligned with the General Data Protection Regulation (GDPR). Additionally, we present PrivTune, an instruction-tuning dataset specifically curated to improve privacy sensitivity. We obtain multiple Privacy VLMs by fine-tuning off-the-shelf VLMs on only a few hundred samples from PrivTune, which leads to substantial gains on all benchmarks, surpassing even GPT-4, while maintaining strong performance on other tasks. Our findings show that privacy-awareness in VLMs can be substantially improved with minimal data and careful dataset design, setting the stage for safer, more privacy-aligned AI systems.

12.
arXiv (CS.CL) 2026-06-24

Paying to Know: Micro-Transaction Markets for Verified Product Information in Agentic E-Commerce

Commercial NLP treats the shopping chatbot as a recommender or a conversion tool: its job is to match a user to a catalogue entry and close a sale. We argue that the arrival of agent-native micro-payment rails (e.g., x402, AP2) changes what is scarce. When the buyer is an autonomous agent that can investigate exhaustively, the bottleneck is no longer matching products but acquiring trustworthy, decision-relevant information about them. We envision agentic e-commerce as a micro-transaction market for verified information: buyer agents spend fractions of a cent to progressively unlock seller- and reviewer-supplied data – service histories, third-party test reports, bills of materials, audited sales and support metrics – paid for a la carte under a freemium model, with reviewer trust scored reputationally. We sketch the architecture of such a market and argue that it rewards genuine product quality and yields truer competition than ranking-based storefronts. We then translate the vision into concrete NLP problems – cost-optimal information acquisition, data pricing and negotiation, real-time entity resolution, grounded value exchange, and privacy-preserving persona modelling – and argue that these, not chat fluency, deserve the field's attention.

13.
arXiv (CS.LG) 2026-06-11

Discovery and inference beyond linearity for epidemiological data by integrating Bayesian regression, tree ensembles and Shapley values

arXiv:2505.00571v3 Announce Type: replace-cross Abstract: Machine Learning (ML) is gaining popularity in epidemiology and healthcare studies for hypothesis-free discovery of risk and protective factors. ML is strong at discovering nonlinearities and interactions, but this power is compromised by a lack of reliable inference. Although Shapley values provide local measures of features' effects, valid uncertainty quantification for these effects is typically lacking, thus precluding statistical inference. We propose RuleSHAP, a framework that addresses this limitation by combining a dedicated Bayesian sparse regression model with an improved tree-based rule generator and Shapley value attribution. RuleSHAP provides detection of nonlinear and interaction effects, with uncertainty quantification at the individual level as a key contribution. We derive an efficient formula for computing marginal Shapley values within this framework. We apply RuleSHAP to data from an epidemiological cohort to detect and infer several effects for high cholesterol and blood pressure, such as nonlinear interaction effects between features like age, sex, ethnicity, BMI and glucose level. To conclude, we demonstrate the validity of our framework on simulated data.

14.
arXiv (CS.CV) 2026-06-18

Spiking Pyramid Wavelet Transformation for High-efficient and Low-energy Image Restoration

Spiking neural networks (SNNs) have garnered significant interest in computer vision due to their potential for efficiency and biological inspiration. While spiking CNN-based methods have shown promise for image restoration (IR) tasks, their performance is constrained by the inherent receptive field limitations of CNN operations. In the paper, we explore the benefits of discrete wavelet transformation and propose a spiking pyramid wavelet-based model (SPWM) for high-efficient and low-energy target. Specifically, we develop a spiking dual pyramid wavelet (SDPW) block to model long-range dependency and exploit the properties of the degradation in the wavelet domain. Experimental results on several benchmarks demonstrate that SPWM significantly lowers computational costs and energy consumption while maintaining image quality. Our method showcases the potential of SNNs in the field of IR, offering new insights for future applications of resource-limited devices.

15.
arXiv (CS.AI) 2026-06-24

Themis: An explainable AI-enabled framework for Reinforcement Learning with Human Feedback

arXiv:2606.24622v1 Announce Type: new Abstract: Training safe Reinforcement Learning (RL) systems is inherently challenging, with no guarantee of avoiding unwanted behaviors. The most effective defenses against this are (i) transparency through explainability and (ii) alignment via human feedback. While both show promising results, no publicly available framework currently combines them. To address this, we introduce Themis, an XAI-enabled testing and evaluation framework for Reinforcement Learning from Human Feedback. Themis supports over 200 widely used environments and is easily configurable for experiments in RL, transparency, and alignment. Our results show that Themis can train reward models that match or outperform the environment's true reward signal using human preferences. We also provide a cloud-based platform for collecting human feedback and managing experiments. It is user-friendly, auto-scalable, and supports large participant groups across multiple experiments without extra development overhead. Tests show Themis can support one thousand users in back-to-back experiments on a modest commercial machine.

16.
arXiv (CS.CL) 2026-06-18

Probing Semantic Alignment, Lexical Invariance, and Syntactic Influence in LLM Metaphor Processing

Large language models (LLMs) achieve strong performance on metaphor detection and interpretation tasks, yet it remains unclear what such behavioral success reveals about metaphor processing. We present a diagnostic analysis that examines the limits of behavioral evidence by probing three complementary dimensions: semantic attribute alignment, lexical invariance, and syntactic sensitivity. Using geometric probing, we assess whether model-generated interpretations align with reference semantic attributes; through context-varying substitution, we analyze the stability of lexical associations between metaphorical and literal expressions; and via controlled syntactic perturbations, we examine sensitivity in metaphor detection. Our analysis reveals that LLM-generated interpretations can exhibit semantic drift relative to reference attributes; stable lexical anchors persist across contextual conditions, potentially supporting conventional metaphors while biasing novel metaphors requiring contextual integration; and detection performance is sensitive to syntactic irregularities. These findings suggest that strong behavioral performance may reflect heterogeneous underlying signals, highlighting the need for caution when interpreting metaphor benchmarks as evidence of robust, integrated semantic understanding.

17.
arXiv (quant-ph) 2026-06-17

Quantum mechanics in configuration space in context

arXiv:2606.17622v1 Announce Type: new Abstract: To enhance the way in which wave-particle duality is implemented in the modelling of quantum mechanical systems, Bukhari et al. [New J. Phys. 27, 084501 (2025)] recently introduced an alternative approach to quantum mechanics, namely quantum mechanics in configuration space. This formalism is based on a physically motivated quantisation of Newtonian mechanics and promotes the classical position-velocity states (x,v) to pairwise distinguishable quantum states. The resulting |x,v> states form the basis of the Hilbert space of individual quantum mechanical particles and evolve along classical trajectories. In this paper, we consider the modelling of a mechanical particle in free space and put quantum mechanics in configuration space into context. It is shown that this formalism increases the continuity between quantum and classical mechanics by avoiding a conceptual inconsistency associated with the definition of momentum in canonical quantisation. In addition, we emphasise that standard quantum mechanics and quantum mechanics in configuration space are based on two distinct formulations of classical mechanics.

18.
arXiv (CS.AI) 2026-06-12

Variational Learning for Insertion-based Generation

arXiv:2606.02133v3 Announce Type: replace-cross Abstract: Non-monotonic sequence generation methods, such as masked diffusion models, provide a flexible alternative to left-to-right autoregressive modeling by allowing tokens to be generated in non-fixed and prescribed orders. Despite their practical advantages, most existing non-monotonic models are order-agnostic and rely on a fixed-length grid, limiting their ability to support variable-length generation and adaptive insertion order. In this work, we introduce a probabilistic framework for learning insertion order in variable-length insertion models. We formalize a bijective correspondence between insertion trajectories and permutations, which enables an exact reparameterization of the data likelihood as a sum over permutations. Building on this result, we propose the Insertion Process (IP), a stochastic generative model that jointly learns where to insert, what to insert, and when to terminate, trained via permutation-based variational inference. Unlike prior fixed-canvas approaches, IP natively supports variable-length generation and learns data-driven preferences over insertion orders. Experiments on goal-conditioned planning and molecular string generation demonstrate that learning insertion order improves both modeling quality and generalization in domains without a canonical left-to-right structure.

19.
arXiv (CS.CV) 2026-06-18

Learning to Distort: Weakly-Supervised Image Quality Transfer for Prostate DWI Correction

Single-shot echo-planar prostate diffusion-weighted imaging (DWI) is frequently complicated by geometric distortions, which impact the ability to derive reliable diagnoses from such images. Developing automated correction methods is challenged by the absence of paired distorted and undistorted clinical scans. In this paper, we first propose a novel weakly-supervised image quality transfer (IQT) framework from undistorted to distorted images that utilizes image quality assessment (IQA) signals to supervise the transfer process. Unlike traditional methods that require expensive, voxel-wise paired data or resort to developing unpaired algorithms, our approach utilizes image-level quality labels (here, distorted vs. undistorted) to establish latent quality prototypes within a pre-trained feature space. Recognizing that simulating realistic distortions is more reliable than direct unpaired correction, we describe a weakly-supervised prototype flow matching algorithm to explicitly regularize generative trajectories towards distorted prototypes, producing realistic susceptibility artifacts that mimic clinical degradations. By synthesizing these realistic pairs, we enable a second IQT model to be trained in the forward direction for distortion correction. Experimental results demonstrate that our generated images successfully mimic the diagnostic interference of real-world artifacts, which leads to more capable distortion correction IQT models. In addition to qualitative comparisons, we also conduct exhaustive quantitative evaluations that compare our approach with existing unpaired approaches (e.g., CycleGAN, UNIT-DDPM, and OT-FM) - as either forward or reverse alternatives - by assessing clinical downstream task performance in PI-RADS and Gleason score classification, using both in-distribution and external data sets.

20.
arXiv (CS.CV) 2026-06-16

GRACE: Boosting Video MLLMs with Grounded Action-Centric Evidence for Viewer Sentiment Prediction

Viewer sentiment prediction in video advertisements aims to infer the latent affective response evoked in the audience. To bridge the gap between what is shown and what is felt, models must deduce hidden viewer emotions from explicit visual narratives, concrete character-object interactions, and visible textual cues. However, standard Multimodal Large Language Models (MLLMs) typically rely on holistic frame representations, which leave these fine-grained, affect-relevant events implicit and complicate precise emotional reasoning. To address this, we propose a grounded action-centric evidence augmentation framework that enhances video MLLMs' clue extraction and comprehension by introducing explicit event structure and localized visual evidence. Our method extracts temporally ordered subject-verb-object (SVO) triplets and auxiliary visible textual cues from action-centric video descriptions, grounds subject and object entities as visual entity crops, and then enables the MLLM to perform clue-enhanced emotional reasoning based on these extracted structured clues. In this way, action triplets specify "what happens", while grounded visual entity crops anchor "who or what participates in each event" to concrete visual evidence. Experiments on the Pitts dataset show consistent improvements over Qwen2.5-VL and Qwen3-VL baselines. Ablation studies, cross-dataset evaluation on AdsQA, and transfer experiments on an emotion-focused TVQA subset further support the effectiveness and generalization of our approach.

21.
Nature Medicine 2026-06-15

Activity-dependent adaptive deep brain stimulation improves gait in Parkinson’s disease

Parkinson’s disease leads to a spectrum of locomotor deficits that vary in severity with the nature of daily activities and the fluctuating physiology of patients. Many of these deficits remain inadequately addressed by existing deep brain stimulation therapies that rely on activity-agnostic parameters optimized for cardinal motor symptoms. By contrast, therapies embedding activity-specific parameters have the potential to better address the entire range of symptoms. Here we expose physiological principles that enable real-time decoding of ongoing locomotor activities across motor fluctuations from the neural dynamics of the subthalamic nucleus. This decoding steered activity-dependent adaptations of deep brain stimulation therapies that improved locomotor deficits while preserving efficacy for cardinal motor symptoms across activities of daily living. Our activity-dependent framework provides a blueprint for next-generation neuromodulation therapies that continuously select parameters optimized to the behavioral context and fluctuating physiology of each patient. ClinicalTrials.gov registration NCT06791902 . Neural decoding algorithms that leverage physiological principles of locomotor encoding support activity-dependent deep brain stimulation therapies that improve locomotor deficits in people with Parkinson’s disease.

23.
arXiv (quant-ph) 2026-06-24

Preparing multi-qudit states in a definite-weight subspace

arXiv:2606.24659v1 Announce Type: new Abstract: We formulate a deterministic algorithm for preparing arbitrary multi-qudit states in a definite-weight subspace. By ordering the corresponding computational basis states according to a Gray code for multiset permutations, the state-preparation task is reduced to performing a sequence of controlled 2-qudit Gray rotations. We use this algorithm to prepare exact eigenstates of the SU(3)-invariant Heisenberg Hamiltonian, whose Bethe ansatz is nested. In particular, we describe the preparation of the Bethe states, which are SU(3) highest-weight states, as well as their lower-weight descendants. We also consider the preparation of $SU(d)$ Dicke states and their q-deformations.

24.
arXiv (CS.AI) 2026-06-24

Prob-BBDM: a Probabilistic Brownian Bridge Diffusion Model for MRI sequence image-to-image translation

arXiv:2606.24313v1 Announce Type: new Abstract: AI-driven image-to-image synthesis is rapidly advancing, with growing applications in medical imaging. Multi-modal image analysis plays a crucial role in optimizing examination quality, yet acquiring multiple imaging modalities in clinical settings remains resource-intensive and time-consuming, especially for 3D imaging. To address this challenge, we propose a novel image-to-image translation model based on Brownian Bridge Diffusion Models (BBDM), which synthesizes magnetic resonance imaging (MRI) sequences from 2D axial slices. Our approach integrates a variational encoder-guided diffusion mechanism, leveraging probabilistic image distributions to enhance synthesis quality. Evaluated on the BraTS 2021 dataset, our Probabilistic-BBDM (Prob-BBDM) achieves superior performance across multiple translation tasks, reaching up to 88.46% SSIM and 26.09 dB PSNR, with consistent improvements over baselines. Notably, our diffusion process requires only 4 steps, making it computationally efficient while maintaining high-quality synthesis. To further validate generalizability, we test Prob-BBDM on an external third-party dataset, demonstrating consistent performance across domains. Additionally, we assess the clinical utility of the synthesized slices by using them as input to a pre-trained segmentation model. Tumor segmentation yields a Dice score of 88.71% and an HD95 of 3.49 mm, confirming that the synthesized slices preserve critical diagnostic information. These results highlight the potential of Prob-BBDM for high-quality, efficient, and generalizable MRI synthesis, offering a promising step toward improved medical image translation.

25.
arXiv (CS.CV) 2026-06-24

Sat2City v2: Native 3D City Asset Generation from a Single Satellite Image

Generating explicit 3D city assets from a single satellite image is important for digital twins, urban simulation, and geospatial intelligence. Unlike satellite-to-street-view synthesis, the task requires a reusable textured mesh with plausible geometry and controllable appearance rather than a 3D proxy optimized only for rendering a small set of images or videos. The ICCV Sat2City framework made a first step by conditioning cascaded sparse-voxel latent diffusion on satellite-derived height maps, but its appearance was random, its training data were synthetic, and its task-specific VAE did not scale well to noisy real-world reconstructions. We present Sat2City v2, a journal extension that adapts a pretrained native structured-latent 3D foundation model to weakly aligned satellite images and textured meshes. We build a real-world dataset with 16,241 satellite-mesh pairs across 24 regions in 9 cities. Instead of learning a 3D representation from noisy city meshes, Sat2City v2 encodes each mesh into a pretrained native 3D latent space, fine-tunes a satellite-conditioned geometry flow, and uses the decoded shape to anchor satellite-conditioned texturing. This retains Sat2City's geometry-to-appearance cascade while enabling appearance-controllable generation from the satellite input. Experiments on metric-scale DSM reconstruction and generative city-asset benchmarks for geometry and appearance show that Sat2City v2 achieves the best overall performance among evaluated baselines. Overall, Sat2City v2 advances satellite-to-city generation from rendering-oriented 3D proxies to explicit textured mesh assets, supported by, to the best of our knowledge, the first documented satellite-mesh paired dataset collected from matched geographic crops for this asset-level task. Project page: https://ai4city-hkust.github.io/Sat2City-v2/