Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-18

E-VAds: An E-commerce Short Videos Understanding Benchmark for MLLMs

E-commerce short videos represent a high-revenue segment of the online video industry characterized by a goal-driven format and dense multi-modal signals. Current models often struggle with these videos because existing benchmarks focus primarily on general-purpose tasks and neglect the reasoning of commercial intent. In this work, we first propose a multi-modal information density assessment framework to quantify the complexity of this domain. Our evaluation reveals that e-commerce content exhibits substantially higher density across visual, audio, and textual modalities compared to mainstream datasets, establishing a more challenging frontier for video understanding. To address this gap, we introduce E-commerce Video Ads Benchmark, which is the first benchmark specifically designed for e-commerce short video understanding. We curated 3,961 high-quality videos from Taobao covering a wide range of product categories and used a multi-agent system to generate 19,785 open-ended Q&A pairs, which consist of five distinct tasks. Finally, we develop E-VAds-R1, an RL-based reasoning model featuring a multi-grained reward design called MG-GRPO. This strategy provides smooth guidance for early exploration while creating a non-linear incentive for expert-level precision. Experimental results demonstrate that E-VAds-R1 achieves a 109.2% performance gain in commercial intent reasoning with only a few hundred training samples. Data is available at https://github.com/TaobaoTmall-AlgorithmProducts/E-VAds_Benchmark.

02.
arXiv (CS.CV) 2026-06-15

Pix2Fact: When Vision Is Not Enough – Benchmarking Fine-Grained VQA with Web Verification on High-Resolution Real-World Scenes

Despite progress on general tasks, vision-language models (VLMs) still struggle with challenges that demand both fine-grained visual grounding and external knowledge, a synergy overlooked by existing benchmarks that evaluate these abilities in isolation. To fill this void, we introduce Pix2Fact, a visual question-answering benchmark designed to assess expert-level visual perception and knowledge search. Pix2Fact comprises 1,000 high-resolution (4K+) images spanning eight scenarios. Its questions and answers are meticulously crafted by PhD-holding annotators from top global universities across diverse disciplines. Each question requires detailed visual grounding and the integration of external knowledge. Evaluating ten state-of-the-art VLMs, including proprietary models such as Gemini-3.1-Pro and GPT-5.4, we find that Pix2Fact poses a formidable challenge: the most advanced model (Gemini-3.1-Pro) achieves only 51.7% average accuracy, even with access to visual ground truth and search tools. Our analysis attributes this low accuracy to three factors, frequent visual grounding errors even with visual ground truth, shallow search harnessing, and VLM's inability to retrieve long-tail, unstructured local information. This striking gap exposes the limitations of current models in assisting humans with real-world scenarios that demand overwhelming visual comprehension. We believe Pix2Fact will serve as a critical benchmark to drive the next generation of language-vision agents that seamlessly integrate fine-grained perception with robust knowledge search.

03.
arXiv (CS.CL) 2026-06-18

As Easy as Rocket Science: Assessing the Ability of Large Language Models to Interpret Negation in Figurative Language

Figurative language and negation are two areas that challenge current language models, however, both are widely used throughout written and spoken language. Large language models (LLMs) are also widely used in everyday contexts where they cannot necessarily be tuned for a specific dataset. It is therefore essential to understand the ability of LLMs to correctly interpret text that includes both negation and figurative language. To investigate this, we develop a set of new annotations to an existing dataset of figurative language, and test a range of language models on the dataset. We find that the combination of negation and figurativeness can present a particular challenge, and that performance overall and across different negation types is particularly dependent on the prompt style used.

04.
arXiv (CS.AI) 2026-06-11

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

arXiv:2606.11990v1 Announce Type: cross Abstract: Remaining Useful Life (RUL) prediction is essential for industrial predictive maintenance, yet many learning-based approaches rely on extensive feature engineering or large labeled datasets to train task-specific sequence models. In this work, we introduce a lightweight learning approach, in which we leverage a frozen pretrained time-series foundation model (TSFM) and combine it with a small regression head for RUL estimation from multivariate sensor streams. More specifically, we use Chronos-2 as a frozen backbone to extract context window features and train a lightweight regression neural network for RUL prediction. Experiments on real-world industrial sensor data from two device types show that Chronos-2 features consistently improve over recurrent, convolutional, Transformer-based, and gradient-boosting baselines under the same preprocessing and evaluation protocol. We further analyze the impact of context length and find that performance improves significantly with longer histories, indicating that TSFM representation offer a practical and data-efficient alternative for RUL estimation in industrial settings.

05.
arXiv (quant-ph) 2026-06-11

Non-Hermitian Delocalization Realizes Random Dirac Criticality in One Dimension

arXiv:2606.12089v1 Announce Type: cross Abstract: Non-Hermitian systems can evade Anderson localization and exhibit delocalized states even in one dimension. Here, we show that such non-Hermitian delocalized states under periodic boundary conditions (PBC) are intrinsically critical, realizing the universality class of one-dimensional random Dirac fermions. By linking spectral winding to topological Anderson transitions via Hermitization, we demonstrate that the delocalized PBC states exhibit a Dirac-type criticality with universal algebraic correlations. In contrast to Hermitian systems, where this criticality occurs only at fine-tuned transition points, it emerges generically in non-Hermitian systems as a consequence of spectral topology. These results identify a universal mechanism by which non-Hermiticity promotes criticality, providing a unified description of non-Hermitian delocalization in one dimension.

06.
arXiv (math.PR) 2026-06-16

Purely unrectifiable sets, fractal percolation and graphs of functions

arXiv:2606.15745v1 Announce Type: cross Abstract: This paper contains a survey of some of the results of the author related to unrectifiablity and is an extended version of the author's talk given at the Second Winter School Geometric Measure Theory Rectifiability vs. Pure Unrectifiability in Hanghzou, China. These results include irregular/purely unrectifiable $1$-sets on the graphs of continuous functions like the Takagi, the Weierstrass-Cellerier and the typical (in the sense of Baire) continuous function. It is also discussed that there exists $ {\alpha}_{0}\alpha_0$. The background of the $1$-unrectifiability is discussed in more detail.

07.
medRxiv (Medicine) 2026-06-17

Brain age gap correlates with DTI-derived microstructural abnormalities in multiple sclerosis.

Background: Brain age gap (BAG) is increased in multiple sclerosis (MS), but whether it reflects microstructural pathology beyond conventional atrophy remains unclear. Objective: To test whether BAG is elevated in MS and correlates with conventional and diffusion tensor imaging (DTI) abnormalities relative to healthy controls. Methods: A case-control study of 43 people with MS and 18 healthy controls was performed. BAG was estimated from T1-weighted MRI using brainageR. Controls were used as MRI reference distributions. MRI values were expressed as deviation z-scores and correlated with BAG within MS. Conventional MRI and DTI domains were analysed using age/sex-adjusted partial correlations with domain-wise Benjamini-Hochberg FDR correction, where appropriate. Results: BAG was higher in MS than controls (4.79 vs -2.58 years; p

08.
medRxiv (Medicine) 2026-06-11

Electrical signatures of divergent connectivity in the human subgenual cingulate cortex

Background: Major depressive disorder remains a leading cause of disability. While subgenual cingulate cortex (sgCC) deep brain stimulation (DBS) shows promise for medically refractory depression, clinical outcomes have been heterogeneous, suggesting that individual differences in neural circuitry engagement may critically influence therapeutic efficacy. We aimed to define the electrophysiological signatures of sgCC efferent connectivity using single-pulse electrical stimulation (SPES) with intracranial stereo-EEG (sEEG) to inform rational targeting and physiological biomarkers for sgCC-DBS. Methods: In four patients undergoing clinically indicated sEEG for seizure mapping, SPES was delivered through sgCC pairs, while distributed brain stimulation-evoked potentials (BSEPs) were recorded across cortical and subcortical sites. Responses were characterized using Canonical Response Parameterization to extract reproducible waveforms and per-trial reliability. Results: sgCC stimulation elicited reproducible, spatially organized BSEPs across frontal, limbic, and paralimbic networks, aligning with known anatomical pathways. Frontal recruitment featured robust, lateralized orbitofrontal activation favoring the ipsilateral central, medial OFC and bilateral ventromedial prefrontal responses. Limbic effects demonstrated bilateral cingulate activation with stronger ipsilateral recruitment and lateralized amygdala and hippocampal responses. Paralimbic engagement included insular responses with subject-specific anterior predominance and bi-hemispheric temporal-polar slow-wave deflections. Conclusion: These findings provide direct electrophysiological evidence of distributed, lateralized sgCC divergent network connectivity in the human brain, offering physiologic confirmation of its role in affective circuitry. The observed topography and laterality have direct applications for sgCC-DBS targeting and implicate BSEP signatures as candidate biomarkers to guide patient-specific therapy.

09.
arXiv (quant-ph) 2026-06-12

Explicit Quantum Circuit Simulation of Nonlinear 1-Dimensional Fluid with Carleman-linearized Boltzmann Method

arXiv:2606.12770v1 Announce Type: new Abstract: Quantum computation of fluid dynamics has attracted growing attention as a key application of fault-tolerant quantum computers anticipated in the coming decade, with lattice Boltzmann methods emerging as a particularly promising approach. Explicit and efficient elementary-gate-level circuit simulations, however, have so far been demonstrated only in the linear case. Here we include the leading nonlinearity through second-order Carleman linearization of the one-dimensional Boltzmann equation, and demonstrate, via explicit quantum-circuit simulation, the preparation of the final-time state using a Taylor-expansion-based ODE solver based on the quantum singular value transformation. With this construction, we analyze the gate and qubit complexities, which scale logarithmically with the grid size, the nonlinearity captured by the higher-order Carleman linearization, and the practical utility of higher-order expansions in the Taylor ODE solver. The construction provides a concrete baseline for computational cost reduction and further developments such as extensions to higher dimensions, complex geometries, and the extraction of physical quantities, towards industrially useful quantum CFD.

10.
arXiv (CS.CV) 2026-06-18

Conditional Latent Diffusion Model with Fourier-based Motion Modelling for Virtual Population Synthesis

In-silico trials of medical devices require the generation of virtual populations of anatomies. In cardiovascular applications, virtual anatomy is typically represented as a 3D+t mesh sampled from a generative model. However, most existing mesh generators focus on static anatomy, while sequence models often lack explicit periodicity. To this end, we propose 4D F-MeshLDM, a conditional generative framework comprising a convolutional mesh VAE to encode meshes, a structural latent space that parameterises motion using a truncated Fourier series, and a diffusion prior that learns the latent distribution over Fourier coefficient tokens. By conditioning the diffusion process on clinical covariates via affine modulation, we enable controllable synthesis. Sampling tokens and performing inverse Fourier synthesis yield cycle-consistent latent trajectories, which can be decoded into 3D+t cardiac mesh sequences. Experiments on 5,000 UK Biobank subjects demonstrate that 4D F-MeshLDM outperforms state-of-the-art baselines in anatomical fidelity and achieves near-zero cycle closure error. Furthermore, the generated cohorts accurately preserve clinical functional indices, highlighting the potential of our framework for reliable in-silico cardiac trials.

11.
arXiv (CS.CV) 2026-06-19

Shape of Thought: Progressive Object Assembly via Visual Chain-of-Thought

Multimodal models for text-to-image generation have achieved strong visual fidelity, yet they remain brittle under compositional structural constraints, notably generative numeracy, attribute binding, and part-level relations. To address these challenges, we propose Shape-of-Thought (SoT), a visual CoT framework for process-supervised progressive shape assembly in the rendered 2D domain, without external engines at inference time. SoT trains a unified multimodal autoregressive model to generate interleaved textual plans and rendered intermediate states, helping the model capture shape-assembly logic without producing explicit geometric representations. Unlike text-only CoT, each decision is grounded in a rendered state, making counts, attachments, topology, and intermediate part-addition errors inspectable across the trajectory. To support this paradigm, we introduce SoT-26K, a large-scale dataset of grounded assembly traces derived from part-based CAD hierarchies, and T2S-CompBench, a benchmark for evaluating structural integrity and trace faithfulness. Fine-tuning on SoT-26K achieves 88.4% on component numeracy and 84.8% on structural topology, outperforming direct generation by +24.2 points on component numeracy and +19.3 points on structural topology. SoT establishes a transparent testbed for rendered-domain structure-aware generation. The code is available at https://github.com/yuhuo03/Shape-of-Thought.

12.
arXiv (quant-ph) 2026-06-11

Additivity and chain rules for quantum entropies via multi-index Schatten norms

arXiv:2502.01611v3 Announce Type: replace Abstract: The primary entropic measures for quantum states are additive under the tensor product. In the analysis of quantum information processing tasks, the minimum entropy of a set of states, e.g., the minimum output entropy of a channel, often plays a crucial role. A fundamental question in quantum information and cryptography is whether the minimum output entropy remains additive under the tensor product of channels. Here, we establish a general additivity statement for the optimized sandwiched Rényi entropy of quantum channels. For that, we generalize the results of [Devetak, Junge, King, Ruskai, CMP 2006] to multi-index Schatten norms. As an application, we strengthen the additivity statement of [Van Himbeeck and Brown, 2025] thus allowing the analysis of time-adaptive quantum cryptographic protocols. In addition, we establish chain rules for Rényi conditional entropies that are similar to the ones used for the generalized entropy accumulation theorem of [Metger, Fawzi, Sutter, Renner, CMP 2024].

13.
arXiv (CS.LG) 2026-06-19

Direct Advantage Estimation for Scalable and Sample-efficient Deep Reinforcement Learning

arXiv:2606.20411v1 Announce Type: new Abstract: Direct Advantage Estimation (DAE) has been shown to improve the sample efficiency of deep reinforcement learning algorithms. However, its reliance on full environment observability limits its applicability in realistic settings, and its requirement to model transition probabilities incurs substantial computational overhead for high-dimensional observations. In the present work, we address both limitations. First, we extend the theoretical framework of DAE to partially observable domains with minimal modifications. Second, we reduce its computational complexity by introducing discrete latent dynamics models that efficiently approximate transition probabilities. We evaluate our approach on the Arcade Learning Environment and find that DAE scales effectively with function approximator capacity while retaining high sample efficiency.

14.
arXiv (CS.AI) 2026-06-15

The Accountability Paradox: How Platform API Restrictions Undermine AI Transparency Mandates

arXiv:2505.11577v5 Announce Type: replace-cross Abstract: Recent application programming interface (API) restrictions on major social media platforms challenge compliance with the EU Digital Services Act [20], which mandates data access for algorithmic transparency. We develop a structured audit framework to assess the growing misalignment between regulatory requirements and platform implementations. Our comparative analysis of X/Twitter, Reddit, TikTok, and Meta identifies critical ``audit blind-spots'' where platform content moderation and algorithmic amplification remain inaccessible to independent verification. Our findings reveal an ``accountability paradox'': as platforms increasingly rely on AI systems, they simultaneously restrict the capacity for independent oversight. We propose targeted policy interventions aligned with the AI Risk Management Framework of the National Institute of Standards and Technology [80], emphasizing federated access models and enhanced regulatory enforcement.

15.
arXiv (CS.CL) 2026-06-18

Dango: A Strictly L1-Only Large Language Model for Studying Second Language Acquisition

We introduce Dango, a 1.8B-parameter large language model designed for controlled studies of L1-to-L2 (Japanese-to-English) transfer in second language acquisition (SLA). While previous studies have explored SLA in language models, they have predominantly relied on smaller or non-decoder models, limiting their ability to generate open-ended text and reducing their suitability as practical L2 simulators. We identify a key challenge when scaling models to this size: L2 contamination within the "monolingual" pretraining corpus used for L1 acquisition. To address this, we propose a filtering method to reduce premature exposure to English while preserving realistic, minimal exposure. We then fine-tune the model on LLM-generated L2-learning lessons to simulate the L2 acquisition process. Our evaluations confirm that Dango develops human-like L2 production patterns, outperforming both unfiltered and standard multilingual baselines. We release the model, data, and code to facilitate reproducible computational SLA research and learner-facing applications.

16.
arXiv (CS.LG) 2026-06-11

Structure-Preserving Neural Surrogates with Tractable Uncertainty Quantification

arXiv:2606.11650v1 Announce Type: new Abstract: Recent advances in scientific machine learning provide a means of near-real-time solution to partial differential equations (PDEs), but lack the theoretical underpinnings of conventional simulators that support contemporary verification and validation. In this work, we construct data-driven reduced-order models that serve as structure-preserving, real-time surrogates. Remarkably, the exterior calculus that imposes physical conservation structure also exposes topological structure that we use to build a Gaussian process (GP) representation of uncertainty in state-flux relationships, ultimately yielding a Dirichlet-to-Neumann map for quantities of interest with closed-form expressions for posterior uncertainty. We specifically propose structure-preserving $H(\mathrm{div})$–$L^2$ subspaces of conventional Raviart–Thomas and $dgP_0$ elements prescribed by a lightweight transformer. Reduced-order dynamics consistent with this subspace are learned by posing a conservation law in which a GP describes the fluxes between volumes. This work hinges on a novel interface between mixed FEM spaces and GP regression; when training is posed as the optimal recovery problem (ORP), the resulting GP regression can be written as an optimization problem with equality constraints that impose a conservation structure, amenable to a fast Schur-complement training strategy. The trained model can then be solved in real time with closed-form estimators for boundary fluxes driven by prescribed Dirichlet data. The paper includes RKHS posterior error bounds for linear functionals to support uncertainty quantification, as well as numerical experiments demonstrating the accuracy of the posterior distribution as a surrogate for error estimation.

17.
arXiv (CS.LG) 2026-06-16

Surrogate-Assisted Framework for SI-Compliant Interconnect Design Optimization Using the Earth Mover's Distance

arXiv:2606.15234v1 Announce Type: cross Abstract: This work presents a deterministic, machine-assisted framework for SI-compliant PCB design based on the Earth Mover's Distance (EMD). In contrast to conventional surrogate-based optimization methods that rely on iterative black-box search procedures, the proposed approach follows an interpretable, sequential evaluation strategy. Neural surrogate models are first used to efficiently predict waveform describing features from topology-dependent design parameters. A decision tree then acts as a physically motivated quality gate that identifies SI-compliant waveforms according to predefined SI criteria. Within the resulting valid solution space, the Earth Mover's Distance is employed as a similarity metric to rank candidate designs according to their proximity to an ideal reference signal. This enables not only the deterministic identification of admissible parameter regions but also a transparent prioritization of physically superior solutions without inverse modeling or stochastic search procedures. The methodology is demonstrated using a large-scale set of simulated DDR3 fly-by waveforms. By combining surrogate prediction, interpretable classification, and EMD-based waveform evaluation, the framework provides an explainable and computationally efficient alternative to conventional optimization strategies for supporting PCB development with AI-based methods.

18.
arXiv (CS.CV) 2026-06-11

DynaTok: Token-Based 4D Reconstruction from Partial Point Clouds

We address 4D reconstruction from partial point cloud sequences, where depth-sensor observations are incomplete, unordered, and lack explicit temporal correspondences. This geometry-only setting is challenging due to missing observations and ambiguous dynamics. While recent progress has largely relied on image-based methods, existing point-based approaches typically focus on single objects, assume relatively complete inputs, or require explicit correspondences. To address these limitations, we propose DynaTok, a point-based framework for correspondence-free 4D reconstruction from partial point cloud sequences without images. DynaTok encodes frames into compact latent tokens, aggregates incomplete observations over time with a Transformer-based spatiotemporal encoder, and decouples geometry and motion through residual tokens in a unified model. A flow-matching decoder then reconstructs complete, temporally consistent 4D point-cloud sequences conditioned on the latent tokens. Experiments on object- and scene-level benchmarks demonstrate improved reconstruction quality and temporal coherence from partial point cloud observations. Project page: https://wrchen530.github.io/dynatok/.

19.
bioRxiv (Bioinfo) 2026-06-21

Antibody-Antigen Affinity Prediction with Chain-Aware Protein Language Modeling

Motivation: Antibody-antigen affinity determines which antibodies advance in therapeutic discovery, repertoire analysis and affinity maturation, but experimental measurements are sparse relative to the scale of sequence libraries. Structure-based predictors can exploit interface geometry when reliable complexes are available, yet early discovery often requires ranking many heavy-light chain pairs against antigens for which no complex structure exists. Existing sequence-based models are scalable, but frequently compress heavy and light chains into a single antibody representation or concatenate antibody and antigen features obscuring the chain-specific and epitope-specific signals that drive binding. Results: We present AbAffinity, a sequence-only chain-aware three-stream architecture that maintains heavy chain, light chain and antigen as distinct streams. It integrates frozen ESM-2 embeddings with heavy-chain CDR-focused pooling, heavy-light self-attention, adaptive fusion gating and gated cross-attention, training only a compact interaction module. On the SAAINT-DB benchmark, AbAffinity achieves strong predictive performance under ten-fold cross-validation and maintains robust accuracy on novel antigens. It consistently outperforms recent sequence-based models across external benchmarks including SAbDab, AB-Bind and SKEMPI 2.0. Ablation studies highlight the contributions of chain-specific representations, CDR-focused pooling and the gated interaction pathway. Integrated Gradients attributions recover known paratope and epitope residues at structurally validated interfaces. AbAffinity provides a lightweight, explainable sequence-first framework for antibody triage and prioritisation when structural information is limited or unavailable.

20.
arXiv (quant-ph) 2026-06-11

Optimizing Encoder Circuits of Entanglement-Assisted Quantum LDPC Codes via Beam Search

arXiv:2606.11468v1 Announce Type: new Abstract: Entanglement-assisted (EA) quantum QC-LDPC codes offer strong error-correction capabilities with structured parity-check matrices, but their practical use depends on efficient encoder circuits and the availability of pre-shared Bell pairs (ebits). In all encoder implementations based on the stabilizer formalism, the dominant contribution to this complexity comes from the use of controlled gates. In this paper, we adopt the Sharma-Kumar-Garani (SKG) encoder construction. We formulate the encoder optimization as a search over GF(2) row operations that decompose the binary matrix derived from its CNOT sub-sequence. We solve this problem using a beam search algorithm guided by a Hamming-distance heuristic. For the tested EA quantum QC-LDPC code families, the proposed method achieves CNOT-count reductions of 7.3-34.0% relative to the SKG baseline encoder. The optimized circuits also yield lower CNOT counts than Patel-Markov-Hayes synthesis on all tested instances and are verified by stabilizer-tableau simulation. These results show that substantial encoder simplification is possible for structured EA QC-LDPC codes.

21.
arXiv (CS.AI) 2026-06-11

READER: Robust Evidence-based Authorship Decoding via Extracted Representations

arXiv:2606.10794v2 Announce Type: replace Abstract: As agentic applications increasingly route user tasks through official and third-party LLM APIs, provenance becomes an operational question: which model generated a given black-box response? We study Dynamic Black-Box LLM Provenance: identifying the source LLM from generations elicited by query-varying, non-predefined prompts rather than a fixed input set or benchmark suite. This setting is difficult because prompt semantics dominate the text, while model-specific authorship traces are weak and inconsistent at the surface level. We introduce READER (Robust Evidence-based Authorship Decoding via Extracted Representations), a lightweight provenance framework that treats a frozen proxy LLM as a reader of hidden authorship evidence. READER maps black-box outputs into proxy activation space, temporally filters token states within each response, and performs Bayesian Evidence Accumulation by summing single-response log-posterior evidence across independently sampled prompts. This avoids fragile mean-pooling of prompt-specific representations while preserving the query-wise evidence needed for calibrated confidence. On Agent500, a 50-target dataset built from agent-style prompts, READER reaches $31.0$-$42.4\%$ top-1 accuracy from a single response and $70.0$-$84.0\%$ from 50 responses, substantially outperforming sentence-encoder fingerprints. Scaling across nine proxy readers further shows that stronger LLMs expose more linearly decodable authorship structure, suggesting that authorship perception is already present in frozen LLM representations and can be converted into reliable multi-query attribution.

22.
arXiv (math.PR) 2026-06-18

On two overlooked stick-breaking constructions of the normalized inverse Gaussian process

arXiv:2606.19306v1 Announce Type: new Abstract: We shed light on two alternative stick-breaking constructions of the normalized inverse Gaussian (NIG) random discrete distribution which appear to have been overlooked so far in the Bayesian nonparametric setting. The first is derived from a result in Aldous and Pitman (1998) for the conditional Brownian excursion partition, mixing over the local time at zero up to time one. The second arises as a particular case of a result in James (2013) for priors obtained by a random spatial and temporal change of the normalized generalized Gamma subordinator. Both constructions are in terms of straightforward transformations of standard random variables and can be easily generalized to provide the stick-breaking construction of any element, respectively, in a) the family of mixed Poisson-Kingman models driven by the $1/2$ stable Lévy measure and b) the family of Poisson-Gamma processes driven by the Inverse Gaussian subordinator.

23.
arXiv (CS.LG) 2026-06-16

Imbalanced Semi-Supervised Learning via Label Refinement and Threshold Adjustment

arXiv:2407.05370v3 Announce Type: replace Abstract: Semi-supervised learning (SSL) algorithms often struggle to perform well when trained on imbalanced data. In such scenarios, the generated pseudo-labels tend to exhibit a bias toward the majority class, and models relying on these pseudo-labels can further amplify this bias. Existing imbalanced SSL algorithms explore pseudo-labeling strategies based on either pseudo-label refinement (PLR) or threshold adjustment (THA), aiming to mitigate the bias through heuristic-driven designs. However, through a careful statistical analysis, we find that existing strategies are suboptimal: most PLR algorithms are either overly empirical or rely on the unrealistic assumption that models remain well-calibrated throughout training, while most THA algorithms depend on flawed metrics for pseudo-label selection. To address these shortcomings, we first derive the theoretically optimal form of pseudo-labels under class imbalance. This foundation leads to our key contribution: SEmi-supervised learning with pseudo-label optimization based on VALidation data (SEVAL), a unified framework that learns both PLR and THA parameters from a class-balanced subset of training data. By jointly optimizing these components, SEVAL adapts to specific task requirements while ensuring per-class pseudo-label reliability. Our experiments demonstrate that SEVAL outperforms state-of-the-art SSL methods, producing more accurate and effective pseudo-labels across various imbalanced SSL scenarios while remaining compatible with diverse SSL algorithms. The code is publicly available (https://github.com/ZerojumpLine/SEVAL).

24.
arXiv (CS.CV) 2026-06-16

Avoiding Exponential Blow-Up in Distributive Lattice Submodular Minimization

作者:

Submodular function minimization has gained a lot of interest in recent years. They are highly applicable in the area of Computer Vision and Machine Learning. Often such applications require to work with submodular functions defined on distributive lattice. Current best way of dealing with it is using a transformation which extrapolates the submodular function for the respective boolean lattice. It makes optimization system too inefficient due to enlargement of the working space. Quantitatively, the expanded space has additional exponential (in set size) number of elements. We propose a generic framework for dealing with distributive lattice which only works within distributive lattice. Our framework allows one to use already established submodular function minimization algorithms for boolean lattice. In our experiment, we show the huge improvement in terms of running time over tranditional methods for handling distributive lattice.

25.
arXiv (CS.CV) 2026-06-16

BRDFusion: Physics Meets Generation for Urban Scene Inverse Rendering

Inverse rendering of urban scenes from captured videos enables numerous applications, including content creation and autonomous driving simulation. Physically-based rendering methods follow and control lighting physics, but suffer from reconstruction and rendering artifacts. While generative models produce realistic videos, they offer limited consistency and controllability. We present BRDFusion, a unified framework that combines two complementary models for inverse and forward rendering. Specifically, BRDFusion recovers explicit, consistent scene properties with physical modeling and alleviates optimization ambiguity with generative priors. During forward rendering, the physical model provides controllable rendering from the scene configuration, and the generative model denoises and fixes artifacts. Therefore, our method produces high-quality videos while allowing precise control, outperforming baselines in real and synthetic scenes. Moreover, BRDFusion supports novel-view relighting, night simulation, and dynamic object insertion/editing. Project page: https://shigon255.github.io/brdfusion-page/