Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
PLOS Computational Biology 2026-06-18

A comparison of contact patterns derived from the population structure in agent-based models and empirical contact survey data

Authors:

by Janik Suer, Johannes Ponge, Michael Brüggemann, Jan Pablo Burgard, Vitaly Belik, Bernd Hellingrath, Alejandra Rincón Hidalgo, Andrzej K. Jarynowski, Richard Pastor, Huynh Thi Phuong, Steven Schulz, Ashish Thampi, Chao Xu, Marlli Zambrano, Rafael Mikolajczyk, André Karch, Veronika K. Jaeger, on behalf of the OptimAgent Consortium Agent-based models (ABMs) are powerful tools for simulating disease spread, relying on individual-level interaction rules from which emergent dynamics arise. An important component in ABMs is contact behaviour. To reduce computational complexity, contact behaviour in ABMs is often assumed as random mixing within structurally defined settings (as, e.g., workplaces). with setting composition typically based on empirical data such as census information. However, the validity of this approach to represent contacts remains unclear. To address this gap, we compare the contact structure derived through this approach in a large-scale ABM with empirical contact survey data with respect to age contact matrices for households, schools, workplaces, all remaining contact settings, and all contacts combined (based on difference matrices and sum of squared errors (SSE)). Our results demonstrate that random mixing in settings with known age compositions like households (SSE:0.7(95%CI0.4–0.9)), schools (SSE:0.7(95%CI:0.3–1.1)) and workplaces (SSE:0.5(95%CI:0.2-0.7)), captures basic interaction patterns but fails to account for age-related variation in contact numbers. The largest differences arise for contacts outside these settings (SSE:3.8(95%CI:1.2–6.5)), as ABMs typically use random regional contacts that do not capture age-structured behaviour observed in contact surveys. Applying contact matrices from both approaches to an age-structured compartmental model, leads to noticeable differences in simulated epidemic outcomes regarding reproduction numbers and spreading dynamics between age groups. Our results suggest that naïve approaches to represent contact behaviour in ABMs based on population structure can be valid in settings with defined age-structures while settings with low a priori structure require more advanced methods to represent contact behaviour observed in contact surveys.

02.
arXiv (quant-ph) 2026-06-19

Recursive perturbation approach to time-convolutionless master equations: Explicit construction of generalized Lindblad generators for arbitrary open systems

arXiv:2506.04095v2 Announce Type: replace Abstract: We develop a recursive perturbative expansion for the time-convolutionless (TCL) generator of an open quantum system in a generalized Lindblad form. This formulation provides a systematic approach to derive the generator at arbitrary order while preserving a Lindblad-like structure, without imposing assumptions on the system or environment beyond an initially uncorrelated state. The generator is written, at all orders, in a canonical form, which also corresponds to the minimal dissipation condition, which uniquely specifies the decomposition of the generator into Hamiltonian and dissipative contributions. To validate the method and show its effectiveness in addressing non-Markovian dynamics and strong-coupling effects, we compute the generator explicitly up to fourth order.

03.
arXiv (quant-ph) 2026-06-15

Multiple-time Quantum Imaginary Time Evolution

arXiv:2512.10875v2 Announce Type: replace Abstract: Quantum Imaginary-Time Evolution (QITE) is a powerful method for preparing ground states on quantum hardware. However, executing QITE has costly measurement budgets for general Hamiltonians. Both fidelity and computational cost are strongly dependent on the definition of suitable local domains and Hamiltonian partitions. In this work, we introduce the Multiple-Time QITE algorithm (MT-QITE). We show how using more than one imaginary time substantially improves the fidelity of the resulting ground state as well as the measurement overhead with respect to the previously published QITE algorithm, while preserving its deterministic character and its independence from ad hoc ansatze. Moreover, unlike QITE and other QITE-based algorithms, MT-QITE is parallelizable, and we show that even in Hamiltonians with non-local interactions, partitioning may entail a computational advantage.

04.
arXiv (math.PR) 2026-06-11

Stochastic Reaction Networks Within Interacting Compartments with Content-Dependent Fragmentation

arXiv:2511.10223v4 Announce Type: replace Abstract: Stochastic reaction networks with mass-action kinetics provide a useful framework for understanding processes – biochemical and otherwise – in homogeneous environments. However, cellular reactions are often compartmentalized, either at the cell level or within cells, and hence non-homogeneous. We investigate a model of compartmentalization in which the rate of fragmentation of a compartment depends on the abundance of some designated species inside that compartment. The particular model of study is part of a general framework for compartmentalized chemistry with dynamic compartments that was proposed in (Duso and Zechner, PNAS, 2020). This paper builds on (Anderson and Howells, Bull. Math. Biol., 2023) where the special case where the compartment dynamics do not depend on their contents was studied mathematically. In particular, we demonstrate that the explosivity characterization from (Anderson and Howells, Bull. Math. Biol., 2023) fails in this setting and provide new sufficient conditions for non-explosivity and positive recurrence, under the assumption that the underlying CRN admits a linear Lyapunov function. These results extend the theoretical foundation for modeling content-mediated compartment dynamics, with implications for systems such as cell division and intracellular transport.

05.
arXiv (CS.LG) 2026-06-17

Price of metric universality in vector quantization is at most 0.11 bit

arXiv:2602.05790v2 Announce Type: replace-cross Abstract: Fast computation of a matrix product $W^\top X$ is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation $\widehat W$ in place of true $W$ (``weight-only quantization''). Information theory demonstrates that an optimal algorithm for reducing precision of $W$ depends on the (second order) statistics of $X$ and requires a careful alignment of vector quantization codebook with PCA directions of $X$ (a process known as ``waterfilling allocation''). Dependence of the codebook on statistics of $X$, however, is highly impractical. This paper proves that there exist a universal codebook that is simultaneously near-optimal for all possible statistics of $X$, in the sense of being at least as good as an $X$-adapted waterfilling codebook with rate reduced by 0.11 bit per dimension in the case when $W$ is Gaussian. Such universal codebook would be an ideal candidate for the low-precision storage format, a topic of active modern research, but alas the existence proof is non-constructive. Equivalently, our result shows existence of a net in $\mathbb{R}^n$ that is a nearly-optimal covering of a sphere simultaneously with respect to all Hilbert norms.

06.
arXiv (quant-ph) 2026-06-16

Information Is Not Physical: Possibility Spaces, Erasure, and the Structure of Unrealized Alternatives

arXiv:2606.15120v1 Announce Type: cross Abstract: The slogan ``information is physical,'' introduced by Rolf Landauer and developed through quantum information theory and black-hole thermodynamics, has achieved near-axiomatic status in modern physics. Yet the ontological status of information remains surprisingly underexamined: most discussions either reduce information to a form of energy or treat it as a purely mathematical object. This paper proposes a third position. I argue that information is neither a physical substance nor a free-floating abstraction, but rather the structure of physically realizable alternatives – a counterfactual structure that a physical system instantiates in virtue of the possibility space available to it. Building on Shannon's combinatorial definition, the Landauer principle, the no-cloning theorem, and the black-hole information paradox, I show that the informational content of any physical event is constituted by the set of outcomes that could have occurred but did not. This counterfactual reading dissolves several persistent confusions: it explains why erasing information dissipates heat without making information ``material,'' why quantum superposition is informationally richer than any classical mixture, and why information loss in black holes is physically significant beyond mere bookkeeping. The proposal sits within a structural-realist framework but departs from standard structural realism by locating the relevant structure in modal, not merely actual, relations. I conclude by sketching implications for the foundations of quantum mechanics, quantum gravity, and scientific ontology more broadly.

07.
arXiv (CS.LG) 2026-06-17

Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective

arXiv:2512.11784v2 Announce Type: replace Abstract: Softmax attention is a central component of transformer architectures, yet its nonlinear structure poses significant challenges for theoretical analysis. We develop a unified, measure-based framework for studying single-layer softmax attention under both finite and infinite prompts. For i.i.d. Gaussian inputs, we lean on the fact that the softmax operator converges in the infinite-prompt limit to a linear operator acting on the underlying input-token measure. Building on this insight, we establish non-asymptotic concentration bounds for the output and gradient of softmax attention, quantifying how rapidly the finite-prompt model approaches its infinite-prompt counterpart, and prove that this concentration remains stable along the entire training trajectory in general in-context learning settings with sub-Gaussian tokens. In the case of in-context linear regression, we use the tractable infinite-prompt dynamics to analyze training at finite prompt length. Our results allow optimization analyses developed for linear attention to transfer directly to softmax attention when prompts are sufficiently long, showing that large-prompt softmax attention inherits the analytical structure of its linear counterpart. This, in turn, provides a principled and broadly applicable toolkit for studying the training dynamics and statistical behavior of softmax attention layers in large prompt regimes.

08.
arXiv (quant-ph) 2026-06-24

Quantum Entanglement Halves the Oblivious Update Bandwidth

Authors:

arXiv:2605.19248v2 Announce Type: replace Abstract: We consider $(n,k)$ MDS-coded distributed storage over $\mathbb{F}_q$ with per-node storage $\alpha$ symbols. For the oblivious update problem, where a single message symbol changes and neither helpers nor the stale node know which, the classical lower bound is $\alpha k \log_2 q$ bits. We prove that when the $k$ contacted helpers share prior quantum entanglement, the update bandwidth is $\lceil \alpha/2 \rceil \cdot k \log_2 q$ bits-equivalent, a factor approaching 2 reduction. For $\alpha = 2$, a $[[k, k-2]]_q$ CSS code achieves bandwidth $k \log_2 q$ with one qudit per helper. For general $\alpha$, a $[[\lceil \alpha/2 \rceil k, \lceil \alpha/2 \rceil k - \alpha]]_q$ CSS code achieves the bound with $\lceil \alpha/2 \rceil$ qudits per helper. The matching converse uses the superdense coding bound: the stale node holds all transmitted qudits and hence the entangled partners, so each helper's channel supports at most $D^2$ distinguishable signals for dimension $D$. The result holds for all $(n,k)$ pairs with sufficiently large prime $q$.

10.
arXiv (CS.AI) 2026-06-16

Orchestrated Reality: From Role-Play to Living, Playable Game Worlds – LLM-Driven World Simulation as a Parameterized-Action POMDP

arXiv:2606.16014v1 Announce Type: cross Abstract: Many games rely on storytelling combined with systems that track levelling, NPC behaviour, and consequence simulation; bridging tightly-authored narrative with deeply-simulated worlds – most acute in sandbox and open-world settings – has been prohibitively expensive. LLM-driven worlds open a new path: a single harness can coordinate numerical state, narrative voice, storytelling pacing, and rule logic together. Realising this requires the LLM system to sustain a persistent world (who is where, what has just happened, what is currently true), which today's deployed systems do not: the narrative voice asserts state in free prose without any validated representation, so a fully autonomous game engine remains infeasible. We treat this as an architectural choice, not a limitation of language models, and report work in progress on a framework – orchestrated reality – that makes the world a canonical object owned by a singleton orchestration agent analogous to the tabletop-RPG Game Master (GM). We formalise an LLM-driven game world for a human player as a Parameterized-Action POMDP: state is a tree of canonical JSON entities, actions decompose as $a=(k, x_k)$ (a discrete intent kind plus structured JSON parameters), the agent observes only a narrative projection $o=O(s)$ of state, and the transition kernel $F$ is an LLM-driven Plan-Diff-Validate-Apply (PDVA) pipeline that commits schema-validated, content-hashed JSON deltas. We give the formal model, a JSON-state example, a worked single-turn example, and a catalogue of 15 illustrative incidents drawn from a real deployment showing the framework in action. Empirical validation through a planned human player study – together with multi-NPC concurrent agency and deployment as an RL environment – is situated as future work.

11.
arXiv (CS.AI) 2026-06-12

Brick: Spatial Capability Routing for the Mixture-of-Models (MoM) Paradigm

arXiv:2606.13241v1 Announce Type: new Abstract: Defining query difficulty is one of the hardest problems in deployment engineering. Existing LLM routers rely on surface features such as domain labels, keywords, and token count, ignoring the within-domain variance that actually determines model success. Frontier models cost ten to one hundred times more than local open-weight models, so at production scale even small per-request savings become a direct cloud-bill lever. We present Brick, a multimodal router that scores each model on six capability dimensions, combines this with a per-query difficulty estimate, and dispatches via a cost-penalized geometric rule. A continuous preference knob lets operators slide between max-quality and max-saving profiles at deploy time. On a benchmark of 5,504 queries, Brick at max-quality reaches 76.98% accuracy, beating the best single model (75.02%) and all tested routers. At a neutral cost-quality profile, Brick achieves 74.11% accuracy at 4.71x lower cost than always using the strongest model. At min-cost, it cuts cost 22.15x with 11.85 points accuracy loss. Median latency drops from 51.2s to 22.8s.

12.
arXiv (CS.LG) 2026-06-16

Simulation-Augmented Multi-Step Split Conformal Prediction for Aggregated Forecasts

arXiv:2606.16356v1 Announce Type: new Abstract: We study uncertainty quantification for aggregated forecasting tasks such as annual totals and year-over-year growth rates. We propose SA-MSCP, a simulation-augmented multi-step split conformal method that generates future paths from cross-validated residuals using a block bootstrap and constructs prediction intervals from empirical quantiles. Experiments show that SA-MSCP improves empirical coverage over a simulated-path baseline for aggregated and growth-rate targets. Our results demonstrate that simulation-enhanced conformal calibration is an effective and general framework for uncertainty quantification in aggregated time-series forecasting.

14.
arXiv (CS.CL) 2026-06-17

Environment-Grounded Automated Prompt Optimization for LLM Game Agents

LLM agents in interactive environments are highly sensitive to their prompts, yet prompt engineering remains a manual, task-specific process. We introduce an automated prompt optimization framework for LLM agents that decomposes the observation-to-action pipeline into a goal-conditioned descriptor agent and an action selection agent, and iteratively refines each module's prompt through an LLM-driven evolutionary loop guided by environment returns. We propose a behavior analyzer to attribute episode outcomes to specific prompt components, and a mutator to propose targeted revisions to the prompt, before validating them through environment rollouts. We evaluate on all five BabyAI tasks in the BALROG benchmark, comparing our pipeline against BALROG's RobustCoTAgent under both plain and guided prompt initializations. Optimization improves performance consistently across tasks and conditions, without requiring updates to the model weights. On PutNext, a multi-step coordination task where the RobustCoTAgent achieves 0% success, our framework reaches up to 72.5% success rate using the same underlying LLM with optimized prompts. These results suggest that a multi-agent framework, combined with automatic prompt optimization, enhances LLMs without the need for fine-tuning or extensive human supervision.

15.
arXiv (CS.CV) 2026-06-11

Natural-Language Temporal Grounding in Hour-Long Videos is a Search Problem: A Benchmark and Empirical Decomposition

Temporal grounding–returning the interval $[t_s, t_e]$ for a natural-language query over a video–is the language interface to long-form video, yet has been studied on short videos; the dynamics of hour-scale natural-language grounding remain underexplored. We take the position that at hour-scale, the binding constraint is search, not recognition: Video-LLMs are bottlenecked not by localizing a nearby event, but–given a natural-language query–by searching for the relevant region of a long video. To test this, we release ExtremeWhenBench, the first open hour-scale grounding benchmark (2,273 queries over 194 videos, mean 75.7 min, max 9 hr) with an open-form query distribution. Every open Video-LLM collapses while a frame-level retrieval baseline outperforms them; a failure taxonomy attributes 85% of failures to search; and a retrieve-then-ground hybrid recovers 6.7x over the monolithic Video-LLM–mirroring retrieve-then-read in open-domain QA.

17.
arXiv (quant-ph) 2026-06-24

Universal Dynamical Response to Slow Driving in Chaotic Systems

arXiv:2606.23810v1 Announce Type: cross Abstract: We propose a unified perspective on classical and quantum chaos based on the stability of a system's stationary states under slow driving. We probe this sensitivity via the system's susceptibility to the average protocol speed, which we call the ``speed-Fisher information," and relate it to irreversible entropy production in the system. We show that chaotic dynamics manifests as a divergence of the speed-Fisher information with the protocol time, and that this response is controlled by the perturbation's low-frequency spectral weight. This approach to chaos applies to both classical and quantum Hamiltonian systems, and naturally extends to non-Hamiltonian classical flows. We illustrate this framework with simple classical and quantum examples, along with a non-Hamiltonian flow that qualitatively exhibits analogous low-frequency spectral behavior.

18.
arXiv (quant-ph) 2026-06-12

Generalized Exact Fractional Quantum Information Model with Memory Effects

arXiv:2606.13525v1 Announce Type: new Abstract: In this paper, we analyze quantum information measures in fractional quantum mechanics using the Riemann-Liouville derivative formalism adopted here. In this case, we initially reconsider the conventional definitions of Shannon entropy and Fisher information, subsequently extending them to fractional quantum systems described by nonlocal differential operator frameworks adopted. Within this generalized formulation, fractional expressions of Shannon entropy and Fisher information are constructed and their mathematical structures examined thoroughly. Also, the formalism is then applied to the quantum harmonic oscillator, yielding explicit analytical expressions derived as functions of the fractional parameter therein. The obtained results demonstrate that fractional derivatives alter the localization properties of probability densities and generate nontrivial variations in information content and sensitivity across system behavior. In this context, the fractional parameter plays a central role in controlling deviations from the standard quantum information measures framework. Also, the study establishes a consistent framework for describing information-theoretic properties of quantum systems governed by nonlocal dynamics.

19.
arXiv (CS.AI) 2026-06-24

Offline Reinforcement Learning for Warehouse SLAM Throughput Control

arXiv:2606.23978v1 Announce Type: cross Abstract: We present an offline reinforcement learning (RL) framework for optimizing SLAM throughput control in a warehouse fulfillment environment. SLAM (Scan/Label/Apply/Manifest) throughput directly influences system congestion and operational efficiency. Our RL-based control approach dynamically recommends SLAM throughput settings that adaptively balance throughput maximization with downstream stability through intelligent adjustment of throttling behavior. We include a history-informed state representation, action space abstraction for delayed-impact control, and a reward function that captures both upstream and downstream operational metrics. Our approach is algorithm-agnostic, enabling integration of multiple offline RL methods under a unified architecture. We instantiate our framework with three state-of-the-art offline RL algorithms, and trained the models offline using de-identified historical operational logs from a large-scale warehouse. Policy performance is evaluated using a comprehensive multi-method strategy. These include model-free approaches including immediate reward estimation via regression models and long-horizon Fitted Q Evaluation (FQE), as well as model-based Deep Koopman dynamics evaluation. Empirical results reveal that the CQL policy consistently outperforms alternatives, improving system health by 22.97% and reducing average throttling duration by 3.18%. These findings demonstrate the potential of offline RL for safe and scalable warehouse throughput control optimization.

20.
arXiv (CS.LG) 2026-06-19

Optimal Deterministic Multicalibration and Omniprediction

arXiv:2606.20557v1 Announce Type: new Abstract: A model is multicalibrated on a collection of group weights $G$ if it is calibrated – i.e. unbiased even conditional on its prediction – not just overall, but also after reweighting contexts by each $g \in G$. It is a useful property for many downstream applications and is a basic desideratum of trustworthy machine learning. Before this work, all predictors known to attain the minimax-optimal $\widetilde O(\varepsilon^{-3})$ sample complexity rate for $\varepsilon$-multicalibration were randomized, while deterministic predictors were known only with substantially worse sample complexity. Whether randomization is necessary for optimal sample complexity in multicalibration was explicitly asked by [CLNR26] and implicitly in several prior works. We resolve this open problem by giving a minimax-optimal multicalibration algorithm that outputs a deterministic predictor. We then generalize the algorithm to produce optimal deterministic predictors that satisfy outcome indistinguishability (OI) with respect to finite or finitely covered collections of tests. As an application, this also gives deterministic omnipredictors and panpredictors with optimal sample complexity, resolving open problems posed by [OKK25] and [BHHLZ25].

21.
arXiv (CS.CV) 2026-06-19

DiT-JSCC: Rethinking Deep JSCC with Diffusion Transformers and Semantic Representations

Generative joint source-channel coding (GJSCC) has emerged as a new Deep JSCC paradigm for achieving high-fidelity and robust image transmission under extreme wireless channel conditions, such as ultra-low bandwidth and low signal-to-noise ratio. Recent studies commonly adopt diffusion models as generative decoders, but they frequently produce visually realistic results with limited semantic consistency. This limitation stems from a fundamental mismatch between reconstruction-oriented JSCC encoders and generative decoders, as the former lack explicit semantic discriminability and fail to provide reliable conditional cues. In this paper, we propose DiT-JSCC, a novel GJSCC backbone that can jointly learn a semantics-prioritized representation encoder and a diffusion transformer (DiT) based generative decoder, our open-source project aims to promote the future research in GJSCC. Specifically, we design a semantics-detail dual-branch encoder that aligns naturally with a coarse-to-fine conditional DiT decoder, prioritizing semantic consistency under extreme channel conditions. Moreover, a training-free adaptive bandwidth allocation strategy inspired by Kolmogorov complexity is introduced to further improve the transmission efficiency, thereby indeed redefining the notion of information value in the era of generative decoding. Extensive experiments demonstrate that DiT-JSCC consistently outperforms existing JSCC methods in both semantic consistency and visual quality, particularly in extreme regimes.

22.
arXiv (CS.CV) 2026-06-12

Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models

Text-guided image editing with visual autoregressive (VAR) generators requires controlling both what the model samples and where the sampled change is written back into the image code. Existing VAR editors mainly operate on token streams, features, or flat next-token logits, leaving two native structures of bitwise-residual VAR models underused: the per-bit Bernoulli prediction head and the additive multi-scale residual code field from which the image is assembled. We propose BitResEdit, a training-free editor for bitwise-residual VAR generators such as Infinity. BitEdit performs source-negative guidance by tilting the post-CFG per-bit log-odds along a source–target contrast computed on a shared edited prefix, then projects each update into a closed-form Bernoulli-KL trust region around the clean CFG sampler. ResEdit converts the sampled bits into per-scale continuous-code residuals, gates them with a localization mask, and re-injects them through the generator's native sum-of-scales. Together they couple decision-time bit guidance with combination-time code composition, so masked-out latent features are preserved exactly by code arithmetic while localized, scale-aware edits are applied inside the target region. On PIE-Bench with Infinity-2B, BitResEdit attains the strongest text alignment among same-backbone VAR editors, improving CLIP on the edited region by +1.07 over the strongest prior editor while keeping background preservation competitive with it. Ablations show BitEdit and ResEdit play complementary roles in target alignment and background preservation.

23.
arXiv (CS.LG) 2026-06-15

Geometric Domain Adaptation via Optimal Transport for Linear Regression in R^2

arXiv:2606.14023v1 Announce Type: cross Abstract: Optimal Transport has become recently a powerful method for domain adaptation by aligning source and target distributions. We study a supervised domain adaptation problem where source and target domains are related by a rotation or a translation or a homothety in $\mathbb{R}^2$. We prove that the optimal transport map recovers the underlying map when using a $p-$norm cost with $p \geq 2$. Based on this insight, we develop a method combining $K-$means and optimal transport to estimate the underlying map, enabling adaptation of linear regression models when target data is scarce. Simulations demonstrate improved performance over baseline methods. Rather than relying on highly expressive deep learning architectures, we focus on classical machine learning models to emphasize interpretability and theoretical insight. This perspective allows us to explicitly characterize the role of optimal transport in recovering geometric transformations such as rotations, translations, and homotheties. Our contributions include a theoretical result linking optimal transport and rotations, translations and homothecies in $\mathbb{R}^2$, and a practical method for adaptation in linear regression offering both conceptual clarity and applied value in domain adaptation tasks in this space.

24.
arXiv (CS.CV) 2026-06-15

Instruct-Particulate: Scaling Feed-Forward 3D Object Articulation with Kinematic Control

Reconstructing articulated 3D objects is important for animation, gaming, and robotic simulations. Recent neural networks can estimate the articulated structure of 3D objects, but their generalization remains limited by the scarcity of annotated data for this task. To address this gap, we introduce Instruct-Particulate, a model that takes a 3D mesh together with a target kinematic specification, including part descriptions, connectivity, joint types, and optional point prompts, and predicts the corresponding kinematic part segmentation and joint motion parameters. The kinematic specification disambiguates the task and allows the model to target annotations of different granularity, thereby making it possible to use more abundant heterogeneous training data. At test time, the kinematic specification can be obtained automatically from large-scale vision-language models, so the model can be applied to any input mesh. To train our model at scale, we construct a heterogeneous dataset of more than 150,000 articulated 3D objects, extending existing publicly available collections with data obtained by partially labelling other 3D models (monolithic or already decomposed into parts) with kinematic labels by means of vision-language models. Experiments show that our model generalizes better across categories and to AI-generated meshes, enabling articulated asset reconstruction from real-world images via image-to-3D models.

25.
arXiv (CS.AI) 2026-06-19

Confidence Calibration for Multimodal LLMs: An Empirical Study through Medical VQA

arXiv:2606.19950v1 Announce Type: cross Abstract: Multimodal Large Language Models (MLLMs) show great potential in medical tasks, but their elicited confidence often misaligns with actual accuracy, potentially leading to misdiagnosis or overlooking correct advice. This study presents the first comprehensive analysis of the relationship between accuracy and confidence in medical MLLMs. It proposes a novel method that combines Multi-Strategy Fusion-Based Interrogation (MS-FBI) with auxiliary expert LLM assessment, aiming to improve confidence calibration in Medical Visual Question Answering (VQA). Experiments demonstrate that our method reduces the Expected Calibration Error (ECE) by an average of 40\% across three Medical VQA datasets, significantly enhancing MLLMs' reliability. The findings highlight the importance of domain-specific calibration for MLLMs in healthcare, offering a more trustworthy solution for AI-assisted diagnosis.