Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-11

From Content to Knowledge: Lightning Fast Long-Video Understanding with Neural Knowledge Representations

We propose a new paradigm for long video understanding by treating a long video as a Neural Knowledge Representation (NKR). NKR represents video contents neither as a stream of tokens nor pre-organized databases, but as an individual small portion of network weights attached to the VLM backbone. The NKR weights are optimized to encapsulate the video's semantic content via a novel Agentic Knowledge Distillation (AKD) process, where an agent automatically synthesizes dense descriptions and question-answer pairs to distill the video's knowledge into the NKR. While AKD serves as a comprehensive, one-time encoding phase, the resulting NKR transforms the video into a portable, reusable asset. At inference, the lightweight NKR is mounted onto a frozen Vision-Language Model (VLM), enabling direct, query-based understanding without reloading or re-encoding the original video. This approach decouples video length from inference cost, offering high amortized efficiency for multi-turn video understanding. Experiments on the LVBench benchmark show our method achieves performance comparable to state-of-the-art approaches while reducing end-to-end latency by over two orders of magnitude, opening new possibilities for interactive long-video understanding.

02.
arXiv (math.PR) 2026-06-11

Sample Path Properties of the Fractional Wiener–Weierstrass Bridge II

arXiv:2606.11994v1 Announce Type: new Abstract: Fractional Wiener–Weierstrass bridges are a class of Gaussian processes obtained by replacing trigonometric functions in the construction of classical Weierstrass functions by fractional Brownian bridges. A number of their sample path properties were derived in Schied–Zhang (2024,2026). The analysis in these papers left several open questions, most of which are addressed here. Specifically, we prove that, in the regime in which the Weierstrass mechanism dominates the underlying fractional Brownian bridge, the limiting $b$-adic variation coefficient has an absolutely continuous distribution and is therefore genuinely random. At the critical point between the two roughness regimes, we establish the power-variation formula and the critical $\Phi$-variation limit conjectured in Schied–Zhang (2024). Finally, we derive the Hausdorff dimension for the graphs of the sample paths by proving a conjecture from Schied–Zhang (2026) for the missing high-Hurst case.

03.
arXiv (CS.AI) 2026-06-12

CausalMoE: A Billion-Scale Multimodal Foundation Model for Granger Causal Discovery with Pattern-Routed Heterogeneous Experts

arXiv:2606.13024v1 Announce Type: cross Abstract: Granger Causal Discovery (GCD) is fundamental for analyzing temporal dependencies in complex systems. However, existing neural GCD methods predominantly rely on a "one-size-fits-all" paradigm, struggling to capture distribution shifts and dynamic regime changes inherent in real-world time series. This often leads to entangled representations and spurious causal graphs. In this paper, we propose CausalMoE, a billion-scale multimodal Granger causal foundation model that explicitly models patch-level heterogeneity. CausalMoE introduces a Pattern-Routed Mixture of Heterogeneous Experts, which dynamically identifies latent temporal patterns and routes patches to specialized domain experts, effectively decoupling regime-specific mechanisms from shared dynamics. To ensure interpretable graph recovery, we design a Causality-Aware Self-Attention mechanism operating across variables, yielding sparse Granger causal graphs via proximal optimization. Furthermore, CausalMoE is the first to integrate LLMs and VLMs to align numerical signals with textual and visual priors, regularizing causal estimation in complex scenarios. Extensive experiments demonstrate that CausalMoE establishes a new state-of-the-art on fully supervised benchmarks, while effectively generalizing to few-shot settings where traditional methods fail.

04.
medRxiv (Medicine) 2026-06-18

Hospital staff views on the visibility, role and impact of Acute Learning Disability Liaison Services in Wales: a service evaluation

People with a learning disability experience marked health inequalities. In Wales, Acute Learning Disability Liaison Services (ALDLS) are delivered by specialised learning disability services, and all roles within them are undertaken by Learning Disability Liaison Nurses (LDLN). These services aim to enable access to, and delivery of, secondary care by supporting reasonable adjustments, facilitating communication, and coordinating care for people with learning disability during hospital encounters. However, independent evidence of the impact of ALDLS on patient care remains limited. This evaluation tries to address this evidence gap by examining hospital staff perceptions of the visibility, role, and impact of ALDLS across Welsh Health Boards, with the aim of informing service design and development and improving secondary care access and care for people with learning disability. The service evaluation used a qualitative approach involving interviews and a focus group with hospital staff across the seven Welsh Health Boards who had experience working with or interacting with ALDLS staff to care for patients with learning disability. Findings cover six key areas including i) visibility and delivery of ALDLS, ii) Barriers and challenges to effective ALDLS delivery, iii) Enablers of effective ALDLS delivery, iv) Positive impacts for patients with learning disability, v) Negative impacts and unintended consequences when the service is absent or limited, and vi) Participants recommendations for future improvements of ALDLS. To synthesise the findings, we developed an overview diagram, which illustrates how ALDLS may influence care quality in acute hospitals. The overview places the liaison service at the centre, showing how organisational enablers and barriers shape its delivery, and how its core functions support improvements in safety, timeliness, effectiveness, efficiency, equity, and patient-centred care. From the findings we have identified recommendations for practice and policy. These include that ALDLS should be recognised as a core, safety-critical component of acute hospital care for people with a learning disability, rather than an optional add-on. In practice, services should be more visibly embedded within routine pathways, with consistent site-based presence, clear referral criteria, early identification through electronic flagging and notification systems, and routine involvement in multidisciplinary planning for complex admissions and procedures. At policy level, ALDLS provision should be recognised within equality and patient safety frameworks as an essential service requiring sustained investment, national minimum configuration standards, adequate staffing, and better-integrated digital systems to support continuity, equitable access, and person-centred care.

05.
arXiv (CS.AI) 2026-06-19

LoRDO: Distributed Low-Rank Optimization with Infrequent Communication

arXiv:2602.04396v2 Announce Type: replace-cross Abstract: Distributed training of foundation models via $\texttt{DDP}$ is limited by interconnect bandwidth. While infrequent communication strategies reduce synchronization frequency, they remain bottlenecked by the memory and communication requirements of optimizer states. Low-rank optimizers can alleviate these constraints; however, in the local-update regime, workers lack access to the full-batch gradients required to compute low-rank projections, which degrades performance. We propose $\texttt{LoRDO}$, a principled framework unifying low-rank optimization with infrequent synchronization. We first demonstrate that, while global projections based on pseudo-gradients are theoretically superior, they permanently restrict the optimization trajectory to a low-rank subspace. To restore subspace exploration, we introduce a full-rank quasi-hyperbolic update. $\texttt{LoRDO}$ achieves near-parity with low-rank $\texttt{DDP}$ in language modeling and downstream tasks at model scales of $125$M–$720$M, while reducing communication by $\approx 10 \times$. Finally, we show that $\texttt{LoRDO}$ improves performance even more in very low-memory settings with small rank/batch size.

06.
arXiv (CS.CL) 2026-06-16

REFLEX: Reflective Evolution from LLM Experience

Authors:

Large multimodal language models (LLMs) have emerged as powerful tools for guiding evolutionary search toward interpretable programmatic policies. However, existing frameworks rely on a monolithic model call to simultaneously interpret visual behavioral evidence and synthesize corrective code. This diagnosis-repair entanglement creates an opaque feedback loop, obscuring the rationale behind mutations and preventing the retention of algorithmic insights across independent runs. To achieve auditable and efficient policy search, we argue that visual diagnosis must be structurally decoupled from code generation. We present REFLEX, a train-free evolutionary framework that operationalizes this decoupling. In REFLEX, a vision-enabled Critic first distills task-specific behavioral evidence into structured, auditable diagnoses. Subsequently, a text-optimized Actor synthesizes child policies using these diagnoses alongside a persistent, self-evolving Skill Memory of reusable code snippets. This architecture not only provides transparent mutation traces but also enables cross-run programmatic knowledge transfer. Extensive evaluations across control benchmarks (Lunar Lander, Acrobot, Pendulum) and a 36-dimensional antenna array synthesis task demonstrate exceptional sample efficiency. Notably, REFLEX solves Acrobot and Pendulum in under 10 LLM calls and reaches a best Normalized Weighted Score of 1.092 on Lunar Lander, achieving highly competitive final performance while significantly accelerating the early-stage discovery of transparent policies.

07.
arXiv (CS.CV) 2026-06-12

Skill-3D: Evolving Scene-Aware Skills for Agentic 3D Spatial Reasoning

This paper explores agentic 3D spatial understanding, i.e., MLLM agents performing 3D reasoning through tool use. Existing methods often misuse tools and exhibit biased tool preferences under 3D scenarios, leaving the agentic paradigm with only marginal gains over non-agentic strategies. We reveal that 3D spatial reasoning tasks are heterogeneous across scenes, while these agents apply a uniform tool-use strategy to all scenes rather than selecting tools according to the specific scene and task. To address this, we propose Skill-3D, a framework that learns self-evolving scene-aware skills. Specifically, Skill-3D identifies the task scene and records the agent's tool-use trajectory into a Scene Memory, where successful trajectories from similar scenes are aggregated and distilled into a reusable scene-aware skill, with failed ones attached to the skill as lessons. During training, once a similar scene recurs, the corresponding skill is injected to guide the agent, producing new trajectories whose successes and failures further refine the skill, forming a loop in which the memory and the skill library co-evolve. Experiments show that Skill-3D substantially improves tool utilization in 3D spatial reasoning (from 39% to 78% on VSI-Bench), driving the agent toward correct and sufficient tool use. For instance, it improves Gemini-3-Flash by 67% on MMSI-Bench. Furthermore, we conduct agentic post-training over skill-guided trajectories, which boosts Qwen3-VL-8B by 60% on VSI-Bench.

08.
arXiv (CS.CV) 2026-06-11

XPR: An Extensible Cross-Platform Point-Based Differentiable Renderer

Point-based differentiable rendering underpins modern 3D reconstruction, novel-view synthesis, and learning-based graphics pipelines, but developing new rendering methods often requires extensive low-level implementation, hardware-specific kernels, and manually written backward passes. This limits rapid prototyping, reproducibility, exploration, and deployment, especially across diverse hardware platforms. This paper presents XPR, an extensible cross-platform framework for point-based differentiable rendering. XPR introduces a high-level programming interface that separates method-specific logic from the shared rendering pipeline, allowing users to implement new methods in a few lines of code. Its pipeline decomposes rendering into modular, statically shaped parallel operations that can be lowered by a cross-platform compiler to GPUs, TPUs, CPUs, and other ML accelerators. We demonstrate implementations of 3DGS, 3DGUT, and LinPrim, with only a few 100s lines of Python code, each of which can be compiled to a range of hardware platforms with the XLA compiler. These results show that XPR enables fast experimentation and portable execution for emerging point-based differentiable rendering systems.

09.
arXiv (CS.CL) 2026-06-12

SkMTEB: Slovak Massive Text Embedding Benchmark and Model Adaptation

We introduce SkMTEB, the first comprehensive MTEB-style text embedding benchmark for Slovak, a low-resource West Slavic language, comprising 31 datasets across 7 task types – nearly 4$\times$ the depth of existing multilingual benchmark coverage for Slovak. Our evaluation of 31 embedding models reveals that large instruction-tuned multilingual models achieve the strongest performance, while existing Slovak-specific models trained for NLU tasks transfer poorly to embedding tasks. To address the need for efficient, locally-deployable Slovak embeddings, we develop \texttt{e5-sk-small} (45M parameters) and \texttt{e5-sk-large} (365M) by applying vocabulary trimming and fine-tuning to Multilingual E5 models. Despite size reductions of up to 62\%, our open-source models achieve competitive performance with proprietary APIs while remaining locally deployable for semantic search and retrieval-augmented generation (RAG). We release the benchmark, models, datasets, and code openly, hoping our approach offers a replicable path for other under-resourced languages.

10.
arXiv (math.PR) 2026-06-16

The existence of invariant sublinear expectations for $G$-SDEs

arXiv:2606.15203v1 Announce Type: new Abstract: In this paper, we study the existence of invariant sublinear expectations of Markovian semigroups on sublinear expectation spaces. To achieve this, we establish a complete metric space of sublinear expectations, on which we extend Harris' method to the nonlinear setting on the convergence of sublinear semigroups. We then explore two cases of $G-$diffusions by studying the Lyapunov function and the local Doeblin condition. One is the $G-$Brownian motion on the unit circle which is the case studied in Feng and Zhao [Zhaonon], but with the new method. Another is the multidimensional $G-$SDEs on the whole space $\mathbb{R}^d$. We establish, for the first time in the literature, the existence of the invariant sublinear expectation for $G-$SDEs under the non-degenerate and weakly dissipative assumption. For this, we prove that for a class of $G-$SDEs, the $G-$expectation can be represented as the supremum of the semigroup of a family of SDEs, of which the regularity is obtained by considering the Bismut-Elworthy-Li formula and the Denis-Hu-Peng representation for the distribution of $G-$Brownian motions.

11.
arXiv (CS.AI) 2026-06-17

Talking to Your Data: Exploring Embodied Conversation as an Interface for Personal Health Reflection

arXiv:2606.17767v1 Announce Type: cross Abstract: Personal health data from wearables are typically presented through dashboards of charts and summary statistics, requiring users to actively interpret patterns and implications. We explore an alternative interaction paradigm: engaging with personal health data through an embodied conversational agent that facilitates objective data reflection in dialogue with the user. We present a system that combines lightweight preprocessing of wearable data with a Unity-based embodied character. Internally, the system follows a dual-agent design in which an Observer agent extracts descriptive statistics and temporal trends, and a Presenter agent communicates these findings through "spoken statistics," intentionally refraining from clinical advice to isolate the impact of the interaction modality. We evaluate this approach through a simulated-self user study (N=5) using a within-subject design. Participants adopted health personas and goals derived from the LifeSnaps dataset to compare traditional dashboard exploration with embodied conversational reflection. Our evaluation focuses on perceived understanding, the specificity of generated actions, and the cognitive shift from passive viewing to active sensemaking. The paper contributes a functional prototype, a design pattern for objective health data narrative generation, and early empirical insights into how embodiment affects the interpretation of personal health metrics.

12.
arXiv (CS.CL) 2026-06-11

Pretrained self-supervised speech models can recognize unseen consonants

Modern pretrained self-supervised automatic speech recognition models are trained on large-scale audio data to encode speech into contextualized representations. However, their training data are heavily skewed toward high-resource languages with little data from low-resource languages, raising concerns about the potential underrepresentation of typologically uncommon speech sounds such as click consonants primarily found in Khoisan languages. This leads to our central research question: Can these models recognize click consonants as accurately as other speech sounds? To address this question, we fine-tune and compare pretrained self-supervised speech models (Wav2Vec2 and HuBERT) on data from two click-rich Khoisan languages (G|ui and West !Xoon). Our results reveal that the fine-tuned models consistently recognize clicks more accurately than non-clicks, suggesting that self-supervision enables generalization across human speech sounds including rare phonemes.

13.
arXiv (quant-ph) 2026-06-16

Non-Gaussian Phase Transition and Cascade of Instabilities in the Dissipative Quantum Rabi Model

arXiv:2507.07092v3 Announce Type: replace Abstract: The open quantum Rabi model describes a two-level system coupled to a harmonic oscillator. A Gaussian phase transition for the nonequilibrium steady states has been predicted when the bosonic mode is soft and subject to damping. We show that oscillator dephasing is a relevant perturbation, which leads to a non-Gaussian phase transition and an intriguing cascade of instabilities for $k$-th order bosonic operators, as well as a jump in the steady-state qubit polarization. For the soft-mode limit, the equations of motion form a closed hierarchy and spectral properties can be efficiently studied. To this purpose, we establish a fruitful connection to non-Hermitian Hamiltonians. The results for the phase diagram, stability boundaries, and relevant observables are based on mean-field analysis, exact diagonalization, perturbation theory, and Keldysh field theory.

14.
arXiv (CS.CV) 2026-06-11

Periodic-MAE: Periodic Video Masked Autoencoder for rPPG Estimation

In this paper, we propose Periodic-MAE, a self-supervised framework for learning generalizable spatio-temporal representations of periodic physiological signals from unlabeled facial videos. The proposed method leverages a masked autoencoder (MAE), which learns high-dimensional facial representations by reconstructing masked video tokens without relying on remote photoplethysmography (rPPG) specific supervision. To explicitly align representation learning with the characteristics of rPPG, we introduce a periodicity-aware frame masking strategy based on video resampling, enabling the encoder to learn representations that capture quasi-periodic temporal patterns relevant to pulse signal estimation. In addition, physiological bandlimit constraints are integrated into the MAE pre-training framework, exploiting the sparsity of pulse signals in the frequency domain to guide the learned representations toward physiologically meaningful patterns. After pre-training, the learned representations are transferred to downstream rPPG estimation, where the encoder serves as a generic feature extractor for recovering pulse-related signals from facial videos. We conduct extensive experiments on four benchmark datasets, including PURE, UBFC-rPPG, MMPD, and V4V. Moreover, we evaluate the proposed approach on a real-world rPPG dataset collected under unconstrained lighting conditions and subject motion. Experimental results demonstrate that Periodic-MAE consistently improves rPPG estimation performance, particularly in challenging cross-dataset and real-world evaluation settings. Our code is available at https://github.com/ziiho08/Periodic-MAE.

15.
arXiv (CS.CL) 2026-06-15

TVIR: Building Deep Research Agents Towards Text-Visual Interleaved Report Generation

Deep Research Agents have shown strong capability in multi-step information retrieval, reasoning, and long-form report generation, but existing benchmarks and systems remain predominantly text-centric, with limited evaluation of whether visual elements are factually reliable and well aligned with the surrounding analysis. To address this gap, we introduce TVIR (Text-Visual Interleaved Report Generation), which includes TVIR-Bench, a benchmark of 100 expert-curated multimodal deep research tasks that require visual elements to serve specific analytical sub-goals, and TVIR-Agent, a hierarchical multi-agent framework that serves as a strong baseline for constructing outlines, retrieving images, generating charts with traceable sources, and composing reports through context-aware sequential writing. We further develop a dual-path evaluation framework that combines Textual Assessment and Visual Assessment. Experiments across nine deep research systems show that TVIR-Agent achieves strong overall performance, underscoring the importance of explicit multimodal design and evaluation for evidence-driven report generation.

16.
arXiv (quant-ph) 2026-06-11

Diffusive Relaxation of Participation Entropy in U(1)-symmetric Dynamics

arXiv:2606.11561v1 Announce Type: new Abstract: Participation entropy (PE) quantifies the spread of a many-body wavefunction across configuration space. While PE relaxes rapidly in generic chaotic systems, we show that $\mathrm{U}(1)$ conservation laws slow it down by imprinting with the slow hydrodynamic modes. Using a cluster expansion around equilibrium, we show that, after local density inhomogeneities decay, the leading PE deficit is dominated by squared connected density correlations. The long time relaxation is therefore controlled by diffusive correlation spreading, giving $\Delta S(t)\sim t^{-1/2}$ in the hydrodynamic regime and crossing over to $\sim \exp[-O(t/L^2)]$ when $t\geq L^2$. We confirm this entropy correlation relation using exact computation and infinite system tensor network simulations in various quantum $\mathrm{U}(1)$ conserving circuits. Our results establish PE as a sensitive probe of hydrodynamic memory and suggest that slow relaxation is a generic consequence of conservation laws.

17.
arXiv (quant-ph) 2026-06-16

Quantum speedup from nonclassical polarization

arXiv:2603.23124v2 Announce Type: replace Abstract: We develop a framework for identifying nonclassical speedups in systems with polarization, likewise spin degrees of freedom. By confining the dynamics to the manifold of angular momentum coherent states, which act as the classical reference in this case, we compute the speed limit that bounds the rate of change of the state achievable without generating quantum coherence. A comparison with the unrestricted quantum speed limit enables the quantitative identification of speedups arising from polarization nonclassicality. We apply this framework to the cross-Kerr interaction, demonstrating a persistent speedup scaling as $\mathcal{O}(\sqrt{N})$ with the photon number $N$ with a parity effect in favour of even photon numbers. The results establish polarization nonclassicality as a genuine dynamical resource, linking quantum coherence to quantum-enhanced evolution speeds in nonlinear photonic systems.

18.
PLOS Computational Biology 2026-06-22

TCRBinder: Unified pre-trained language model with paired-chain synergy for predicting T-cell receptor binding specificity

Authors:

by Weihe Dong, Qiang Yang, Long Xu, Xiaokun Li, Kuanquan Wang, Suyu Dong, Gongning Luo, Xianyu Zhang, Tiansong Yang, Xin Gao, Guohua Wang Deciphering how human T cells recognise peptide-HLA (pHLA) complexes underpins next-generation vaccines and personalised immunotherapies, yet extreme sequence diversity and paired-chains interdependence still hamper reliable in silico prediction of T-cell receptor (TCR) specificity. To overcome these hurdles, we built TCRBinder, a paired-chain-aware deep model with a multi-branch encoder that routes each molecular component through dedicated transformer-based modules to capture contextual signals in both HLA pseudo-sequences and antigenic peptides while simultaneously processing the TCR α and β chains. This design captures the synergistic interaction between paired chains to emulate peptide-HLA-TCR (PHT) interactions and expose residue-level contact motifs. Across PHT and peptide-TCR (pTCR) benchmarks, the model delivered state-of-the-art performance (AUC-ROC = 0.911, AUPR = 0.791 for the PHT task) and remained superior on multiple independent datasets. We tracked the dynamics of clonal expansion and, in a large SARS-CoV-2 repertoire containing completely unseen peptides, improved the AUC-ROC by up to 16.3% over the leading alternatives. Moreover, TCRBinder provided mechanistic insights by pinpointing contact hotspots and quantifying residue contributions to binding probability. These capabilities position TCRBinder as a versatile tool for rational antigen discovery, immunotherapy stratification, and neoantigen vaccine design.

19.
arXiv (CS.LG) 2026-06-16

Benchmarking Instance-Dependent Label Noise with Controlled Corruptions

arXiv:2606.14965v1 Announce Type: new Abstract: Synthetic instance-dependent label noise (IDN) benchmarks are widely used to evaluate noisy-label learning methods, yet existing approaches typically generate noise through imperfect annotators or classifier raters, leaving the source of ambiguity implicit. We introduce CILN, a benchmark generation framework that creates IDN through controlled input corruptions. A diverse voter pool labels corrupted instances, producing benchmark datasets in which both the source and severity of ambiguity are explicit and controllable. Using CIFAR10, MNIST, and Adult, we construct 90 benchmark settings spanning multiple corruption families and severity levels. Our experiments show that the resulting benchmarks exhibit genuine instance-dependent noise, provide diverse confusion structures, and, on CIFAR-10, can produce label distributions that are closer to human uncertainty than an existing synthetic IDN benchmark. We further demonstrate that corruption-mediated IDN can expose failure modes of popular noisy-label learning methods, including Co-Teaching and DivideMix, that are not observed under comparable levels of rater-fallibility noise. These findings suggest that noise structure, not only noise rate, plays an important role in benchmark difficulty and algorithm behavior. By making ambiguity generation explicit and controllable, CILN provides a complementary benchmarking framework for studying noisy-label learning under diverse sources of instance difficulty.

21.
arXiv (quant-ph) 2026-06-12

Unifying spacetime approaches to quantum mechanics

arXiv:2606.12539v1 Announce Type: new Abstract: Recent efforts to formulate quantum mechanics in a way that treats space and time on a more equal footing have led to a large variety of spacetime-oriented approaches. In this work we present a detailed study of spacetime states, the objects that play the role of quantum states in the recently introduced framework of spacetime quantum mechanics, and show that the main proposals in the literature are different manifestations of the same underlying object. Path integrals, quantum states over time, pseudo-density matrices, the Page and Wootters mechanism, superdensity operators, and timelike-entanglement proposals all arise from spacetime states through particular evaluations, reduced information, linear maps, or quantum channels. This unification provides explicit mathematical representations of these formalisms, reveals relations among them, and clarifies the spacetime information each one captures. We also study the broader relevance of the spacetime-state point of view for Leggett-Garg inequalities, OTOCs, temporal tensor networks, fermionic systems, relativistic QFTs, quantum reference frames, and classical physics, together with additional insights and perspectives revealed by the common unifying framework.

22.
arXiv (CS.LG) 2026-06-16

Manifold-Orthogonal Dual-spectrum Extrapolation for Parameterized Physics-Informed Neural Networks

arXiv:2603.13751v2 Announce Type: replace Abstract: Physics-informed neural networks (PINNs) have achieved notable success in modeling dynamical systems governed by partial differential equations (PDEs). To avoid computationally expensive retraining under new physical conditions, parameterized PINNs (P$^2$INNs) commonly adapt pre-trained operators using singular value decomposition (SVD) for out-of-distribution (OOD) regimes. However, SVD-based fine-tuning often suffers from rigid subspace locking and truncation of important high-frequency spectral modes, limiting its ability to capture complex physical transitions. While parameter-efficient fine-tuning (PEFT) methods appear to be promising alternatives, applying conventional adapters such as LoRA to P$^2$INNs introduces a severe Pareto trade-off, as additive updates increase parameter overhead and disrupt the structured physical manifolds inherent in operator representations. To address these limitations, we propose Manifold-Orthogonal Dual-spectrum Extrapolation (MODE), a lightweight micro-architecture designed for physics operator adaptation. MODE decomposes physical evolution into complementary mechanisms including principal-spectrum dense mixing that enables cross-modal energy transfer within frozen orthogonal bases, residual-spectrum awakening that activates high-frequency spectral components through a single trainable scalar, and affine Galilean unlocking that explicitly isolates spatial translation dynamics. Experiments on challenging PDE benchmarks including the 1D Convection–Diffusion–Reaction equation and the 2D Helmholtz equation demonstrate that MODE achieves strong out-of-distribution generalization while preserving the minimal parameter complexity of native SVD and outperforming existing PEFT-based baselines.

23.
arXiv (CS.CV) 2026-06-16

Rotational Symmetry based Object Pose Estimation from Point Clouds in the Absence of Known 3D Models

Object pose estimation is crucial to many industrial applications, with one example being automated spray painting using a robot. However, confidentiality concerns often limit access to high-quality 3D models, posing a significant challenge for point-cloud-based pose estimation. In such scenarios, rotational symmetry, a readily accessible characteristic of many industrial objects, can provide valuable prior information to facilitate pose estimation.In this paper, we propose a method that leverages the rotational symmetry commonly found in industrial objects to address the challenge caused by the absence of 3D models. The object pose is jointly estimated with point cloud refinement through an iterative optimization process. This optimization relies on a rotational symmetry constraint loss. To construct this loss, each 3D point is rotated according to the currently estimated pose, and multiple correspondences are identified using nearest-neighbor search by exploiting the rotational symmetry property. These correspondences are then used to compute the rotational symmetry constraint loss, which iteratively refines both the pose and the point cloud.By explicitly incorporating rotational symmetry into the optimization process, the proposed method achieves robust pose estimation and generalizes well across diverse object types. The proposed method is evaluated on a dataset specifically created for point clouds without known 3D models, consisting of four categories of synthetic objects and one real wheel hub collected from a production line. Experimental results demonstrate that the proposed method achieves performance comparable to methods that rely on known 3D models.

24.
arXiv (CS.AI) 2026-06-16

Sensor-Conditioned Representation Learning via Scene-Relevant Observation Quotients

arXiv:2606.16210v1 Announce Type: new Abstract: Learned representations in intelligent sensing systems are often evaluated by reconstruction fidelity or downstream prediction accuracy, but these criteria do not specify which latent distinctions are justified by the sensing process. In sensor-conditioned environments, nuisance factors can change measurements without changing the scene, while distinct scenes may be indistinguishable under limited sensing capability. This paper formulates sensor-conditioned representation correctness as preserving sensing-supported scene distinctions while suppressing nuisance-induced and sensor-unsupported variation. We introduce the scene-relevant observation quotient, a representation target induced by sensing-supported distinguishability after nuisance canonicalization, and develop Observation-Quotient Tucker-Structured Autoencoding (OQ-TSAE), a scene-nuisance factorized framework with diagnostics for false distinction, false merge, nuisance sensitivity, and latent ordering consistency. Experiments on a controlled benchmark show that quotient-consistent supervision improves representation-correctness diagnostics over reconstruction-oriented, metric-learning, and contrastive-learning baselines. Sensitivity, perturbation, and ablation studies show the importance of quotient-aligned supervision, reliable quotient relations, and quotient geometry. Complementary real-radar experiments show that a reconstruction-only OQ-TSAE variant retains competitive downstream utility, robustness under observation degradation, and low seed-to-seed variability. These results suggest that sensor-conditioned representations should be evaluated not only by predictive utility, but also by whether their latent geometry preserves sensing-justified scene distinctions.

25.
arXiv (CS.AI) 2026-06-16

MiroBench: Benchmarking Realism in Agentic Simulation of Real-world Discussions

arXiv:2606.14715v1 Announce Type: cross Abstract: LLM agents are increasingly used to simulate real world interactions, but it remains unclear whether simulated behaviors preserve the content patterns and interaction dynamics of real human behaviors. Existing evaluations remain fragmented, which makes it difficult to compare systems or measure progress. In this paper, we focus on Reddit discussions as a concrete first step toward evaluating real-world social simulation. Reddit threads provide public, topic-grounded, multi-party interactions where people share experiences, debate, seek advice, express emotion, and collectively respond to products, events, and social issues. These discussions offer an observable window into broader social behavior, making them a useful setting for testing whether LLM agents can reproduce not only fluent text, but also the distributional patterns and interaction dynamics of real online communities. We introduce MiroBench, a benchmark for Reddit discussion simulation built from 4,292 real Reddit threads. MiroBench uses statistical tests to compare generated and real discussions across four major aspects: repetition and semantic uniformity, narrative content, toxicity and aggression, and structural complexity. Experiments across five domains and five models show that current simulators remain distributionally mismatched with real Reddit threads, while a lightweight prompt-based improvement procedure provides only limited gains. MiroBench offers a concrete benchmark for measuring, diagnosing, and improving realism in LLM-based social simulation.