Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-19

On the Limitations of Ray-Tracing for Learning-Based RF Tasks in Urban Environments

arXiv:2507.19653v2 Announce Type: replace-cross Abstract: We study the realism of Sionna v1.0.2 ray-tracing for outdoor cellular links in central Rome. We use a real measurement set of 1,664 user-equipments (UEs) and six nominal base-station (BS) sites. Using these fixed positions we systematically vary the main simulation parameters, including path depth, diffuse/specular/refraction flags, carrier frequency, as well as antenna's properties like its altitude, radiation pattern, and orientation. Simulator fidelity is scored for each base station via Spearman correlation between measured and simulated powers, and by a fingerprint-based k-nearest-neighbor localization algorithm using RSSI-based fingerprints. Across all experiments, solver hyper-parameters are having immaterial effect on the chosen metrics. On the contrary, antenna locations and orientations prove decisive. By simple greedy optimization we improve the Spearman correlation by 5% to 130% for various base stations, while kNN-based localization error using only simulated data as reference points is decreased by one-third on real-world samples, while staying twice higher than the error with purely real data. Precise geometry and credible antenna models are therefore necessary but not sufficient; faithfully capturing the residual urban noise remains an open challenge for transferable, high-fidelity outdoor RF simulation.

02.
arXiv (math.PR) 2026-06-17

Non-asymptotic Tail Bounds for the Kostlan–Shub–Smale Field: Tensor PCA and Spherical $k$-Spin Complexity

arXiv:2606.17665v1 Announce Type: cross Abstract: This paper builds a hierarchy of explicit, non-asymptotic tail bounds for the supremum of the Kostlan–Shub–Smale (KSS) random field on the sphere, and applies it to two problems: Spiked Tensor PCA and the landscape of the spherical $k$-spin model. For Tensor PCA, we study the non-asymptotic statistical limits of estimating a rank-$R$ symmetric signal tensor of order~$k\ge 3$ and dimension~$d\ge 3$ from a single Gaussian observation at signal-to-noise ratio~$\lambda$, through the profile maximum likelihood estimator, the MLE restricted to normalized rank-$R$ tensors of coherence at least~$\kappa$. Our analysis uses a single reduction: a deterministic geometric inequality (the Tube Method) and a rank-reduction step bound the estimation error by the supremum of the canonical KSS field, which the Kac–Rice formula turns into a Gaussian integral against the expected absolute characteristic polynomial of a shifted Gaussian Orthogonal Ensemble, controlled in turn by the four explicit tail bounds of our hierarchy (three from a Mehta–Fyodorov representation, one from a Ben Arous–Dembo–Guionnet large deviation). The same reduction yields two results, each with explicit constants. For estimation, a finite-$(k,d)$ error bound recovers the asymptotically optimal rate~$\sqrt{d\log k}$ of Perry, Wein and Bandeira, with explicit dependence on the rank~$R$ and the coherence~$\kappa$. For the landscape, a two-sided non-asymptotic bracketing of the annealed complexity of the spherical $k$-spin Hamiltonian recovers the Auffinger–Ben Arous–\v{C}ern\'y complexity function in the high-dimensional limit.

03.
medRxiv (Medicine) 2026-06-17

Low-Density Lipoprotein Cholesterol and Dementia Risk: Integrating Mendelian Randomization and Target Trial Emulation Within the Heart-Brain Axis

Background: The heart-brain axis links cardiovascular and neurodegenerative disease through shared vascular and inflammatory mechanisms. Although low-density lipoprotein cholesterol (LDL-C) is an established causal factor in atherosclerotic cardiovascular disease (ASCVD), its relationship with dementia remains uncertain, with midlife elevations associated with increased risk but late-life associations often appearing null or inverse. To address this cholesterol paradox, we integrated mendelian randomization (MR) with an active-comparator new-user target trial emulation. Methods: We applied a triangulated causal inference framework integrating two-sample MR with observational target trial emulation. Genetic variants associated with LDL-C were used as instrumental variables to evaluate Alzheimer disease (AD), dementia with Lewy bodies (DLB), frontotemporal dementia (FTD), and any dementia (AnyDem), with causal estimates derived using inverse-variance weighted models and sensitivity analyses for heterogeneity and pleiotropy. In parallel, an active-comparator new-user design compared statin versus ezetimibe initiation among adults aged 60 years or older using propensity score (PS) overlap weighting and Cox proportional hazards models to evaluate cardiovascular and dementia outcomes. Results: Genetically predicted LDL-C was associated with increased risk of DLB (OR 1.65, 95% CI 1.30-2.10; p

04.
arXiv (quant-ph) 2026-06-12

Improving Variational Counterdiabatic Driving with Weighted Actions and Computer Algebra

arXiv:2505.18367v4 Announce Type: replace Abstract: Variational counterdiabatic (CD) driving is a disciplined and widely used method to robustly control quantum many-body systems by mimicking adiabatic processes with high fidelity and reduced duration. Central to this technique is a universal structure of the adiabatic gauge potential (AGP) over a parameterized Hamiltonian. Here, we reveal that introducing a new degree of freedom into the theory of the AGP can significantly improve variational CD driving. Specifically, we find that the algebraic characterization of the AGP is not unique, and we exploit this nonuniqueness to develop the weighted variational method for deriving a refined driving protocol. This approach extends the conventional method in two aspects: it assigns customized weights to matrix elements relevant to specific problems, and it effectively incorporates nonlocal information into local driving coefficients. We also develop an efficient numerical algorithm to compute the refined driving protocol using computer algebra. Our framework is broadly applicable and, in principle, it can replace any previous use of variational CD driving. We demonstrate its practicality by applying it to adiabatic evolution along the ground state of a parameterized Hamiltonian. This proposal outperforms the conventional method in terms of fidelity, as confirmed by extensive numerical simulations on quantum Ising models.

05.
arXiv (CS.CL) 2026-06-19

NRITYAM: Language Models Meet Art and Heritage of Dance

Language models have become essential tools in shaping modern workflows. However, their global effectiveness hinges on a nuanced understanding of local socio-cultural contexts. To address this gap, we present NRITYAM, a comprehensive benchmark for evaluating the cultural comprehension capabilities of language models in the context of global dance traditions. NRITYAM comprises 9,260 carefully curated question-answer pairs spanning 12 languages, making it the largest dataset dedicated to evaluating cultural knowledge in dance. The dataset has been developed from the ground up through close collaboration with native dance artists and native speakers of the languages, who authored and validated culturally relevant questions specific to their regions. We evaluate a broad set of models, including large language models, small language models, multimodal large language models, and small multimodal language models. As a multilingual and multicultural benchmark, NRITYAM sets a new standard for evaluating the ability of AI systems to understand and reason about traditional performing arts. Detailed dataset samples are available at~\url{https://github.com/niladrighosh03/NRITYAM}.

06.
arXiv (CS.CV) 2026-06-16

From Static Inference to Dynamic Interaction: A Survey of Streaming Large Language Models

Standard Large Language Models (LLMs) are predominantly designed for static inference with pre-defined inputs, which limits their applicability in dynamic, real-time scenarios. To address this gap, the streaming LLM paradigm has emerged. However, existing definitions of streaming LLMs remain fragmented, conflating streaming generation, streaming inputs, and interactive streaming architectures, while a systematic taxonomy is still lacking. This paper provides a comprehensive overview and analysis of streaming LLMs. First, we establish a unified definition of streaming LLMs based on data flow and dynamic interaction to clarify existing ambiguities. Building on this definition, we propose a systematic taxonomy of current streaming LLMs and conduct an in-depth discussion on their underlying methodologies. Furthermore, we explore the applications of streaming LLMs in real-world scenarios and outline promising research directions to support ongoing advances in streaming intelligence. We maintain a continuously updated repository of relevant papers at https://github.com/EIT-NLP/Awesome-Streaming-LLMs.

07.
arXiv (CS.CV) 2026-06-15

S$^2$COPE: Self-Supervised Concept Discovery via Preference Learning

Current representation learning paradigms force a fundamental compromise: self-supervised methods scale to massive datasets but yield opaque features, whereas interpretable models remain bottlenecked by the need for dense human annotation. We introduce Self-Supervised Concept discOvery via Preference lEarning (\model), a label-free framework that resolves this dilemma. Instead of treating Vision-Large-Language Models (VLLMs) as static feature extractors, \model leverages them as active participants in a self-supervised preference optimization loop. By autonomously hypothesizing, validating, and reinforcing candidate visual attributes directly from raw imagery, our framework discovers novel, structured concepts without a single label. Extensive experiments across natural, medical, and physics domains demonstrate that \model successfully extracts domain-specific concepts where standard VLLMs often fail to generate. By amortizing concept discovery directly into the VLLM backbone through our self-supervised preference objective – rather than relying on static generation and disjoint filtering – we achieve up to a 24-point absolute improvement in downstream top-1 classification accuracy on unseen data. Our work suggest that interpretability can emerge through a model's autonomous interaction with incidental visual structures, without any human supervision.

08.
arXiv (CS.CV) 2026-06-11

SG2Loc: Sequential Visual Localization on 3D Scene Graphs

Visual localization in complex indoor environments remains a critical challenge for robotics and AR applications. Sequential localization, where pose estimates are refined over time, is important for autonomous agents. However, traditional methods often require storing extensive image databases or point clouds, leading to significant overhead. This paper introduces a novel, lightweight approach to sequential visual localization using 3D scene graphs. Our method represents the environment with a compact scene graph, where nodes represent objects (with coarse meshes) and edges encode spatial relationships. For each image in the localization phase, we extract per-patch semantic features, predicting object identities. Localization is performed within a particle filter framework. Each particle, representing a camera pose, projects the coarse object meshes from the scene graph into the image, assigning object identities to patches based on visibility. The similarity of the per-patch features, in the input image, and object features from the scene graph determines the weight of a particle. Subsequent images are incorporated sequentially, refining the pose estimate. By leveraging a compact scene graph and efficient semantic matching, our method significantly reduces storage while maintaining performance on real-world datasets. The code will be available at https://github.com/DmblnNicole/sg2loc.

09.
arXiv (CS.LG) 2026-06-11

Learning Object Manipulation from Scratch via Contrastive Interaction

arXiv:2606.11525v1 Announce Type: cross Abstract: Contrastive Reinforcement Learning (CRL) has seen recent success in a wide variety of goal-conditioned robotics tasks by learning structured representations of the dynamics. However, despite its success in locomotion and simpler control domains, CRL often struggles in interaction-rich manipulation. We argue that a key source of this difficulty is object-centric interaction, such as contact or grasping, that induces distinct changes in the underlying dynamic modes. In this work, we formulate manipulation dynamics as a piecewise-smooth Markov process and show that interaction-induced mode changes create piecewise nonlinear reachability structures that are difficult for standard CRL energy functions to represent and plan over. Based on this analysis, we introduce Interaction-weighted Resampling (IWR). IWR performs interaction-aware resampling around phases before, during, and after interactions, encouraging the learned representation to preserve the mode boundaries that determine future reachability to capture multi-modal and piecewise nonlinear reachability. Across interaction-centric environments, including 2D dynamic control, robotic manipulation, and robot air hockey, IWR improves both sample efficiency and overall performance over prior CRL methods, with 19.8% average improvement in simulation. Finally, using a sim-to-real pipeline with policies trained by IWR, we demonstrate the first real-world goal-conditioned robot air hockey agent capable of hitting goals, improving success from 25% to 60%. Project Page: IWR-arxiv.github.io.

10.
arXiv (CS.CL) 2026-06-16

An Empirical Study on Learning Latent Representations for Emotional Speech Synthesis

For the last couple of years, the field of speech synthesis has improved dramatically thanks to deep learning. There are more and more deep learning-based TTS systems developed to make it possible to produce voices with high intelligibility and naturalness. Meanwhile, controlling the expressiveness is yet a big deal, generating speech in different styles or manners has received a lot of attention from community recently. This paper aims to give our solutions to deal with the task emotional speech synthesis (ESS) at VLSP 2022 which allows to generate humanlike natural-sounding voice from a given input text with desired emotional expression. By integrating speaker embedding, prosody bottleneck into FastSpeech 2, our systems can promisingly generate emotional speech of a single speaker (Sub-task 1), transfer speaking styles from another speaker to the target speaker with neutral non-expressive data while retaining the target speaker's identity (Sub-task 2).

11.
PLOS Medicine 2026-06-09

Prediction of hospitalisation in young children with pneumonia in Malawi: A machine learning-based approach

by Patrick Staunton, Mohammad Adib Makrooni, Master Chisale, Billy Nyambolo, Joseph Wu, Damien McCarthy, Mark Ledwidge, Yasir Bin Nisar, Chris Watson, Balwani Mbakaya, Cathal Seoighe, Joe Gallagher Background Globally, pneumonia remains the single biggest cause of mortality in children under 5 years of age. This study sought to train and test a prediction model for hospitalisation within 7 days after initial presentation in 2- to 59-month-old Malawian children with WHO-defined pneumonia in primary care and compare its performance to existing risk prediction models. Methods and findings BIOTOPE is a cohort study of children with pneumonia in a primary healthcare setting in Malawi. The training cohort involved nine primary care centres and the testing cohort involved two primary care centres in Northern Malawi. The training cohort was recruited between December 2022 and April 2023 while the testing cohort was recruited in 2016. Participants were consecutive children aged 2–59 months presenting with cough and/or difficulty breathing and who were diagnosed as WHO-defined pneumonia in primary care of any severity. The training cohort was used to train and validate a machine learning model with a prespecified primary outcome defined as hospitalisation and/or death within 7 days as the outcome. This model was then further evaluated in the testing cohort.Median age was 15 months (interquartile range 8−27) in the training and 17 months (interquartile range 9−29) in the external testing cohort (52.1% and 54.4% male, respectively). Hospitalisation occurred in 14.3% (294) of the training cohort and 12.1% (55) of the testing cohort. There was one death in the training cohort only. WHO danger signs were present in 17.6% (360) and 15.9% (70) of children in the training and testing cohorts, respectively. The optimal machine learning model achieved an area under the receiver operating characteristic and precision recall curves of 0.87 and 0.57, respectively, in the testing cohort outperforming existing risk prediction models; furthermore, this model produced an expected calibration error of 0.16 (a logistic regression model using severity status as the response variable and the log odds of the machine learning model’s calibrated probabilities produced an intercept estimate of −0.32 and a slope estimate of 1.13). Key limitations include the use of hospitalisation and/or death as a severity outcome, which may reflect health system factors rather than true disease severity, that mortality-based comparisons were not possible due to low mortality in these primary care cohorts, and that comparator tools were developed for hospital populations rather than primary care populations. Conclusion This machine learning score outperformed traditional pneumonia risk scores in predicting hospitalisation within 7 days in Malawian children presenting to primary care. Traditional pneumonia risk scores diminish in performance when externally applied to new datasets suggesting they may not generalise well beyond their original derivation settings. Mortality-related findings are not applicable as there was only one death in this cohort. Overall these findings support the potential of machine learning to meaningfully improve early identification of children at risk of severe pneumonia in low-resource primary care settings. Further external validation and clinical impact studies are needed to confirm these results.

12.
arXiv (CS.AI) 2026-06-17

Embedded Machine Learning for Microcontroller-Class Edge Devices: Data, Feature, Evaluation, and Deployment Pipelines

arXiv:2606.18122v1 Announce Type: cross Abstract: Embedded machine learning moves inference from cloud services to resource-constrained devices that must acquire data, preprocess signals, run a model, and act within tight limits on memory, energy, and latency. This paper presents a systems-oriented synthesis of an embedded machine-learning workflow for microcontroller-class platforms. The emphasis is placed on engineering decisions that are often hidden in generic machine-learning introductions: sampling and buffering, feature extraction as dimensionality reduction, validation under class imbalance, model/runtime co-design, and streaming deployment. Two representative signal families are used throughout the paper. The first is inertial motion recognition, where a two-second, three-axis accelerometer window is transformed from raw samples into root-mean-square and spectral features before classification. The second is keyword spotting, where audio is sampled, anti-aliased, transformed into mel-frequency cepstral coefficients, and processed by a compact one-dimensional convolutional network. The paper concludes with practical design rules for robust on-device inference, including data curation, quantization, thresholding, scheduling, and field monitoring.

13.
bioRxiv (Bioinfo) 2026-06-14

TopoMIL: Topology Improves Multiple Instance Learning in Diagnostic Microscopic Images

Microscopic images of cells and tissues are central to disease diagnosis. In computational pathology, multiple instance learning (MIL) has emerged as a key paradigm for analyzing numerous images within a single patient sample. While the representative distribution of cells in a sample is important for diagnosis, existing MIL frameworks largely overlook it. We introduce TopoMIL, a framework that extracts the representative topological structure of the sample and integrates it into the MIL classifier. Three topological representations are assessed, each with distinct advantages and computational costs. We evaluate TopoMIL on four histopathology and cytomorphology datasets, each presenting unique challenges. Integrating the sample's topological information into MIL enhances classification across average, max, attention-based, and transformer pooling, yielding AUCROC gains of 3.3%, 4.2%, 5.9%, and 0.5%, respectively, with moderate computational cost. Our work underscores the potential of TopoMIL as a scalable extension to existing morphology-based models in computational pathology.

14.
medRxiv (Medicine) 2026-06-18

Maternal and fetal HLA heterozygosity in preeclampsia: Insights from a large multi-ancestry pregnancy cohort

Preeclampsia (PE) is a leading cause of maternal and neonatal morbidity, with immune dysregulation at the maternal-fetal interface central to its pathogenesis. The highly polymorphic human leukocyte antigen (HLA) region mediates maternal immune tolerance of the semi-allogeneic fetus, yet the contribution of HLA diversity to PE risk remains poorly defined. Whether the HLA heterozygote advantage observed in other immune disorders is relevant to PE has not been systematically evaluated. Using data from the multi-ancestry TOPMed Boston-Colombia Collaborative for Adverse Pregnancy Outcomes (n = 12,790; 4,770 PE, 8,020 controls; 10,808 maternal, 1,982 fetal, including 1,848 pairs), we evaluated associations between heterozygosity across eight classical HLA loci and PE and four sub-phenotypes, adjusting for genetic ancestry. HLA heterozygosity was common across most loci (>80%). No individual maternal HLA locus was associated with overall PE; however, heterozygosity across class I loci showed a protective effect in preterm PE (OR=0.82, 95%CI:0.69-0.97), with a similar pattern for HLA-A heterozygosity (OR=0.78, 95%CI:0.64-0.96). In contrast, fetal heterozygosity at HLA-DQB1 was nominally associated with increased risk of PE (OR=1.36, 95%CI:1.03-1.79) and preterm PE (OR=1.73, 95%CI:1.13-2.73). No individual maternal or fetal HLA alleles were associated with PE. Maternal-fetal mismatch analysis demonstrated locus-specific associations with preterm PE, including increased risk with HLA-DQA1 mismatch and reduced risk with HLA-C mismatch. These findings highlight distinct maternal and fetal immunogenetic contributions to PE risk and underscore the importance of considering HLA diversity-rather than individual alleles alone-in studies of PE etiology.

15.
arXiv (CS.LG) 2026-06-19

Mask-Morph Graph U-Net: A Generalisable Mesh-Based Surrogate for Crashworthiness Field Prediction under Large Geometric Variation

arXiv:2605.15231v2 Announce Type: replace Abstract: Nonlinear finite element crash simulations are accurate but computationally expensive, limiting their use in iterative design optimisation. Machine-learning surrogate models based on graph neural networks (GNNs) offer a faster alternative. Message-passing GNNs are widely used for mesh simulation, and their shared node and edge update functions are relatively generalisable across varying graph structures. By contrast, non-shareable edge-specific aggregation layers can capture nonlinear relationships more accurately but usually require fixed graph connectivity, which limits generalisability. This paper presents Mask-Morph Graph U-Net (MMGUNet), a practical approach to addressing the limitation of hierarchical Graph U-Net architectures that use edge-specific downsampling and upsampling layers. Fixed coarse graph connectivity is required for edge-specific layers. To retain this while improving spatial correspondence, the proposed method morphs the coarsened graph hierarchy to each input mesh using feature-aligned barycentric parameterisation before constructing cross-graph edges. It further applies node masking during supervised pretraining, followed by parameter-efficient fine-tuning in which high-parameter edge-specific layers are frozen. The proposed approach is evaluated in in-distribution, out-of-distribution, and cross-component transfer settings using mean Euclidean distance and maximum intrusion percentage error. Results show that coarse-graph morphing improves test accuracy relative to a fixed-coarse-graph baseline, while masked supervised pretraining reduces the train-test discrepancy and improves data efficiency during transfer. The proposed model also achieves lower prediction error compared with external baselines. These results demonstrate a practical route toward reusable, data-efficient mesh-based surrogate modelling for crashworthiness design exploration.

16.
arXiv (CS.LG) 2026-06-15

Shuttling Compiler for Trapped-Ion Quantum Computers Based on Large Language Models

arXiv:2512.18021v3 Announce Type: replace-cross Abstract: We present the first shuttling compiler based on large language models (LLMs) for trapped-ion quantum computers, where qubits are shuttled between segments for gate execution and qubit storage. We fine-tune pre-trained LLMs on examples from linear and branched one-dimensional shuttling architectures. Thus, we obtain a layout-independent compilation strategy that learns the required shuttling operations directly from data. Using benchmark circuits with up to 16 qubits, such fine-tuned LLMs can now generate valid schedules for shuttling architectures. Notably, we also obtain a valid schedule for a previously unseen four-way junction layout. This demonstrates that trained LLMs can generalize to layouts not encountered during training. For various architectures, LLM-based schedules improve upon state-of-the-art baseline compiler results, reducing the shuttling effort by up to 15%.

17.
arXiv (CS.CV) 2026-06-11

ISAP-3D: Identity-Slot Aligned Part-Aware 3D Generation

Part-aware 3D generation aims to synthesize structured objects with semantically meaningful components, yet often suffers from structural ambiguity due to identity-layout entanglement. Existing methods either infer part identity and spatial layout implicitly, which can lead to unstable part allocation (e.g., slot swapping or part merging), or rely on strong layout conditions that are difficult to obtain in practice. We attribute this ambiguity to identity-slot permutation freedom: without explicit identity-slot alignment, the correspondence between semantic parts and generation slots is not identifiable during training, allowing multiple slot assignments to fit the same supervision and leading to inconsistent decomposition. Based on this insight, we argue that stable part-aware generation requires identity-aligned one-to-one slot modelling. We therefore propose an identity-slot aligned framework, ISAP-3D, which anchors each part with semantic identity tokens and performs identity-conditioned one-to-one layout prediction, followed by layout-conditioned geometry synthesis. Structured local-global conditioning maintains identity alignment across semantic, spatial, and geometric stages. We also construct a part-level dataset with a unified semantic protocol to enable learnable and consistent identity-slot alignment. Extensive experiments demonstrate improved structural stability, controllability, and robustness over state-of-the-art part-aware generation baselines.

18.
arXiv (CS.AI) 2026-06-12

Muse Spark Safety & Preparedness Report

arXiv:2606.12429v1 Announce Type: cross Abstract: Muse Spark is the latest large language model developed by Meta. In this report, we first present evaluations for catastrophic risk domains under Meta's Advanced AI Scaling Framework, along with the evidence that informed our launch decision. We then discuss additional considerations, such as Muse Spark's broader content safety and behavioral profile, that are relevant to overall safety but fall outside the catastrophic risk domains governed by the Framework. Our preparedness results covering Chemical and Biological, Cybersecurity, and Loss of Control risks assess Muse Spark's deployment within Meta AI as presenting acceptable levels of residual risks under our Advanced AI Scaling Framework. We conducted a broad set of evaluations targeting dual-use and high-risk capabilities across these catastrophic risk domains. Those evaluations identified elevated risks prior to mitigations, with Chemical and Biological capabilities assessed as likely reaching the "high risk" category under the Advanced AI Scaling Framework before safeguards were applied. We have implemented a multi-layered set of mitigations that address the identified risks, and Muse Spark demonstrates state-of-the-art refusal across a range of benchmarks related to hazardous workflows in chemistry and biology. We therefore release Muse Spark as the underlying model of Meta AI.

19.
arXiv (CS.CL) 2026-06-16

Evaluating the Robustness of Proof Autoformalization in Lean 4

Proof autoformalization aims to translate a mathematical informal proof written in natural language into a formal proof in a formal language such as Lean~4. Several works have developed LLM-based models for proof autoformalization. However, existing evaluations have typically focused on translating well-formed informal proofs from curated datasets. We argue that a robust proof autoformalizer must remain faithful even for informal proofs that diverge from these idealized ones, and we present the first study on the robustness of proof autoformalization models. We formulate two categories of perturbations and evaluate robustness under each: a global perturbation paraphrases the informal proof in a different style, under which the formalization should remain consistent; a local perturbation alters a value, symbol, or proof step, possibly in a counterfactual way, and a robust formalization should faithfully reflect the perturbation rather than reverting to the original one or inferring a different one on its own. We build a benchmark with both perturbations on miniF2F and MATH-500, and automatically measure how stable a proof autoformalization's correctness is under global perturbations and how faithfully its output reflects local perturbations. We evaluate seven recent models, all of which are sensitive to global perturbations and mostly fail to remain faithful under local perturbations. Code and data are available via https://github.com/ucr-rai/robust-proof-autoformalization.

20.
arXiv (CS.CV) 2026-06-17

Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks

The data processing inequality is an information-theoretic principle stating that the information content of a signal cannot be increased by processing the observations. In particular, it suggests that there is no benefit in enhancing the signal or encoding it before addressing a classification problem. This assertion can be proven to be true for the case of the optimal Bayes classifier. However, in practice, it is common to perform "low-level" tasks before "high-level" downstream tasks despite the overwhelming capabilities of modern deep neural networks. In this paper, we aim to understand when and why low-level processing can be beneficial for classification. We present a comprehensive theoretical study of a binary classification setup, where we consider a classifier that is tightly connected to the optimal Bayes classifier and converges to it as the number of training samples increases. We prove that for any finite number of training samples, there exists a pre-classification processing that improves the classification accuracy. We also explore the effect of class separation, training set size, and class balance on the relative gain from this procedure. We support our theory with an empirical investigation of the theoretical setup. Finally, we conduct an empirical study where we investigate the effect of denoising and encoding on the performance of practical deep classifiers on benchmark datasets. Specifically, we vary the size and class distribution of the training set, and the noise level, and demonstrate trends that are consistent with our theoretical results.

21.
arXiv (CS.AI) 2026-06-19

Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models

arXiv:2606.05833v2 Announce Type: replace-cross Abstract: Multimodal Large Language Models (MLLMs) excel at 2D semantic understanding but lack intrinsic 3D awareness, resulting in representations that fail to maintain geometric and spatial consistency across video frames. Given the scarcity of large-scale 3D data, we present GeoVR, a novel framework that learns geometric representations using purely 2D video sequences. This approach effectively restructures the semantic latent space within MLLMs to unlock spatial intelligence. Rather than employing superficial feature mixing, GeoVR reshapes the internal representations of the MLLM by distilling geometry knowledge from pre-trained 3D foundation models. This is accomplished through a multi-objective learning strategy driven by four complementary geometric targets: (1) estimating inter-frame camera poses to embed varying viewpoint dynamics, (2) regressing dense depth maps to anchor physical distances, (3) predicting a metric scale factor for real-world calibration, and (4) distilling multi-scale 3D features to align the intermediate feature space. Guided by these explicit physical and geometric constraints, the model's internal representations naturally develop strong 3D awareness. Extensive experiments on spatial reasoning benchmarks demonstrate that GeoVR achieves state-of-the-art performance, establishing a new paradigm for endowing foundation models with spatial intelligence.

22.
medRxiv (Medicine) 2026-06-11

Parent and physiotherapist perceptions about movement skills of young children with juvenile idiopathic arthritis

Objective: The onset of juvenile idiopathic arthritis (JIA) in the early years ([≤]5 years) may negatively impact movement skill (encompassing related concepts of gross motor skills, fundamental movement skills, and functional ability) development. Few studies have explored the perceptions and needs of parents and physiotherapists towards children's difficulty with these movement skills, essential to identify potential areas for added support. The objective of this study is to understand the perceptions of physiotherapists and parents towards movement skills of children with JIA. Methods: Seventeen parents and 24 physiotherapists completed an online questionnaire consisting of multiple choice and open-ended questions about the movement skills of young children with JIA. Demographic and multiple choice questions were quantitively analysed using descriptive statistics. Open-ended responses were analyzed using qualitative conventional content analysis. Results: About half (47%) of parents perceived their children to have movement difficulties, and 75% of physiotherapists described the movement skills of children with JIA as worse than other children of the same age. Our qualitative analysis revealed three general themes including: functional task difficulties; clinical variability in movement skills; and psychosocial components of movement skill difficulties. Conclusion: This study provides an analysis of perceptions of physiotherapists and parents towards the movement skills of young children with JIA. A significant proportion of parents and physiotherapists identify movement difficulties among children with JIA that impact daily life. Future interventions co-designed with both parents and care providers targeting movement skills are needed.

23.
arXiv (quant-ph) 2026-06-19

On the significance of Wigner's Friend in contexts beyond quantum foundations

arXiv:2402.08727v3 Announce Type: replace Abstract: There has been a surge of recent interest in the Wigner's Friend paradox, sparking several novel thought experiments and no-go theorems. The main narrative has been that Wigner's Friend highlights a counterintuitive feature that is unique to quantum theory, and which is closely related to the quantum measurement problem. Here, we challenge this view. We argue that the gist of the Wigner's Friend paradox can be reproduced without assuming quantum physics, and that it underlies a much broader class of enigmas in the foundations of physics and philosophy. To show this, we first consider several recently proposed Extended Wigner's Friend scenarios, and demonstrate that some of their implications for the absoluteness of observations can be reproduced by classical thought experiments that involve the duplication of agents. Crucially, some of these classical scenarios are technologically much easier to implement than their quantum counterparts. Then, we argue that the essential structural ingredient of all these scenarios is a feature that we call "Restriction A": that a physical theory cannot give us a probabilistic description of the observations of all agents. Finally, we argue that this difficulty is at the core of other puzzles in the foundations of physics and philosophy, and demonstrate this explicitly for cosmology's Boltzmann brain problem. Our analysis suggests that Wigner's Friend should be studied in a larger context, addressing a frontier of human knowledge beyond quantum foundations: to obtain reliable predictions for experiments in which these predictions can be privately but not intersubjectively verified.

24.
medRxiv (Medicine) 2026-06-12

Reduced nighttime smartphone use among cohabiting partners: a longitudinal study under the lens of social control of health behaviors theory

Objective: We examined the link between cohabitation with a partner and nighttime smartphone use through the social control of health behavior theory. Background: Nighttime smartphone use is a behavioral risk factor for sleep problems. While previous research has predominantly focused on individual-level risks of sleep disturbances, the role of social context remains underexplored. Theoretical frameworks, specifically the Social Control of Health Behavior, suggest that social relationships regulate health-related behaviors; however, it is unclear how far this regulation extends to modern digital behaviors among couples. Method: We analyzed survey data from three waves of the SmartSleep Study (2018, 2020, and 2023; total N = 25,028), including a longitudinal follow-up subset (N = 1,003). We tested multivariate associations between living with a partner, changes in cohabitation status and frequent nighttime smartphone use by fitting generalized linear mixed-effects models. Additionally, we mapped the complex interplay between indicators of social integration, social support, smartphone use, and sleep quality using hierarchical clustering of non-linear correlations. Results: Cohabiting participants had lower odds of frequent nighttime smartphone use compared to those living alone (OR = 0.66; 95% CI: 0.61, 0.72). This lower risk was driven primarily by cohabitation with a partner (OR = 0.49; 95% CI: 0.36, 0.66). Longitudinal analysis supported these findings, showing that sustained cohabitation was associated with less frequent nighttime use (OR = 0.56; 95% CI: 0.38, 0.82). Clustering analysis revealed that indicators of social integration and support clustered with favorable sleep quality. Conclusion: Our findings suggest that the health-protective effects of cohabitation with a partner extend to digital behaviors. Consistent with social control of health behavior theory, the presence of a partner appears to reduce frequent nighttime smartphone use, highlighting the critical importance of considering social context when addressing digital health hygiene and promoting sleep.

25.
bioRxiv (Bioinfo) 2026-06-11

Hyper3D-lite: count-preserving representation auditing for long-read multi-contact genome data

作者:

Long-read and single-molecule sequencing technologies are rapidly increasing molecule-level data, with platforms such as Oxford Nanopore, PacBio HiFi, and Roche sequencing-by-expansion advancing at different technology readiness levels. In the specific context of Pore-C and HiPore-C multi-contact chromatin-conformation assays, long-read multi-contact 3D genome assays preserve molecule-level contact context, but common downstream pairwise projections can expand one multi-contact molecule into many pair records. This creates a representation problem: apparent contact evidence can increase through the counting frame before biological interpretation begins. Hyper3D-lite addresses this problem as a representation-first audit tool for read-to-fragment-style long-read multi-contact inputs. It compares all-pair projection with CPB, a count-preserving statistical accounting reference point, and separates broad software outputs from conservative higher-order candidate calls.