Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
PLOS Medicine 2026-05-20

Associations between hematologic dynamics during pregnancy and obstetric complications: A retrospective observational study

by Veronica Tozzo, Rachel Petherbridge, Kaitlyn James, Sarah Hsu, Deepti Pant, Chloe Michalopoulos, Brody H. Foy, Tanayott Thaweethai, Christopher Mow, Jacqueline Maya, Carolina Batlle Camero, Lydia Shook, Kathryn J. Gray, Logan Mauney, John M. Higgins, Camille E. Powe Background Pregnancy alters hematologic state as measured by complete blood count (CBC), but the longitudinal changes in CBC indices that define healthy pregnancies are not well established. In a large cohort based at an academic health system in the United States, we aimed to define reference intervals and typical longitudinal changes in CBC indices during pregnancy. We then tested for associations between extreme CBC values for gestational age or extreme longitudinal changes in CBC indices and obstetric complications. Methods and findings We studied nine CBC indices in individuals with singleton pregnancies who delivered after 30 weeks’ gestation and presented for prenatal care prior to 20 weeks. The electronic health record (EHR)-based Maternal Health Cohort (Massachusetts General Hospital; 1998–2016) formed our discovery cohort of 45,992 pregnancies, 18% of which had relevant complications. We developed a validation cohort of 48,868, 27% with complications from EHR data in the Mass General Brigham healthcare system from 2016 to 2024. In pregnancies without complications in the discovery cohort, we derived gestational-age-specific reference intervals (2.5th–97.5th percentile) and established typical intra-pregnancy longitudinal changes. In the validation cohort, we then tested CBC values outside of the 26–29 weeks’ gestation reference interval and CBC rare changes (uncommon changes in magnitude and direction) between 7–14 and 26–29 weeks’ gestation for association with a composite outcome (hypertensive disorders of pregnancy, small for gestational age birthweight, preterm birth) and its individual components using generalized estimating equations. Derived reference intervals differed from those in the literature for mean red cell volume, mean red cell hemoglobin, red cell count, and mean red cell hemoglobin concentration; reference intervals for other indices were similar to those previously published. In validation, hematocrit, hemoglobin, and red cell count values above their gestational-age specific reference intervals were associated with increased risk of the composite obstetric outcome: odds ratios (ORs) of 1.4 (95% CI [1.2, 1.5] p 

02.
arXiv (math.PR) 2026-06-16

A Concavity Theorem for the Parisi PDE

作者:

arXiv:2606.15432v1 Announce Type: new Abstract: We prove that the map sending the diffusion profile to the solution of a time-changed Parisi PDE evaluated at time-space $(0,0)$ is concave. This result strengthens the raywise concavity result proven by Auffinger and Chen (2016). As an application, for the balanced multispecies Ising spin glasses, the lower bound of Bates and Sohn (2025) matches the Hopf-type upper bound given by the Hamilton–Jacobi framework developed by Mourrat, Chen and Xia.

03.
arXiv (CS.CV) 2026-06-11

Adapting Vision-Language Models from Iconic to Inclusive for Multi-Label Recognition Without Labels

Understanding multi-label images remains a challenging task in computer vision. With the rapid progress of vision-language multimodal learning, vision-language models (VLMs) enable zero-shot recognition without labeled data. However, due to their intrinsic design, these models often prioritize the most iconic object and omit other contextual positives. This intrinsic bias conflicts with the nature of multi-label learning, thereby limiting their applicability. In this work, we propose an unsupervised framework that adapts VLMs from iconic recognition toward inclusive understanding, enabling label-free multi-label image recognition. Our approach consists of two key stages, ``cutting'' and ``sewing'': In the cutting stage, we present the multi-sampling response estimator to prevent the model from concentrating only on one single object. In the second sewing stage, the multi-object blend adaptation is introduced to adjust the labels to better conform to the multi-label distribution while preserving the intrinsic characteristics of the original model within only one epoch. Extensive experiments show that our framework significantly outperforms existing unsupervised approaches on four public datasets, even surpassing several representative weakly supervised baselines. These results demonstrate the potential of adapting pre-trained VLMs for more comprehensive visual understanding without manual annotations. Our code is publicly available at https://github.com/iCVTEAM/TailorCLIP.

04.
arXiv (CS.CV) 2026-06-16

Fusion-E2Pulse: A Multimodal Event-RGB Fusion Network for Non-contact Pulse Wave Reconstruction

Non-contact pulse wave reconstruction hinges on the precise recovery of waveform morphology, including the dicrotic notch. Conventional Red-Green-Blue (RGB)-based methods, which extract physiological signals from recorded facial videos, are constrained by the integral imaging mechanism of standard cameras, where the exposure process induces a smoothing effect that attenuates subtle vascular pulsation details. Conversely, neuromorphic event cameras, while offering exceptional sensitivity to intensity fluctuations, are inherently susceptible to noise and artifacts induced by minor motion. To exploit the synergy between frame-based integration and event-based differential sensing, we propose a novel multimodal network named Fusion-E2Pulse. This framework utilizes filtered RGB signals as structural priors to suppress motion artifacts, while leveraging the high-sensitivity of event streams to recover fine-grained morphological details. Experimental results demonstrate that Fusion-E2Pulse achieves state-of-the-art performance, effectively balancing noise suppression and morphological fidelity, achieving a mean absolute error of 0.78 bpm for heart rate estimation, a waveform correlation of 0.89, and a systolic phase duration error of 16.74 ms, validating its efficacy in reconstructing fine-grained pathological features.

05.
arXiv (CS.LG) 2026-06-12

Single vs. Multiple Branches in DeepONet and S-DeepONet: Network Architecture Follows Coupling in Multiphysics Systems

arXiv:2507.03660v2 Announce Type: replace Abstract: `Real-time prediction of complex physical systems requires surrogate models that learn from data while representing strong multiphysics coupling. Deep Operator Networks have shown success in single-physics problems, yet their effectiveness in capturing nonlinear interactions in coupled systems (such as thermo-mechanical or electro-thermal coupling) remains underexplored. Here we pose a practical question: should the architecture of a neural operator reflect the strength of physical coupling it aims to model? We compare single-branch and multi-branch designs, in both feedforward and sequential recurrent forms, across three representative systems: a reaction–diffusion problem with heterogeneous sources, a nonlinear thermo-electrical problem with temperature-dependent conductivity and Joule heating, and a viscoplastic thermo-mechanical model of steel solidification. Single-branch networks consistently outperform multi-branch variants in tightly coupled regimes by encouraging shared latent representations, whereas multi-branch designs remain favorable for decoupled or single-physics tasks. Once trained, these surrogates deliver full-field predictions up to $1.8 \times 10^4$ times faster than physics-based solvers.

06.
arXiv (CS.LG) 2026-06-12

Learning with Simulators: No Regret in a Computationally Bounded World

arXiv:2606.13576v1 Announce Type: new Abstract: Understanding the minimal assumptions necessary for generalization is the fundamental question in learning theory. Unfortunately, most results rely heavily on independence (or some proxy thereof) of the data-generating process, while results for strongly dependent data are far more limited. Towards addressing this gap, we introduce the framework of simulatable processes, where the learner has access to a simulator that approximates the distribution generating the data (which may be an arbitrarily complex and dependent process). Surprisingly, given access to such a simulator, we show that we can recover the same learning guarantees as in the classical setting with independent data, namely, error bounds that depend on the VC dimension. Further, we use this framework to study the power of conditional sampling and show strict statistical and computational advantages in this setting. As a highlight of our framework, we exhibit a single algorithm that simultaneously learns any given VC class under all processes samplable in bounded polynomial time, with regret controlled by the time-bounded Kolmogorov complexity of the process. This provides a significant conceptual broadening of the classical PAC model.

07.
arXiv (CS.LG) 2026-06-16

Enhancing Physics-Informed Neural Networks Through Feature Engineering

arXiv:2502.07209v4 Announce Type: replace Abstract: Physics-Informed Neural Networks (PINNs) seek to solve partial differential equations (PDEs) with deep learning. Mainstream approaches that deploy fully-connected multi-layer deep learning architectures require prolonged training to achieve even moderate accuracy, while recent work on feature engineering allows higher accuracy and faster convergence. This paper introduces SAFE-NET, a Single-layered Adaptive Feature Engineering NETwork that achieves orders-of-magnitude lower errors with far fewer parameters than baseline feature engineering methods. SAFE-NET returns to basic ideas in machine learning, using Fourier features, a simplified single hidden layer network architecture, and an effective optimizer that improves the conditioning of the PINN optimization problem. Numerical results show that SAFE-NET converges faster and typically outperforms deeper networks and more complex architectures. It consistently uses fewer parameters – on average, 65% fewer than the competing feature engineering methods – while achieving comparable accuracy in less than 30% of the training epochs. Moreover, each SAFE-NET epoch is 95% faster than those of competing feature engineering approaches. These findings challenge the prevailing belief that modern PINNs effectively learn features in these scientific applications and highlight the efficiency gains possible through feature engineering.

08.
arXiv (CS.LG) 2026-06-15

Learning the Context of Errors: Black-Box Online Adaptation of Time Series Foundation Models

arXiv:2606.14222v1 Announce Type: new Abstract: The rapid evolution of Time Series Foundation Models (TSFMs) has advanced zero-shot forecasting across diverse domains. Inspired by the current form of Large Language Models, future TSFMs may be offered as commercialized, closed-source API services. However, many existing online adaptation methods still rely on white-box access for parameter fine-tuning or gradient backpropagation. This paradigm mismatch raises a question: In black-box online adaptation for TSFMs, what should we learn? We answer this with an insight: the predictive errors of the base model are conditioned on both the input and output of the base model (i.e., the context of errors). To validate this insight, we propose ORCA (Online Residual Contextual Adaptation). We conduct extensive experiments across 5 state-of-the-art TSFMs and 8 datasets to demonstrate the effectiveness of our approach. Furthermore, through ablation studies, we quantitatively analyze the impact of different adapter learning hypotheses on the final adaptation performance in black-box online adaptation. Code available at https://github.com/Fifthky/ORCA.

09.
arXiv (CS.AI) 2026-06-16

SpecAlign: Efficient Specification-Grounded Alignment of Large Language Models via Synthetic Data

arXiv:2606.16276v1 Announce Type: new Abstract: As large language models (LLMs) are increasingly deployed in real-world applications, alignment is no longer governed by a single universal notion of safety or helpfulness, but instead by provider- or application-specific model specifications. These specifications are typically long, structured, and frequently updated, yet existing alignment pipelines lack a systematic mechanism to operationalize them as training signals. In this paper, we propose specification-grounded alignment, a new alignment paradigm that treats provider-authored model specifications as the primary alignment target rather than abstract principles or static benchmarks. To instantiate this paradigm, we introduce SpecAlign, a framework that synthesizes alignment data directly from specification documents. SpecAlign combines structured rule annotation, controllable specification instantiation, and multi-agent adversarial data synthesis to generate fine-grained, boundary-aware preference pairs that capture both compliant behaviors and meaningful specification violations. Experiments across multiple model specifications and backbone models demonstrate that training with SpecAlign consistently improves rule compliance while preserving general capabilities and avoiding over-conservative behavior. These results suggest that grounding alignment in explicit model specifications enables rapid, precise, and scalable adaptation of LLM behavior to evolving policy requirements.

10.
arXiv (CS.AI) 2026-06-19

Assessment of Personality Dimensions Across Situations in Dyadic Role-Play Scenarios

arXiv:2507.19137v3 Announce Type: replace-cross Abstract: Prior research indicates that users prefer assistive technologies whose personalities align with their own. This has sparked interest in automatic personality perception (APP), which aims to predict an individual's perceived personality traits. Previous studies in APP have treated personalities as static traits, independent of context. However, perceived personalities can vary by context and situation as shown in psychological research. In this study, we investigate the relationship between conversational speech and perceived personality for participants engaged in two work situations (a neutral interview and a stressful client interaction). Our key findings are: 1) perceived personalities differ significantly across interactions, 2) loudness, sound level, and spectral flux features are indicative of perceived extraversion, agreeableness, conscientiousness, and openness in neutral interactions, while neuroticism correlates with these features in stressful contexts, 3) handcrafted acoustic features and non-verbal features outperform speaker embeddings in inference of perceived personality, and 4) stressful interactions are more predictive of neuroticism, aligning with existing psychological research.

11.
arXiv (CS.CV) 2026-06-16

High-Fidelity 4D Hand-Object Capture via Multi-View Spatiotemporal Tracking and Physics-Aware Gaussians

The growing demand for high-fidelity 4D hand-object interaction (HOI) data in embodied AI and spatial computing is currently bottlenecked by the reliance on pre-scanned object templates and physical markers. While recent methods have demonstrated promising results in reconstructing 4D hand-object interaction from videos, they are highly sensitive to initial estimates of hand and object poses. Yet, estimating these poses from images is challenging, in particular under severe occlusion which is inherent in hand-object interaction scenarios. We propose a novel system for the robust and accurate reconstruction of hands and objects from synchronized and calibrated multi-view videos without requiring any templates or markers. Our system consists of two main components with key innovations: (1) a multi-view feed-forward transformer model that aggregates cross-view geometry and temporal cues to provide a reliable, metric-consistent initialization for both poses and dense object geometry, and (2) a hand-object physics-aware Gaussian-based optimization framework to refine the initial estimates, integrating tetrahedral constraints, collision refinement, and appearance decomposition to produce physically plausible and visually accurate reconstruction. Validated on public benchmarks and an extensive internal dataset, our pipeline achieves highly robust, artifact-free reconstruction, providing an efficient foundation for automated 4D asset generation. Our project page are available at https://zyshen021.github.io/HOSTPG/.

12.
arXiv (CS.LG) 2026-06-16

A Penalty Approach for Differentiation Through Black-Box Quadratic Programming Solvers

arXiv:2602.14154v3 Announce Type: replace Abstract: Differentiating through the solution of a quadratic program (QP) is a central problem in differentiable optimization. Most existing approaches differentiate through the Karush–Kuhn–Tucker (KKT) system, but their computational cost and numerical robustness can degrade at scale. To address these limitations, we propose dXPP, a penalty-based differentiation framework that decouples QP solving from differentiation. In the solving step (forward pass), dXPP is solver-agnostic and can leverage any black-box QP solver. In the differentiation step (backward pass), we map the solution to a smooth approximate penalty problem and implicitly differentiate through it, requiring only the solution of a much smaller linear system in the primal variables. This approach bypasses the difficulties inherent in explicit KKT differentiation and significantly improves computational efficiency and robustness. We evaluate dXPP on various tasks, including randomly generated QPs, large-scale sparse projection problems, and a real-world multi-period portfolio optimization task. Empirical results demonstrate that dXPP is competitive with KKT-based differentiation methods and achieves substantial speedups on large-scale problems. Our implementation is open source and available at https://github.com/mmmmmmlinghu/dXPP.

13.
arXiv (CS.CL) 2026-06-16

Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier

For the development of Large language models (LLMs), recent approaches to generating pseudo intermediate reasoning have shown remarkable progress. But they typically rely on large numbers of correctly annotated answers to assess reasoning quality. This paper presents a semi-supervised framework that scales reasoning learning from minimal supervision, turning reasoning verification itself into a data creation mechanism. We train a lightweight reasoning-correctness classifier on only a few labeled samples, which judges whether intermediate reasoning traces generated by an LLM are valid. Furthermore, an entropy-based confidence threshold filters out unreliable samples, and the remaining high-confidence reasoning traces are used to fine-tune the model. Experiments on Verifiable Math Problems (Orca-Math subset) and Question Answering on Image Scene Graphs (GQA) with Visual Programming show that our method achieves accuracy comparable to using 10-15x more labeled data. Ablation analyses confirm that both the classifier and entropy filtering are essential for scalable and noise-resistant pseudo-labeling. By replacing expensive answer-level supervision with lightweight reasoning verification, our method provides a practical path toward constructing large-scale reasoning resources and paves the way for future autonomous reasoning systems that learn from minimal human input.

14.
arXiv (CS.CL) 2026-06-11

Automated Scoring of Arabic Text Using Large Language Models: A Literature Review

In modern educational systems, Automatic Text Scoring (ATS) plays a central role by enabling scalable and consistent evaluation of learner responses without human intervention. Recently, the increased accessibility of LLMs and Arabic-specific datasets has sparked renewed interest in this area. In this work, we investigate LLM-Based approaches for the automated evaluation of Arabic texts, focusing on both short answer grading (ASAG) and essay scoring (AES). We further introduce a structured taxonomy comprising five dimensions: application domain, feedback generation capability, LLM architecture deployed, alignment with competency referential frameworks, and prompt engineering strategy. By applying this taxonomy, we conduct a comparative analysis of existing studies, examining their methodological approaches, datasets, evaluation metrics, and reported performance. The findings highlight the need for sustained and pedagogically grounded research efforts in Arabic ATS, given its significance for improving educational quality across Arabic-speaking communities.

15.
arXiv (quant-ph) 2026-06-11

Compressed minimum-purity time evolution for late-time quantum dynamics

arXiv:2606.11392v1 Announce Type: cross Abstract: Unitary time evolution of initially simple quantum many-body states rapidly generates entanglement and complex correlations, which limits direct numerical simulations. The late-time dynamics of physical observables, however, typically exhibits an effective simplicity in the form of hydrodynamics or kinetic theory. This leads to the question whether microscopic equations of motion can remain accurate and tractable up to long time scales by discarding irrelevant information in a controlled manner. Here, we introduce compressed minimum-purity time evolution (CoMPuTE) as an approach to keep track of a consistent set of reduced local density matrices, closing the hierarchical equations of motion using a minimum-purity principle. In benchmark applications we demonstrate (i) accurate description of energy diffusion in the one-dimensional mixed-field Ising model, (ii) the applicability to genuinely out-of-equilibrium Floquet dynamics starting from a pure state, and (iii) the limitations of the local reduced density matrix approximation when describing transport in the XXZ chain at $\Delta=1$ that is governed by increasingly non-local integrals of motion. The CoMPuTE method enhances computational efficiency in comparison to the closely related local-information time evolution algorithm, opening a possible route towards an extension to systems in higher spatial dimensions.

16.
arXiv (CS.CV) 2026-06-19

OTCHA: Optimal Transport-driven Confidence-aware Latent Hub Alignment for Multi-View Medical Image Classification

Multi-view imaging, such as mammography and chest radiography, is a standard component of clinical practice. However, medical images are often unregistered and contain view-specific artifacts or irrelevant background cues that can obscure diagnostically relevant findings. Many existing methods directly fuse per-view representations, allowing such irrelevant content to contaminate the fused embedding and reducing robustness under varying view configurations. We propose OTCHA, a confidence-aware latent hub token alignment module based on optimal transport (OT) that refines patch tokens before fusion for multi-view classification. OTCHA introduces a set of learnable latent hub tokens shared across views. For each view, we compute an OT plan between patch tokens and hub tokens that jointly considers feature similarity and geometry, and augment the OT formulation with token-conditional dustbins to enable partial matching and discard irrelevant tokens. The resulting transport plan provides token-wise matching confidence, which gates hub-mediated message passing and weights a novel optimal-transport-based representation alignment loss to stabilize refinement. Experiments on three multi-view medical image datasets demonstrate consistent improvements over competing baselines across diverse anatomies and view configurations. Our code is available at https://github.com/labhai/OTCHA.

17.
arXiv (CS.CV) 2026-06-12

Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning

Multimodal agents, which integrate a controller e.g., a vision language model) with external tools, have demonstrated remarkable capabilities in tackling complex multimodal tasks. Existing approaches for training these agents, both supervised fine-tuning and reinforcement learning, depend on extensive human-annotated task-answer pairs and tool trajectories. However, for complex multimodal tasks, such annotations are prohibitively expensive or impractical to obtain. In this paper, we propose an iterative tool usage exploration method for multimodal agents without any pre-collected data, namely SPORT, via step-wise preference optimization to refine the trajectories of tool usage. Our method enables multimodal agents to autonomously discover effective tool usage strategies through self-exploration and optimization, eliminating the bottleneck of human annotation. SPORT has four iterative components: task synthesis, step sampling, step verification, and preference tuning. We first synthesize multimodal tasks using language models. Then, we introduce a novel trajectory exploration scheme, where step sampling and step verification are executed alternately to solve synthesized tasks. In step sampling, the agent tries different tools and obtains corresponding results. In step verification, we employ a verifier to provide AI feedback to construct step-wise preference data. The data is subsequently used to update the controller for tool usage through preference tuning, producing a SPORT agent. By interacting with real environments, the SPORT agent gradually evolves into a more refined and capable system. Evaluation in the GTA and GAIA benchmarks shows that the SPORT agent achieves 6.41% and 3.64% improvements, underscoring the generalization and effectiveness introduced by our method. The project page is https://SPORT-Agents.github.io.

18.
arXiv (math.PR) 2026-06-17

Large deviation principle for friendship-biases in Galton–Watson trees

arXiv:2606.17381v1 Announce Type: new Abstract: In this paper we consider the friendship-bias of the vertices in an infinite rooted Galton–Watson tree. The friendship-bias of a vertex is the difference between the average degree of the neighbours of the vertex and the degree of the vertex itself. A vertex is said to be of type $\chi \in S$, with $S = \{-,0,+\}$, when its friendship-bias is, respectively, strictly negative, zero or strictly positive. We consider the fractions $f_l^\chi$ of vertices of type $\chi \in S$ along a random downward path up to branching depth $l \in \mathbb{N}$ and derive a large deviation principle (LDP) for the triple $(f_l^\chi)_{\chi \in S}$ as $l\to\infty$. The branching depth of a vertex counts the number of branchings that occur along the path that connects the vertex to the root of the tree. The rate in the LDP is $l$, while the rate function in the LDP is identified in terms of a variational formula minimising a relative entropy under a linear constraint. We focus on the case of binary branching, for which the rate function is already quite involved. We identify the qualitative properties of the rate function and show how it can be computed numerically. We briefly indicate how to proceed for more general branching and for vertex types along a tree consisting of a finite number of random downward paths. Our paper is the first to consider large deviations of vertex types.

19.
arXiv (CS.CV) 2026-06-19

Evaluation of Image Matching for Art Skills Assessment

While some individuals possess a natural talent for drawing, mastering this skill requires dedicated training and practice. Determining one's skill in the art of drawing requires proper comprehensive assessment. In this paper, we propose a method to measure drawing skill by by matching the hand-drawn image with the original template. Existing techniques often involve complex processes. However, advancements in computer vision allow us to train computers to perform these comparisons at a human-like level, thereby resolving the tedious and overwhelming traditional process. Using computer vision applications, determining image similarity involves identifying the level of similarities in an image with a reference image. We have implemented and analyzed the SIFT feature and Siamese network to measure image similarity. Our results indicate that it is feasible to assess art skill levels. Through feature analysis, we found that SIFT-based key point matching provides a more effective means of detecting drawing skills.

20.
arXiv (quant-ph) 2026-06-16

Long-range nonstabilizerness of topologically encoded states from mutual information

arXiv:2605.22424v2 Announce Type: replace Abstract: We study long-range nonstabilizerness (LRN), namely the obstruction to remove nonstabilizerness with shallow-depth local quantum circuits. In one-dimensional settings, the mutual information between disconnected spatial regions has proven to be a powerful tool to diagnose LRN. In this work, we focus on encoded states of two-dimensional topologically-ordered systems, and explore the ability of the mutual information to serve as a diagnostic of LRN. Focusing on the concrete setting of lattice models defined on a torus, we show that information about LRN can be gained from the analysis of the mutual information between non-overlapping regions containing non-contractible loops, and of the change of such mutual information under modular real-space transformations. We exemplify this idea in the toric code and the non-abelian string-net model with doubled Fibonacci topological order. In the former case, we show that the mutual information provides a full classification, certifying LRN for all encoded non-stabilizer states. In the latter case, instead, our approach does not lead to a full classification, as it detects LRN for all states except from a finite subset with special transformation properties under the modular group. Finally, we discuss how our results on LRN constrain the logical gates that can be implemented fault-tolerantly on the torus.

21.
arXiv (quant-ph) 2026-06-11

Robust Mixed-State Cluster States and Spurious Topological Entanglement Negativity

arXiv:2504.16165v2 Announce Type: replace Abstract: We investigate 1D and 2D cluster states under local decoherence to assess the robustness of their mixed-state subsystem symmetry-protected topological (SSPT) order. By exactly computing fidelity correlators via dimensional reduction of effective statistical mechanics models, we pinpoint the critical error rate for strong-to-weak spontaneous breaking of strong subsystem symmetry. Without resorting to the replica trick, we demonstrate that mixed-state SSPT order remains remarkably robust up to the maximal decoherence rate when noise respects strong subsystem symmetry. Furthermore, we propose that the mixed-state SSPT order can be detected by a constant correction to the area-law scaling of entanglement negativity, termed spurious topological entanglement negativity. This also highlights that topological entanglement negativity, a widely used diagnostic for mixed-state topological order, is generally not invariant under finite-depth quantum channels.

22.
arXiv (CS.AI) 2026-06-16

Mask-Proof: An LLM-based Automated Data Curation Pipeline on Mathematical Proofs

arXiv:2606.15258v1 Announce Type: new Abstract: Large language models (LLMs) are increasingly capable of mathematical problem solving and can even assist with research-level proofs, yet we still lack a scalable and reproducible way to measure step-level reasoning in long proofs across diverse sources. This evaluation gap limits trustworthy AI assistance in proof-certified scientific progress. Existing evaluations often emphasize final answers or rely on costly expert grading, while end-to-end proof generation remains open-ended and hard to verify automatically. We introduce Mask-Proof, a pipeline that turns real proofs into automatically checkable masked-step tasks. It masks key formula steps, provides the necessary surrounding context, and evaluates model reconstructions with an LLM-based equivalence judge using repeated votes for stability. The resulting Mask-ProofBench contains 292 curated problems across diverse research areas. Experiments with 17 models show that reasoning-enhanced models outperform standard models by 12% to 27%. Our evaluator achieves 96.8% agreement with expert annotators, enabling faithful, reproducible, and comparable measurement of step-level mathematical reasoning. Benchmark, annotations, and code are available at https://github.com/weating/Mask-Proof.

23.
arXiv (CS.CV) 2026-06-16

Enhancing Precision Agriculture with a Hybrid Deep Learning Framework for Multi-Class Plant Disease Classification and Interpretability

This study proposes an overall deep learning architecture for multi-class classification of plant diseases from high-resolution leaf imagery, with a particular interest in investigating the behavior of ResNet-50 and a hybrid ResNet + Vision Transformer (ViT) design. A specially gathered image database with 15,200 training images and 3,800 validation images spanning 38 classes across multiple crops, including tomato, apple, grape etc. were subjected to preprocessing steps such as resizing, normalization, and data augmentation to enhance model robustness. Multiple architectures, including ResNet-50, MobileNetV2, and EfficientNet-B0, were trained and compared with the hybrid ResNet + ViT model. All models were fine-tuned using the AdamW optimizer and cross-entropy loss, with early stopping applied to prevent overfitting and ensure generalization. Furthermore, interpretability techniques such as Grad-CAM and saliency maps were implemented to indicate disease-relevant regions, while segmentation-based analysis was performed to identify the affected parts of a leaf. For every one of the considered architectures, ResNet-50 led to the highest accuracy of 98.74%, whereas the hybrid ResNet + ViT model achieved a competitive accuracy of 98.58%, showing that the hybrid architectures were effective in capturing both local and overall information. The experimental results showcase the promise of transformer-based models to achieve highly accurate, interpretable, and computationally efficient computer-based multi-class multi-disease classification systems, providing helpful assistance for cultivation management practices as well as for precision farming.

24.
arXiv (CS.LG) 2026-06-19

Does Text Actually Help? Uncovering and Resolving Text Collapse in Multimodal Time Series Forecasting

arXiv:2606.19413v1 Announce Type: new Abstract: Multimodal time series forecasting, which pairs numerical sequences with domain-relevant textual reports, promises to inject world knowledge into forecasting pipelines. However, we uncover a critical failure mode in existing frameworks that we term text collapse: the text branch converges to a content-independent transformation, contributing negligible discriminative signal regardless of the input description. We argue that text collapse is a consequence of a fundamental asymmetry in time series forecasting: the numerical input is strongly autocorrelated with the output, making the numerical backbone inherently dominant, while the text branch, despite carrying complementary and often critical information, is insufficiently utilized, leading to its systematic underexploitation. To address this, we propose REST-TS (Residual-Exclusive Supervision for Text in Time Series), which turns the asymmetry into a design principle: the numerical backbone produces its own independent numerical forecast, and the text branch is exclusively supervised to predict the structured components of the residual, the prediction gap that numbers cannot explain. Because no numerical pathway can reduce these losses, the text branch must extract genuine content from the input description. Evaluated across diverse real-world domains and backbone architectures, REST-TS achieves state-of-the-art performance and consistently demonstrates greater text-branch utilization than existing frameworks, providing strong empirical evidence that supervising the text branch on the residual compels it to extract genuine content from the input.