Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-19

HypOProto: Hyperbolic Ordinal Prototypes for Left Ventricular Filling Pressure Classification

Echocardiography (echo) is a widely used imaging modality for assessing cardiac function, with Left Ventricular Filling Pressure (LVFP) serving as a critical physiological marker for conditions such as heart failure. Standard LVFP classification into normal vs elevated categories relies on the Doppler-derived $E/e'$ ratio, which is operator-dependent and often unavailable in resource-limited settings, motivating methods that infer LVFP directly from B-mode echo. Existing deep learning approaches achieve high performance but remain largely black-box, limiting clinical interpretability. We propose HypOProto, a hyperbolic, ordinal prototype-based framework for interpretable LVFP classification using a frozen, explainable foundation model backbone. HypOProto arranges prototypes along the physiological $E/e'$ scale, placing borderline cases near the hyperboloid root where small angular differences separate similar cases, while normal and elevated cases occupy outward positions reflecting increasing diagnostic certainty. This hyperbolic geometry encodes clinically meaningful ordinal relationships and improves interpretability. We also introduce a novel Hyperbolic Prototype Angular Separation (HyperPAS) loss, enforcing inter-class prototype separation in hyperbolic space. HypOProto achieves SOTA performance while maintaining transparency, and highlights clinically relevant regions in visualizations. This work represents the first prototype-based framework for LVFP classification in echo. Our code can be found at https://github.com/DeepRCL/HypOProto.

02.
arXiv (CS.CV) 2026-06-18

PEFT-MedSAM: Efficient Fine-Tuning of Medical Foundation Models for Explainable Skin Lesion Segmentation

Automated segmentation of skin lesions using deep learning models for dermoscopic images can be very helpful in finding melanomas earlier than they would normally be detected. However, most deep learning methods available do not perform well. The aim of this paper is to present a parameter-efficient fine-tuning method called PEFT-MedSAM for adapting the Medical Segment Anything Model (MedSAM) to automatically segment dermoscopic skin lesions. The PEFT-MedSAM method uses only the lightweight mask decoder for training the model while keeping the pre-trained image encoder and prompt encoder frozen. The experiments performed on the ISIC 2018 benchmark dataset shows that PEFT-MedSAM obtains a dice coefficient of .9411 and an intersection over union value of .8918 when compared to both a fully trained U-Net baseline (.8715 dice coefficient) and zero-shot MedSAM inference (.8997 dice coefficient). The external validation of the model using PH2 dataset shows .9467 dice coefficient with +/- .0310 standard deviation. Supportive evidence for these claims include a p-value less than .0001 for Wilcoxon signed rank tests comparing the two datasets and bootstrap-estimated 95% confidence intervals of [.9364,.9447] that represent the estimated range of possible values for the average dice coefficient obtained by repeating the test. To increase clinical trustworthiness, we used Grad-CAM explainability along with a pointing game based evaluation methodology to evaluate the CNN baseline model on the validation set. The results showed that we had an accuracy rate of 98.27% on the validation set of 519 images and confirmed that the model classified regions containing skin lesions.

03.
arXiv (CS.CL) 2026-06-17

Evaluating Large Language Models Abilities for Addressee, Turn-change, and Next Speaker Prediction in Meetings

We investigate turn-taking in multimodal multi-party conversations using large language models (LLMs). We construct an evaluation framework for three tasks: addressee detection, turn-change prediction, and next speaker prediction. We compare supervised models trained for these tasks, text-based LLMs, multimodal LLMs (MM-LLMs), and human subjects. Experiments on the AMI corpus showed that LLMs outperformed supervised models and humans in next speaker prediction, despite not being trained on the target domain and without access to audio or visual information. An MM-LLM performed better than text-based LLMs on addressee detection and turn-change prediction but remained below human performance, indicating difficulty leveraging raw audio-visual signals. Ablation analyses revealed that conversational context was critical, particularly for next speaker prediction. We observed that human and LLM prediction patterns were similar, and intervals with frequent turn changes were difficult for both.

04.
medRxiv (Medicine) 2026-06-12

Reduced nighttime smartphone use among cohabiting partners: a longitudinal study under the lens of social control of health behaviors theory

Objective: We examined the link between cohabitation with a partner and nighttime smartphone use through the social control of health behavior theory. Background: Nighttime smartphone use is a behavioral risk factor for sleep problems. While previous research has predominantly focused on individual-level risks of sleep disturbances, the role of social context remains underexplored. Theoretical frameworks, specifically the Social Control of Health Behavior, suggest that social relationships regulate health-related behaviors; however, it is unclear how far this regulation extends to modern digital behaviors among couples. Method: We analyzed survey data from three waves of the SmartSleep Study (2018, 2020, and 2023; total N = 25,028), including a longitudinal follow-up subset (N = 1,003). We tested multivariate associations between living with a partner, changes in cohabitation status and frequent nighttime smartphone use by fitting generalized linear mixed-effects models. Additionally, we mapped the complex interplay between indicators of social integration, social support, smartphone use, and sleep quality using hierarchical clustering of non-linear correlations. Results: Cohabiting participants had lower odds of frequent nighttime smartphone use compared to those living alone (OR = 0.66; 95% CI: 0.61, 0.72). This lower risk was driven primarily by cohabitation with a partner (OR = 0.49; 95% CI: 0.36, 0.66). Longitudinal analysis supported these findings, showing that sustained cohabitation was associated with less frequent nighttime use (OR = 0.56; 95% CI: 0.38, 0.82). Clustering analysis revealed that indicators of social integration and support clustered with favorable sleep quality. Conclusion: Our findings suggest that the health-protective effects of cohabitation with a partner extend to digital behaviors. Consistent with social control of health behavior theory, the presence of a partner appears to reduce frequent nighttime smartphone use, highlighting the critical importance of considering social context when addressing digital health hygiene and promoting sleep.

05.
arXiv (CS.AI) 2026-06-16

FOUNDv2: Learning Unified User Quantized Tokenizers for User Representation

arXiv:2508.00956v3 Announce Type: replace-cross Abstract: User representation learning serves as a fundamental pillar for personalized services on large-scale web platforms. Despite its importance, conventional continuous embedding methods face significant challenges, including the lack of a unified paradigm for multi-source data integration, prohibitive storage overhead due to low information density, and the lack of multi-scale modeling granularity. To overcome these limitations, we introduce FOUNDv2, a comprehensive user representation scheme centered on the Unified User Quantized Tokenizer U2QT) framework. FOUNDv2 transforms heterogeneous user data into a standardized discrete token space through a robust two-stage architecture. Specifically, the framework first extracts compact feature representations and subsequently employs a multi-view RQ-VAE to discretize them into storage-efficient tokens using shared and source-specific codebooks. To empower these representations with predictive intelligence, we further design multi-scale alignment objectives to capture both fine-grained behavioral dependencies and macro-temporal periodicity. Extensive experiments on various benchmarks demonstrate that FOUNDv2 consistently outperforms task-specific baselines while achieving substantial reductions in storage and computational costs. Finally, the large-scale deployment of FOUNDv2 on Alipay validates its practical scalability and efficiency across diverse industrial scenarios. The main code is available at: https://github.com/chuanhe1999/FOUNDv2.

06.
arXiv (CS.CL) 2026-06-19

MiqraBERT: Regression-Based Sentence-BERT Finetuning for Biblical Hebrew Parallel Detection

Textual reuse pervades the Hebrew Bible, yet the computational methods used to detect it still rest largely on lexical overlap, and they falter once a parallel involves paraphrase, lexical substitution, or syntactic reworking. This paper introduces MiqraBERT, a Sentence-BERT model finetuned from AlephBERT (a Modern Hebrew encoder) for verse-level semantic similarity in Biblical Hebrew. The training set comprises 1,650 labeled verse and half-verse pairs: 825 true parallels drawn from the Chronicles synoptic material and from foundational studies of poetic parallelism, balanced against 825 randomly sampled negatives. Through cosine-similarity regression, the model learns an embedding space in which parallel verses cluster together and unrelated verses move apart. We evaluate separation with distribution-based metrics, Wasserstein distance and the overlap coefficient, across ten random seeds. MiqraBERT improves distributional separation 2.7-fold over the pre-trained baseline and reduces the ambiguous overlap region from roughly 24% to about 6%. Narrative synoptic parallels reach a recall@10 of 87.1%; poetic parallels remain difficult, below 9%. This genre-dependent asymmetry confines the model's reliable scope to narrative textual reuse. MiqraBERT is publicly available at https://huggingface.co/davidmsmiley/MiqraBERT

07.
arXiv (CS.CV) 2026-06-16

Keep It in Mind: User Centric Continual Spatial Intelligence Reasoning in Egocentric Video Streams

We introduce UCS-Bench, a dataset spanning 170+ hours of egocentric visual observations with 8.1K+ timestamped questions for diagnosing User-Centric Continual Spatial intelligence in egocentric video streams. UCS-Bench targets a new problem that emphasizes dynamic spatial reasoning, long-term memory, and their alignment with users' real-time locations. We propose DirectMe, a framework that incrementally constructs and maintains a structured spatial memory from streaming egocentric observations. DirectMe enables robust tracking and recall of object locations, all relative to the user's movement over time. By tightly coupling visual perception with memory updates and spatial reasoning, our approach supports long-horizon queries that require recalling interactions, resolving viewpoint-induced ambiguities, and adapting to dynamic scenes. Our experiments show that DirectMe significantly improves the spatial reasoning of leading multimodal LLMs; it also surpasses many spatially aware and long-form streaming video models. We hope our benchmark and solution will advance spatial intelligence research for egocentric AI assistants. Data and code are available at https://github.com/cocowy1/UCS-Bench.

08.
medRxiv (Medicine) 2026-06-16

Daily Healthy Eating Index (HEI-2020) scoring reveals diet quality patterns masked by aggregation

The Healthy Eating Index (HEI-2020) is conventionally computed by aggregating intake across days before scoring. Digital food logging enables an alternative: scoring each day and averaging daily scores. These methods are not equivalent. The HEI's density-based structure and component caps cause aggregation to inflate adequacy scores when intake is irregular. Using Food & You data, we show daily HEI correlates more strongly with microbiome diversity, and recommend co-reporting both metrics.

09.
arXiv (CS.CV) 2026-06-16

Automated 3D Kinematic Monitoring for Circadian Activity and Anomaly Detection in Juvenile Fish

Precision aquaculture faces a "phenotyping bottleneck" in tracking high-resolution behavioral traits, as conventional methods cannot quantify instantaneous three-dimensional (3D) physical exertion. To address this, we present a high-throughput 3D behavioral phenotyping framework integrating deep learning object detection with binocular stereo vision for real-time monitoring of juvenile tilapia in high-density environments. The system automates non-contact body length estimation and reconstructs 3D swimming trajectories from absolute spatial coordinates. By eliminating 2D perspective distortions, this approach precisely quantifies 3D velocity and acceleration, marking the first estimation of true physical swimming speeds in free-roaming juveniles. Results show the framework successfully establishes circadian locomotor baselines, serving as an early warning system for physiological stress and providing an objective metric for fish vitality.

10.
arXiv (CS.LG) 2026-06-11

REACH: Interpretability-Driven Feature Identification and Architecture Compression for Multi-Channel Vehicular Channel Estimation

arXiv:2606.11857v1 Announce Type: cross Abstract: Multi-channel mixed-SNR training improves out-of-distribution (OOD) generalisation of deep learning channel estimators for IEEE 802.11p vehicular communications, yet the internal mechanism responsible for this remains unexplained. This work presents REACH (Relevance-based Explanation and Architectural Compression for cHannel estimators), a gradient-based interpretability framework that operates at two levels. Input-level attribution identifies a subset of time-frequency features consistently relevant across all evaluated channel conditions, enabling input dimensionality reduction with minimal performance loss. Filter-level attribution reveals a near-universal internal representation, providing a representational account of the observed OOD generalisation. Guided by the resulting filter taxonomy, relevance-guided architecture compression substantially reduces both the number of parameters and the number of floating-point operations (FLOPs) with sub-1 dB normalised mean square error (NMSE) degradation, and OOD generalisation degrades more slowly than within-distribution accuracy under increasing compression.

11.
arXiv (CS.LG) 2026-06-11

Geometric bias in eigenspace perturbation under random heterogeneous noise

arXiv:2606.11263v1 Announce Type: cross Abstract: Spectral methods rely fundamentally on the stability of principal eigenspaces under random perturbations. Classically, this stability is quantified by the Davis-Kahan and Wedin theorems, which bound the eigenspace error using the operator norm of the noise and the relevant spectral gaps. While these worst-case bounds are sharp for arbitrary deterministic perturbations, they can be wasteful in the low-rank signal-plus-random-noise setting, as they fail to capture the fine-grained interaction between the signal geometry and the noise distribution. In this paper, we study the spectral perturbation of signal-plus-noise matrices corrupted by sparse, random noise with an arbitrary, inhomogeneous variance profile. We demonstrate that under heterogeneous noise variances, the empirical eigenvectors suffer a systematic, deterministic geometric bias that is entirely invisible to classical perturbation bounds. By leveraging the Quadratic Vector Equation (QVE) and establishing fine-grained isotropic local laws, we derive near-optimal, non-asymptotic perturbation bounds for the leading eigenspaces in the operator and $2\to\infty$ norms. The bounds separate the usual signal-to-noise contribution, stochastic fluctuations, and structured geometric bias terms determined by the alignment between the signal eigenspaces and the row-wise variance profile.

12.
arXiv (CS.CV) 2026-06-16

Projection and Quantisation: A Unifying View of Learning to Hash, from Random Projections to the RAG Era

作者:

Approximate nearest-neighbour search underpins large-scale retrieval and retrieval-augmented generation, yet its methods are studied in communities that seldom read one another. We argue that they form one field with three design choices. We develop the projection-quantisation-organisation lens: every method places its projections, places its quantisation thresholds, and organises the resulting codes for search. We test the lens with a reproducible measurement, released as the open BitBudget benchmark, and report three findings. First, the quantisation axis delivers the largest memory savings: a one-bit code with full-precision re-ranking matches uncompressed quality for six of seven embedders, the scanned code one thirty-second of the float's size. Second, the orderings the lens anticipates, including a learned-embedding regime where binary codes overtake an inverted-file product quantiser at a matched byte budget, recur as the embedding is enlarged. Third, given class labels, an eight-byte supervised code more than doubles the retrieval quality of the two-kilobyte task-agnostic float it replaces. We also recast the semantic identifiers of generative retrieval as quantisation codes. The main contribution is a single, tested account of compact-code search, from random projections to the retrieval-augmented era.

13.
arXiv (CS.AI) 2026-06-12

Muse Spark Safety & Preparedness Report

arXiv:2606.12429v1 Announce Type: cross Abstract: Muse Spark is the latest large language model developed by Meta. In this report, we first present evaluations for catastrophic risk domains under Meta's Advanced AI Scaling Framework, along with the evidence that informed our launch decision. We then discuss additional considerations, such as Muse Spark's broader content safety and behavioral profile, that are relevant to overall safety but fall outside the catastrophic risk domains governed by the Framework. Our preparedness results covering Chemical and Biological, Cybersecurity, and Loss of Control risks assess Muse Spark's deployment within Meta AI as presenting acceptable levels of residual risks under our Advanced AI Scaling Framework. We conducted a broad set of evaluations targeting dual-use and high-risk capabilities across these catastrophic risk domains. Those evaluations identified elevated risks prior to mitigations, with Chemical and Biological capabilities assessed as likely reaching the "high risk" category under the Advanced AI Scaling Framework before safeguards were applied. We have implemented a multi-layered set of mitigations that address the identified risks, and Muse Spark demonstrates state-of-the-art refusal across a range of benchmarks related to hazardous workflows in chemistry and biology. We therefore release Muse Spark as the underlying model of Meta AI.

14.
arXiv (CS.LG) 2026-06-16

From Physics to Representation: Audio Learning with Synthetic Pre-training via Procedural Generation

arXiv:2606.14791v1 Announce Type: cross Abstract: Self-supervised learning advances audio representation for multimedia analysis. However, prevailing data-centric approaches rely on massive real-world corpora, increasing training costs, curation burdens, and privacy barriers. To address this, we present AudioPG, a procedural synthesis framework eliminating real audio recordings during pre-training. AudioPG trains a Transformer-based masked autoencoder on waveforms generated on-the-fly from basic acoustic primitives and composition rules. The encoder transfers effectively to real audio benchmarks, achieving 90.60% accuracy on ESC-50, 0.546 mAP on FSD50K, 88.17% on UrbanSound8K, and 97.03% on Speech Commands V2. Notably, pre-training completes in under 20 minutes on a single GPU. Latent space analysis reveals physical factors, including fundamental frequency and relative intensity, emerge in orthogonal subspaces, making representations linearly decodable. These results establish procedural synthesis as an efficient, interpretable pre-training signal when large-scale corpora are unavailable. Our code is available at: https://github.com/Freyliu0516/audioPG.

16.
arXiv (CS.CV) 2026-06-16

CropTrack: A Tracking with Re-Identification Framework for Precision Agriculture

Multiple-object tracking (MOT) in agricultural environments presents major challenges due to repetitive patterns, similar object appearances, sudden illumination changes, and frequent occlusions. Contemporary trackers in this domain rely on the motion of objects rather than appearance for association. Nevertheless, they struggle to maintain object identities when targets undergo frequent and strong occlusions. The high similarity of object appearances makes integrating appearance-based association nontrivial for agricultural scenarios. To solve this problem we propose CropTrack, a novel MOT framework based on the combination of appearance and motion information. CropTrack integrates a reranking-enhanced appearance association, a one-to-many association with appearance-based conflict resolution strategy, and an exponential moving average prototype feature bank to improve appearance-based association. Evaluated on publicly available agricultural MOT datasets, CropTrack demonstrates consistent identity preservation, outperforming traditional motion-based tracking methods. Compared to the state of the art, CropTrack achieves significant gains in association accuracy and identification precision scores with a lower number of identity switches.

17.
arXiv (CS.LG) 2026-06-17

On Randomized Algorithms in Online Strategic Classification

arXiv:2602.06257v2 Announce Type: replace Abstract: Online strategic classification studies settings in which agents strategically modify their features to obtain favorable predictions. For example, given a classifier that determines loan approval based on credit scores, applicants may open or close credit cards and bank accounts to obtain a positive prediction. The learning goal is to achieve low mistake or regret bounds despite such behavior. While randomized algorithms have the potential to offer advantages to the learner in strategic settings, they have been largely underexplored. In the realizable setting, no lower bound is known for randomized algorithms, and existing lower bound constructions for deterministic learners can be circumvented by randomization. In the agnostic setting, the best known regret upper bound is $O(T^{3/4}\log^{1/4}T|\mathcal H|)$, which is far from the standard online learning rate of $O(\sqrt{T\log|\mathcal H|})$. In this work, we provide refined bounds for online strategic classification in both settings; our bounds depend on the Littlestone dimension $\mathrm{Ldim}(\mathcal H)$ of the hypothesis class $\mathcal H$ and the maximum degree $\Delta$ of the manipulation graph. In the realizable setting, we extend, for $T > \mathrm{Ldim}(\mathcal H) \Delta^2$, the existing lower bound $\Omega(\mathrm{Ldim}(\mathcal H) \Delta)$ for deterministic learners to all learners. This yields the first lower bound that applies to randomized learners. We then provide the first randomized learner that improves the known (deterministic) upper bound of $O(\mathrm{Ldim}(\mathcal H) \cdot \Delta \log \Delta)$. In the agnostic setting, we give an improper randomized learner that improves the regret upper bound to $O(\sqrt{T\log|\mathcal H|})$, matching the standard online learning rate. We also show a larger lower bound for all proper learning rules, demonstrating that improperness is necessary to achieve the optimal rate.

18.
arXiv (CS.CV) 2026-06-16

WaveDINO: Learning-Based Atmospheric Correction of Unwrapped InSAR Interferograms Validated by GNSS: Results at Laguna del Maule and Campi Flegrei Volcanoes

Interferometric Synthetic Aperture Radar (InSAR) enables effective monitoring of volcanic deformation; however, the observed signals are often corrupted by atmospheric phase delays, seasonal surface changes, and decorrelation effects. Existing atmospheric correction methods, such as numerical weather model-based methods, can reduce these effects but do not consistently remove atmospheric artefacts and may introduce residual biases. To address these limitations, we propose a novel learning-based method for denoising unwrapped InSAR interferograms, using a hybrid training strategy that combines physically motivated synthetic deformation with real atmospheric noise. Specifically, we introduce WaveDINO, a wavelet-based multi-scale denoising framework conditioned on frozen DINOv3 foundation-model features and terrain information. Training uses synthetic magma-source deformation superimposed on short-term interferograms to expose the network to realistic atmospheric statistics while retaining known ground truth. Performance is evaluated on both controlled synthetic data and long-term real interferograms from Laguna del Maule (Chile) and Campi Flegrei (Italy), with independent GNSS measurements used for validation. WaveDINO consistently outperforms competing models, improving agreement with GNSS measurements, and reducing mean GNSS misfit by approximately 3% and 19% at two sites, respectively, while surpassing weather-model-based corrections.

19.
arXiv (CS.LG) 2026-06-16

Forecasting Bacterial Antimicrobial Resistance Trends Using Machine Learning on WHO GLASS Surveillance Data: A Retrieval-Augmented Generation Approach for Policy Decision Support

arXiv:2602.22673v2 Announce Type: replace Abstract: Background: Antimicrobial resistance (AMR) is a global health threat. While the WHO Global Antimicrobial Resistance and Use Surveillance System (GLASS) provides standardized data, population-level machine learning forecasting of resistance trends remains limited. Translating computational forecasts into policy requires transparent interpretation mechanisms. Methods: Surveillance data (2021-2023) comprising 5,909 observations across 44 countries and five WHO regions were processed. A rigorous temporal split prevented data leakage. Six models (Naive, Linear, Ridge, XGBoost, LightGBM, LSTM) were benchmarked to forecast one-year-ahead resistance rates using features including prior-year resistance and antibiotic consumption. Evaluation metrics (MAE, RMSE, sMAPE) were computed, with 95% bootstrap confidence intervals for MAE. A local Retrieval-Augmented Generation (RAG) system utilizing Gemma 4 was implemented to translate forecast findings into policy guidance grounded in retrieved WHO documents. Results: XGBoost achieved the best performance (test MAE = 6.13% [95% CI: 5.83-6.44]), an 85.3% error reduction versus the naive baseline (MAE = 41.79%). SHAP analysis identified prior-year resistance as the dominant predictor (50.5% gain), confirming strong autoregressive behavior. Regional forecast error tracked closely with surveillance coverage, ranging from 3.65% in the European Region to 8.61% in South-East Asia. The RAG pipeline generated accurate, source-attributed policy responses without fabricated citations. Conclusion: Short-term AMR resistance rates exhibit strong temporal autocorrelation that can be accurately forecasted using gradient boosting. Coupling these forecasts with a hallucination-resistant RAG system provides a scalable, evidence-based decision-support framework for AMR governance.

20.
arXiv (quant-ph) 2026-06-11

Entanglement preservation and Clauser-Horne nonlocality in electromagnetically induced transparency quantum memories

arXiv:2507.15453v4 Announce Type: replace Abstract: Entanglement preservation in noisy quantum memories represents a central challenge in quantum information science. While experiments have shown that electromagnetically induced transparency (EIT) memories can store entangled photons, a quantitative theoretical analysis of whether nonlocal quantum correlations can survive storage loss induced by ground-state decoherence remains limited. Here we combine the dark-state polariton formalism with a reduced density-operator treatment to derive an EIT-specific effective pure-loss description for the retrieved photonic state in the ground-state-decoherence-limited regime. The analysis reveals that decoherence transforms an initially pure Bell state into a mixed state with a vacuum component and predicts a protocol-dependent storage-efficiency benchmark of 89.7% for violating the chosen unconditional Clauser-Horne (CH) inequality. Above this benchmark, the retrieved photonic state violates the CH inequality without post-selection, whereas below it, this unconditional CH violation is no longer obtained. This framework provides a quantitative theoretical description of entanglement retention, retrieved photonic density operators, and protocol-dependent Bell-test benchmarks in EIT quantum memories.

21.
arXiv (CS.AI) 2026-06-17

Rethinking Multimodal Fusion for Time Series: Text Modalities Need Constrained Fusion

arXiv:2603.22372v2 Announce Type: replace-cross Abstract: Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit generalization. In this paper, we show that multimodal models with naive fusion strategies (e.g., simple addition or concatenation) often underperform unimodal TS models, which we attribute to the uncontrolled integration of auxiliary modalities which may introduce irrelevant information. Motivated by this observation, we explore various constrained fusion methods designed to control such integration and find that they consistently outperform naive fusion methods. Furthermore, we propose Controlled Fusion Adapter (CFA), a simple plug-in method that enables controlled cross-modal interactions without modifying the TS backbone, integrating only relevant textual information aligned with TS dynamics. CFA employs low rank adapters to filter irrelevant textual information before fusing it into temporal representations. We conduct over 20K experiments across various datasets and TS/text models, demonstrating the effectiveness of the constrained fusion methods. Code is available at: https://github.com/seunghan96/cfa.

22.
arXiv (quant-ph) 2026-06-15

On-site interactions in quantum thermal machines: efficiency, rectification and entanglement beyond local and global master equations

arXiv:2606.14593v1 Announce Type: new Abstract: Advances in experimental techniques have opened new routes for harnessing non-equilibrium dynamics in mesoscopic quantum systems. In this context, we study the impact of on-site interactions on the transport properties of a continuous quantum thermal machine composed of two coupled oscillators connected to two thermal reservoirs. In the weak system-reservoir coupling regime, where a long-standing debate concerns which reduced description should be preferred, we first show that the Redfield master equation (RME) provides an accurate and unifying framework that interpolates between two well-known limits: the local and global master equations. By relying on the Hierarchy of Pure States (HOPS), a numerically exact stochastic method, we then explore the full parameter space and show that interactions can be leveraged to tune the efficiency of the thermal machine at high temperatures (while leaving it essentially unchanged at low temperatures), induce non-reciprocal transport under asymmetric reservoir couplings, and generate steady-state entanglement within the junction. We derive expressions for system-bath correlators, such as heat and particle currents, consistently across different frameworks. Our work features on-site interactions to enhance the versatility of quantum thermodynamic junctions and clarifies the role of non-Markovianity and non-linearities in quantum transport.

23.
medRxiv (Medicine) 2026-06-11

Foundation model-based tool for automated ulcerative colitis histology scoring demonstrates non-inferiority to pathologists across multiple scoring indices

In clinical trials for ulcerative colitis (UC), pathologists assess disease severity through standardized histological indices, including the Geboes Score, Robarts Histopathology Index (RHI), and Nancy Histologic Index (NHI). Despite strong associations with clinical outcomes, histologic scoring suffers from inter- and intra-reader variability, and consensus criteria for histologic remission remain uncertain. Through a consortium approach, we developed an artificial intelligence-based measurement (AIM) tool for scoring histology in UC mucosal biopsies (AIM-HI UC). This model, trained on a large dataset of UC biopsies (N=10,230), utilizes additive multiple instance learning models leveraging PLUTO, a pathology foundation model, that predict each of the Geboes subgrades, from which the Geboes grade-level score, RHI, and NHI can be calculated. Evaluation of this model on a standalone verification set including clinical trial specimens established algorithm non-inferiority and/or superiority relative to standard qualified pathologists through comparison of algorithm-consensus and pathologist-consensus agreement metrics (non-inferior if difference >-0.1, superior if difference >0, inclusive of confidence intervals). AIM-HI UC was determined to be non-inferior to pathologists (N=3) for the prediction of all seven Geboes subgrades, grade-level Geboes, RHI, NHI, histologic improvement (GS

24.
arXiv (CS.LG) 2026-06-16

CREST: Deployment-Realistic Hardware-in-the-Loop NAS for Embedded Sensing Systems

arXiv:2606.15004v1 Announce Type: cross Abstract: Deploying neural networks on low-power microcontrollers (MCUs) requires selecting model architectures under tight memory, latency, and energy constraints. Existing workflows often simplify this process along one or more axes: static proxy costs such as FLOPs or parameters, treating one MCU as representative, and continuous-inference tests instead of deployed sensing schedules. These assumptions can mis-rank Pareto-front candidates, miss infeasible deployments, and obscure schedule-dependent energy. We present CREST (Cross-platform Runtime Evaluation and Search Tool), a deployment-realistic hardware-in-the-loop (HIL) neural architecture search (NAS) framework for MCU sensing systems. CREST keeps the optimizer, HIL measurement boundary, logging, and replay workflow fixed while exposing workload, model family, target backend, schedule, quantization, and scoring policy as configurable axes. This makes deployment effects experimentally separable within one reusable workflow. We evaluate CREST on inertial odometry and audio classification across three Arm Cortex-M targets. For inertial odometry, measured-energy HIL search reduces median per-inference energy by 41.7% versus FLOPs-based selection and 40.8% versus memory-traffic-based selection at similar error. FLOPs-based selection also chooses infeasible deployments on memory-constrained targets. On the STM32 N657 target, continuous-inference and duty-cycled searches produce different Pareto frontiers. For audio classification, the same application-level policy selects different DS-CNN architectures on different boards, and cross-board replay changes deployment cost substantially. Overall, CREST shows that deployment-realistic MCU NAS must jointly optimize model architecture, target platform, runtime schedule, and deployment policy rather than relying only on static proxy costs or continuous-inference measurements.

25.
arXiv (CS.CL) 2026-06-16

CHILLGuard: Towards Fine-Grained Chinese LLM Safety Guardrail with Scalable Data Construction and Model-aware Preference Alignment

Malicious content generated from large language models (LLMs) could pose severe safety risks and ethical concerns. While existing LLM safety guardrails excel in English or multilingual settings, they lack adaptation to Chinese-specific regulatory policies, cultural context and linguistic nuances, failing to support fine-grained risk classification for diverse deployment needs. In this paper, we introduce a 5-macro, 31-micro category fine-grained risk taxonomy for Chinese scenarios, and build CHILLGuard: a dedicated Chinese LLM content safety guardrail. To address the critical scarcity of high-quality annotated Chinese safety data, we propose a scalable multi-stage data construction pipeline: we expand multi-source corpus via retrieval-augmented generation, generate implicit harmful samples through prompt engineering rewriting, and refine high-quality data via multi-model voting-based label calibration. Based on this, we build CHILLGuardTrain, a large-scale training set with 405,007 samples, and CHILLGuardTest, a rigorously curated annotated test set with 51,745 samples. We then train CHILLGuard on CHILLGuardTrain under a generator-classifier collaborative framework via Model-aware Direct Preference Optimization. Extensive experiments under multiple settings demonstrate the state-of-the-art performance of CHILLGuard, e.g., a 15.92% improvement of F1 score over Qwen3Guard-8B-Strict on our benchmark. We will release our resources at https://github.com/cswbyu/CHILLGuard.