Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-12

M*: A Modular, Extensible, Serving System for Multimodal Models

arXiv:2606.12688v1 Announce Type: cross Abstract: We are entering a new era of composite model architectures that integrate diverse components such as vision encoders, language backbones, diffusion and flow heads, audio codecs, action generators, and world-model predictors. Such architectures underpin a broad class of multimodal models, including unified multimodal models, omni models, speech-language models, vision-language-action policies, and world models. However, existing model serving frameworks were built on narrow assumptions about model structure, making them ill-suited to accommodate this new architectural diversity. Here we present M*, a universal serving system for efficient serving of composite AI models. M* represents models as dataflow graphs, processing requests spanning diverse modalities and tasks as traversals over these graphs. The core insight is a modular abstraction that supports arbitrary composition of model components, flexible placement onto a physical cluster, and model-agnostic optimizations within a distributed runtime. We call this abstraction the Walk Graph and show how it can concisely capture composite models from a broad range of families. We instantiate M* on representative models and find that it achieves, on average, 20% lower end-to-end latency than vLLM-Omni for text-to-image workloads on BAGEL, while delivering up to 2.9x lower real-time factor and 2.7x higher throughput for text-to-speech workloads on Qwen3-Omni. M* also outperforms the V-JEPA 2-AC rollout baseline for robotic planning by up to 12.5x. Thus, our work paves the road towards more efficient serving of complex models with minimal developer effort.

02.
arXiv (CS.CL) 2026-06-16

Equity with Efficiency: An Empirical Study of Tokenizers for Multilingual Large Language Models

Multilingual large language models (LLMs) depend on subword tokenization to bridge discrete text and continuous neural representation. State-of-the-art multilingual LLMs often use Byte-level Byte-Pair Encoding (BPE) tokenizers that structurally favor high-resource languages and Latin scripts. For speakers of underrepresented languages, particularly those across Southeast Asia, this bias inflates inference costs and widens cross-lingual capability gaps. We present the first systematic comparison of equitable tokenizers on a unified benchmark spanning 11 Southeast Asian languages. Beyond tokenizer-level analysis of compression efficiency and cross-lingual equity, we assess downstream task performance through controlled 1.5B-parameter language model training using the same training data. Our results show that Parity-aware BPE lies on the Pareto frontier of the efficiency-equity trade-off, achieving strong compression parity at competitive cost. Morphology-Driven Byte Encoding delivers the best semantic reasoning performance through morphologically richer representations, albeit at a higher computational expense. Byte Latent Transformer underperforms on downstream tasks, possibly because its architectural assumptions misalign with the constraints of limited low-resource training data. Together, our findings demonstrate that cross-lingual fairness and tokenization efficiency are not fundamentally at odds, and offer practical guidance for designing equitable multilingual models.

03.
arXiv (CS.LG) 2026-06-24

Predictive variational inference: Learn the predictively optimal posterior distribution

arXiv:2410.14843v4 Announce Type: replace-cross Abstract: Vanilla variational inference finds an optimal approximation to the Bayesian posterior distribution, but even the exact Bayesian posterior is often not meaningful under model misspecification. We propose predictive variational inference (PVI): a general inference framework that seeks and samples from an optimal posterior density such that the resulting posterior predictive distribution is as close to the true data generating process as possible, while this closeness is measured by multiple scoring rules. By optimizing the objective, the predictive variational inference is generally not the same as, or even attempting to approximate, the Bayesian posterior, even asymptotically. Rather, we interpret it as implicit hierarchical expansion. Further, the learned posterior uncertainty detects heterogeneity of parameters among the population, enabling automatic model diagnosis. This framework applies to both likelihood-exact and likelihood-free models. We demonstrate its application in real data examples.

04.
arXiv (quant-ph) 2026-06-12

Spectral analysis of equilibration: information leakage in isolated quantum systems

arXiv:2606.12545v1 Announce Type: new Abstract: We develop a unified dynamical-spectral framework for equilibration in isolated quantum systems based on a subspace coarse-graining approach. Central to our formulation is the Leakage Fidelity Function (LFF), defined as the probability that a unitarily evolving state escapes the support of its initial subspace. This quantity provides a direct, operational measure of information flow and memory loss without invoking ensemble assumptions or perturbative arguments. We derive universal bounds on temporal fluctuations of the LFF, in terms of the spectral gap structure and the square of the effective dimension, evincing that large spectral delocalization suppresses fluctuations and guarantees equilibration on average. By introducing spectral power distributions and associated entropic measures, we establish a quantitative link between phase mixing, gap participation, and dynamical stability. We further investigate the equilibration timescale by connecting the LFF to quantum speed limits, thereby revealing the average time required for equilibration. Our results provide a state-dependent, geometrically transparent perspective on how spectral complexity and subspace information leakage jointly govern irreversibility in closed quantum many-body systems.

05.
arXiv (quant-ph) 2026-06-15

Geometric mechanisms enabling spin- and enantio-sensitive observables in one photon ionization of chiral molecules

arXiv:2603.02735v3 Announce Type: replace-cross Abstract: We examine spin-resolved photoionization of randomly oriented chiral molecules via circularly polarized light, and revisit earlier predictions of Cherepkov (J. Phys. B: Atom. Mol. Phys. 16, 1543, 1983). We will show that the dynamical origin of spin- and enantio-sensitive observables arise from two intrinsic mechanisms that are quantified by two pseudovectors stemming from the geometric properties of the photoionization dipoles in spin space and in real space, and an extrinsic mechanism which is a directional bias introduced by the well-defined direction of light polarization. These mechanisms arise solely from electric dipole interactions. Consequently, this means that the ten independent parameters that was earlier predicted by Cherepkov to fully describe spin-resolved photoionization of chiral molecules can be reduced as moments of these three pseudovectors. We also find that the molecular pseudoscalars describing the spin- and enantio-sensitive components of the yield can be described by the flux of these pseudovectors through the energy shell, which changes sign upon switching enantiomers. Our results provide compact expressions for these observables which provide an intuitive picture on what determines the strength of these spin- and enantio-sensitive observables. The approach can be readily generalized to photoexcitation, multiphoton processes, and arbitrary field polarizations. Regardless of the specific driving conditions, the resulting spin- and enantio-sensitive observables are still controlled by the same three pseudovectors, underscoring their universal role as the primary generators of chirality-induced spin asymmetries, emphasizing their fundamental geometric origin and the universality of the mechanism identified here.

06.
arXiv (math.PR) 2026-06-16

Scaling Limits of Bivariate Nearly-Unstable Hawkes Processes and Applications to Rough Volatility

arXiv:2605.03703v3 Announce Type: replace Abstract: We study a pair of nearly-unstable Hawkes processes coupled through a one-directional, or triangular, cross-excitation: the first component evolves autonomously and excites the second, but not conversely. Each component is self-exciting through a heavy-tailed memory kernel, and the two kernels are allowed to have different tail indices, so that the limiting components exhibit genuinely different degrees of roughness. As the system approaches criticality, we prove that the suitably rescaled intensity vector converges weakly to the unique solution of a coupled system of stochastic Volterra equations of rough-volatility type. The first limiting component is autonomous, while the second is driven both by its own noise and by an inherited noise transmitted from the first component through an effective cross-kernel. This cross-kernel is the convolution of the two limiting Mittag-Leffler kernels and therefore combines the two memory structures. As a consequence, we obtain a short-time cross-decorrelation law: although the two components are coupled, their functional correlation vanishes at small time scales at an explicit polynomial rate. This time-dependent correlation distinguishes the limit from independent rough processes and from classical bivariate rough models with constant Brownian correlation.

07.
arXiv (CS.AI) 2026-06-17

Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

arXiv:2606.18114v1 Announce Type: cross Abstract: State Space Models (SSMs) such as Mamba-2 offer linear-time inference but their memory footprint limits edge deployment. Prior ternary SSM work (Slender-Mamba) trains from scratch on 150B tokens; we show a pretrained checkpoint suffices, reducing the marginal token budget by 1,000x. Using grouped quantization-aware training (QAT) with knowledge distillation from a frozen FP16 teacher, we compress Mamba-2 1.3B to 3.61x (2,687 to 744 MB) and achieve 48.1% zero-shot accuracy (7-task average) in just 102M tokens (4 GPU-hours, single H100) – approaching Bi-Mamba's 48.4% (within +/-0.9pp CI). This QAT-from-pretrained setting reveals zero-ratio collapse, a novel instability caused by learnable quantization scales that does not arise in from-scratch training. We further show that post-hoc correction strategies effective for Transformers fail for SSMs due to error accumulation through the recurrence. These results demonstrate that ternary SSMs do not require expensive from-scratch training: QAT from pretrained checkpoints with KD is a data-efficient alternative.

08.
arXiv (CS.AI) 2026-06-16

Quantifying the Impact of Lossy Compression on Neural Generative Surrogate Modeling

arXiv:2606.15959v1 Announce Type: cross Abstract: Neural networks are used as generative surrogate models for scientific discovery, which are trainable approximations of scientific simulations. These models enable users to replace time-consuming numerical simulations with learned alternatives, providing quick solutions. However, high-fidelity generative surrogate models require massive training datasets, which can create storage and I/O challenges. Lossy compression is a promising way to reduce this burden, but compression errors may affect the model quality in subtle ways, making it challenging to quantify their impact. In this work, we examine how lossy compression of training data impacts the quality of generative surrogate models. We begin by characterizing the uncertainty inherent in training neural networks, showing that identical training configurations can produce different models. By exploiting this variability, we propose a method to estimate how much compression-induced error a surrogate model can tolerate without affecting its accuracy. Evaluation of two application simulations demonstrates that our approach significantly reduces memory/storage requirements and speeds up training while producing high-quality surrogate models. These results show that lossy compression saves data storage up to 23.7x and 39x with negligible impact on the quality of the surrogate model. Meanwhile, reducing the size of the training data set also enhances the data loading speed and reduces the training time by up to 3x.

09.
arXiv (CS.CV) 2026-06-16

Landmark-free Assessment of Lower-limb Alignment with Implicit Neural Shape Functions from Knee Radiographs

Radiographic assessment of lower-limb alignment (LLA) is important for predicting joint health and surgical outcomes in total knee arthroplasty. Traditional measurement methods are manual and time-consuming, while recent machine learning approaches typically rely on locating a fixed set of anatomical landmarks. This dependence limits flexibility and may require re-annotation when clinical definitions change. To address this, we propose an automated workflow using Implicit Neural Shape Functions (INSF). Rather than relying on explicit landmark coordinates, we encode the anatomy into a compact latent space and regress clinical alignment measurements directly from these latent codes. This architecture allows for rapid extendability to new tasks without altering the backbone representation. We trained our method on an internal dataset of 566 knee radiographs, each annotated with the outline of the femur and tibia. We evaluated it on both an internal test dataset of 50 patients and a separate external set of 402 preoperative cases from the MRKR dataset. Manual clinical measurements are available for these data, and the MRKR measurements will be made publicly accessible. Performance was comparable to state-of-the-art landmark-based methods and manual agreement, while offering a flexible shape representation that can be extended to additional measurement tasks.

10.
arXiv (quant-ph) 2026-06-16

VQE as Initial State Preparation for QPE on Heisenberg Spin-Glass Hamiltonians

arXiv:2606.15061v1 Announce Type: new Abstract: Quantum Phase Estimation (QPE) is the quantum algorithmic workhorse for computing ground state energies of quantum Hamiltonians with quantum computers. Ground state energy calculation of physical systems is perhaps the most promising use case for quantum computing in terms of scientific and commercial value with a plausible path to outperformance of classical alternatives. This path, however, hinges on the availability of initial states for QPE with significant overlap with the true ground state. Using extensive (classical) numerical computations, we study whether the NISQ-era algorithm VQE (Variational Quantum Eigensolver) could be used to efficiently prepare high-overlap states of disordered fully-connected anisotropic Heisenberg spin glass quantum Hamiltonians with up to $15$ qubits. We find that (i) – consistent with widely held, but rarely numerically illustrated beliefs – VQE is generally unable to efficiently converge to the ground state for our Hamiltonians, which is a well-known issue with VQE due to a variety of factors including vanishing gradients and local minima; (ii) low energy states do not necessarily have large ground-state overlap, but there is typically a correlation between the two measures; (iii) adding more than three layers to the VQE ansatz neither improves overlap nor the energies found; and (iv) the best-found overlap scaling as a function of the Hamiltonian system size is not strongly exponentially decreasing, suggesting potential for VQE to be a heuristic state preparation algorithm for QPE.

11.
arXiv (CS.LG) 2026-06-24

Exact Schur-Sylvester Dimensionality Reductions for Non-Smooth Stochastic Complexity and Manifold Sampling

arXiv:2606.23867v1 Announce Type: new Abstract: The exact computation of the Normalized Maximum Likelihood (NML) codelength for regular non-smooth estimators (e.g., Lasso) has been historically limited by the cubic scaling walls of manifold-constrained projection and volume integration. At each step of the geometric Propose-and-Project Metropolis–Hastings (PPMH) sampler, evaluating the projection operator requires inverting an $(N+k) \times (N+k)$ generalized KKT matrix, while calculating the volume factor requires the determinant of an $(N-k) \times (N-k)$ Gram matrix. This paper presents an exact, mathematically equivalent formulation that bypasses both bottlenecks by utilizing the block Schur complement and Sylvester's determinant identity. We prove that the computational complexity of both operations collapses from $\mathcal{O}(N^3)$ to $\mathcal{O}(k^3 + N^2 k)$ per step. We generalize this reduction to Sparse Support Vector Machines (SVMs), Elastic Net, and Group Lasso. Finally, we provide a rigorous numerical stability analysis and evaluate the sampler's efficiency using the Effective Sample Size (ESS) per second. Our empirical benchmarks on high-dimensional datasets confirm a constant speedup exceeding $14{,}100\times$ while maintaining double-precision numerical equivalence, rendering exact non-smooth NML estimation highly tractable for large-scale statistical inference.

12.
medRxiv (Medicine) 2026-06-24

Repetitive Transcranial Magnetic Stimulation over Primary Somatosensory Cortex for Upper Limb Function in Stroke: An Exploratory Randomized Controlled Trial

Background: Stroke often causes Upper Limb (UL) functional impairments. The Primary Somatosensory Cortex (S1) plays an important role in motor learning. Repetitive Transcranial Magnetic Stimulation (rTMS) over S1 could enhance UL recovery. We aimed to explore its preliminary effects on UL motor activity and function post-stroke. Methods: An exploratory parallel-group randomized controlled trial in people with chronic stroke (>3 months) and moderate hemiparesis was conducted. Participants received 20 sessions of active or sham 5Hz rTMS over affected S1, with Robot-Assisted Therapy and Task-Oriented Training, 5 days/week for 4 weeks. The primary endpoint was UL motor activity (Action Research Arm Test, ARAT). Secondary measures were the UL Fugl-Meyer Assessment (UL-FMA) and sensory outcomes. Results: The baseline-adjusted mean difference (MD) in ARAT was 4.05 points [0.78, 7.33], favoring active stimulation. Secondary measures did not favor active stimulation (UL-FMA: MD = 2.62 [-1.51, 6.76]; sensory outcomes showed no between-group differences). Conclusion: High-frequency rTMS over S1 may enhance UL motor activity (ARAT), but no evidence for motor impairment (UL-FMA) or sensory domains was found. Compensation rather than restoration may underlie this improvement. Stimulation targets should match the intended recovery domain, although larger trials are needed to confirm these preliminary findings.

13.
arXiv (CS.AI) 2026-06-11

CCKS: Consensus-based Communication and Knowledge Sharing

arXiv:2606.12281v1 Announce Type: cross Abstract: In Decentralized Training and Decentralized Execution (DTDE) for cooperative Multi-Agent Reinforcement Learning (MARL), action-advising-based knowledge sharing promotes interpretable and scalable cooperation among agents. However, current action advising approaches often adhere too much to the teacher's guidance without evaluating teacher-student compatibility, which causes excessive advising, suboptimal stability, and degraded performance. To overcome these challenges, this paper presents a Consensus-based Communication and Knowledge Sharing (CCKS) framework, which allows agents to adopt recommendations based on consensus-derived constraints and to follow the teacher's instructions more smartly. This mechanism enables agents to balance exploration and learning from experienced teachers, improving overall performance. The key is the consensus model construction, for which we propose to employ contrastive learning to construct consensus models based on local observations in the agents' training phase. In action selection, agents score and choose actions based on consensus and shared knowledge. Designed as a plug-and-play solution, CCKS integrates seamlessly with existing DTDE algorithms. Experiments conducted in the Google Research Football environment and the complex StarCraft II Multi-Agent Challenge demonstrate that the integration with CCKS significantly improves cooperation efficiency, learning speed, and overall performance compared with current DTDE baselines. The code is available at https://github.com/yuanxpy/CCKS.

14.
arXiv (CS.CL) 2026-06-16

Speaking the Language of Science: Toward a General-Purpose Generative Foundation Model for the Natural Sciences

In this report, we present LOGOS (Language Of Generative Objects in Science), a scientific generative language model that unifies heterogeneous tasks across the natural sciences within a single autoregressive framework based on a shared scientific grammar. It encodes diverse scientific objects and their spatial interactions as token sequences over a common vocabulary. By representing spatial contact and constraint patterns as discrete tokens, the model captures complex structural interactions in a purely sequential manner, without relying on explicit coordinates or geometric neural networks. This unified representation enables a wide range of downstream tasks to be formulated consistently as next-token prediction in the same grammar space, creating strong alignment between continued multi-domain pre-training and downstream objectives. Across diverse tasks, LOGOS consistently matches or outperforms domain-specific baselines, providing preliminary evidence for the feasibility of "one model fits all" in the natural sciences. We train LOGOS models at different scales (1B, 3B, and 8B parameters) and find a consistent positive correlation between model size and performance. This suggests that the future of AI for Science (AI4S) may not lie in building an independent technical stack that is separated from large language models (LLMs). Instead, it may depend on deeply aligning scientific foundation models with LLMs through shared architectures, shared training paradigms, and shared inference infrastructure, so that LLMs can truly become a new entry point for AI4S. We release the model weights and associated resources to facilitate further research.

15.
arXiv (CS.LG) 2026-06-16

SDVDiag: Multimodal Causal Discovery for Online Diagnosis in Software-defined Vehicles

arXiv:2606.15559v1 Announce Type: cross Abstract: The transition toward software-defined vehicles concentrates an increasing share of vehicle functionality into distributed software services, where failures propagate through service dependencies and the surface symptom is often several causal hops away from the underlying defect. Existing approaches to causal root-cause analysis in such systems address this only partially: they typically reason over a single observability modality and operate in an offline, operator-driven mode that does not match the demands of continuous vehicle operation. This paper presents SDVDiag, a multimodal causal-discovery pipeline that fuses log-based and metric-based service representations into a shared embedding space before graph construction, coupled with an anomaly-driven trigger that converts the diagnostic platform from a manually operated batch tool into a continuously running online system. Evaluation on an Autonomous Valet Parking testbed shows that the multimodal pipeline produces sparser causal graphs than a metrics-only baseline (134 vs. 182 edges on average) and consistently outperforms it in edge-weighted reward against an expert knowledge graph at every stage of human-feedback refinement, showing a 2.4-fold improvement over the baseline after 60 feedback queries. An end-to-end fault-injection scenario further demonstrates that the integrated trigger correctly recovers a true root cause located two causal hops upstream of the observable symptom.

16.
arXiv (quant-ph) 2026-06-17

Entanglement transition in unitary system-bath dynamics

arXiv:2512.06081v3 Announce Type: replace Abstract: The evolution of a system coupled to baths is commonly described by a master equation that, in the long-time limit, yields a steady-state density matrix. However, when the same evolution is unraveled into quantum trajectories, it is possible to observe a transition in the scaling of entanglement within the system as the system-bath coupling increases - a phenomenon that is invisible in the trajectory-averaged reduced density matrix of the system. Here, we go beyond the paradigm of trajectories from master equations and explore whether a qualitatively analogous entanglement-scaling transition emerges in a single unitary evolution of the combined system-bath setup, without monitoring the dynamics of the system. We investigate the scaling of entanglement in a unitary quantum setup composed of a two-dimensional lattice of free fermions, where each site is coupled to a fermionic bath. As the system-bath coupling increases, the logarithmic fermionic negativity reveals an entanglement transition from logarithmic-law to area-law scaling. This occurs while the system's steady-state properties are trivial, highlighting that the signatures of these different scalings are within the bath-bath correlations. Evidence of the transition is also found in the mutual information and the correlations of the full system-bath setup, suggesting that the entanglement transition is underpinned by a change in the spatial structure of quantum information.

17.
arXiv (CS.CL) 2026-06-16

Evaluating and Preserving Lexical Stress in English-to-Chinese Speech-to-Speech Translation

Speech-to-speech translation (S2ST) systems have achieved impressive progress in semantic accuracy and speech naturalness. However, the cross-lingual transfer of lexical stress, a vital cue for emphasis and speaker intent, remains heavily underexplored, compounded by a lack of reliable automatic evaluation metrics for tonal languages like Chinese. We investigate English-to-Chinese S2ST stress transfer by constructing a stress-annotated Chinese dataset and an XLS-R-based Mandarin stress detector. Integrating this with the English EmphAssess system, we propose a novel objective metric for cross-lingual stress evaluation. Furthermore, we fine-tune CosyVoice3 to build a stress-aware S2ST system. Experiments demonstrate that our proposed S2ST architecture significantly outperforms existing systems in stress translation capability while maintaining competitive translation quality. Furthermore, our evaluation metric exhibits a strong correlation with human subjective judgments.

18.
medRxiv (Medicine) 2026-06-16

Validating an Early Pregnancy HbA1c as the Screening Test for Gestational Diabetes Mellitus: Findings from PRISMA Pakistan Cohort

Background: Early identification of gestational diabetes mellitus (GDM) is critical to improving maternal and neonatal outcomes, particularly in resource-constrained settings where universal oral glucose tolerance testing (OGTT) is burdensome. We assessed whether early-pregnancy HbA1c alone or combined with common risk factors can predict GDM and reduce the burden of OGTT requirements in a peri-urban cohort in Karachi, Pakistan. Methods: We conducted a secondary analysis of the Pregnancy Risk Infant Surveillance and Measurement Alliance (PRISMA) Pakistan cohort. Women enrolled before 20 weeks' gestation with available early-pregnancy HbA1c and a 2-hour 75g OGTT at 24 to 28 weeks were included. We externally validated GDM prediction models originally developed in the STRiDE-India cohort. Model performance was evaluated using receiver operating characteristic (ROC) curves and area under the curve (AUC). We assessed four models: HbA1c alone (Model 1a); age, BMI, and family history of diabetes mellitus (FH DM) (Model 1b); HbA1c combined with age, BMI, and FH DM (Model 2); and an extended model, i.e., Model 2 combined with socioeconomic status, gestational age, parity, systolic and diastolic blood pressure (Model 3). A dual-threshold approach was applied to assess rule-in and rule-out performance. Results: Among 2,489 women, GDM incidence was 7.5% (n=186). Models with a broader set of predictors demonstrated higher AUC values, with Model 2 achieving an AUC of 0.61 (95% CI: 0.57, 0.66). Including additional factors (Model 3) did not further improve predictive ability (AUC: 0.62; 95% CI: 0.58, 0.66). In addition, at predefined thresholds, Model 2 achieved sensitivity of 73.7% (rule-out) and specificity of 83.5% (rule-in), with the potential to reduce OGTT requirements (58.5%). Conclusions: Early-pregnancy risk stratification using HbA1c combined with simple clinical predictors offers a pragmatic approach to streamline GDM screening among high-risk pregnant women. A dual-threshold strategy using Model 2 could reduce reliance on universal OGTT while prioritizing high-risk women for confirmatory testing.

19.
arXiv (CS.AI) 2026-06-17

Towards Leveraging AutoML for Sustainable Deep Learning: A Multi-Objective HPO Approach on Deep Shift Neural Networks

arXiv:2404.01965v3 Announce Type: replace-cross Abstract: Deep Learning (DL) has advanced various fields by extracting complex patterns from large datasets. However, the computational demands of DL models pose environmental and resource challenges. Deep shift neural networks (DSNNs) offer a solution by leveraging shift operations to reduce computational complexity at inference. Following the insights from standard DNNs, we are interested in leveraging the full potential of DSNNs by means of AutoML techniques. We study the impact of hyperparameter optimization (HPO) to maximize DSNN performance while minimizing resource consumption. Since this combines multi-objective (MO) optimization with accuracy and energy consumption as potentially complementary objectives, we propose to combine state-of-the-art multi-fidelity (MF) HPO with multi-objective optimization. Experimental results demonstrate the effectiveness of our approach, resulting in models with over 80\% in accuracy and low computational cost. Overall, our method accelerates efficient model development while enabling sustainable AI applications.

20.
arXiv (CS.LG) 2026-06-18

Everywhere Valid Bounds on False Discovery Proportions in Conformal Inference

arXiv:2605.20726v2 Announce Type: replace-cross Abstract: Modern applications of conformal inference to multiple testing problems, such as outlier detection and candidate selection, often involve selecting test samples whose conformal p-values fall below a threshold. The quality of such methods is often measured by the false discovery proportion (FDP), defined as the fraction of incorrect selections. Existing approaches typically control the expected value of the FDP, using methods such as the Benjamini-Hochberg procedure. This approach fails to provide high-probability bounds on the realized false discovery proportion and invalidates statistical guarantees if the rejection threshold is selected after inspecting the data. This paper establishes finite-sample, distribution-free upper bounds on the FDP that hold simultaneously over all possible rejection thresholds, enabling arbitrary post hoc selection of the threshold. Simultaneous validity is achieved by constructing a high-probability envelope for the empirical distribution function of null conformal p-values by sampling from their joint distribution. Furthermore, our framework allows practitioners to modulate the envelope's shape, thereby producing tight bounds in rejection regions of primary interest. We use this flexible approach to derive simultaneous FDP upper bounds for both outlier detection and conformal selection. We demonstrate through synthetic and real-data experiments that the resulting bounds are both valid and substantially less conservative than those derived from existing approaches.

21.
arXiv (CS.AI) 2026-06-19

Can In-Context Learning Support Intrinsic Curiosity?

arXiv:2606.19476v1 Announce Type: cross Abstract: Effective machine learning depends not only on how we model data, but also on what data we choose to collect. While large sequence models have revolutionized data modeling, the problem of automated data selection, or "intrinsic curiosity", remains a significant challenge. Classic approaches incentivize exploration by rewarding an agent based on its "learning progress", which measures how much a newly acquired observation improves a world model's predictive ability. However, evaluating these rewards traditionally requires expensive inner loops of gradient descent updates within each trajectory, rendering them computationally impractical at scale. In this work, we investigate whether the emergent in-context learning (ICL) capabilities of sequence models can eliminate this bottleneck by serving as immediate, update-free world models. Specifically, we evaluate whether an exploration policy can be trained to maximize learning progress, using solely the prediction errors and counterfactual context manipulations of an in-context learner. We first prove that in general Markov decision processes, this is in fact impossible in an unbiased way: the resulting intrinsic rewards either suffer from nuisance terms that bias their estimation of true learning progress, or they cannot be implemented using an in-context learner's prediction errors. Conversely, we prove a positive result for a broad subclass of non-temporal settings, encompassing active learning and Bayesian Experimental Design: here, ICL-derived rewards successfully bound and asymptotically converge to the true learning progress. We corroborate our theory with controlled experiments across continuous and symbolic environments, demonstrating that our ICL-driven framework successfully trains curious data-collection policies that explore optimally.

22.
arXiv (CS.AI) 2026-06-19

ELVA: Exploring Ranking-Driven Universal Multimodal Retrieval

arXiv:2606.20280v1 Announce Type: cross Abstract: Leveraging Multimodal Large Language Models (MLLMs) via contrastive learning has become a mainstream paradigm for improving the performance of Universal Multimodal Retrieval (UMR). However, previous works have ignored the grain blindness when adapting the contrastive paradigm into retrieval tasks. Grain blindness refers to the tendency of the model to overlook grain-level information contained in the query, which is crucial for effectively handling complex queries. This stems from contrastive learning treating samples as a binary classification (positive/negative), while ignoring the different information carried by each negative sample. To address this, we argue that negatives should be treated differently according to their similarity to the positive sample, enabling the model to learn distinct grain information from each negative. In this paper, we introduce a simple but effective framework, called ELVA, a novel rule-based RL framework that mitigates grain blindness through ranking-driven MLLMs. 1) Instead of relying on reward models, we extend Reinforcement Learning with Verifiable Rewards (RLVR) to retrieval tasks, allowing the model to explore new ranking behaviors without explicit ranking labels. 2) By utilizing rule-based rewards, our approach jointly optimizes the ranking of negative samples while enlarging the similarity gap between positive and negative. To more precisely measure grain blindness, we further introduce MRBench, a new benchmark specifically designed for multi-grain query scenarios. ELVA achieves state-of-the-art results across standard retrieval benchmarks, and its notable 13.1% improvement on MRBench further demonstrates its effectiveness in alleviating grain blindness.

24.
arXiv (CS.AI) 2026-06-11

MODF-SIR: A Multi-agent Omni-modal Distilled Framework for Social Intelligence Reasoning

arXiv:2606.12018v1 Announce Type: new Abstract: We propose a multi-agent collaborative framework built upon a lightweight Multimodal Large Language Model (MLLM), specifically designed for social intelligence reasoning. A key feature of our approach is that both the training and inference phases are augmented via knowledge distillation. Within this architecture, multi-modal data pertinent to social intelligence is precisely localized. Furthermore, relevant long-tail events are identified, extracted, and rendered as formatted, explicit text. This formatting strategy prevents critical long-tail information from being overshadowed by head events and environmental noise during the tokenization process. Specifically, we integrate Test-Time Adaptation (TTA) across the entire reasoning pipeline, encompassing the extraction and representation of long-tail events, Chain-of-Thought (CoT) prompting, and self-reflection. This TTA mechanism is also distillation-enhanced, utilizing Low-Rank Adaptation (LoRA) to fine-tune the foundation model exclusively for instance-level reasoning. Extensive evaluations against various open-source and proprietary AI models across multiple benchmarks demonstrate the effectiveness of the proposed framework. With around 30% of training data from IntentTrain, we achieve state-of-the-art results. Codes are available at https://github.com/eeee-sys/MODF-SIR, demo is available at https://huggingface.co/spaces/Harry-1234/MODF-SIR, LoRA is available at https://huggingface.co/Harry-1234/MODF-SIR and the dataset for training router is available at https://huggingface.co/datasets/Harry-1234/IntentRouterTrain.

25.
arXiv (CS.LG) 2026-06-16

Causal-Privacy Audit Workflow for Synthetic and Distilled Data in Dropout Support

arXiv:2606.15940v1 Announce Type: new Abstract: Synthetic and distilled student data are increasingly used to enable privacy-conscious learning analytics, yet their suitability for decision-facing institutional support remains uncertain. In dropout support, generated data must preserve not only predictive utility or distributional resemblance, but also the financial-status evidence used to guide advising, payment-plan assistance, and scholarship-related decisions. Method: This study introduces CaP-Eval, a decision-facing causal-privacy audit workflow for evaluating generated student data under a fixed estimand, timing-aware adjustment design, estimator set, and empirical privacy-governance screen. The workflow compares original, distilled, adversarial synthetic, statistical synthetic, and DPGNet privacy-oriented generated data on predictive utility, treatment-effect fidelity, robustness to alternative estimators, and local training-record proximity. Results: DPGNet and distilled data preserved the original financial-status treatment-effect structure more reliably than the adversarial and Gaussian Copula baselines. DPGNet preserved full direction and rank agreement across epsilon levels; epsilon = 10 produced the smallest non-original IPW and DML deviations, while epsilon = 1 and epsilon = 5 amplified several financial-status contrasts. Distilled data remained highly faithful but retained the strongest local training-record proximity signal. TabularGNet preserved qualitative directions with moderate attenuation, and Gaussian Copula compressed effect magnitudes. Conclusions: Predictive utility, privacy orientation, empirical disclosure signals, and causal fidelity diverged; generated student data require joint audits of direction, magnitude, overlap, and release-governance risk before decision use.