Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-12

Limits of spectral learning under noise

arXiv:2606.13067v1 Announce Type: new Abstract: Learning functional relationships from noisy data is a central problem in scientific inference. Spectral methods approximate unknown functions by expanding them in a basis and estimating the corresponding coefficients from data, but the stability of these coefficients under noise remains poorly understood. Here we study supervised regression with additive label noise using sparse spectral representations across multiple bases and dimensions. We show that noise induces a predictable drift in the learned coefficient vector whose magnitude depends on the effective number of active spectral modes. After whitening the empirical feature geometry, we derive a closed-form expression for the overlap between noisy and noiseless coefficient vectors, revealing a universal degradation curve governed by a single intrinsic noise scale. Numerical experiments across Fourier, Legendre, Bessel, and Haar bases confirm the theoretical prediction. The results demonstrate that spectral learning exhibits a fundamental noise threshold beyond which coefficient estimates become unstable, placing intrinsic limits on recovering functional structure from noisy data.

02.
arXiv (CS.CV) 2026-06-16

BRDFusion: Physics Meets Generation for Urban Scene Inverse Rendering

Inverse rendering of urban scenes from captured videos enables numerous applications, including content creation and autonomous driving simulation. Physically-based rendering methods follow and control lighting physics, but suffer from reconstruction and rendering artifacts. While generative models produce realistic videos, they offer limited consistency and controllability. We present BRDFusion, a unified framework that combines two complementary models for inverse and forward rendering. Specifically, BRDFusion recovers explicit, consistent scene properties with physical modeling and alleviates optimization ambiguity with generative priors. During forward rendering, the physical model provides controllable rendering from the scene configuration, and the generative model denoises and fixes artifacts. Therefore, our method produces high-quality videos while allowing precise control, outperforming baselines in real and synthetic scenes. Moreover, BRDFusion supports novel-view relighting, night simulation, and dynamic object insertion/editing. Project page: https://shigon255.github.io/brdfusion-page/

03.
arXiv (CS.CL) 2026-06-11

Soft-Prompt Tuning for Fair and Efficient LLM Benchmark Evaluation

Benchmark scores often misrepresent a large language model's (LLM's) knowledge, because they rely, e.g., on the model's ability to follow specific formatting requirements. This especially penalizes base models that may know the correct answers but lack the ability – typically introduced in post-training – to structure them as instructed. To overcome this, we propose soft-prompt tuning, an efficient, fair, and architecture-agnostic model evaluation. By optimizing only 10 soft-prompt vectors (roughly 0.0006% parameters for a 7B model) over a short tuning period, we adapt models to specific benchmark formats, closing gaps in format-following and ensuring that underlying knowledge is accurately reflected in benchmark scores. This allows one to fairly compare different base models – trained with various pre-training recipes – on benchmarks without the need for full post-training. We evaluated soft-prompt tuning across 7 models and 7 datasets. The results show that (a) soft-prompt tuning saturates format-following within 80 steps (~640 samples) making it highly efficient, (b) soft-prompt tuning significantly outperforms zero- and few-shot prompting, surfacing base model knowledge that standard prompting misses, that (c) even post-trained models can benefit from soft-prompts to maximize format compliance, and that (d) soft-prompted base model performance predicts post-trained model rankings more reliably than zero- and few-shot baselines, offering a low-cost proxy for downstream model quality. Our contributions include (1) metrics which disentangle format-following and knowledge accuracy, (2) a fairer benchmarking protocol of LLM knowledge, and (3) a cost- and memory-effective recipe to identify optimal pre-training strategies early in LLM development.

04.
medRxiv (Medicine) 2026-06-13

Projected population level impact and cost-effectiveness of clinic and community-based tuberculosis screening approaches

The South Africa National Department of Health have set ambitious targets to scale up TB testing, focusing primarily on clinic attendees. In the context of declining funding for TB care and prevention, the most cost-effective approaches for targeting testing should be identified. We developed a mathematical model of TB in South Africa, explicitly incorporating clinic attendance by sex and HIV/ART status. We simulated six screening approaches over 2026-2035 (individually and in combination): three clinic-based (symptom screening, intensified targeted universal TB testing [TUTT, symptom-agnostic sputum testing of clinic attendees in key risk groups], and intensified TUTT allowing saliva samples) and three targeted community-based (community radiographic screening, symptom screening, and universal Xpert Ultra testing), each implemented at a range of coverage levels. Model outputs were combined with a mechanistic cost function to estimate potential impact and cost-effectiveness from a societal perspective. The most cost-effective standalone approach was community radiographic screening at 10% annual population coverage, with an incremental cost-effectiveness ratio (ICER) of $421 per disability-adjusted life year (DALY) averted. 10/11 scenarios along the expansion path included community radiographic screening at progressively higher coverage, combined with a clinic-based approach. Combining complementary approaches to reach both groups at increased risk of TB (e.g. clinic-based screening) and groups with lower screening coverage (e.g. community-based screening) may increase cost-effectiveness of TB screening, compared to standalone approaches. When designing TB screening strategies, both population risk and existing screening coverage should be considered.

05.
arXiv (quant-ph) 2026-06-12

Quantum-Driven Neuromorphic Computing for Million-Qubit-Scale Workloads

arXiv:2606.12968v1 Announce Type: new Abstract: We introduce Apollo, a 10000 node p-qubit neuromorphic processor fabricated in 16 nm mixed signal CMOS and operating fully at room temperature with a typical analog core power envelope of about 0.5 W. Its fundamental element, the p-qubit, is a bistable stochastic unit whose continuous time state fluctuations are driven by integrated quantum entropy units that inject true quantum derived randomness. This enables ultrafast stochastic transitions at low energy while preserving a classical state representation. Apollo combines these p-qubits with a high degree Hyperion 256 interconnect topology, allowing efficient embedding of dense Ising and QUBO problems with substantially reduced minor embedding overhead compared with sparse annealing platforms. We show that, through the Suzuki Trotter correspondence, the equilibrium statistics and annealing dynamics of the p-qubit network reproduce key properties of transverse field quantum annealing without cryogenic cooling, long lived coherence, or microwave control. Beyond device level validation, Apollo is evaluated on a three dimensional spin glass benchmark previously used to study quantum advantage in superconducting annealers. Across 300 disorder realizations, Apollo reaches substantially lower ground state energies than reported cryogenic quantum annealing hardware, while remaining distinct from classical simulated annealing and simulated quantum annealing. A 350 nm release candidate device experimentally validates the core p-qubit dynamics, thermodynamic sampling correctness, and continuous time annealing behavior. These results establish Apollo as a room temperature, industrially scalable platform for quantum driven energy based optimization, probabilistic inference, generative modeling, and hybrid classical quantum workflows.

06.
arXiv (quant-ph) 2026-06-11

Generating function and Bloch representation for quantum Fisher tensor

arXiv:2603.04615v2 Announce Type: replace Abstract: The Uhlmann relative amplitude between two density matrices is shown to be a generating function, through which the quantum Fisher tensor that contains both the quantum Fisher information matrix and the mean Uhlmann curvature can be obtained via differentiation over system parameters. In the pure state limit, our generating function recovers that of the quantum geometric tensor proposed by Het\'{e}nyi and L\'{e}vay, and also clarifies the fidelity and phase between two quantum states as the generating functions of the quantum metric and Berry curvature, respectively. A generic expression for the quantum Fisher tensor in terms of the Bloch representation of density matrices is derived, which facilitates the calculation of the tensor, mean Uhlmann curvature, and geometric properties derived from the quantum Fisher information matrix. Canonical ensembles of spins are adopted to demonstrate our formalism, which reveals a constant Ricci scalar, a vacuum Einstein equation, and a cosmological constant on the 3D Euclidean manifold of the magnetic field.

07.
Science (Express) 2026-04-23

Structural N- and O-glycans revealed by high-resolution cryo-EM analysis of tubular mastigonemes | Science

作者: 未知作者

The chemical complexity and non-templated biosynthesis of glycans have posed significant challenges for establishing sequence-structure relationships. Here we report cryo-EM structures of tubular mastigonemes from a golden alga species, Ochromonas danica , in which a large number of N- and O-glycans are resolved at 1.8-2.2 Å resolution. Beyond high-mannose and complex N-glycans, we identify a non-canonical N-glycan on the Ala- Asn -Asp (A N D) motif. The surface spikes comprise dense O-glycans coating PSXX tetrapeptide repeats, with two glycans linked on trihydroxylated proline and one on serine per repeat. In addition to various types of sugars and their covalent modifiers, water molecules (>10% of resolved volume) and cations are clearly resolved and mediate the structural assembly. Our study establishes a framework for investigating glycan folding in high-order biological assemblies.

08.
arXiv (CS.AI) 2026-06-11

CoVar: Confidence-Variance-Guided Pseudo-Label Selection for Semi-Supervised Learning

arXiv:2601.11670v3 Announce Type: replace-cross Abstract: Pseudo-label selection in semi-supervised learning is commonly driven by maximum-confidence thresholds, yet confidence alone can be unreliable under model overconfidence and class imbalance. We propose CoVar, a confidence–variance framework that assesses pseudo-label reliability by jointly modeling Maximum Confidence (MC) and Residual-Class Variance (RCV). Starting from entropy minimization, we derive a second-order cross-entropy approximation showing that low-loss pseudo-labels are favored when MC is high and RCV is low, with a confidence-dependent penalty that becomes stronger for near-certain predictions. Based on this criterion, CoVar embeds predictions into a two-dimensional confidence–variance space and uses SVD-based spectral relaxation to separate reliable and unreliable predictions without hand-tuned confidence thresholds. Cluster-wise Gaussian weighting then converts this separation into per-sample training weights. The resulting weights can be integrated into existing semi-supervised segmentation and classification pipelines during training and introduce no inference-time overhead. Experiments on PASCAL VOC 2012, Cityscapes, CIFAR-10, CIFAR-100, SVHN, and STL-10 show clear gains on VOC and Cityscapes under matched backbones, as well as competitive or improved error rates on standard classification benchmarks. These results indicate that residual-class dispersion provides a useful signal complementary to confidence for robust pseudo-label selection.

09.
arXiv (CS.CV) 2026-06-16

Human Cognition in Machines: A Unified Perspective of World Models

This report of world models distinguishes prior works by the cognitive functions they innovate. Many works claim an almost human-like cognitive capability in their world models. To evaluate these claims requires a proper grounding in first principles from human and machine cognition theory. In moving towards human-like world models we present a conceptual unified framework for world models that fully incorporates all the cognitive functions (i.e., memory, perception, language, reasoning, imagining, motivation, and metacognition) and identify gaps in existing research as a guide for future states of the art. In particular, we find that motivation (especially intrinsic motivation) and metacognition remain drastically under-researched, and we propose concrete directions to address these gaps informed by active inference and global workspace theory. We also introduce epistemic world models, a new category encompassing agent frameworks for scientific discovery that operate over structured knowledge. Our taxonomy, applied to video, embodied, and epistemic world models, suggests research directions where prior taxonomies have not.

11.
arXiv (quant-ph) 2026-06-11

Mixed-State Topological Order under Coherent Noise

arXiv:2411.03441v2 Announce Type: replace Abstract: Mixed-state phases of matter under local decoherence have recently garnered significant attention due to the ubiquitous presence of noise in current quantum processors. One of the key issues is understanding how topological quantum memory is affected by realistic coherent noise, such as random rotation noise and amplitude-damping noise. In this work, we investigate the intrinsic error threshold of the two-dimensional toric code (TC), a paradigmatic topological quantum memory, under these types of coherent noise by employing both analytical and numerical methods based on the doubled-Hilbert-space formalism. A connection between the mixed-state phase of the decohered TC and a non-Hermitian Ashkin-Teller-type statistical-mechanics model is established, and the mixed-state phase diagrams under the coherent noise are obtained. We find remarkable stability of mixed-state topological order under random rotation noise with axes near the $Y$-axis of qubits. We also identify intriguing extended critical regions at the phase boundaries, highlighting a connection with non-Hermitian physics. We argue that these phase boundaries provide upper bounds for the intrinsic error threshold, beyond which quantum error correction becomes impossible. We complement these findings by estimating the error thresholds for random rotation noise under standard quantum error correction, thereby providing lower bounds on the intrinsic error threshold.

12.
medRxiv (Medicine) 2026-06-15

Artificial Intelligence-Based Detection of Airway Mucus Plugs on CT and Associations With Clinical Outcomes in COPDGene

RATIONALE: Airway mucus plugging is a clinically relevant manifestation of airway pathology in chronic obstructive pulmonary disease (COPD) and is associated with increased mortality even in early disease; however, visual computed tomography (CT) assessment is subjective and labor intensive. OBJECTIVES: To develop an AI-based quantitative CT method for automated detection of airway mucus plugging and evaluate associations with physiologic impairment and clinical outcomes. METHODS: Inspiratory CT scans from 8,971 COPDGene Phase 1 (GOLD 0-4 and PRISm) participants were analyzed. An AI-based framework combining 3D airway segmentation discontinuities and convolutional neural network classification identified mucus plug obstructions, yielding mucus plug burden (total plug count). Associations with outcomes were evaluated using covariate-adjusted models. MEASUREMENTS AND MAIN RESULTS : Higher mucus plug burden was associated with lower post-bronchodilator FEV % predicted ({rho} = -0.41; P < 0.001), greater air trapping (LAA < -856 HU; {rho} = 0.33; P < 0.001), worse health status (SGRQ; {rho} = 0.31; P < 0.001), and shorter 6-minute walk distance ({rho} = -0.26; P < 0.001). Among GOLD 1-4 participants, mucus plug presence was independently associated with increased all-cause mortality (adjusted hazard ratio, 1.28; P < 0.005) and exacerbation frequency (adjusted incidence rate ratio, 1.32; P < 0.005). Plug presence was also associated with increased respiratory mortality across GOLD categories and cardiovascular mortality in GOLD 1-2. CONCLUSIONS: AI-based quantitative CT assessment of airway mucus plugging provides a scalable, reproducible measure associated with physiologic impairment and adverse outcomes in COPD, supporting its role in risk stratification and future therapeutic studies.

13.
arXiv (CS.AI) 2026-06-11

The Standard Interpretable Model: A general theory of interpretable machine learning to deductively design interpretable methods using Lagrangian mechanics

arXiv:2606.12289v1 Announce Type: cross Abstract: As Artificial Intelligence models grow in complexity, interpretability has become an indispensable tool for understanding, debugging, and controlling their computations. However, interpretability lacks general theories to deductively design interpretable methods. This gap between theories and methods results in a fragmented literature and inconsistent evaluation protocols. To fill this gap, we introduce the Standard Interpretable Model (SIM), a general theory grounded in Lagrangian mechanics that enables the deductive design of interpretable methods. Specifically, the SIM summarises, in a set of premises, what interpretability is for a target user. From these premises, the SIM systematically derives interpretability symmetries and corresponding constraints, which shape the landscape of a Lagrangian whose minima correspond to optimal interpretable models. To reach the minima, one can either update the parameter values of an opaque model to make it more interpretable or compile constraints into an interpretable architecture. We empirically show that the SIM identifies and solves limitations of existing methods (including traditional, concept-based, and mechanistic interpretability), highlights underexplored research directions, and informs the design of core programming interfaces. Beyond being a research method, the deductive nature of the SIM offers pedagogical grounding for interpretability curricula and may shift the scientific community's perspective of a discipline that has long been fragmented.

14.
medRxiv (Medicine) 2026-06-16

A MULTICENTER SWEDISH HISTOPATHOLOGY IMAGE DATASET OF PEDIATRIC CENTRAL NERVOUS SYSTEM TUMORS

Refined detection methods, more detailed tumor characterization, and adequate distinction between different pediatric tumor subtypes are necessary to improve diagnosis and treatment, enable precision medicine, and advance patient prognosis. However, the application of computational approaches to pediatric brain tumors remains limited, largely due to the lack of accessible datasets. To address part of this gap, we provide whole slide images (WSIs) of hematoxylin and eosin (H&E)-stained tissue sections from all pediatric central nervous system (CNS) samples collected in Sweden between 2013 and 2023. These data represent a population-based national cohort encompassing all six pediatric oncology centers in Sweden and are available through the Swedish Childhood Tumor Biobank (BTB). The dataset includes 1,446 WSIs of sufficient image quality with confirmed CNS tumor diagnoses, derived from 537 unique subjects (562 cases). In addition, diagnosticrelevant clinical information is included. Corresponding whole-genome sequencing (WGS), wholetranscriptome sequencing (WTS), and methylation array data are available for most tumor samples through separate resources. This H&E dataset has been specifically curated to support artificial intelligence-based analyses, while also serving broader applications in medical research and education. When combined with matched molecular data, it provides a valuable resource for advancing multimodal and precision diagnostic approaches in the pediatric population. Refined detection methods, more detailed tumor mapping and adequate distinction between different subtypes of pediatric tumors are necessary to improve treatment, enable precision medicine and improve patient prognosis. Application of computational algorithms for pediatric brain tumors is very limited mainly due to the unavailability of pediatric histology brain tumor data sets. To enable the development of AI models comprehensive datasets covering a wide range of pediatric brain tumors are needed.

15.
arXiv (quant-ph) 2026-06-12

Reduced basis algorithm for solving nonlinear differential equations on quantum computers

arXiv:2606.13457v1 Announce Type: cross Abstract: As quantum computing moves toward scientific computing applications, nonlinear differential equations remain a central challenge since quantum evolution is intrinsically linear. In this work, we introduce a reduced basis algorithm (RBA) for polynomial nonlinear ordinary differential equations (ODEs) and spatially discretized partial differential equations (PDEs). After time discretization, the method composes the resulting polynomial update map over $m$ timesteps, identifies the reduced monomial basis appearing in this composed map, and constructs a linear RBA operator whose action recovers the exact $m$-timestep nonlinear dynamics. Thus, at the level of the chosen discrete update rule, the method introduces no additional approximation error beyond the time discretization error. The qubit number requirement is governed by the size of the reduced monomial basis. For an $n$-dimensional polynomial ODE system of degree $p>1$, the lifted register requires at most $q_m^{\mathrm{ODE}} = O(nm\log p)$ qubits in the full basis scenario. For PDEs discretized on $N^D$ grid points, a locality-based construction requires at most $q_m^{\mathrm{PDE}} = O(D\log N + n m^{D+1}\log p)$ qubits. Hence, the dependence on the grid size remains logarithmic, while the nonlinear overhead is controlled by local reduced basis size. The main computational burden is moved from the quantum computer to a classical preprocessing step, where the reduced monomial basis and RBA operator are constructed for the chosen timestep window. Through numerical tests on the Lorenz system and the one-dimensional Burgers equation, we verify that the RBA reproduces the corresponding discrete time nonlinear dynamics exactly, while exposing the trade-off between timestep composition, reduced basis growth, and locality.

16.
arXiv (CS.AI) 2026-06-15

Moonlight in Latent Space: Chirality and Structural Correspondence Between Beethoven's Op. 27 No. 2 and Machine Learning Mechanisms

arXiv:2606.14612v1 Announce Type: cross Abstract: We show that the three movements of Beethoven's "Moonlight Sonata" (Op. 27 No. 2) instantiate three distinct machine learning architectures – not by analogy, but by structural correspondence. Through computational analysis of the score (entropy, Jensen-Shannon divergence, dissonance, hand distributional overlap, self-similarity matrices, temporal memory decay, and contextual pitch embeddings), we establish four counterintuitive findings: (1) perceived musical "temperature" is governed by throughput, not distributional width; (2) the lightest movement carries the highest dissonance; (3) the movements implement streaming, recurrent, and periodic positional encoding memory architectures; and (4) the same pitch class acquires different contextual identities across movements, analogous to contextual vs.static embeddings in NLP – and unsupervised clustering recovers the tonal structure without music-theoretic input. We construct a reverse sonification (decoding analytical features back into MIDI) and quantify the chirality of the encode-decode cycle: what distributions preserve and sequential ordering destroys. Prompted by a listener's observation that the decoded piece sounds like "mirror isomers that can't be superimposed," the chirality measurement reveals reconstruction loss increasing monotonically with n-gram order. Bootstrap baselines and subsample checks confirm all movements carry sequential information above noise, though raw values are confounded by sample size. Cross-domain comparison shows natural language has higher chirality than music, reflecting stronger sequential constraints.

17.
arXiv (CS.CV) 2026-06-16

CPS4: Class Prompt driven Semi-Supervised Spine Segmentation with Class-specific Consistency Constraint

Vision Language Model (VLM) has great potential to enhance the quality of pseudo labels in semi-supervised spine segmentation by leveraging textual class prompts to generate segmentation map, but no one has studied it yet. Although promising, it lacks explicit constraints to ensure consistency between spine class prompts and spine unit region, resulting in unsatisfactory performance in multi-class segmentation map generation. In this paper, we propose CPS4, the first text-guided semi-supervised spine segmentation network using class prompts to enhance the quality of spine pseudo labels. Specifically, CPS4 is implemented through two training stages. (i) Class-specific consistency constrained VLM pretraining stage: we propose token- and pixel-level attention loss to optimize the consistency between class prompts and spine units, forcing the textual class prompt to be closely coupled with the target spine unit in the semantic space. (ii) Class Prompt driven semi-supervised spine segmentation stage: using the pretrained vision-text encoder, we derive each class-specific binary segmentation map for the unlabeled spine image and integrate them into an unified multi-class segmentation map, improving the quality of the spine pseudo label generated by the semi-supervised spine segmentation network. Experimental results show that our CPS4 achieves superior spine segmentation performance with Dice of 80.44%, only using 5% labeled data on the public spine segmentation dataset, surpassing popular semi-supervised learning and VLM methods. Our code will be available.

18.
arXiv (CS.CL) 2026-06-17

HistoRAG: Embedding Historical Methodology in Retrieval-Augmented Generation Through Critical Technical Practice

Retrieval-Augmented Generation (RAG) is the prevailing architecture for grounding language model outputs in external evidence, yet its dominant evaluation paradigms and default configurations remain oriented toward factual question-answering. For interpretive disciplines such as historical studies, RAG embeds assumptions that conflict with scholarly practice. We introduce HistoRAG, a framework that translates historiographical principles into concrete architectural interventions. Separated retrieval and generation decouples source discovery from interpretation, temporal windowing enforces balanced source representation across the research period as a methodological requirement of historical inquiry, and LLM-as-judge evaluation makes relevance judgments transparent and contestable. We evaluate these interventions using SPIEGELragged, applied to 102,189 articles from Der Spiegel (1950-1979). Each intervention addresses a measurable deficiency in standard RAG: era-specific vocabulary retrieves zero chunks from the 1950s when using 1970s terminology, evidence of the temporal skew that motivates windowing; vector similarity and LLM-assessed relevance correlate only weakly (Spearman rho = 0.275), motivating post-retrieval evaluation; and keyword-based and semantic retrieval surface largely disjoint source pools, motivating an architecture in which both operate as complementary retrieval layers under a shared LLM evaluation filter. We also introduce the concept of Zwischentexte (intermediate texts that function as interpretive proposals rather than findings) as a framework for responsible integration of LLM-generated text into scholarly practice. The architecture offers a model for how domain-specific epistemological commitments can be translated into RAG design decisions, and may transfer to other interpretive disciplines working with large corpora.

19.
arXiv (CS.LG) 2026-06-18

Unreduced Persistence Diagrams for Topological Machine Learning

arXiv:2507.07156v2 Announce Type: replace-cross Abstract: Supervised machine learning pipelines trained on features derived from persistent homology have been experimentally observed to ignore much of the information contained in a persistence diagram. Computing persistence diagrams is often the most computationally demanding step in such a pipeline, however. To explore this dynamic, we introduce several methods to generate topological feature vectors from unreduced boundary matrices and investigate their theoretical and computational properties. We compared the performance of pipelines trained on vectorizations of unreduced PDs to vectorizations of fully-reduced PDs across several data and task types. Our results indicate that models trained on PDs built from unreduced diagrams can perform on par and even outperform those trained on fully-reduced diagrams on some tasks. We also benchmarked the computational performance of an algorithm for computing unreduced diagrams, which was implemented as a heavily modified version of Ripser. These computations are parallelizable and required an order of magnitude less memory on average compared to computing full persistence diagrams. Our results suggest that machine learning pipelines which incorporate topology-based features may benefit in terms of computational cost and performance by utilizing information contained in unreduced boundary matrices.

20.
arXiv (CS.AI) 2026-06-19

ORAgentBench: Can LLM Agents Solve Challenging Operations Research Tasks End to End?

arXiv:2606.19787v1 Announce Type: new Abstract: Large language models are increasingly deployed as autonomous agents for multi-step tasks in executable environments, yet their ability to perform realistic operations research (OR) work remains unclear. Existing OR evaluations often decouple modeling from solving, rely on pre-formalized or text-only instances, and rarely test the full workflow from operational artifacts to validated decisions. In this work, we introduce ORAgentBench, an execution-grounded benchmark for evaluating autonomous agents on challenging end-to-end operations research tasks. It contains 107 human-reviewed tasks across diverse operational scenarios, each packaged in an isolated environment with a natural-language brief, multi-file data, configuration artifacts, and a required submission schema. Agents must write and run solution code, and their submissions are evaluated by hidden validators for schema validity, hard-constraint feasibility, and normalized objective quality. Experiments with fourteen frontier agent-model configurations show that current agents remain far from reliable OR practice. The best agent passes only 35.51% of all tasks and 20.59% of hard tasks, and many feasible submissions still fall below the required quality threshold. Failure analysis further shows that errors are dominated by strategic weaknesses, including missed operational rules, brittle formulations, weak feasible-solution construction, and insufficient solution improvement. OR-specific procedural skills increase hard-task feasibility, but do not reliably improve solution quality or pass rate. These results suggest that progress in OR agents requires moving beyond plausible optimization code toward dependable, high-quality operational decision-making.

21.
arXiv (CS.CV) 2026-06-16

Tool-IQA: Augmenting Image Quality Assessment with Simple Tools

Vision-Language Models (VLMs) have been increasingly adopted for Image Quality Assessment (IQA). However, current methods typically employ a static one-shot scoring paradigm, despite the fact that humans assess image quality through dynamic visual inspection, e.g., selectively adjusting views to verify details and subtle artifacts. Specifically, relying solely on a single-pass observation introduces two primary limitations: first, perceiving the image only at a global scale restricts the assessment of finer local details; second, the original intensity distribution of the image may overwhelm the visibility, leading to insufficient inspection of image quality. To address these issues, we propose Tool-IQA, shifting the assessment mechanism from passive scoring to a tool-augmented workflow. In particular, we equip VLMs with simple yet effective view tools: a Magnifier to inspect local details, and a Gamma Corrector to uncover visibility and hidden artifacts. The assessment follows a structured pipeline that consists of an initial observation with rubric notes, a tool-augmented in-depth inspection, and a final quantification for calibrated quality score. Furthermore, to ensure efficient and purposeful tool callings, we introduce a batch-aware training strategy to reward tool interactions that can yield positive contributions rather than simply encouraging usage. Experiments on a variety of IQA benchmarks demonstrate that, with effective tool calling and calibrated assessment, our proposed Tool-IQA significantly outperforms existing state-of-the-art models, e.g., it achieves a PLCC of 0.854 on the challenging CLIVE dataset.

22.
arXiv (CS.LG) 2026-06-17

Predictive Analytics in E-Commerce for CustomerBehavior Forecasting using hybrid Ret-DNN withXGBoost Model

arXiv:2606.17931v1 Announce Type: new Abstract: In recent years, electronic (E) commerce services have rapidly increased in the daily lives of people, which helpsthem to purchase products online. However, retail platforms have struggled to understand customer behavior and make it difficult to predict their future purchases. To overcome these challenges, this study proposes a hybrid Retail Deep NeuralNetwork (Ret-DNN) with an Extreme Gradient Boosting(XGBoost) model for capturing temporal features and tabular dynamics of retail data. First, data were sourced from a UnitedKingdom (UK)-based online retailer that contains transactions with almost 500,000 records. Then, the collected data were pre-processed using a series of techniques, such as data cleaning, outlier handling, temporal feature extraction, feature encoding, and z-score normalization, to ensure that the data were ready for model training and testing. Subsequently, the preprocessed data were fed into the Ret-DNN model, which acts as a feature extractor to understand the complete context of customer transactions. Further, the extracted data were fed as input into the XGBoost model, which predicted the final output as the purchase probability of customers. Finally, the proposed Ret-DNN XGBoost model achieved better results by attaining aMean Absolute Error (MAE) 0.2193 when compared to the existing Ret-DNN model. Keywords: Customer behavior forecasting, extreme gradientboosting, electronic commerce, predictive analytic, retail deepneural networks.

23.
medRxiv (Medicine) 2026-06-11

Neighborhood socioeconomic status associated with post-stroke cognitive impairment: a retrospective cohort study

Background: Late complications after stroke (LCAS), including cognitive symptoms, impact quality of life and recovery. It is not known if neighborhood-level measures of socioeconomic status (SES) influence LCAS. This study assessed associations between SES measures, including neighborhood income inequality (Gini) and area deprivation index (ADI), and cognitive symptoms after acute ischemic stroke (AIS) in a hospital leveraging active surveillance of LCAS. Methods: This retrospective cohort study included 512 patients hospitalized with AIS at Tufts Medical Center with subsequent follow-up (between zero and three months or between three and twelve months) in the Stroke Clinic from 1/1/2018 - 12/31/2022. Using ZIP code data, patients were characterized as low Gini (low inequality) and high ADI (high deprivation) (Gini = 5) by state medians. These variables were combined, indicating patients who were living in both a low Gini and high ADI neighborhood to evaluate the effects of living in a homogeneously deprived area. There were 206 and 281 patients in the low Gini and high ADI groups respectively. 140 patients lived in a low Gini and high ADI neighborhood. The multivariable logistic analysis assessed the likelihood of cognitive symptoms, adjusting for age, race, ethnicity, sex, NIH Stroke Scale (NIHSS), thrombolysis, active LCAS surveillance, poverty, and ADI-Gini combination. Results: There were no associations between high ADI (OR: 1.03, 95% CI: 0.67 ? 1.57) or low Gini (OR: 1.74, 95% CI: 0.98 ? 3.07) alone and cognitive symptoms after AIS. However, the combined variable demonstrated increased likelihood of cognitive symptoms in the high ADI-low Gini group (OR: 1.82, 95% CI: 1.08 ? 3.06). Conclusions: This study suggests that individuals living in homogeneously deprived neighborhoods report higher likelihood of cognitive symptoms after AIS. Further studies with increased power are needed to investigate the underlying causes of these disparities and to develop interventions to reduce these complications.

24.
bioRxiv (Bioinfo) 2026-06-12

PeptiDIA: A Machine Learning Framework for Enhanced Peptide Identification in Fast-Gradient Data-Independent Acquisition Proteomics

Data-independent acquisition (DIA) mass spectrometry has become increasingly prevalent in proteomics as advances in instrumentation, chromatography, and computational analysis have enabled robust proteome identification across complex biological samples. However, analytical depth achieved with fast chromatographic gradients remains lower than that obtained using long-gradients, reflecting a throughput-depth trade-off. Here, we present PeptiDIA, a machine learning framework that enhances peptide identification in fast-gradient DIA data by leveraging paired fast and long-gradient acquisitions from identical samples. PeptiDIA processes DIA-NN outputs generated at relaxed false discovery rate thresholds to obtain expanded candidate peptide pools and trains gradient-boosted decision tree models using long-gradient identifications as reference labels. The model integrates DIA-NN features with engineered peptide descriptors and applies isotonic regression to calibrate probabilities, enabling controlled peptide recovery relative to the long-gradient reference. Applied to human and murine datasets spanning six tissues acquired on an Orbitrap Exploris 480, PeptiDIA increased peptide identifications by 25-34% at 1% target reference-discordance rate (RDR) and increased the number of protein groups containing at least one rescued peptide by 15-17%. Overall, PeptiDIA improves the identification depth of fast-gradient DIA-NN workflows without altering acquisition strategies. The framework is available as a web application and command-line tool at https://github.com/Jordano700/PeptiDIA.

25.
arXiv (CS.LG) 2026-06-18

ActiTect: A Generalizable Machine Learning Pipeline for REM Sleep Behavior Disorder Screening through Standardized Actigraphy

arXiv:2511.05221v3 Announce Type: replace Abstract: Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of $\alpha$-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they become inoperable without a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features characterizing activity patterns. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and on two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess real-world robustness, leave-one-dataset-out cross-validation across the internal and external cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). A complementary stability analysis showed that key predictive features remained reproducible across datasets, supporting the final pooled multi-center model as a robust pre-trained resource for broader deployment. By being open-source and easy to use, our tool promotes widespread adoption and facilitates independent validation and collaborative improvements, thereby advancing the field toward a unified and generalizable RBD detection model using wearable devices.