Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (quant-ph) 2026-06-19

Computing noise-canceling observables via Pauli propagation

arXiv:2606.20441v1 Announce Type: new Abstract: The pursuit of quantum advantage is driving the co-evolution of quantum processors and classical simulation methods. Despite advances in scale and quality, the accuracy of quantum simulation is ultimately limited by error rates and sampling overheads. Similarly, while classical simulation methods such as Pauli propagation have made remarkable progress, their accuracy is ultimately limited by the exponential growth of operator paths and the truncations needed to control memory and runtime. Here we show that these complementary limitations can be mitigated by embedding Pauli propagation within a hybrid error-mitigation framework that reduces quantum sampling overhead while achieving lower truncation errors with fewer classical resources than traditional Pauli propagation alone. In this framework, a target observable is classically propagated through noise-canceling inverse channels, producing a modified observable that is measured directly on a quantum processor. We prototype two implementations and benchmark their performance numerically on canonical models that challenge traditional Pauli propagation. We also perform experiments on a quantum processor using 56 superconducting qubits, revealing the tradeoffs of their respective truncation strategies. These results illustrate how classical and quantum resources can be orchestrated to extend observable estimation beyond the limits of either approach alone, providing a foundation for quantum-centric supercomputing and future demonstrations of quantum advantage.

02.
arXiv (CS.AI) 2026-06-16

Who Drifted: the System or the Judge? Anytime-Valid Attribution in LLM Evaluation Pipelines

作者:

arXiv:2606.15474v1 Announce Type: new Abstract: Continuous evaluation of LLM products relies on a strong LLM judge treated as ground truth: a cheap monitor scores every interaction and a team is paged when the score drifts down. But the judge is itself a model behind an API, and a silent version bump or scoring-prompt update changes how it scores – so every drift alarm is ambiguous between a worse product and a changed judge. We resolve the ambiguity with a fixed, human-labeled anchor set that the current judge re-scores at a steady interleave, a second betting e-process on the judge-versus-human gap, and a guard-window rule returning a verdict in {none, system, judge}. We prove anytime-validity, one-way identification (only the judge can move the anchors), an attribution race whose design law is that the anchors must out-run the main process they guard, and process orthogonality. On two real judge changes, a silent version bump is detected as judge drift in 60/60 runs with zero judge-to-system misattribution, and a contaminating strict-prompt change is correctly attributed on 110 of 120 runs at guard width 300 – while the industry-default rolling z-test false-alarms on 75% of drift-free streams. Every experiment replicates on a second domain (TL;DR summarization) with nothing re-tuned, and where the domains differ the differences are the ones the race predicts: the strict-prompt change shifts scores harder there, so the anchors fire faster and attribution becomes perfect (240/240). The monitor runs at approximately 0.64 of the cost of strong-judging every item, or 0.21 in a cheaper-but-deafer regime.

03.
arXiv (CS.AI) 2026-06-15

Transforming Shape Schemas with Composable Property-Graph Queries (Extended Version)

arXiv:2606.14309v1 Announce Type: cross Abstract: Property graphs may be constrained by schemas that inform both query engines and human users about the shape of valid data, enforcing a contract between data provider and consumer. Composable property-graph queries transform input graphs into output graphs. Then, the question arises of which schema can be expected after one (or several) transformation steps. We investigate how schema constraints can be inferred given an input schema and a transforming query. Specifically, we propose a reasoning procedure that, given an input schema in ProGS and a query in G-CORE infers an output schema. Since graph updates will happen frequently, our inference procedure does not rely on graph instances, such that the computed output schema applies to all graphs originating from any input graph complying with the input schema. Related work has addressed this problem for SPARQL CONSTRUCT queries, encoding it in Description Logics (DLs) so that the output schema is entailed by axioms inferred from input schema and queries. Property graphs and their queries, however, complicate the matter, as property graphs feature label and property annotations as well as first-class edges. Thus, reification has to be used in one way or another, though available DLs lack the means to encode such features directly. We approach this novel challenge via a family of mappings for i) property graphs reified in RDF, aligned with ii) a mapping from ProGS to SHACL and iii) a mapping from G-CORE to SPARQL CONSTRUCT queries. In this manner, schema inference for property graphs becomes manageable, as we break apart the problem through the extra mapping layer and utilize efficient DL reasoners. We develop the metatheory regarding the soundness of inferred schema constraints and the semantic equivalence of mapped schemas and queries.

04.
arXiv (CS.AI) 2026-06-19

cAPM: Continual AI-Assisted Pace-Mapping with Active Learning

arXiv:2606.19373v1 Announce Type: cross Abstract: Ventricular tachycardia is a life-threatening rhythm disorder and a major cause of sudden cardiac death. Pace-mapping is a clinical procedure for identifying the intervention target during catheter ablation of VT. It requires clinicians to pace different sites in the ventricles and rapidly interpret the resulting electrocardiograms to determine where to pace next or whether a target site has been identified. Active learning AI models have been proposed to guide clinicians to the next pacing site, showing promise in reducing the number of pacing sites and improving the efficiency of pace-mapping. Existing methods require retraining each target without the ability to transfer knowledge across multiple VTs within the same patient or across patients. We introduce cAPM for continuous AI-assisted pace-mapping to capture and transfer knowledge accumulated from past pace-mapping data to reduce the number of pace-mapping data needed for future target VTs. This is made possible by a task-agnostic surrogate neural network that learns the mapping from pacing sites to 12-lead ECG morphology, an active-learning strategy that refines this surrogate model by selecting the most informative pacing site for each target, and a continual learning strategy to do so sequentially while retaining knowledge from prior targets. Evaluated on an in-silico testbed consisting of sequentially-presented localization tasks across different physiological conditions and ventricular geometries, cAPM with and without replay of past data samples achieved an 81% probability of localizing within clinical tolerance (5 mm accuracy) using 4.5 pace-mapping sites, compared to the state-of-the-art active-learning method achieving 38% probability using 13.7 pacing sites. These results provide a strong basis for preparing cAPM towards in-vivo preclinical and clinical studies where it can be used to guide pace-mapping.

05.
arXiv (CS.AI) 2026-06-16

Driving, Fast or Slow? Neuro-Symbolic Guidance for Motion Prediction in Multi-Modal Ground Mobility

arXiv:2606.15251v1 Announce Type: cross Abstract: Accurate and interpretable motion prediction for heterogeneous traffic spaces, including pedestrians, bicycles, cars, and trucks, is essential for safe autonomous navigation. Nevertheless, state-of-the-art approaches remain predominantly black-box, lacking explicit encoding of the regulatory and behavioral constraints of real-world mobility. We propose Trajectory Compliance-Shaping (TraCS), a neuro-symbolic framework that augments existing black-box motion prediction backbones with interpretable and probabilistic first-order logic. To do so, TraCS employs an agentic code-generation pipeline to bridge the gap between natural-language descriptions of traffic regulations and probabilistic motion prediction. Furthermore, TraCS employs a reactive data-streaming inference engine that maintains and efficiently updates compliance landscapes as scenes evolve. To prevent TraCS from overconfidently steering the backbone's predictions in the wrong direction, we propose a neural confidence rating learned as a context-aware attenuation of the compliance signal. We demonstrate on the Argoverse 2 benchmark how TraCS consistently improves state-of-the-art prediction backbones, showing that probabilistic and symbolic compliance reasoning is a broadly applicable and computationally efficient complement to purely neural motion predictors.

07.
arXiv (CS.CV) 2026-06-12

Bounding Boxes as Goals: Language-Conditioned Grasping via Neuro-Symbolic Planning

For robotics to be effectively integrated into household or industrial environments, machines must adapt to natural-language prompts in real time. Although Vision-Language Models (VLMs) have enabled zero-shot generalization in robot task and motion planning (TAMP), current state-of-the-art approaches often remain computationally "heavyweight" or require extensive training on thousands of demonstrations. We present GRASP (Grounded Reasoning and Symbolic Planning), a framework designed as a step toward open-vocabulary tabletop manipulation. Our approach leverages a pretrained VLM to translate natural-language queries into neuro-symbolic goal states, grounded in the physical world via a bounding-box detection pipeline. Unlike methods that rely on fixed color lists or hard-coded coordinates, GRASP enables robots to interpret abstract spatial concepts such as "top shelf" and execute tasks without additional fine-tuning. We achieve 73.3% overall success across 90 real-robot trials at three difficulty levels, requiring no task-specific training.

08.
arXiv (CS.LG) 2026-06-12

Out-of-Distribution (OOD) Detectors for Open-Set RF Fingerprinting

arXiv:2606.12718v1 Announce Type: new Abstract: Radio-frequency (RF) fingerprinting systems must operate in open-world environments where signals from unknown transmitters and temporal drift introduce distribution shift at test time. Out-of-distribution (OOD) detection provides a natural framework for this problem, yet its application to RF fingerprinting (RFF) remains limited. A key barrier to their adoption is that most OOD detectors require auxiliary OOD data for parameter tuning, an assumption that is difficult to satisfy in RF environments where representative OOD data is impractical to collect. In this work, we introduce a promising set of OOD detection methods from the machine learning literature to open-set RFF domain. We present these methods within a unified mathematical framework based on information theory, which is a natural framework for communication systems. Our framework allows for the systematic analysis of methods and development of new methods. We further demonstrate the applicability of recent work on tuning OOD detectors without given OOD tuning data for open-set RFF. We evaluate on the POWDER RF fingerprinting dataset, showing that detectors tuned without any given OOD data achieve performance comparable to baselines with access to true OOD tuning data and greatly out-perform baseline approaches without access to true OOD tuning data, showcasing the practical viability for the RFF problem.

09.
arXiv (CS.LG) 2026-06-11

TaskFusion: Continual Anomaly Detection for Heterogeneous Tabular Data

arXiv:2606.11844v1 Announce Type: new Abstract: Continual anomaly detection in tabular data is challenging and remains largely underexplored, particularly in settings with heterogeneous feature schemas, distribution shifts, and severe class imbalance. In many real-world applications, data arrive sequentially from diverse domains, rendering conventional continual learning methods ineffective due to their reliance on a fixed input space. We propose a continual learning (CL) method, which can overcome these challenges and continually learn from different tasks. Our method consists of three main parts: our AGF model, Taskfusion augmentation, and outlier exposure. The AGF-model maps task-specific features into a shared space, then aligns distributions to reduce representation drift, and learns anomaly decision boundaries in the aligned space. To improve stability, we introduce Taskfusion augmentation, combining boundary-aware interpolation within tasks to refine the model anomaly boundaries and cross-task mixing to transfer anomaly structure across datasets. To handle class imbalance and memory constraints, we employ tabular dataset distillation to store compact synthetic replay samples, which are jointly used with augmented data in an outlier exposure objective for robust anomaly detection. We evaluate the approach on 21 heterogeneous datasets across multiple domains. Results show that our approach substantially improves continual anomaly detection performance over sequential fine-tuning and other CL baselines while reducing catastrophic forgetting and maintaining stable detection across heterogeneous datasets.

10.
arXiv (CS.CL) 2026-06-11

ChartFI: Benchmarking Faithfulness and Insightfulness of Chart Descriptions from Multimodal Large Language Models

Chart descriptions are essential for accessibility, cross-modal retrieval, and assisting readers in extracting insights from complex visualizations. As multimodal large language models (MLLMs) are increasingly adopted for automated chart description generation, a critical question arises: how faithfully and insightfully do these models actually describe charts? Current benchmarks fall short on two fronts: existing datasets consist of simple, homogeneous charts paired with shallow, fact-enumerating descriptions; and prevailing metrics fail to capture the multi-faceted nature of description quality. To address these gaps, we present the Chart Faithfulness and Insightfulness Benchmark (ChartFI-Bench). We first summarize four dimensions that characterize high-quality chart descriptions: factual accuracy, salient feature emphasis, domain-informed guidance, and chart-text complementarity. Guided by these dimensions, we construct a high-quality benchmark comprising 896 chart-description pairs, which feature visually complex charts and semantically rich descriptions. Furthermore, we design four aligned evaluation metrics – Faithfulness, Coverage, Informativeness, and Acuity – to systematically assess the quality of descriptions across these dimensions. Experiments conducted on mainstream MLLMs demonstrate the effectiveness of the proposed framework and reveal common weaknesses among existing models.

11.
medRxiv (Medicine) 2026-06-17

A multistate model of frailty progression after severe infections in adults >=65 years in England: a matched-cohort study

Background Evidence on frailty progression following severe infections is limited. We compared rates of transition to greater frailty or death between adults with and without severe infection in England. Methods We conducted a matched-cohort study among adults aged [≥]65 years (1,452,117: median age 76 years, 45% male) in Clinical Practice Research Datalink Aurum (2006-2019). Adults with severe infection (hospitalised primarily due to infection) were matched on calendar time to individuals without severe infection on age, sex, and primary care practice. The admission date was used as index date and same was assigned to matched unexposed adults. We measured frailty using Electronic Frailty Index, a proportion of 36 health deficits in validated categories (Fit 0-0.12, Mild >0.12-0.24, Moderate >0.24-0.36, Severe >0.36). In a time-varying Markov multistate model, we focused on forward transitions from baseline or intermediate frailty states to higher states or death. For each transition, we used Cox regression to estimate cause-specific transition hazard ratios (HR) with 95% confidence intervals (CIs), comparing adults with and without severe infection. We adjusted for baseline frailty score, age, sex, deprivation, harmful alcohol use, smoking, and primary care infection history 5 years before index date. We estimated state occupancy probabilities, and expected length of stay (ELOS) in each state at year five among adults with and without severe infection. We explored effect modification by infection type. Results Across all transitions, severe infection was associated with higher adjusted hazards of transitioning to worsening frailty or death, HR, 95% CI: (fit to: mild[1.56, 1.54-1.58], moderate[2.51, 1.79-3.51], death[4.57, 4.50-4.65]; mild to: moderate[1.52, 1.50-1.53], severe[1.90, 1.43-2.52], death[2.67, 2.64-2.70]; moderate to: severe[1.40, 1.38-1.42], death[1.87, 1.85-1.90]; severe to death[1.48, 1.46-1.50]). Transition hazard ratios were strongest for lower respiratory tract infections, followed by sepsis, urinary tract infections, meningitis/encephalitis, gastroenteritis, and skin and soft tissue infections. At five years, adults with severe infection had higher probabilities of transitioning to greater frailty or death across all transitions and lower ELOS in each frailty state than those without severe infection. Interpretation Severe infections may accelerate frailty deterioration in older age. Prevention through vaccination, early detection, and prompt management may help mitigate this decline.

12.
arXiv (CS.CV) 2026-06-18

Confidence is Not Reliability: Rethinking MC Dropout in Brain Tumour Segmentation

Glioma segmentation in multiparametric MRI is a critical component of treatment planning. A segmentation model that fails silently on treatment-critical sub-regions represents a patient safety risk that overlap-based metrics such as Dice scores cannot expose. We ask whether voxel-level uncertainty estimation via Monte Carlo (MC) Dropout can reliably identify segmentation errors in clinically critical sub-regions, and whether calibration failure modes are detectable from standard reporting metrics alone. In an empirical two-model case study on 126 BraTS21 patients, we evaluate a high-performance pretrained SegResNet and a locally trained UNet with residual units (UNet-Res). MC dropout preserved segmentation accuracy ($|\Delta Dice|$ $

13.
arXiv (CS.CL) 2026-06-11

Afrispeech Semantics: Evaluating Audio Semantic Reasoning in Spoken Language Models Across Domains and Accents

Audio language models (ALMs) are increasingly used for speech-based understanding, yet their ability to perform semantic reasoning beyond transcription, Text-to-Audio Retrieval, Captioning, and Question-Answering accuracy remains insufficiently benchmarked. In particular, the effects of accent variation, domain shift, and semantic over-inference on audio reasoning are poorly understood. We evaluate audio language models across five semantic and paralinguistic reasoning tasks: entailment, consistency, plausibility, accent drift, and accent restraint. Collectively, these tasks assess a model's ability to reason over spoken audio as the primary evidence source, including whether a textual hypothesis can be inferred, contradicted, or left undetermined by the audio, whether statements align or conflict with spoken content, whether claims are plausible given the discourse, and whether model predictions remain stable or appropriately constrained across accent variation. These findings highlight critical limitations in current audio reasoning evaluations and hope to provide guidance for more robust and equitable ALM design and assessment

14.
medRxiv (Medicine) 2026-06-17

Multi-strain Probiotics Alter Gut Microbiota and Estrobolome Pathways in Primary Dysmenorrhea

Background: Exact cause of primary dysmenorrhoea is unknown but recent evidence uncovers a potential link between gut dysbiosis and benign gynaecological disorder via disruption of estrobolome. Methods: A randomized controlled trial to investigate the effects of multi-strain oral probiotics on primary dysmenorrhoea has been conducted. This is a secondary analysis comparing the stool microbiome in women with primary dysmenorrhoea and those without (control), and the effects of treatment with probiotics versus placebo. Results: Although microbial richness and evenness were comparable between groups (alpha diversity, p > 0.05), gut microbial community composition differed significantly (Bray Curtis PERMANOVA, p = 0.015), characterised by reduced Bifidobacterium adolescentis and Blautia and enrichment of Faecalibacterium in dysmenorrhoea, alongside condition-specific core taxa. Post-intervention analysis revealed significant shifts in microbial community structure between pre- and post-treatment groups (PERMANOVA, F = 2.11, p = 0.005), with probiotic supplementation inducing more consistent and directed microbiome changes than placebo, without altering alpha diversity (p > 0.05). Functional prediction showed no significant difference in overall beta glucuronidase pathway abundance (p > 0.05); however, dysmenorrhoea was associated with higher abundance of beta glucuronidase producing taxa (MaAsLin2, q < 0.05) that were differentially modulated by probiotic treatment. Conclusion: This discovery provides evidence on the microbial disruption in primary dysmenorrhoea as well as the benefit of probiotics to modulate the intestinal microbiota to improve the condition.

15.
arXiv (CS.LG) 2026-06-18

Investigating Inductive Biases for Machine Learning Emulation of Sudden Stratospheric Warmings in Idealised Isca Simulations

arXiv:2606.18857v1 Announce Type: new Abstract: Machine-learning emulators are increasingly used for weather prediction and have the potential to extend skill on subseasonal-to-seasonal timescales by learning dynamically important sources of predictability. A key challenge is whether the models can exploit predictability anchors, such as stratospheric variability, that influence tropospheric circulation beyond short lead times. We test how architectural inductive bias affects emulation of sudden stratospheric warming (SSW) dynamics using paired idealised Isca simulations that differ only in an imposed wave-2 heating perturbation. Across convolutional, transformer, and graph-based architectures trained for one-step prediction, model differences are modest when the stratosphere is dynamically quiet but widen substantially when SSW-like variability is active. Our results identify explicit three-dimensional vertical coupling as a key inductive bias for machine-learning emulation of stratospheric dynamics. However, Eliassen-Palm flux diagnostics show that low forecast error does not guarantee physically faithful wave-mean-flow interaction, with coherent errors remaining in stratospheric wave-driving structure.

16.
medRxiv (Medicine) 2026-06-17

Accounting for Human Movement to Improve Exposure-Health Models

Background. Current exposure-health models rely on averaged, residential-based environmental exposures, failing to account for human movement. This aggregation can lead to exposure misclassification and biased exposure-response estimates, potentially distorting our understanding of the true health effects of environmental conditions. We developed exposure disaggregation regression models that explicitly account for human movement when linking environmental exposures to health outcomes. Methods. By weighting pixel-level exposures according to distance from home as a simple proxy for human movement, our model linked disaggregated environmental exposures to individual-level health outcomes. Weights were either fixed a priori or derived from a latent distance-decay power parameter learned from the data. We additionally evaluated model performance under a nonlinear exposure-response relationship. Model performance was assessed across multiple sample sizes (N = 1,114; 50,000; and 100,000). A simulation study examined parameter recovery using bias, empirical standard error (EmpSE), and credible interval coverage. As a case study, Demographic and Health Surveys (DHS) data from Albania were used to link acute respiratory infection (ARI) outcomes among children under five to pixel-level NDVI within a 3 km buffer around DHS cluster centroids, and the proposed models were applied to these data. Results. Across all models (fixed-weight, learned-weight, and restricted cubic spline models), parameter recovery improved with increasing sample size. At N = 1,114, estimates were biased and imprecise, with incorrect effect direction for exposure-response parameters (e.g., learned-weight {beta}1 bias = - 0.79; EmpSE = 2.61; coverage = 0.88). In contrast, the models accurately recovered parameters at larger sample sizes, including the latent distance-decay parameter (bias = - 0.02; EmpSE = 0.15; coverage = 0.95 at N = 100,000), demonstrating their ability to reliably learn movement-based exposure weights when sufficient data were available. Conclusion. Instead of relying on arbitrarily-sized buffers, this statistical framework provides a novel method for studying environmental exposure-health relationships whilst accounting for human movement. With sufficiently large sample sizes, it can accurately estimate the influence of disaggregated environmental exposures on individual-level health and help address exposure misclassification arising from residential-only metrics. This methodological framework remains scalable, interpretable, and adaptable to other exposures and outcomes, offering a foundation for future work that integrates richer mobility-informed exposure-health research.

17.
arXiv (CS.AI) 2026-06-16

Knowledge-Based Zero-Replay Debugging of Multi-Agent LLM Traces

arXiv:2606.14805v1 Announce Type: cross Abstract: Reliable operation of multi-agent large language model (LLM) systems depends on debugging long execution traces, where the few causally decisive events are buried in unstructured logs of messages, routes, memory writes, and tool calls. The standard tool is counterfactual replay (rewind, edit, and re-run the trajectory to measure each event's effect), but its cost grows linearly with the number of candidate events, making exhaustive replay infeasible at scale. We frame trace debugging as a knowledge-based decision-support problem. Each trace is compiled into a structured event knowledge graph over routing, memory, tool-use, uncertainty, and latent evidence, and a calibrated predictor decides where a scarce replay budget should be spent. We do not propose a new replay oracle; we propose a method to predict its results without paying the replay cost. We formulate zero-replay counterfactual-effect prediction: given a trace under a fixed budget, predict which events the oracle would mark high-effect before any replay is performed. BranchPoint-Latent is a lightweight predictor over observable, structural, uncertainty, and latent features of the knowledge graph. Calibrated against a deterministic replay oracle across 37 trace families, a single learning-to-rank gradient-boosted predictor raises per-trace localization (Branch Recall@5) from 0.73 to 0.93 on held-out families at zero oracle-replay cost. Rather than claiming universal dominance, we characterize when cheap graph centrality suffices and when learned evidence is necessary. The result is an auditable, cost-efficient decision-support system for AI-reliability debugging, positioned explicitly on the cost-accuracy frontier with reproducible artifacts.

18.
arXiv (CS.CV) 2026-06-16

Bridging Information Asymmetry: A Hierarchical Framework for Blind Face Restoration with Reduced Uncertainty

Blind face restoration remains a persistent challenge due to the inherent ill-posedness of reconstructing holistic structures from severely constrained observations. Current generative paradigms, while capable of synthesizing realistic facial details, remain limited by the under-constrained nature of blind restoration, where severely degraded inputs can be mapped to plausible yet identity-inconsistent outputs. To address this issue, we present Pref-Restore, a hierarchical framework for BFR with reduced restoration uncertainty. Our design is organized around three complementary principles: (1) Semantic Information Augmentation, where an auto-regressive semantic branch converts image and text cues into structured tokens that provide a stable high-level anchor; (2) Texture-level Fidelity Alignment, where the diffusion generator is trained under this anchor to recover identity-relevant details; and (3) Fidelity-constrained Preference Optimization, where a face-aware reward refines the diffusion trajectory while controlling the quality–fidelity trade-off. Extensive experiments on synthetic and real-world benchmarks show that Pref-Restore achieves state-of-the-art performance, with stronger identity-sensitive fidelity and lower restoration uncertainty across repeated sampling. Systematic ablations further attribute these gains to the proposed hierarchical design, showing the necessity of staged training, the robustness and quality dependence of the text pathway, and the benefit of fidelity-constrained preference optimization.

19.
arXiv (CS.LG) 2026-06-16

Active Learning with Low-Rank Structure for Data Selection

arXiv:2606.16045v1 Announce Type: new Abstract: In the data selection problem, the objective is to choose a small, representative subset of data that can be used to efficiently train a machine learning model. Sener and Savarese [ICLR 2018] showed that, given an embedding representation of the data and suitable geometric assumptions, heuristics based on $k$-center clustering can be used to perform data selection. This perspective was further explored by Axiotis et. al. [ICML 2024], who proposed a data selection approach based on $k$-means clustering and sensitivity sampling. However, these methods rely on the assumption that the dataset exhibits intrinsic geometric structure that can be effectively captured by clustering, whereas many modern datasets instead possess global algebraic structure that is better exploited by low-rank approximation or principal component analysis. In this paper, we introduce a new data selection framework based on low-rank approximation and residual-based sampling, formulated through the lens of row subset selection and loss-preserving coreset construction. Given an embedding representation of the data satisfying mild regularity conditions, which can be interpreted as algebraic or angular notions of Lipschitz continuity, we show that it is possible to select a weighted subset of $\tilde{O}\left(k + \frac{1}{\varepsilon^2}\right)$ data points whose average loss approximates the average loss over the full dataset within a $(1+\varepsilon)$ relative error, up to an additive $\varepsilon \Phi_k$ term, where $\Phi_k$ denotes the optimal rank-$k$ approximation cost of the embedding matrix. We complement these theoretical guarantees with empirical evaluations, demonstrating that on a range of real-world datasets, our data selection approach achieves improved performance over prior strategies based on uniform sampling or clustering-based sensitivity sampling.

20.
arXiv (CS.AI) 2026-06-19

Measuring Curriculum Alignment across Topical Coverage, Competency, and Cognitive Depth: A Longitudinal Framework Applied to CS2013 and CS2023

arXiv:2606.19469v1 Announce Type: new Abstract: Undergraduate computer science is governed by international curricular guidelines revised about once a decade, yet programs lack a reliable, reproducible way to measure how completely they cover the current guidelines and how that coverage shifts when the guidelines are restructured. We address this with a human-in-the-loop pipeline that measures a program's coverage of an external body of knowledge, applied longitudinally to one accredited BSc in Computer Science against Computer Science Curricula 2013 (CS2013) and 2023 (CS2023). The pipeline represents the program and each guideline as structured corpora, generates candidate course-to-knowledge-unit matches by semantic retrieval, and confirms them through human judgment under an explicit coverage definition. Of seven benchmarked retrievers, a reciprocal-rank-fusion ensemble was strongest, and a reputed long-context model underperformed a small sentence model, so retriever choice must be measured. Both maps were validated by an independent second rater (Cohen's kappa 0.64 for CS2023, 0.69 for CS2013). The program covers 49.7% of CS2023 and 50.9% of CS2013 knowledge units, near-constant across a decade. Extending the same retrieve-then-confirm design to competency articulation and cognitive depth shows that the program articulates the competency for ~88% of covered units under each guideline, yet delivers it at the recommended depth for 76% of present units under CS2023 against 95% under CS2013, a gap reflecting the newer guideline's raised expectations, not the program. The longitudinal comparison separates persistent structural gaps (parallel and distributed computing, foundations of programming languages, systems fundamentals), uncovered against both guidelines and ABET, from differences that reflect the standard's evolution. The instrument is reusable and available from the authors on request.

21.
arXiv (CS.LG) 2026-06-16

A Compositional Framework for Open-ended Intelligence

arXiv:2606.15386v1 Announce Type: new Abstract: Open-ended intelligence is the capacity to adapt to novel problems and environments that are substantially different from those in training. We formalize open-ended intelligence as the closure induced by a finite primitive set \(P\) and a set of composition operators \(C\). We characterize properties of the induced closure \(\mathcal{L}(P,C)\) that support unbounded compositional generation across families of tasks and worlds. A mathematics of open-ended intelligence requires two pillars: a minimal set of representational primitives (e.g., states, actions) and algorithmic primitives (e.g., nearest neighbor), together with composition motifs (e.g., recursion, sequencing) that reflect an acquired compositional grammar. The closure of these two pillars enables the generation of infinite adaptive responses across a wide range of settings. The mathematics supports complementary research agendas, including evaluation metrics for explanation and interpretability, as well as building architectures where compositional generalization is native. We propose next primitive prediction as a novel architectural objective, where the training objective encourages the acquisition of reusable algorithmic primitives and their compositional grammar, such that new solutions are generated through recombination. Curriculum learning and self-play enable lifelong learning and expansion of the closure by discovering reusable primitives and transition motifs across families of tasks and worlds. We ground the framework through case studies in physics, evolution, and neuroscience.

22.
arXiv (CS.LG) 2026-06-16

Online Realizable Regression and Applications for ReLU Networks

arXiv:2602.19172v2 Announce Type: replace Abstract: Realizable online regression can behave very differently from online classification. Even without any margin or stochastic assumptions, realizability may enforce horizon-free (finite) cumulative loss under metric-like losses, even when the analogous classification problem has an infinite mistake bound. We study realizable online regression in the adversarial model under losses that satisfy an approximate triangle inequality (approximate pseudo-metrics). Recent work of Attias et al. shows that the minimax realizable cumulative loss is characterized by the scaled Littlestone/online dimension $\mathbb{D}_{\mathrm{onl}}$, but this quantity can be difficult to analyze. Our main technical contribution is a generic potential method that upper bounds $\mathbb{D}_{\mathrm{onl}}$ by a concrete Dudley-type entropy integral that depends only on covering numbers of the hypothesis class under the induced sup pseudo-metric. We define an entropy potential $\Phi(\mathcal{H})=\int_{0}^{diam(\mathcal{H})} \log N(\mathcal{H},\varepsilon)\,d\varepsilon$, where $N(\mathcal{H},\varepsilon)$ is the $\varepsilon$-covering number of $\mathcal{H}$, and show that for every $c$-approximate pseudo-metric loss, $\mathbb{D}_{\mathrm{onl}}(\mathcal{H})\le O(c)\,\Phi(\mathcal{H})$. In particular, polynomial metric entropy implies $\Phi(\mathcal{H})d$, otherwise infinite), and for bounded-norm $k$-ReLU networks separate regression (finite loss, even $\widetilde O(k^2)$, and $O(1)$ for one ReLU) from classification (impossible already for $k=2,d=1$).

23.
arXiv (math.PR) 2026-06-16

A small noise approximation for Muller's Ratchet

arXiv:2606.15842v1 Announce Type: new Abstract: We consider an infinite system of SDEs with Fleming-Viot noise indexed by $k=0,1,2,\dots$, whose parameters $\alpha,\lambda$, and $\nu$ are the (deleterious) selection coefficient, the (uni-directional) mutation rate, and a quantity which determines the size of the system's fluctuations. The SDE's unique weak solution $X(t) = (X_k(t))_{k=0,1,2,...}$ models what is known in population genetics as Muller's ratchet. Here, $X_k(t)$ stands for the frequency of individuals carrying $k$ deleterious mutations. Since the mutation process is uni-directional, $t\mapsto \inf\{k: X_k(t)> 0\}$ is non-decreasing for almost every path of $X$, and we refer to an increase as a click of Muller's ratchet. A long standing question concerns the clicking rate of Muller's ratchet. Using Duhamel's principle for semigroups, we give a partial answer by approximating $E(\sum_{k=1}^\infty kX_k(t) )$ and $E\big(X_0(t)\big)$ up to $O(1/\nu^2)$ for fixed $\alpha$, $\lambda$ and $t>0$. Our results suggest that $\psi:=\nu \alpha e^{-\lambda/\alpha}$ is a crucial quantity also when the mutation/selection ratio $\theta = \lambda/\alpha$ is moderately large: for large $\nu \alpha$, clicking of the ratchet on the time scale $\frac 1\alpha \log \theta$ becomes rare as soon as $\psi$ becomes large.

24.
PLOS Medicine 2026-05-11

Connected or chained by social media? Child and adolescent mental health in a digital era

作者:

by Silja Kosola Social media has evolved from connection to compulsion, disproportionately harming children and adolescents. Addictive designs together with developmental vulnerability fuel mental health risks and highlight the urgent need for stricter age limits and stronger protections. In this Perspective, Silja Kosola outlines how social media disproportionately harms child and adolescent mental health, and argues that while recent policy changes aimed at protecting youth from social media are welcome, stricter age limits and greater accountability of social media companies are needed.

25.
arXiv (CS.CL) 2026-06-19

TSAssistant: A Human-in-the-Loop Agentic Framework for Automated Target Safety Assessment

Target Safety Assessment (TSA) requires systematic integration of genetic, transcriptomic, target homology, pharmacological, and clinical data to evaluate potential safety liabilities of therapeutic targets. This process is labor-intensive and expert-dependent, posing challenges in scalability and reproducibility. We present TSAssistant, a human-in-the-loop multi-agent framework that decomposes TSA report generation into a workflow of specialized subagents: Research Subagents that each ground and cite a single TSA domain, and Synthesis Subagents that integrate findings across domains. Subagents retrieve and synthesize evidence from curated biomedical sources through standardized tool interfaces and produce individually citable, evidence-grounded sections, with behavior shaped by a hierarchical instruction architecture that separates coordination logic from domain expertise and user intent. To complement these soft constraints, programmatic execution hooks and persistent memory stores enforce hard constraints across the workflow, while an interactive refinement loop allows experts to review and revise individual sections with full conversational context preserved across iterations. Rather than a single holistic comparison, we decompose report quality into reproducibility, evidential grounding, task-level accuracy, and controllability under expert oversight, finding high reproducibility and grounding, substantial agreement with the human reference, and net-positive expert-driven refinement.