Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-12

Auditing Discriminatory Patterns in Mortgage Lending Through Association Rules and Fair Binning

arXiv:2606.12435v1 Announce Type: cross Abstract: Mortgage lending in the United States exhibits persistent racial and gender disparities. We investigate whether standard data preprocessing steps, specifically attribute binning, amplify these disparities in downstream pattern mining. Using 103,481 cleaned mortgage applications from the HMDA 2023 dataset (Chicago metropolitan area), we build a three-stage pipeline: (1) a PySpark data cleaning and binning pipeline that implements both standard equal-frequency binning and the epsilon-biased fair binning algorithm from Asudeh et al. [1], (2) FP-Growth association rule mining that compares denial patterns under both binning regimes, and (3) K-Means clustering with a per-cluster disparate impact audit. Our standard binning shows 9.63% racial bias in income discretization, consistent with the 8-10% reported in prior work. Fair binning with seven race groups is infeasible at epsilon=0.03 and only succeeds at epsilon=0.08 with a Price of Fairness of 29.4%. FP-Growth reveals that high debt-to-income ratio is the dominant denial predictor (67.2% confidence, 2.81 lift), while racial bias does not appear as explicit high-support rules. However, K-Means clustering followed by a disparate impact audit flags 10 out of 45 cluster-group pairs, showing that Black applicants face significantly higher denial rates than White applicants even among financially similar groups.

02.
arXiv (CS.CL) 2026-06-19

Benchmarking Local LLMs for Natural-Language-to-SQL Querying in Biopharmaceutical Manufacturing: An Empirical Benchmark on Consumer-Grade Hardware

Biopharmaceutical manufacturing organizations operate under regulatory frameworks such as FDA guidance, EU Good Manufacturing Practice (GMP), and the EU AI Act, which can restrict the use of cloud-based artificial intelligence systems. Locally deployed large language models (LLMs) offer a privacy-preserving alternative, but their suitability for pharmaceutical manufacturing tasks remains underexplored. This study evaluates four open-source LLMs (Qwen 2.5 Coder 7B, Llama 3.1 8B, Mistral 7B, and Meditron 7B) deployed locally via Ollama for natural-language-to-SQL generation over a pharmaceutical manufacturing database. A FastAPI-based evaluation platform, PharmaBatchDB AI, was developed using a synthetic Microsoft SQL Server database containing approximately 63,000 records across Batch, Manufacturing Execution System (MES), and Clean-In-Place (CIP) modules. Models were benchmarked on 60 domain-specific natural-language questions using metrics including SQL extraction rate, SQL compliance, factual consistency, ROUGE-L, hallucination rate, throughput, and latency. Qwen 2.5 Coder 7B, Llama 3.1 8B, and Mistral 7B generated SQL for all evaluation tasks, while Meditron 7B failed on nearly all tasks due to context-window limitations and poor SQL generation capability. Llama 3.1 8B achieved the highest SQL compliance, whereas Qwen 2.5 Coder 7B achieved the strongest overall text similarity and factual consistency. Performance differences between the two leading models were not statistically significant. The results show that code-tuned general-purpose LLMs outperform a domain-specific biomedical model on structured query generation for pharmaceutical manufacturing data. Although fully local, GxP-aligned NLQ systems are feasible on consumer hardware, current performance levels still require human oversight and downstream validation for regulated use.

03.
arXiv (CS.AI) 2026-06-15

Elastic Queries Reinforcement Learning: Self-Aware Policy Execution for VLA Models

arXiv:2606.14375v1 Announce Type: cross Abstract: Vision-language-action (VLA) models are powerful action generators for robot manipulation, but they are typically executed with fixed inference and replanning schedules. This rigidity ignores the uneven difficulty of robot control: contact-rich or uncertain states may need more computation and fresher feedback, while easier states can often be handled with fewer inference steps and longer open-loop execution. We propose Elastic Queries Reinforcement Learning (EQRL), a framework that makes each VLA policy query elastic. A lightweight latent-schedule adaptor jointly selects the latent input, denoising budget, and action chunk length, without fine-tuning the underlying VLA model. To make scheduling difficulty-aware, EQRL trains a critic over the joint latent-schedule action and derives a state difficulty signal from critic ensemble disagreement. This signal guides compute toward difficult states, while a learned residual allows task-driven correction. We formulate variable chunk execution as query-level macro-action RL with chunk-dependent discounting and an amortized number-of-function-evaluations (NFE) budget. Across simulation and real-robot manipulation, EQRL reduces amortized inference cost while preserving or improving task success.

04.
arXiv (CS.AI) 2026-06-24

Multimedia and Visual Analytics in the Agentic Era

arXiv:2504.06138v3 Announce Type: replace-cross Abstract: Professional users need tools to help them gain actionable insights from large multimedia collections. Foundation models and AI agents have rapidly changed the playing field, and improving their accuracy, trustworthiness, and reasoning capabilities are active topics in the computer vision, machine learning, and multimedia communities. Most current research focuses on benchmark driven algorithmic improvements. The multimedia community is the place to go beyond algorithms and consider complete multimedia analytics systems that support professional users in their complex tasks and achieve a true teaming of humans and AI. Supporting users with machine learning and visualizations has been studied for decades in the visual analytics field. In this paper, we propose a framework to bring multimedia and visual analytics together and indicate how it could impact current and new multimedia analytics solutions. Additional information can be found at https://staff.fnwi.uva.nl/m.worring/analytics-model.html

05.
medRxiv (Medicine) 2026-06-11

Long-term Penetrance of Disease Variants in Genes Prioritized for Genomic Newborn Screening: Evidence from Adult Biobanks

Importance: Genomic newborn screening (gNBS) is a potential public health intervention, but its positive predictive value (PPV) remains uncertain. Estimating the prevalence and penetrance of pathogenic and likely pathogenic (P/LP) variants in genes prioritized for screening may clarify the long-term PPV and clinical utility of gNBS. Objective: To compare ICD-based ascertainment, electronic medical record (EMR) review, and clinical assessment of genetic disorders in adults with P/LP variants in 54 genes prioritized for gNBS. Design: Two-cohort observational study with EMR review and clinical assessment in the hospital-based cohort. Setting: The U.K. Biobank (UKB) and Mass General Brigham Biobank (MGBB). Participants: 451,877 adults from the UKB and 53,371 from the MGBB, all with exome sequencing data. Exposures: P/LP variants in 54 genes prioritized through expert consensus for gNBS, in genotypes consistent with each gene's inheritance pattern. Main outcomes and measures: The primary outcome was the absolute difference in the proportion of MGBB participants identified as affected by ICD versus EMR ascertainment. Secondary outcomes included findings from clinical assessments of undiagnosed MGBB participants, corrected UKB penetrance estimates, and extrapolation to U.S.. annual birth cohorts and living adults. Results: P/LP variants were identified in 665 UKB participants (0.15%) and 82 MGBB participants (0.15%), approximately 1 in 650. In MGBB, EMR review revealed that 58/82 individuals (70.7%) were undiagnosed, although 25 of 58 (43.1%) had documented symptoms. Disease-associated ICD codes were found in 39.0% (32/82) of participants, whereas EMR review identified symptoms in 59.8% (49/82, McNemar P

06.
arXiv (CS.AI) 2026-06-24

Breaking the Filter Bubble: A Semantic Pareto-DQN Framework for Multi-Objective Recommendation

arXiv:2606.24042v1 Announce Type: new Abstract: Recommender systems often induce filter bubbles and semantic homogenization by monolithically optimizing for immediate user engagement. Standard single-objective models, including traditional Deep Q-Networks, are ill-equipped to navigate the trade-offs between platform retention and critical societal values like information diversity and provider fairness. To address these limitations, we introduce a multi-objective reinforcement learning framework that formalizes recommendation as a semantic multi-objective Markov decision process. By integrating high-fidelity semantic embeddings with a Pareto-DQN agent, our architecture treats engagement, diversity, and fairness as distinct, non-aggregable reward signals, avoiding the pitfalls of static reward scalarization. Empirical evaluations on the MovieLens small dataset shows that our hypervolume based action selection disrupts the feedback loops responsible for semantic collapse. By sustaining high state-trajectory variance, the Pareto-DQN effectively maps the Pareto frontier, achieving gains in auxiliary societal objectives with only marginal impacts on engagement. This work provides a path toward intrinsically aligned, responsible recommender systems.

07.
arXiv (CS.AI) 2026-06-12

Algorithmic Constitutionalism

arXiv:2606.12437v1 Announce Type: cross Abstract: The increasing encroachment of artificial intelligence (AI) on social life raises significant risks for society, particularly within the infospheres created and controlled by companies such as Google, Facebook, Apple, and Amazon. This article examines these risks through an in-depth analysis of Facebook's content moderation regime, which is already partially governed by algorithms. We argue that the idea of ethical engineering, often proposed in the literature as a solution to the governance challenges posed by AI, is inadequate for several reasons. In response, we develop an alternative framework, which we term "algorithmic constitutionalism." Our approach rests on three pillars: (a) a layered architecture consisting of two levels of code: (i) an operative or object level and (ii) a meta level designed to protect the system's core principles from algorithmically initiated change; (b) algorithmic meta-reasoning, which enables the system to operate simultaneously at both levels so that it can monitor, verify, and potentially correct in real time operations at the object level that depart from principles protected at the meta-code level; and (c) correction through deliberation. The article elaborates the concept of algorithmic constitutionalism and demonstrates how it may be applied to Facebook's content moderation regime. As part of this analysis, we examine the tension between societal constitutionalism and algorithmic constitutionalism. Paradoxically, attempts to subject AI systems to external deliberative control may also enable AI agents to intervene in that process, potentially undermining its purpose. The article concludes by considering the implications of this argument for the European Digital Services Act, which entered into force in October 2022.

08.
arXiv (CS.LG) 2026-06-18

Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks

arXiv:2602.02056v3 Announce Type: replace-cross Abstract: Ultrafast online learning is essential for high-frequency systems, such as controls for quantum computing and nuclear fusion, where adaptation must occur on sub-microsecond timescales. Meeting these requirements demands low-latency, fixed-precision computation under strict memory constraints, a regime in which conventional Multi-Layer Perceptrons (MLPs) are both inefficient and numerically unstable. We identify key properties of Kolmogorov-Arnold Networks (KANs) that align with these constraints. Specifically, we show that: (i) KAN updates exploiting B-spline locality are sparse, enabling superior on-chip resource scaling, and (ii) KANs are inherently robust to fixed-point quantization. By implementing fixed-point online training on Field-Programmable Gate Arrays (FPGAs), a representative platform for on-chip computation, we demonstrate that KAN-based online learners are significantly more efficient and expressive than MLPs across a range of low-latency and resource-constrained tasks. To our knowledge, this work is the first to demonstrate model-free online learning at sub-microsecond latencies.

09.
arXiv (CS.CV) 2026-06-12

Surflo: Consistent 3D Surface Flow Model with Global State

Geometry is invariant to viewpoint, which makes any collection of images a redundant encoding of a single 3D state. Existing feed-forward reconstruction models fail to exploit this: per-view methods emit overlapping, unaligned pointmaps that grow linearly with input count, while global-latent methods commit to a fixed, low-resolution output. We introduce Surflo, which compresses a variable number of unposed RGB views into K latent tokens-one global state-and decodes oriented 3D surface points by independently transporting them from noise onto the surface via flow matching. This frees the output from any fixed grid or token budget: the same latent yields from a few thousand to a million points in a single forward pass. To suppress the local inconsistencies inherent to independent per-point decoding, an inference-time guidance term correlates nearby points by injecting a photometric gradient during ODE integration. Surflo matches or surpasses feed-forward baselines on surface metrics, runs an order of magnitude faster than optimization-based methods that require hundreds of views, and is the only feed-forward approach to combine a global latent with arbitrary-resolution decoding.

10.
arXiv (CS.AI) 2026-06-17

SketchXplain: Intuitive Visual Explanations of Image Classifiers with Sketches

arXiv:2606.17646v1 Announce Type: cross Abstract: Saliency map visualizations explain image-based AI predictions by pointing to regions, but these are often unintuitive and semantically unclear, leaving an interpretability gap. We argue that AI explanations should be intuitive – coherent to user knowledge, yet simple and selective to accelerate interpretation. Inspired by artistic drawings, we propose SketchXplain to generate sketch-based visual explanations for intuitive image-based explainable AI (XAI). Combining techniques in saliency maps, concept-bottleneck models, and sketch optimization, SketchXplain integrates saliency to select coherent observation artifacts, concepts for knowledge coherence, cues to represent them, and abstraction for simplicity. Evaluating on face expression recognition, modeling and user studies showed that SketchXplain supported quicker interpretation with more aligned visualizations than saliency maps or simple drawings. Further evaluation on skin lesion diagnosis found that SketchXplain more coherently visualized disease symptoms, better supporting lay diagnosis. Thus, this work illustrates the value of sketches for intuitive, simple, coherent, and quick image-based XAI visualizations.

11.
medRxiv (Medicine) 2026-06-23

Multivariate Echocardiographic Phenotyping of Hypertensive Heart Failure Using Unsupervised Machine Learning: A Pilot Study

Background Heart failure in hypertensive patients is heterogeneous and poorly captured by traditional left ventricular ejection fraction (LVEF) based classification. Multivariate echocardiographic data combined with unsupervised machine learning may provide a more precise phenotypic characterization. This pilot study evaluated the feasibility of unsupervised clustering of routine transthoracic echocardiographic data to identify phenotypic subgroups of hypertensive heart failure. Methods This retrospective pilot study analyzed transthoracic echocardiography reports from hypertensive patients with clinical heart failure. After data cleaning and exclusion of incomplete records, 102 patients with 11 echocardiographic variables were included. Variables describing left ventricular geometry, systolic function, and diastolic performance were standardized and subjected to K-means clustering. Optimal cluster number was determined using the elbow method and silhouette analysis. Cluster characteristics were assessed using descriptive statistics and Kruskal Wallis testing. Concordance with LVEF based heart failure categories was evaluated. Results Three distinct echocardiographic phenotypes were identified. Cluster 0 (n = 50) demonstrated preserved LVEF with concentric remodeling, consistent with heart failure with preserved ejection fraction (HFpEF) phenotype. Cluster 1 (n = 37) showed marked ventricular dilation and reduced systolic function, consistent with heart failure with reduced ejection fraction (HFrEF). Cluster 2 (n = 15) exhibited concentric hypertrophy with intermediate LVEF, consistent with heart failure with mildly reduced ejection fraction (HFmrEF) like phenotype. All echocardiographic variables differed significantly across clusters (p < 0.001). While Cluster 0 showed strong concordance with HFpEF (96%), Clusters 1 and 2 demonstrated substantial overlap across LVEF categories, indicating partial discordance between structural phenotypes and LVEF based classification. Conclusion Application of unsupervised machine learning to routine echocardiographic data identifies distinct heart failure phenotypes in hypertensive patients. These phenotypes demonstrate significant structural heterogeneity beyond LVEF based classification, supporting the utility of data-driven approaches for refined cardiac phenotyping. This pilot study provides a foundation for larger prospective studies.

12.
arXiv (CS.AI) 2026-06-15

A fully GPU-based workflow for building physics emulators of hypersonic flows

arXiv:2606.13742v1 Announce Type: cross Abstract: The ability to resolve complex physical phenomena with high fidelity and at low computational cost is central to addressing key challenges in modern engineering. A prime example lies in hypersonic flows, where the precise prediction of the full flowfield topology, in particular with respect to shock wave location and intensity, is critical. Yet supersonic and hypersonic flows continue to be a stumbling block for traditional reduced-order models and neural emulators that struggle to capture steep gradients in flow states with physical consistency in applications of industrial relevance. To that end, we introduce a fully GPU based workflow that integrates accelerated data generation with the training of neural emulators augmented by uncertainty quantification and physics-aware refinement. Our workflow is enabled by a differentiable high-fidelity solver (JAX-Fluids) which we employ for rapid dataset creation and residual-based improvement of the neural emulator to enhance physical consistency. Building on this framework, we first present a suite of model architectures and analyze their scaling behavior to expose their strengths and shortcomings. We then show that residual-based refinement enables training on cases where only mesh and input parameters are available, substantially reducing residuals and improving physical consistency. Together, differentiable simulation and residual-based refinement yield physics emulators that remain reliable beyond their training distribution, a key requirement for deploying surrogates in real-world engineering design loops.

13.
arXiv (CS.AI) 2026-06-24

Ten Digits on a Train: AI-Assisted Verification of Two Eigenvalue Problems

arXiv:2606.23821v1 Announce Type: cross Abstract: Accurate numerical eigenvalues are often difficult to certify, especially in singular or non-normal settings. This article reports a human–AI collaboration on two such computations. For a singular self-adjoint Schrödinger operator, a verified zero count and Dirichlet–Neumann bracketing certify the complete negative spectrum to ten decimal places. For a delicate non-normal atom–molecule benchmark, a previously unresolved resonance pair is separated, with each member enclosed to ten digits. The second result is achieved not by increasing the precision of one-way shooting, but by reformulating the problem as a global matching system for projective solution lines. The infinite tail is encoded as uncertainty in the terminal projective data, and a componentwise, tail-robust Krawczyk–Brouwer inclusion supplies the certificate. This gives a reusable architecture for analytic boundary-value systems with ill-conditioned propagation and uncertain asymptotic data. The collaboration also exposes the strengths and limits of AI assistance. AI rapidly produced accurate candidates and plausible proof strategies, but several failed, including one apparently complete tail argument that omitted the componentwise check required by a nonuniform polydisc. Validated computation is a stringent test of AI-assisted mathematics: the output is not merely a number, but a number with a proof. These examples show why the proof object matters, and why human mathematical judgment remained decisive. More broadly, as AI makes code, exposition, and plausible numerical claims inexpensive, standards for verification, attribution, peer review, and training must adapt. The implications are unsettling; the opportunity is extraordinary.

14.
arXiv (CS.AI) 2026-06-16

DeepRoot: A KG-Coordinated Multi-Agent System for Therapeutic Reasoning over Historical Medical Texts

arXiv:2606.15931v1 Announce Type: cross Abstract: Historical medical archives and traditional medicines hold immense potential for drug discovery and remain a primary source for current drug development. However, pre-ontological prose and idiosyncratic taxonomies prevent the standardization and medical modernization of the data for use in current biomedical pipelines. Furthermore, no existing LLM agent system, whether tool-calling, retrieval-augmented, or agentic deep-research, can convert such text into verifiable drug-discovery leads at scale. We close this gap with DeepRoot, a multi-agent LLM system that jointly builds and utilizes a verified knowledge graph, showing that grounding and reasoning – often conflated – are separable axes the system can compose for therapeutic reasoning. Applied to the Shen Nong Ben Cao Jing, DeepRoot recovers $10$ of $21$ held-out compound-disease treatment pairs at R@$20$ ($47.6\%$ vs $4.8\%$ for a raw corpus LLM and $\sim\!2.4\%$ random) and dominates an LLM-as-judge audit for reasoning quality over baseline LLMs and LLMs with direct tool-call access to the same APIs DeepRoot itself queries. Tool-using LLMs hallucinate evidence on $87\%$ of claims, versus 7-10% for DeepRoot. Graph-only inference hallucinates $0\%$ but ranks lowest on reasoning coherence; DeepRoot KG+LLM is the only condition to win on both axes, pointing toward a route for systematic mining and repurposing of historical medical knowledge.

15.
arXiv (CS.CL) 2026-06-15

Personal Care Utility: Health as Everyday Infrastructure

Healthcare is essential, expert, and episodic by design - built around the roughly one hour per year a person spends with a clinician. The 8,759 hours outside clinical settings, where eating, sleeping, movement, medication, and stress actually shape long-term health, have no comparable infrastructure. The bottleneck for personalized health is not raw data or reasoning capability; it is the absence of that infrastructure layer. This paper introduces the Personal Care Utility (PCU): a layered, event-driven architecture proposed as the missing utility for everyday health, in the way that payments, networks, and power are utilities for their domains. PCU organizes continuous personal signals into semantically meaningful life events through a Personicle, estimates dynamic health state against personal baselines, reasons about cause and context, and routes guidance through an orchestrator that separates clinical decision logic, behavioral strategy selection, and natural-language expression. This separation lets large language models support reasoning and communication while keeping safety-critical clinical decisions grounded in validated evidence. We instantiate PCU for Type 2 Diabetes - turning CGM, meal, activity, medication, sleep, stress, and clinical data into glycemic events, individualized state estimates, causal explanations, and knowledge-grounded interventions. A day-in-the-life scenario shows the same infrastructure producing real-time nudges, weekly summaries, medication check-ins, silence, or deterministic safety alerts depending on context and risk. We close with how PCU generalizes to other chronic conditions and the governance questions any always-on personal health utility must address. The result is a blueprint that treats personalization not as a final messaging layer, but as an architectural property of everyday health guidance.

16.
medRxiv (Medicine) 2026-06-17

Postoperative Cognitive Decline in Older Patients with Cardiovascular Disease and Preoperative Mild Cognitive Impairment

Objective. Older adults undergoing cardiac surgery may be vulnerable to postoperative cognitive decline. However, no studies have examined postoperative cognitive outcomes in older patients with cardiovascular disease (CVD) according to preoperative mild cognitive impairment (MCI). This study examined 12-month postoperative cognitive outcomes in older CVD patients according to preoperative MCI diagnosis and explored predictors of postoperative cognitive decline. Method. Twenty-two older CVD patients ([&ge;]65 years) and twenty-five controls were included. Neuropsychological assessment was conducted at baseline in both groups and repeated 12 months after surgery in the CVD group. MCI was diagnosed using current clinical criteria. Postoperative cognitive change was examined across preoperative MCI groups. Results. Fifty percent of patients met criteria for postoperative MCI, showing high diagnostic stability relative to preoperative frequency (45.5%). The preoperative CVD-MCI group showed a decline in working memory, executive functions, visual memory, and naming, whereas CVD-nMCI group declined only in verbal memory. Furthermore, CVD-MCI showed more heterogeneous postoperative cognitive trajectories of change than CVD-nMCI, who showed stability. Estimated IQ, APACHE-II score, and postoperative frailty were important variables in predicting the postoperative pattern. Conclusions. MCI frequency remained high and stable in older CVD patients across the preoperative and one-year postoperative period. However, this apparent diagnostic stability masks subclinical cognitive decline, particularly among patients with preoperative MCI, who showed greater susceptibility to further impairment. Estimated IQ, APACHE-II score, and postoperative frailty may be considered relevant predictors of outcome. These results highlight the value of preoperative neuropsychological assessment for characterizing postoperative cognitive risk in older CVD patients.

17.
arXiv (CS.AI) 2026-06-15

COGNITION: From Evaluation to Defense against Multimodal LLM CAPTCHA Solvers

arXiv:2512.02318v4 Announce Type: replace-cross Abstract: This paper studies how multimodal large language models (MLLMs) undermine the security guarantees of visual CAPTCHA. We identify the attack surface where an adversary can cheaply automate CAPTCHA solving using off-the-shelf models. We evaluate 7 representative MLLMs on 18 real-world CAPTCHA task types, measuring single-shot accuracy, success under limited retries, end-to-end latency, and per-solve cost. We further validate our findings through a supplemental external dataset and an adaptive-attacker setting with session memory, while also analyzing the impact of task-specific prompt engineering and few-shot demonstrations on solver effectiveness. We reveal that MLLMs can reliably solve recognition-oriented and low-interaction CAPTCHA tasks at human-like cost and latency, whereas tasks requiring fine-grained localization, multi-step spatial reasoning, or cross-frame consistency remain significantly harder for current models. By examining the reasoning traces of such MLLMs, we investigate the underlying mechanisms of why models succeed/fail on specific CAPTCHA puzzles and use these insights to derive defense-oriented guidelines for selecting and strengthening CAPTCHA tasks. To validate these principles, we present a proof-of-concept by hardening a vulnerable CAPTCHA type using our guidelines. We demonstrate that incorporating fine-grained localization and implicit counting reduces the success rate of state-of-the-art MLLMs from over 95\% to 0\%, confirming that structural changes can effectively mitigate the threat. We conclude by emphasizing the urgent need for CAPTCHA redesign as MLLM capabilities increasingly threaten existing defenses. Code Availability (https://doi.org/10.5281/zenodo.20406852).

18.
medRxiv (Medicine) 2026-06-10

Resolving Diagnostic Discordance in Group 2 Pulmonary Hypertension Through Staged Physiologic Testing: Insights From PVDOMICS

Background World Symposium on Pulmonary Hypertension (WSPH) Group 2 pulmonary hypertension (PH) is a clinically integrated phenotype attributed to left heart disease, whereas pre- versus post-capillary classification is operationalized primarily by pulmonary capillary wedge pressure (PCWP). Although current recommendations emphasize contextual interpretation and provocative testing for intermediate PCWP values, the relationship between PCWP-based classification and underlying phenotype has not been systematically evaluated. We aim to quantify phenotype-hemodynamic discordance across the PCWP spectrum and evaluate a staged physiology-guided framework incorporating inhaled nitric oxide (iNO), ventricular geometry, and provocative testing. Methods We studied 1,032 participants from the NHLBI-sponsored PVDOMICS cohort with multidisciplinary adjudicated phenotypes integrating clinical, imaging, physiologic, and hemodynamic data. Stage-specific PCWP thresholds classified pre- versus post-capillary physiology at rest, during iNO, and during provocation (fluid challenge or invasive cardiopulmonary exercise testing [iCPET]). Echocardiographic right ventricular-to-left ventricular (RV/LV) ratio was evaluated as a marker of ventricular interdependence. Restricted cubic spline and staged concordance analyses defined certainty-based PCWP ranges and incremental diagnostic yield. Results Adjudicated Group 2 phenotype was present in 37.0% of participants. Resting PCWP demonstrated good discrimination (AUC 0.86), but substantial bidirectional phenotype-hemodynamic discordance persisted across intermediate PCWP ranges. At a resting PCWP of 12 mmHg, 25% of participants classified as pre-capillary had adjudicated Group 2 PH, whereas at 18 mmHg, 35% classified as post-capillary remained discordant non-Group 2. Concordance did not approach 90% until PCWP values were 24 mmHg. Dynamic testing incrementally improved concordance within these overlap zones. Nearly half of adjudicated Group 2 PH participants (46.5%) were not identified by resting PCWP alone; incorporation of iNO and provocative testing increased cumulative Group 2 identification by 63.4% and improved sensitivity from 79.9% to 83.7%. Model discrimination improved from an AUC of 0.863 to 0.908 (likelihood-ratio P

19.
arXiv (CS.LG) 2026-06-16

Size Doesn't Matter: Cosine-Scored Sparse Autoencoders

arXiv:2606.15054v1 Announce Type: new Abstract: Sparse autoencoders (SAEs) detect features via inner product, so a feature's activation scales with both its directional alignment and the input's norm. Under BatchTopK, high-norm tokens inflate all pre-activations simultaneously, claiming dictionary slots regardless of content alignment. This matters because sublayer normalization has already discarded the magnitude the score measures, so the encoder detects a quantity the model does not read. We replace the score with a learned blend of cosine similarity and input magnitude, letting the optimizer choose how much norm to use; a per-feature extension lets each feature decide independently. In both regimes, training is free to recover inner product but never does, with no feature ever choosing more than half-magnitude dependence. At matched reconstruction, the cosine encoder learns features that align with human-recognizable concepts far more often than standard, filling dictionary slots that inner product wastes on norm detectors. Loss reweighting that equalizes gradients barely closes the gap, confirming forward-pass score geometry as the lever. The advantage is not universal across tasks or depths, but we believe cosine scoring should be the default for dictionary learning on normalized representations.

20.
arXiv (quant-ph) 2026-06-17

Split-Head Quantum Generative Adversarial Network for Crystalline Material Discovery

arXiv:2606.17852v1 Announce Type: new Abstract: The discovery of novel crystalline materials is a critical challenge in computational materials science, often limited by the spatial representation limitations and mode collapse typical of classical generative models. Traditionally, developing Quantum GANs for continuous 3D space is hindered by the limited capacity of near-term hardware. To overcome this, we adapt a physics-informed "split-head" architecture right from the quantum trunk to explicitly decouple macroscopic lattice bounds from microscopic atomic coordinates, significantly maximizing resource efficiency. This study disentangles the contributions of quantum circuits from these architectural priors by evaluating a Split-Head Quantum Generative Adversarial Network against an architecture-matched classical ablation model. Evaluated on the highly constrained Mg-Mn-O system, the results reveal a highly nuanced performance dichotomy between the advanced models. The architecture-matched classical ablation model demonstrated superior thermodynamic precision. Conversely, the integration of quantum circuits in the SH-QGAN drove unparalleled structural breadth and latent space exploration, more than doubling the ablation's geometric validity and successfully generating novel, metastable candidates converging on the Mg2MnO4 stoichiometry. These findings clarify that while architectural separation of cell and atom generation drives strict thermodynamic precision, quantum feature mapping independently provides the spatial diversity necessary to overcome mode collapse. Both mechanisms offer distinct, complementary enhancements for the generative discovery of advanced materials.

21.
arXiv (CS.AI) 2026-06-24

Navigating User Behavior toward Personalized Multimodal Generation

arXiv:2606.24196v1 Announce Type: new Abstract: Modern AIGC pipelines deliver high-fidelity images and videos but presuppose a well-formed creation instruction, while end users rarely articulate visual details, leaving generators misaligned with user demand. We study personalized content generation, which turns a user's interaction history into an executable instruction for downstream synthesis, and identify two obstacles: behavior must be encoded in a form legible to language reasoning, and the model must acquire instruction-writing skill absent from both pretraining and behavior data. We propose NaviGen, which represents each item with a dual identifier coupling a collaborative code and a textual code as a behavioral substrate and a semantic bridge in one token stream. On this representation, a two-stage SFT+RL pipeline first distills preference reasoning and instruction writing from evolutionarily searched supervision, then aligns generation with user intent through hierarchical and self-consistent rewards. Experiments across product, game, and short-video domains show that NaviGen improves personalized image and video generation, strengthens next-item prediction, and yields more specific, relevant, and visually generatable instructions. Our code is anonymously released at: https://github.com/iLearn-Lab/NaviGen.

22.
medRxiv (Medicine) 2026-06-12

Genomic wastewater surveillance of seasonal and zoonotic influenza A viruses in California during the 2024-2025 flu season

Wastewater genomic surveillance provides an opportunity to detect human and animal influenza A virus (IAV). We aimed to implement an IAV genomic surveillance framework agnostic to subtype, which enables recovery of IAV from multiple hosts and estimation of proportions across subtypes. We conducted IAV genomic surveillance in wastewater during the 2024-2025 flu season at multiple sites in California and compared these data with available human clinical IAV sequences and test positivity. We applied a custom whole-genome, multi-host IAV probe enrichment panel and adapted our custom expectation-maximization (EM) algorithm to deconvolute IAV mixtures in wastewater and infer subtype relative abundances. Absolute IAV concentrations were quantified using RT-PCR-based assays. H5N1 wastewater and clinical sequences were further characterized by constructing a whole-genome maximum-likelihood phylogenetic tree. Finally, we performed variant analysis to examine amino acid substitutions detected in wastewater. Our IAV probe enrichment method and EM algorithm successfully enriched all eight segments of three circulating IAV subtypes and accurately estimated subclade relative abundances for mixed IAV samples. Seasonal human H1N1pdm09 and H3N2 were detected throughout the study period from both wastewater and clinical sequencing data, with H1N1 subclades 6B.1A.5a.2a.1 and 6B.1A.5a.2a co-circulating, and H3N2 dominated by subclade 3C.2a1b.2a.2a.3a.1. Wastewater surveillance consistently detected H5N1 clade 2.3.4.4b across three monitored wastewater sites, while clinical H5N1 detections, from anywhere in CA, were sporadic and rare. Whole-genome phylogenetic analysis revealed that wastewater H5N1 sequences clustered with reference sequences associated with dairy cow and avian infections, while all human clinical H5N1 sequences clustered exclusively with reference sequences associated with dairy cow infections. Amino acid substitutions were identified across viral segments, and no mutations associated with mammalian adaptation were observed from wastewater samples.

23.
arXiv (CS.CV) 2026-06-17

Attention Sinks in Diffusion Transformers: A Causal Analysis

Attention sinks – tokens that receive disproportionate attention mass – are assumed to be functionally important in autoregressive language models, but their role in diffusion transformers remains unclear. We present a causal analysis in text-to-image diffusion, dynamically identifying dominant attention recipients per timestep and suppressing them via paired, training-free interventions on the score and value paths. Across 553 GenEval prompts on Stable Diffusion~3 (with SDXL corroboration), removing these sinks does not degrade text-image alignment (CLIP-T) or preference proxies (ImageReward, HPS-v2) at $k{=}1$; only under stronger interventions ($k\!\geq\!10$) does HPS-v2 exhibit a metric-dependent boundary, while CLIP-T remains robust throughout. The perceptual shifts induced by suppression are nonetheless sink-specific – $\sim\!6\times$ larger than equal-budget random masking – revealing an empirical dissociation between trajectory-level perturbation and semantic alignment in diffusion transformers. \footnote{Code available at https://github.com/wfz666/ICML26-attention-sink.}

24.
medRxiv (Medicine) 2026-06-22

Effectiveness of Stress Management to Reduce Stress Eating for Women: A Systematic Review and Meta-analysis of Intervention Studies

Objective: This systematic review and meta-analysis examined 1) the effects of stress management interventions on changes in stress eating for women, and 2) the longevity of these effects, by summarizing and assessing evidence from controlled and non-equivalent pretest-posttest intervention studies. Method: Five databases (PsycINFO, PubMed, Medline, Web of Science, CINAHL), existing sources, and grey literature were searched (February - June 2025). Studies that assessed stress eating or emotional eating, included a stress management intervention, and comprised at least 70% women were included. The primary outcome was reduction in stress eating. Data were pooled in meta-analyses using multi-level random-effects models and subset by follow-up period. Risk of bias was assessed via funnel plots and sensitivity analyses. Results: Sixty studies with 119 effect size estimates were included in the primary analysis. Pooled estimates indicated that stress management interventions significantly reduced stress eating (Hedges g = -0.4174, p < 0.001), with pre-post designs having larger effects than controlled trials. Subgroup analyses of follow-up periods found small effects in the short-term (before 3 months; Hedges g = -0.4202, p < 0.0001) and moderate effects for mid-term (3-6 months; Hedges g = -0.5886, p < 0.0001). Effects beyond 6 months were small and nonsignificant (Hedges g = -0.4370, p = 0.0660). Conclusion and Relevance: Stress management interventions appear to be effective for reducing stress eating for women, suggesting the potential to incorporate stress management in interventions targeting obesity. Effects may be only sustained 6 months post-intervention, suggesting the need for strategies to bolster long-term effectiveness.

25.
arXiv (CS.AI) 2026-06-18

AI-Driven Assessment of Human Tutors: Linking Training Performance to Real-Life Practice

arXiv:2606.18617v1 Announce Type: cross Abstract: There exist numerous tutor training platforms. However, few provide AI-driven training and evaluation for human tutors based on real-life performance. We present an AI-driven system that assesses both open responses during training and authentic real-life tutoring. Unlike platforms that only assess learning through online training or simulations, our system utilizes Generative AI (Gemini-2.5-pro) to analyze transcriptions of authentic tutoring, measuring the transfer of tutor skills to real-life application. Human tutors instructing students remotely in math (N=86) completed six scenario-based lessons, averaging a significant 7.4% learning gain. Using mixed-effects models across 405 session-to-lesson pairs, we found that training performance significantly predicted real-life transcript scores with an effect size of 0.25 SD. Model comparison (AIC/BIC) indicated averaging open response and multiple choice performance during training predicted real-life tutor performance best, although open responses were comparatively more predictive. Exploratory analysis showed that after training, tutors were significantly more likely to encounter pedagogical opportunities to apply their skills (61.1% to 68.9%) and demonstrated higher execution quality within those opportunities (65.5% to 68.1%). Interrupted time series analysis suggested that these tutor improvements were part of a gradual trend over time rather than an immediate intervention effect of training. We illustrate an AI-driven method to link tutor training with real-life assessment. In doing so, we contribute open datasets, AI prompts, and scoring rubrics to support transparency and reproducibility.