Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.AI) 2026-06-24

Deciphering Fingerprints of 3D Molecular Surfaces for Accurate Epitope Prediction

arXiv:2606.23830v1 Announce Type: cross Abstract: Molecular surfaces encode the geometric and physicochemical patterns that determine antibody-antigen recognition, central to epitope prediction. However, existing methods rely on sequences or backbone structures and struggle to capture discontinuous, surface-driven epitopes. This study presents SurfBind, a surface-centric learning framework for epitope prediction that operates directly on molecular surface representations. SurfBind integrates geometric and physicochemical cues through a Transformer-based architecture with patch-level surface modeling, binder-aware cross-attention, and a hierarchical coarse-to-fine prediction paradigm. Experiments on challenging epitope identification benchmarks, including SAbDab and DB5.5, demonstrate that SurfBind achieves state-of-the-art performance and strong generalization across unseen antibodies and conformational states, highlighting the value of interaction-aware surface modeling for understanding the crucial mechanisms of protein-protein interactions.

02.
arXiv (CS.CV) 2026-06-12

Diffusion Transformer World-Action Model for AV Scene Prediction

Action-conditioned world models let an autonomous vehicle predict future camera scenes from its own planned controls, enabling planning and simulation without real-world rollouts, but at compact, trainable scale the futures are ambiguous and the field's standard distortion metrics actively mislead: they reward a blurry regression mean over a realistic prediction. We confront this with a compact latent world model that, given the present front-camera latent and a sequence of ego-actions, predicts future scene latents a frozen decoder renders to $256 \times 256$ frames up to 8 seconds ahead, evaluated on 150 held-out nuScenes scenes. We first benchmark where to predict: across six frozen encoders spanning four representation families, V-JEPA2 with temporal context reduces steering RMSE by 40% over the best single-frame encoder. We then train a latent Diffusion Transformer (DiT) and, through a controlled diagnosis, identify the four ingredients it needs: spatial tokens, the $x_0$ objective, residual anchoring, and sampling matched to target uncertainty. In a Stable-Diffusion-VAE encode-predict-decode pipeline we expose the central tension: distortion metrics (cosine similarity, SSIM) favor the blurry mean, masking that the diffusion model is far closer to the real frame distribution. Inception-based FID and KID reveal a clean perception-distortion frontier: diffusion attains KID 0.078 versus 0.375 for regression ($4.8\times$ better), and a deployable train-derived calibration makes this practical without test-time ground truth. The model is genuinely action-controllable (steering drives scene displacement, Spearman $\rho = 0.81$, vs $-0.18$ for regression). We trace limited single-pass motion to a shared-present anchor and engineer a compact 1.7M-parameter "jump" model that recovers full ground-truth motion magnitude ($1.02\times$ GT), where single-pass models capture less than half.

03.
medRxiv (Medicine) 2026-06-15

Fanconi Anemia as a Window into Premalignant Field Cancerization of the Oral Mucosa

Head and neck squamous cell carcinoma (HNSCC) evolves through stepwise clonal expansion within genetically altered mucosa fields, yet actionable biomarkers remain undefined. Leveraging Fanconi anemia (FA), a cancer predisposition syndrome with extreme HNSCC risk due to defective DNA interstrand crosslink repair, we profiled premalignant changes in the oral cavity using noninvasive brush biopsies. Consistent with our prior demonstration of genomic instability in FA-associated SCCs, we detected pathogenic TP53 variants in 26% and copy number alterations in 60.5% in clinically normal-appearing oral mucosa of individuals with FA. These subclinical clonal expansions define candidate biomarkers of early clonal evolution amenable to serial sampling for risk stratification and prevention studies. Since FA-associated SCCs share genomic features with sporadic HNSCC, these findings may extend to the broader population. We also identify somatic reversion of a pathogenic FANCB variant, providing evidence of genomic self-correction and suggesting a potential avenue for gene-based cancer prevention in FA.

04.
arXiv (CS.LG) 2026-06-11

Fourier Features Let Agents Learn High Precision Policies with Imitation Learning

arXiv:2606.12334v1 Announce Type: new Abstract: High-precision robotic manipulation requires fine-grained spatial reasoning that is often difficult to achieve with RGB-only policies due to depth ambiguity and perspective scale issues. Policies that leverage 3D information directly, such as those based on point clouds, offer a stronger geometric prior over purely image-based ones, yet their performance remains highly task-dependent. We hypothesize that this discrepancy may be due to the spectral bias of neural networks towards learning low frequency functions, which especially affects architectures conditioned on slow-moving Cartesian features. We thus propose to map point clouds from Cartesian space into high-dimensional Fourier space, effectively equipping the point cloud encoder with direct access to high-frequency features. We experimentally validate the use of Fourier features on challenging manipulation tasks from the RoboCasa and ManiSkill3 benchmarks and on a real robot setup. Despite their simplicity, we find that Fourier features provide significant benefits across diverse encoder architectures and benchmarks and are robust across hyperparameters. Our results indicate that Fourier features let policies leverage geometric details more effectively than Cartesian features, showing their potential as a general-purpose tool for point cloud-based imitation learning. We provide source code and videos on our project page: https://fourier-il.github.io/fourier-il

05.
arXiv (CS.AI) 2026-06-18

Practical Anonymous Two-Party Gradient Boosting Decision Tree

arXiv:2605.26903v2 Announce Type: replace-cross Abstract: Structured data is well handled by gradient-boosted decision trees (GBDT), which are usually trained on vertically partitioned features across mutually distrustful parties. High speed and interpretability make GBDTs popular in finance and healthcare, where neural networks may fall short. Enabling secure computation for GBDTs poses unique challenges, requiring secure record alignment for comparison. Relying on private set intersection (PSI) is a de facto approach. Mistaking PSI for a safety measure actually exposes which record identifiers (IDs) are shared between the datasets. Although circuit-PSI could help, it is costly for generic uses. New ideas are needed to efficiently train in a "dark forest". Aiming to hide the IDs, we initiate the study of anonymous GBDT training on split data held by two parties. Dual circuit-PSI in our design lets the parties alternate as receiver to run pick-then-sum over local features. Via oblivious programmable pseudorandom functions, we propagate circuit-PSI outputs as shared state across runs. Avoiding universal alignment, we resolve the neglected dilemma that ID hiding incurs a cost that scales with domain size. Next, we halve the cost of ciphertext packing used to convert single-instruction multiple-data homomorphic encryption from (ring) learning with errors in prior secure GBDT (Usenix Security' 23) and related secure machine-learning computations. Comparative experiments show our protocol remains competitive with leaky approaches in efficiency. Enabling ID-hiding aggregation, our techniques can extend to other vertically partitioned analytics.

06.
arXiv (math.PR) 2026-06-11

Integrated expectile-based measures of inequality

arXiv:2606.12333v1 Announce Type: cross Abstract: Expectiles provide a class of asymmetric location functionals that incorporate the magnitude of deviations and admit a natural geometric interpretation. Building on their structural consistency with the convex stochastic order, this paper introduces a family of integrated expectile functionals for measuring risk, dispersion, and inequality. The proposed functionals admit analytical representations as integrals of expectiles across asymmetry levels. For a distinguished subclass of these constructions, a geometric representation is available: the resulting quantities can be expressed as weighted areas of star-shaped sets encoding the distributional asymmetry of a random variable. This approach yields a new class of expectile-based inequality indices, constituting a natural counterpart to classical Gini-type measures while preserving desirable monotonicity and consistency properties. Empirical counterparts are derived in closed form and admit explicit decompositions over finite samples. The framework extends naturally to multivariate settings through directional expectile constructions, leading to measures capable of capturing genuinely joint forms of multivariate dispersion and inequality.

07.
arXiv (CS.CV) 2026-06-16

From Static Inference to Dynamic Interaction: A Survey of Streaming Large Language Models

Standard Large Language Models (LLMs) are predominantly designed for static inference with pre-defined inputs, which limits their applicability in dynamic, real-time scenarios. To address this gap, the streaming LLM paradigm has emerged. However, existing definitions of streaming LLMs remain fragmented, conflating streaming generation, streaming inputs, and interactive streaming architectures, while a systematic taxonomy is still lacking. This paper provides a comprehensive overview and analysis of streaming LLMs. First, we establish a unified definition of streaming LLMs based on data flow and dynamic interaction to clarify existing ambiguities. Building on this definition, we propose a systematic taxonomy of current streaming LLMs and conduct an in-depth discussion on their underlying methodologies. Furthermore, we explore the applications of streaming LLMs in real-world scenarios and outline promising research directions to support ongoing advances in streaming intelligence. We maintain a continuously updated repository of relevant papers at https://github.com/EIT-NLP/Awesome-Streaming-LLMs.

08.
arXiv (CS.CV) 2026-06-12

Amnesia: A Stealthy Replay Attack on Continual Learning Dreams

Continual learning (CL) models often use experience replay to reduce catastrophic forgetting, but their robustness to replay sampling interference remains underexplored. Existing CL attacks alter inputs or training pipelines (poisoning/backdoors) and rarely include explicit auditable constraints, limiting realism. Here, auditability means a monitor can verify compliance from sampler-visible telemetry - e.g., logged replay index/label statistics - by checking that the realized replay class histogram stays close to a nominal baseline and that replay rate is unchanged per batch and/or over a rolling window. We study a limited-privilege insider who controls only replay index selection, not pixels, labels, or model parameters, while staying within auditable limits such as queue priorities. We introduce Amnesia, a replay composition attack that maximizes degradation under two budgets: a visibility budget delta bounding the TV/KL divergence from a nominal class histogram p0, and a mass budget f fixing the replay rate. Amnesia has two steps: (i) compute lightweight class utilities, such as EMA loss or confidence, to tilt p0 toward harmful classes; and (ii) project the tilt back into the delta-ball using efficient KL (exponential tilt) or TV (balanced mass redistribution) optimizers. A windowed scheduler enforces rolling audits. Across challenging CL benchmarks and strong replay baselines, Amnesia consistently lowers final accuracy (ACC) and worsens backward transfer (-BWT). The KL variant delivers high impact while remaining largely undetected under multiple audit schemes, including per-batch and rolling-window checks. The TV variant is more damaging but easier to detect, especially under tight per-class constraints. These results expose index-only replay control as a practical, auditable threat surface in CL systems and establish a principled impact-visibility trade-off.

09.
bioRxiv (Bioinfo) 2026-06-16

Evidence for recombination in dengue virus genomes

Recombination is a key driver of RNA virus evolution, yet its extent and evolutionary implications in dengue virus (DENV) remain incompletely understood. We conducted a comprehensive, genome-wide recombination screen across 6,905 complete DENV genomes representing all four serotypes, 82 countries, and eight decades of sampling (1944-2023) retrieved from the Bacterial and Viral Bioinformatics Resource Center. Using seven complementary recombination detection methods implemented in RDP5, we identified 66 recombination events across 53 unique recombinant sequences, of which 29 are newly described. Events included intra-genotypic (n = 18), inter-genotypic (n = 32), and inter-serotypic (n = 16) exchanges spanning 14 genotypes and four continents, with no meaningful serotype-level enrichment (Cramer's V = 0.054). Recombination was concentrated in non-structural genes, most frequently NS3 (19 events), NS5 (17), and NS2 (12), while the capsid gene contained no recombination events, consistent with strong functional constraint. Single-nucleotide polymorphism analyses confirmed low divergence between recombinants and their inferred parents in both recombinant and non-recombinant regions. Phylogenomic analysis of 6,642 sequences revealed that recombinants cluster significantly closer to their major parents (p = 8.9 x 10-6 ) and that their removal does not significantly alter tree topology (p = 0.898), suggesting that the short length of recombinant regions limits phylogenetic conflict. We also introduce RECOSIM, an unsupervised machine-learning tool for recombination detection that achieved higher precision than RDP5 on both simulated (93.4% vs. 80.0%) and empirical (98.1% vs. 39.3%) datasets. Collectively, these results establish recombination as a widespread, pan-serotypic phenomenon in DENV with implications for genomic surveillance, vaccine evaluation, and evolutionary inference.

10.
medRxiv (Medicine) 2026-06-19

Hyperleukocytosis and outcomes in pediatric B-cell acute lymphoblastic leukemia: A report from the REDIAL Consortium

Hyperleukocytosis (white blood cell [WBC] count >100 000/uL) at diagnosis is an important prognostic risk factor in pediatric acute lymphoblastic leukemia (ALL), though its significance with contemporary therapy is unclear. We analyzed 1 826 pediatric ALL patients from a multi-institution cohort to determine whether hyperleukocytosis independently predicts outcomes using multivariable Cox proportional hazard modeling. Hyperleukocytosis occurred in 211 patients (12%), with 121 having B-ALL, and showed no prognostic significance in T-ALL patients. In B-ALL, 5-year event-free survival (EFS) was 65% versus 89% for non-hyperleukocytosis patients, and overall survival (OS) was 78% versus 93%. After adjustment for age, cytogenetic risk, central nervous system disease status, and treatment site, hyperleukocytosis remained an independent predictor of end-of-induction minimal residual disease (MRD) positivity (odds ratio 2.53 [95% confidence interval [CI]: 1.71-3.94; p

11.
arXiv (CS.CL) 2026-06-18

ScholarSum: Student-Teacher Abstractive Summarization via Knowledge Graph Reasoning and Reflective Refinement

Abstractive summarization plays a crucial role in enabling efficient understanding of scientific literature, yet it inherently demands both linguistic fluency and factual faithfulness. Existing approaches often fail to reconcile these two requirements. Extractive methods rely on rigid sentence splicing that disrupts macro-level logical coherence, while large language model (LLM)-based generative approaches, despite mastering linguistic fluency, exhibit limited factual consistency. In this work, we propose ScholarSum, a hierarchical reflective graph-based framework that emulates a student-teacher writing process for fluent and faithful scientific summarization. ScholarSum first organizes the document into a hierarchical knowledge graph by segmenting it into semantically coherent units, whose multi-layered community structure captures global logic and macro-level themes. Guided by this global structure, the student generates an initial draft, which is subsequently refined through fine-grained evidence retrieval. To ensure factual consistency, a teacher-like reviewer then iteratively examines the draft, identifies unsupported content, and prompts targeted re-retrieval and rewriting until the summary meets rigorous quality standards. Extensive experiments demonstrate that ScholarSum significantly outperforms previous baselines in terms of both completeness and faithfulness. Our code is available at https://github.com/Xiaoyu-Tao/ScholarSum.

12.
arXiv (CS.CL) 2026-06-15

Succeeding at Scale: Enterprise Retrieval Benchmark Construction and Index-Preserving Query Adaptation for Multi-Tenant Search

Large-scale multi-tenant retrieval systems generate extensive query logs but lack curated relevance labels for effective domain adaptation, resulting in substantial underutilized "dark data." This challenge is compounded by the high cost of model updates, as jointly fine-tuning query and document encoders requires full corpus re-indexing, which is impractical in multi-tenant settings with thousands of isolated indices. We introduce DevRev-Search, a passage retrieval benchmark for technical customer support built via a fully automated pipeline. Candidate generation uses fusion across diverse sparse and dense retrievers, followed by an LLM-as-a-Judge for consistency filtering and relevance labeling. We further study and systematically evaluate index-preserving query-only adaptation strategies that fine-tune only the query-encoder while keeping the document indices fixed. Experiments on DevRev-Search, SciFact, and FiQA-2018 show that parameter-efficient fine-tuning of the query encoder delivers a remarkable quality-efficiency trade-off, enabling scalable and practical enterprise multi-tenant retrieval.

13.
arXiv (CS.CL) 2026-06-18

Language Models as Interfaces, Not Oracles: A Hybrid LLM-ML System for Pediatric Appendicitis

Large language models (LLMs) can make clinical decision support more accessible by interpreting free-text documentation, but their direct use as diagnostic engines is limited by sensitivity to prompts, information order, and plausible but incorrect outputs. Structured machine-learning models offer more stable risk prediction, yet they require tabular inputs that are difficult to integrate with narrative clinical workflows. We present ClaMPAPP (Clinical Language-assisted Machine-learning Pipeline for Appendicitis), a hybrid system that uses an LLM as an interface rather than as the final decision-maker. ClaMPAPP extracts schema-constrained clinical features from note-like narratives, applies deterministic plausibility checks, and passes validated features to an XGBoost classifier trained on clinical, laboratory, and ultrasound variables. We evaluated ClaMPAPP on two independent pediatric appendicitis cohorts from German hospitals and compared it with end-to-end LLM baselines, including open-source and proprietary models. To preserve ground truth while testing free-text input, narratives were generated from structured electronic health records through template rendering and constrained LLM rewriting, with additional sentence-order permutation to assess positional robustness. ClaMPAPP achieved the strongest overall diagnostic performance in both internal and external validation while minimizing missed appendicitis cases, the key safety concern in acute triage. End-to-end LLMs showed unstable sensitivity-specificity trade-offs and greater degradation under narrative reordering. These results support an LLM-as-interface, ML-as-predictor design that separates natural-language usability from predictive inference and provides a more auditable pathway for clinical decision support.

14.
arXiv (CS.CL) 2026-06-12

DSAEval: Evaluating Data Science Agents on a Wide Range of Real-World Data Science Problems

Recent LLM-based data agents aim to automate data science tasks ranging from data analysis to deep learning. However, the open-ended nature of real-world data science problems, which often span multiple taxonomies and lack standard answers, poses a significant challenge for evaluation. To address this, we introduce DSAEval, a benchmark comprising 641 real-world data science problems grounded in 285 diverse datasets, covering both structured and unstructured data (e.g., image and text). DSAEval incorporates three distinctive features: (1) Multimodal Environment Perception, which enables agents to interpret observations from multiple modalities, including text and vision; (2) Multi-Query Interactions, which mirror the iterative and cumulative nature of real-world data science projects; and (3) Multi-Dimensional Evaluation, which provides a holistic assessment across reasoning, code, and results. We systematically evaluate 13 recent advanced agentic LLMs using DSAEval. Our results show that Claude-Sonnet-4.5 achieves the strongest overall performance, MiMo-V2-Pro and GPT-5.2 lead in duration and step efficiency, respectively, and MiMo-V2-Flash is the most cost-effective. We further demonstrate that multimodal perception consistently improves performance on vision-related tasks, with gains ranging from 2.04\% to 11.30\%. Overall, while current data science agents perform well on structured data and routine data analysis workflows, substantial challenges remain in unstructured domains. Finally, we offer critical insights and outline future research directions.

15.
arXiv (quant-ph) 2026-06-24

Revealing high-dimensional entanglement through symmetry

arXiv:2606.23817v1 Announce Type: new Abstract: Photons encoded in discrete time bins can be routinely prepared in temporal superposition states, enabling high-dimensional entanglement and enhanced quantum communication rates. However, characterizing this high-dimensional entanglement presents significant challenges, namely due to the involved measurement complexity or reliance on restrictive assumptions that compromise the generality of traditional approaches. Here, we develop and experimentally demonstrate a simple linear-optical scheme based on particle-exchange symmetry that allows us to probe high-dimensional entanglement in time-bin-encoded states. Combining Hong-Ou-Mandel interference with suitable transformations, our method not only certifies entanglement but also lower-bounds its dimensionality using only two dichotomic symmetry-based measurements. This bound is obtained through a new rigorous theoretical analysis and can be further improved by weak, physically motivated assumptions. The scheme remains effective at any timescale, even far below the temporal detector resolution used. Our work provides a powerful state-characterization tool and demonstrates that we can prove high-dimensional temporal entanglement on timescales inaccessible to the setup.

16.
arXiv (CS.AI) 2026-06-24

Alternate loss functions and regression models that achieve robustness to outliers by modulating the learning rate

arXiv:2606.22068v2 Announce Type: replace-cross Abstract: Most real-world datasets used for training supervised learning models are contaminated with noisy data and outliers leading to large prediction errors. This paper proposes a new approach for achieving robustness where the learning rate is modulated by a factor that is sensitive to outliers. In this approach a reduction of the learning rate is shown to be achieved by using alternate loss functions that are infinitely differentiable, strictly convex or quasiconvex and more closely approximate the absolute error than Huber and log-cosh losses. A comparison of the performance of regression models trained with different loss functions on a wide variety of benchmarks and datasets is presented to demonstrate the superior performance of the Square Root Loss (SRL) and Smooth Mean Absolute Error (SMAE) losses proposed in this paper. Two new robust linear regression models are presented. Highly vectorized robust parameter update formulae that take advantage of modern GPUs for both stochastic and batch gradient descent are presented.

17.
arXiv (CS.AI) 2026-06-15

From Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-Spoofing

arXiv:2606.14639v1 Announce Type: cross Abstract: Recent advances in speech generation have significantly improved the naturalness of synthetic speech, making spoofing detection increasingly challenging. A key limitation of current anti-spoofing systems is their limited robustness to unseen synthesis methods. In this work, we transform a self-supervised speech representation model into a Mixture-of-Experts (MoE) architecture to improve generalization. Feed-forward blocks in selected encoder layers are replaced by multiple expert networks controlled by a layer-wise gating mechanism, allowing experts to capture complementary acoustic patterns while preserving the representations learned during self-supervised pretraining. We further analyze the architectural choices affecting the performance of this MoE conversion and investigate the activation behavior of the experts. The proposed approach is evaluated on 14 spoofing datasets and reduces the macro EER from 5.46% to 4.81%, corresponding to 11.9% relative improvement over the baseline.

18.
arXiv (CS.CV) 2026-06-17

Future Dynamic 3D Reconstruction: A 3D World Model with Disentangled Ego-Motion

Forecasting the evolution of dynamic environments is crucial for autonomous agents. While generative world models have recently achieved high photorealism in 2D video synthesis by mixing ego-motion and environmental dynamics within the image plane, they exhibit physical inconsistencies, such as morphing or vanishing objects, especially over long time horizons. In this paper, we propose FR3D, a world model that predicts a persistent 3D latent representation for future dynamic 3D reconstruction. Unlike prior works that treat the world as a sequence of image-based features, FR3D explicitly decouples the 3D evolution of the scene from the agent's trajectory, treating the inferred ego-motion as a latent proxy for action. This disentanglement resolves the ambiguities between self-motion and world-motion, ensuring geometric consistency into the future. Furthermore, we introduce a teacher-student distillation strategy that leverages the spatial "common sense" of off-the-shelf foundation models, leading to robust zero-shot generalization. Extensive experiments demonstrate FR3D's strong performance for future dynamic 3D reconstruction from monocular observations across multiple datasets, even 2 seconds into the future. Project page: https://fr3d-wm.github.io.

19.
arXiv (math.PR) 2026-06-15

Asymptotic analysis of the normal inverse Gaussian cumulative distribution

arXiv:2509.05664v2 Announce Type: replace-cross Abstract: Using a recently derived integral in terms of elementary functions, we derive new asymptotic expansions of the normal inverse Gaussian cumulative distribution function. One of the asymptotic representations is in terms of the normal Gaussian distribution or complementary error function.

20.
Nature (Science) 2026-06-10

Gen Z scepticism towards AI is a wake-up call — universities must take it seriously

作者:

The challenge for universities is not adopting artificial intelligence, but doing so in ways that the current generation of students can trust. The challenge for universities is not adopting artificial intelligence, but doing so in ways that the current generation of students can trust.

21.
arXiv (quant-ph) 2026-06-19

Optimizing resource allocation for accuracy in noisy variational quantum algorithms

arXiv:2606.20153v1 Announce Type: new Abstract: For quantum algorithms to achieve their full potential, we need methodologies to optimize them, such as reaching a given output accuracy with minimal resource costs. Here, we develop such a methodology for a class of Noisy Intermediate-Scale Quantum (NISQ) algorithms. We leverage simulations of a Variational Quantum Eigensolver (VQE) to propose a phenomenological model of such algorithms that captures the complex relationship between algorithmic accuracy, algorithmic resource costs, and the noise that exists in realistic quantum hardware. For this, we take the algorithmic resource cost to be the total number of quantum gate-operations in the algorithm; minimizing this cost typically makes the algorithm faster and more energy-efficient. We consider the subtle trade-off between quantum circuit size (small circuits are too imprecise, but large ones are too noisy), and the number of iterations of that quantum circuit for the full algorithm to sufficiently converge. Using a noise-metric-resource methodology, we identify the sweet spot (of circuit size versus iterations) that minimizes the algorithmic resource costs for a desired algorithm accuracy. It also gives the circuit size that maximizes algorithm accuracy for a fixed resource cost. Our methodology provides a practical guideline for near-term deployment of variational algorithms on realistic noisy hardware, including hardware that uses error mitigation.

22.
arXiv (CS.CV) 2026-06-12

ShowFlow: From Robust Single Concept to Condition-Free Multi-Concept Generation

Customizing image generation remains a core challenge in controllable image synthesis. For single-concept generation, maintaining both identity preservation and prompt alignment is challenging. In multi-concept scenarios, relying solely on a prompt without additional conditions like layout boxes or semantic masks, often leads to identity loss and concept omission. In this paper, we introduce ShowFlow, a comprehensive framework designed to tackle these challenges. We propose ShowFlow-S for single-concept image generation, and ShowFlow-M for handling multiple concepts. ShowFlow-S introduces a KronA-WED adapter, which integrates a Kronecker adapter with weight and embedding decomposition, and together with a novel Semantic-Aware Attention Regularization (SAR) training objective to enhance single-concept generation. Building on this foundation, ShowFlow-M directly reuses robust models learned by ShowFlow-S to support multi-concept generation without extra conditions, incorporating a Subject-Adaptive Matching Attention (SAMA) and a Layout Consistency guidance as the plug-and-play module. Extensive experiments and user studies validate ShowFlow's effectiveness, highlighting its potential in real-world applications like advertising and virtual dressing. Our source code will be publicly available at: https://htrvu.github.io/showflow.

23.
medRxiv (Medicine) 2026-06-19

Reassessing Instrument Strength in Two-Sample Mendelian Randomization Analysis

Mendelian randomization (MR) analysis is widely used to estimate causal relationships between risk factors and outcomes of interest. Two-sample MR approaches have gained increasing attention in genetic epidemiology due to the growing availability of Genome-Wide Association Study (GWAS) summary statistics from public databases. A critical step in two-sample MR is the selection of genetic variants as instrumental variables (IVs). Although genome-wide significant variants are typically preferred, the inclusion of variants with weaker association p-values is considered, as they may potentially improve power through an increased instrument number of instruments, while they may introduce weak instrument bias and attenuate effect estimates towards the null. Our simulation results show that even modest levels of pleiotropy substantially increase the variability of causal effect estimates, while the inclusion of weak IVs does not substantially affect the direction and variability of causal effect estimates in most cases. In real data analyses, we used two released versions of FinnGen GWAS summary statistics with different sample sizes as exposure GWASs to assess the influence of weak IVs. Here, the inclusion of IVs with higher exposure-association p-values resulted in weakened estimated effect sizes, particularly when the exposure GWAS sample size was small. These findings suggest that incorporating weak IVs is reasonable when the exposure GWAS sample size is large, but it poses a risk of falsely concluding null associations when the exposure GWAS sample size is small.

24.
arXiv (CS.CL) 2026-06-16

A large-scale pipeline for LLM-assisted corpus annotation: variation and change in the English consider construction

As natural language corpora expand at an unprecedented rate, manual annotation remains a significant methodological bottleneck in corpus linguistic work. We address this challenge by presenting a scalable pipeline for automating grammatical annotation in voluminous corpora using large language models (LLMs). Unlike previous supervised and iterative approaches, our method employs a four-phase workflow: prompt engineering, pre-hoc evaluation, automated batch processing, and post-hoc validation. We demonstrate the pipeline's accessibility and effectiveness through a diachronic case study of variation in the English evaluative consider construction (consider X as/to be/{\O} Y). We annotate 143,933 'consider' concordance lines from the Corpus of Historical American English (COHA) via the OpenAI API in under 60 hours, achieving 98%+ accuracy on two sophisticated annotation procedures. A Bayesian multinomial GAM fitted to 44,527 true positives of the evaluative construction reveals previously undocumented genre-specific trajectories of change, enabling us to advance new hypotheses about the relationship between register formality and competing pressures of morphosyntactic reduction and enhancement. Our results suggest that LLMs can perform a range of data preparation tasks at scale with minimal human intervention, unlocking substantive research questions previously beyond practical reach, though implementation requires attention to costs, licensing, and other ethical considerations.

25.
arXiv (CS.CV) 2026-06-17

MM++: Unsupervised Scale-Invariant Multilayer OOD Detection via Top-K Gated Feature Fusion

We introduce MM++ (Multilayer Mahalanobis++), a fully unsupervised, strictly post-hoc, and scale-invariant framework for Out-of-Distribution (OOD) detection. To address the trade-off between scale invariance and hierarchical expressivity, MM++ constructs a principled joint feature space. It first identifies discriminative intermediate layers by measuring entropy density drops, which mark the boundaries of sharp semantic compression. By fusing these selected layers with the terminal representation, the framework captures latent cross-layer correlations while mitigating early-layer noise. Crucially, a Ledoit-Wolf regularized tied covariance matrix stabilizes this unified space, enabling reliable distance estimation. Requiring no auxiliary OOD data, classifier fine-tuning, or architectural modifications, MM++ delivers robust performance across distinct architectures for both near- and far-OOD detection.