Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.LG) 2026-06-16

David vs. Goliath in Next Activity Prediction: Argmax vs. LSTM, Transformer, and LLM

arXiv:2606.15868v1 Announce Type: new Abstract: Next activity prediction (NAP) is a cornerstone of predictive process monitoring (PPM), enabling organizations to move from retrospective analysis to proactive process steering. The PPM field has progressed from classical machine learning through deep learning architectures such as LSTMs and Transformers to large language models (LLMs). Despite growing model complexity, no benchmark jointly compares LLMs, Transformers, LSTMs, and simple baselines in a direct sequence modeling setting for NAP. In this paper, we fill this gap with a systematic benchmark. We compare vocabulary-adapted LLMs, Transformers trained from scratch, LLM-distilled Transformers, and LSTMs against a simple counting-based argmax baseline across seven real-life event logs. Our results tell a David vs. Goliath story: pretraining confers no consistent improvement over training from scratch, model size shows little effect on performance, and on most datasets the argmax baseline matches or approaches the performance of billion-parameter LLMs.

02.
arXiv (CS.LG) 2026-06-19

Adversarial Dependence Minimization

arXiv:2502.03227v2 Announce Type: replace Abstract: Minimally redundant representations are typically learned by minimizing feature covariance. However, covariance-based methods fail to eliminate all dependencies/redundancies, as linearly uncorrelated variables can still exhibit nonlinear relationships. To address this, we introduce ADM, a differentiable algorithm that minimizes statistical dependence between feature dimensions through an adversarial game: auxiliary networks identify dependencies, while the encoder removes them. We prove that mutual independence is achieved at the global optimum, empirically verify convergence, and study three potential applications: extending PCA to nonlinear decorrelation, improving generalization in image classification, and preventing dimensional collapse in self-supervised learning. By promoting statistically independent representations, ADM paves the way for learning more robust, compressed, and generalizable representations across diverse applications.

04.
arXiv (CS.AI) 2026-06-18

RankGraph-2: Lifecycle Co-Design for Billion-Node Graph Learning in Recommendation

arXiv:2606.18379v1 Announce Type: cross Abstract: Graph-based retrieval at billion-node scale requires jointly solving three tightly coupled problems – graph construction, representation learning, and real-time serving – yet existing work addresses each in isolation. We present RankGraph-2, a framework deployed at Meta that co-designs all three lifecycle stages for similarity-based retrieval (U2U2I and U2I2I), where each stage's requirements shape the others. Serving requires a co-learned cluster index to avoid expensive online KNN – this pushes index co-training into the training objective. Training benefits from the observation that similarity-based retrieval tolerates pre-computed neighborhoods, eliminating online graph infrastructure – this requires construction to produce self-contained data. Construction must also support hour-level refresh for item coverage. Acting on these cascading requirements, RankGraph-2 reduces hundreds of trillions of edges to hundreds of billions via subsampling with popularity bias correction, pre-computes multi-hop neighborhoods via personalized PageRank, and co-learns a residual-quantization cluster index that reduces serving computational cost by 83%. This lifecycle co-design enables a simple architecture to achieve 3.8 x higher recall than a GAT + Deep Graph Infomax model on a bipartite graph and 2.1 x higher than PyTorch-BigGraph on item retrieval. RankGraph-2 delivers up to +0.96% CTR and +2.75% CVR, and has powered 20+ retrieval launches across major surfaces.

05.
arXiv (CS.AI) 2026-06-19

AAPA: Adversarially Anchored Preference Alignment for Post-Training of Large Language Models

arXiv:2509.25148v2 Announce Type: replace Abstract: Post-training alignment of large language models often combines supervised fine-tuning (SFT) on expert demonstrations with reinforcement learning (RL) from preference or verifiable feedback. SFT provides a useful behavioral anchor but can overfit to static demonstrations, whereas RL encourages exploration but may drift from expert behavior or exploit imperfect rewards. We propose AAPA (Adversarially Anchored Preference Alignment), a plug-in framework that augments existing post-training objectives with a sentence-level adversarial anchoring signal. AAPA compares policy rollouts with offline, pre-collected expert responses using a fixed lightweight discriminator, and therefore requires neither online teacher inference nor discriminator co-training during policy optimization. The same anchoring term can be added to SFT, GRPO, and CHORD while preserving their original training pipelines. Experiments on instruction-following benchmarks show that AAPA consistently improves the corresponding base objectives across model scales. In particular, the staged AAPA configuration improves over a strong GRPO baseline by 5.77\% on \texttt{Qwen3-0.6B} and 3.75\% on \texttt{Qwen3-4B}. Further analyses on response length, log-probability distributions, and discriminator variants suggest that adversarial anchoring provides a stable semantic grounding signal for preference optimization. Code is available at \url{https://github.com/IsFaqq/AAPA}.

06.
Nature Medicine 2026-06-15

Long-term independent use of an intracortical brain–computer interface for speech and cursor control

Brain–computer interfaces (BCIs) can provide naturalistic communication and digital access to people with severe paralysis by decoding neural activity associated with attempted speech and movement. Recent work has demonstrated highly accurate intracortical BCIs for speech and cursor control, but two critical capabilities needed for practical viability were unmet: independent at-home operation without researcher assistance and reliable long-term performance supporting accurate speech and cursor decoding. Here we demonstrate the independent and near-daily use of a multimodal BCI with novel brain-to-text speech and computer cursor decoders by a man with paralysis and severe dysarthria due to amyotrophic lateral sclerosis. Over nearly 2 years, the participant used the BCI for more than 3,800 h at home with no researchers present to maintain rich interpersonal communication with his family and friends, independently control his personal computer and sustain full-time employment—despite being paralyzed. He communicated 183,060 sentences—totaling 1,960,163 words—at an average rate of 56 words per minute. He labeled 92% of sentences as being decoded at least mostly correctly. In formal quantifications of performance where he was asked to say words presented on a screen, attempted speech was consistently decoded with more than 99% word accuracy (125,000 word vocabulary). The participant also used the speech BCI as keyboard input and the cursor BCI as mouse input to control his personal computer, enabling him to send text messages and emails and to browse the internet. These results demonstrate that intracortical BCIs have the potential to support independent use in the home, marking a critical step toward practical assistive technology for people with severe motor impairment. An automated intracortical brain–computer interface, used at home with no researcher intervention, provides long-term and accurate restoration of speech-based communication and cursor-based computer usage in a person with severe dysarthria due to amyotrophic lateral sclerosis.

07.
arXiv (CS.CL) 2026-06-12

Recursive Agent Harnesses

Recursive language models (RLMs) showed that recursion over model calls is an effective strategy for long-context reasoning, and production coding agents have begun to write code that spawns subagents at scale, most recently in Anthropic's dynamic workflows. We name and study the pattern between these two lines of work, where the recursive unit is a full agent harness with filesystem tools, code execution, and planning rather than a model call with no tools. We call this the Recursive Agent Harness (RAH) and frame it as harness recursion, the code-first extension to the model recursion of RLMs. A parent agent generates and runs an executable script that spawns subagent harnesses in parallel for fine-grained workloads and uses structured function calls for small subtasks. We provide a controlled evaluation on long-context reasoning. With the backbone held fixed at GPT-5 to match the published Codex and RLM baselines, RAH improves the Codex coding-agent baseline from 71.75% to 81.36% on Oolong-Synthetic (199 samples, 13 context-length buckets up to 4M tokens), a gain attributable to the harness rather than the model. With a stronger backbone, Claude Sonnet 4.5, the same design reaches 89.77%.

08.
arXiv (CS.AI) 2026-06-16

Automating Low-Risk Code Review at Meta: RADAR, Risk Calibration, and Review Efficiency

arXiv:2605.30208v2 Announce Type: replace-cross Abstract: AI-assisted coding tools have altered software production. At Meta, significant lines of code per human-landed diff grew by 105.9% year over year and per-developer diff volume rose 51%, with agentic AI responsible for over 80% of that growth. Meanwhile, the share of diffs receiving timely review has declined, exposing a widening gap between code supply and reviewer bandwidth. We ask three questions that progress from feasibility through calibration to impact: (1) can risk-stratified automation operate at scale across diverse organizations, (2) how does tuning the risk threshold affect the trade-off between automation yield and safety, and (3) to what extent does automated review reduce end-to-end latency for AI-generated changes? We deployed RADAR (Risk Aware Diff Auto Review), a multi-stage funnel that classifies each diff by authorship and source type, applies eligibility gates, static heuristics, a machine-learned Diff Risk Score, LLM-based Automated Code Review, and deterministic validation before landing qualifying changes. We evaluate RADAR through telemetry covering 535K+ RADAR-reviewed diffs, observational before-after comparisons for policy changes, and difference-in-differences analysis of efficiency outcomes. RADAR has reviewed 535K+ diffs and landed 331K+. Relaxing the Diff Risk Score threshold from the 25th to the 50th percentile increased the approve rate to 60.31%. The revert rate for RADAR-reviewed diffs is 1/3 that of non-RADAR diffs, and the Production Incident rate is 1/50 that of non-RADAR diffs. RADAR reduces median time to close by over 330% and median diff review wall time by 35%. Risk-aware layered automation can materially reduce review bottlenecks created by AI-driven code growth without compromising production safety.

09.
arXiv (CS.LG) 2026-06-16

MultiMolecule: a modular ecosystem for biomolecular sequence-model workflows

作者:

arXiv:2606.16540v1 Announce Type: cross Abstract: Biomolecular sequence models are increasingly reused outside the studies in which they were introduced, but public checkpoints rarely preserve the execution context needed to inspect source-defined behavior, adapt models to new assays, compare models under shared task definitions or deploy biological predictions. MultiMolecule is an open-source Python ecosystem that turns heterogeneous RNA, DNA and protein sequence-model releases into complete, source-checked model-family implementations with shared loading, workflow and prediction interfaces. The Resource state reported here includes 53 complete model-family implementations with 112 standardized model checkpoints, together with 16 curated dataset resources released through 39 public dataset repositories and 10 user-facing prediction pipelines. Standardized components are linked to source provenance, conversion or preparation code, source-reference checks, Extended Data summaries and public documentation, allowing users to inspect what was standardized, what behavior was checked and how each component enters training, evaluation, inference or deployment. By shifting reuse from repository-specific checkpoints to executable implementations connected to standardized checkpoints, curated datasets, Runner workflows and biological prediction pipelines, MultiMolecule provides common infrastructure for preserving source-defined model behavior, adapting models to new assays, enabling controlled evaluation and deploying biomolecular predictions.

10.
bioRxiv (Bioinfo) 2026-06-13

MoE-Bind: Guiding De Novo Protein Binder Generation with Sparse Experts

作者:

De novo protein binder design has been dominated by structure-based pipelines that require known three-dimensional target conformations and consume substantial compute and generation time per design, limiting their throughput and accessibility for routine large-scale binder exploration. Sequence-only generative models promise a faster and lighter alternative, yet existing systems remain uniformly dense and frequently reintroduce structural computation at inference, undermining the core advantages they were intended to deliver. Across the broader language modelling community, transformers have meanwhile transitioned from fully dense designs to sparse Mixture-of-Experts architectures that decouple capacity from per-token compute, a shift that has yet to reach sequence-only protein binder generation. We present MoE-Bind, an autoregressive protein binder generator that, for the first time in this domain, combines Multi-head Latent Attention with a sparse Mixture-of-Experts feed-forward network and is evaluated under two independent structure predictors, Boltz-2 and AlphaFold2-Multimer. Despite activating less than half the per-token parameters of compute-matched dense baselines, MoE-Bind matches or exceeds them on full-length receptor-conditioned binder generation on a leakage-free Docking Benchmark 5.0 evaluation, transfers without peptide-specific training to short-peptide design, and reduces training and inference compute by a large margin. Routing analysis on generated binders reveals interpretable expert specialization at both the individual amino acid and biochemical group level, a structured expert-token alignment not previously reported for natural-language MoE models. These results show that sparse architectural design, rather than scale, can deliver fast, structure-free, and interpretable protein binder generation.

11.
arXiv (CS.CL) 2026-06-18

VISUALSKILL: Multimodal Skills for Computer-Use Agents

Computer-use agents (CUAs) approach human-level performance on standardised benchmarks but still struggle on long-horizon tasks and unseen software. Existing skill libraries address this with reusable skills, but represent the skill artifact as text only, despite the visual nature of GUI interaction. We propose VISUALSKILL: a hierarchical multimodal skill, tailored to each target application and organised as a central index over per-topic files, which the agent consumes through a load_topic MCP tool that fetches the relevant topic's text and figures on demand. We construct each skill with a two-stage pipeline that combines authored documentation with live-application UI exploration. On two CUA benchmarks, CUA-World and OSExpert-Eval, a Claude Code CLI agent backed by Claude Opus 4.6 reaches an average score of 0.456 with VISUALSKILL, a +15.3 point absolute lift over the no-skill baseline (0.303). Against a matched text-only skill that is generated from the same source content and differs from VISUALSKILL only in modality, VISUALSKILL yields a further +8.3 point absolute gain over the matched text-only skill (0.373 vs. 0.456), providing direct evidence that retaining visual figures in the skill artifact, rather than verbalizing them away, helps the agent both identify UI elements and verify workflow state after each action. Our code is available at https://github.com/XMHZZ2018/VisualSkills.

12.
arXiv (CS.CV) 2026-06-11

ARGUS: Stacked Multi-View Identity Mosaic Injection for Subject-Preserving Video Generation

Subject-preserving video generation is not solved by frontal-face similarity alone: a generated person must remain recognizable across motion, large viewpoint changes, expression shifts, occlusion, scale variation, and conflicts among text, first-frame, and identity references. We argue that the central bottleneck is the point-reference paradigm, which collapses identity into a single static observation entangled with pose, accessories, lighting, background, and camera statistics. We introduce Argus, a Wan-based framework centered on Stacked Multi-View Identity Mosaic Injection (SMII). SMII converts MLLM-selected image/video identity evidence into a 3*3 stacked mosaic, synchronizes the mosaic with the current diffusion time, and injects it as negative-time read-only memory in Wan's native token space. This turns identity from an external clean adapter or a single reference image into a compact dynamic distribution. Around SMII, an MLLM Identity Director selects informative identity moments and resolves condition conflicts, while no-cross-pair counterfactual training, Temporal Identity Annealing, and Adaptive Self-Likeness Guidance improve robustness without paired subject-video supervision. We further release HardID-Celeb, a public-figure identity-stress benchmark, and introduce YawScore and OccScore to probe large-yaw and first-frame-occlusion robustness. Argus achieves state-of-the-art results on OpenS2V-Eval Human-Domain, reaching 64.38 Total Score, 71.86 FaceSim, 51.62 NexusScore, and 79.14 NaturalScore. On HardID-Celeb, Argus obtains 76.80 FaceSim and improves YawScore and OccScore by 12.60 and 15.10 points over the strongest baselines, demonstrating that dynamic identity memory and large-scale counterfactual self-supervision are highly effective for subject-preserving video generation.

13.
arXiv (CS.LG) 2026-06-12

Towards One-for-All Anomaly Detection for Tabular Data

arXiv:2603.14407v2 Announce Type: replace Abstract: Tabular anomaly detection (TAD) aims to identify samples that deviate from the majority in tabular data and is critical in many real-world applications. However, existing methods follow a ``one model for one dataset (OFO)'' paradigm, which relies on dataset-specific training and thus incurs high computational cost and yields limited generalization to unseen domains. To address these limitations, we propose OFA-TAD, a generalist one-for-all (OFA) TAD framework that only requires one-time training on multiple source datasets and can generalize to unseen datasets from diverse domains on-the-fly. To realize one-for-all tabular anomaly detection, OFA-TAD extracts neighbor-distance patterns as transferable cues, and introduces multi-view neighbor-distance representations from multiple transformation-induced metric spaces to mitigate the transformation sensitivity of distance profiles. To adaptively combine multi-view distance evidence, a Mixture-of-Experts (MoE) scoring network is employed for view-specific anomaly scoring and entropy-regularized gated fusion, with a multi-strategy anomaly synthesis mechanism to support training under the one-class constraint. Extensive experiments on 34 datasets from 14 domains demonstrate that OFA-TAD achieves superior anomaly detection performance and strong cross-domain generalizability under the strict OFA setting. The source code is available at https://github.com/Shiy-Li/OFA-TAD.

14.
PLOS Medicine 2026-05-21

Semaglutide-associated risk of nonarteritic anterior ischemic optic neuropathy in patients with type 2 diabetes: A systematic review and meta-analysis of observational studies

by Jędrzej Chrzanowski, Magdalena Walicka, Jacek Burzyński, Małgorzata Zaraś, Arkadiusz Michalak, Wojciech Fendler Background Semaglutide, a glucagon-like peptide-1 receptor agonist, is widely used for the management of type 2 diabetes (T2DM). Recent case reports have raised concerns about a potential association between semaglutide use and the development of nonarteritic anterior ischemic optic neuropathy (NAION), a rare but vision-threatening condition. We aimed to evaluate whether semaglutide use is associated with an increased risk of NAION in patients with T2DM. Methods and findings We conducted a systematic review and meta-analysis of observational studies comparing patients with T2DM aged ≥12 years treated with semaglutide to those receiving other glucose-lowering therapies. We searched PubMed, Scopus, and Web of Science databases from January 2023 to November 2025. Two reviewers independently extracted data on study design, population characteristics, and outcomes. Risk of bias was assessed using the Newcastle–Ottawa Scale, and ROBINS-I v.2. Certainty of the evidence was graded according to the GRADE framework. Pooled hazard ratios (HRs) and 95% confidence intervals (CIs) were calculated using fixed-effects models; sensitivity analyses included crude and subgroup HRs, and overlapping study replacement. Leave-one-out analysis was conducted to assess small-study effects and publication bias. Results were contextualized within other meta-analyses, systematic reviews, consensus statements, and regulatory communications on the topic.Five eligible observational studies met the inclusion criteria, and 7 additional studies were included in the sensitivity analysis. Semaglutide use was associated with a significantly increased hazard of NAION compared with nonsemaglutide glucose-lowering regimens (HR 2.17, 95% CI [1.73, 2.74]; p 

15.
arXiv (quant-ph) 2026-06-17

Entanglement transition in unitary system-bath dynamics

arXiv:2512.06081v3 Announce Type: replace Abstract: The evolution of a system coupled to baths is commonly described by a master equation that, in the long-time limit, yields a steady-state density matrix. However, when the same evolution is unraveled into quantum trajectories, it is possible to observe a transition in the scaling of entanglement within the system as the system-bath coupling increases - a phenomenon that is invisible in the trajectory-averaged reduced density matrix of the system. Here, we go beyond the paradigm of trajectories from master equations and explore whether a qualitatively analogous entanglement-scaling transition emerges in a single unitary evolution of the combined system-bath setup, without monitoring the dynamics of the system. We investigate the scaling of entanglement in a unitary quantum setup composed of a two-dimensional lattice of free fermions, where each site is coupled to a fermionic bath. As the system-bath coupling increases, the logarithmic fermionic negativity reveals an entanglement transition from logarithmic-law to area-law scaling. This occurs while the system's steady-state properties are trivial, highlighting that the signatures of these different scalings are within the bath-bath correlations. Evidence of the transition is also found in the mutual information and the correlations of the full system-bath setup, suggesting that the entanglement transition is underpinned by a change in the spatial structure of quantum information.

16.
arXiv (CS.AI) 2026-06-17

ANEForge: Python for direct computation on the Apple Neural Engine

arXiv:2606.17090v1 Announce Type: cross Abstract: ANEForge is a Python package that programs the Apple Neural Engine (ANE), the fixed-function neural accelerator on every recent Apple device, directly and without CoreML. In production the engine is reachable only through CoreML, which treats it as a scheduling option: no configuration requires the ANE, and a model can silently run on the CPU or GPU instead. ANEForge compiles a lazy tensor graph, built from 58 fused operators and 19 native bridge operators, into a single ANE program. The program is dispatched through the same ANE daemon and kernel-driver stack as Apple's internal framework. Beyond inference, the package reaches the engine's native fused attention, streams int8, int4, and sparse weights, keeps decoder and optimizer state resident across steps, and runs the forward pass, backward pass, and optimizer update of training on the engine. A small fused program completes a call in about 90us, near the engine's 70us per-program dispatch floor, and a pretrained ResNet-18 forward runs end-to-end in 0.33ms. ResNet-18, a sentence encoder, and a Vision Transformer run end-to-end against framework references, and a Stable Diffusion U-Net validates its forward pass. ANEForge targets Apple Silicon under macOS 14 and later. Each release is verified against a recorded macOS and ANE-compiler version.

17.
arXiv (math.PR) 2026-06-15

Trivariate Hypergeometric Series Formulas for Pure Partition Functions of Multiple $3$-SLE$_\kappa$

作者:

arXiv:2606.14038v1 Announce Type: new Abstract: Pure partition functions of multiple SLE are characterized by null-state partial differential equations, Möbius covariance, and boundary asymptotics. After quotienting by Möbius covariance, the case of three curves is the first genuinely multivariable one: the moduli space has three independent variables, naturally represented by the three unoriented cross-ratios of the three pairs of links. We solve this Möbius-normalized three-variable problem for the two basic link-pattern types of multiple \(3\)-SLE\(_\kappa\), namely the rainbow and neighbor patterns. Writing \(\beta=4/\kappa\), we construct explicit trivariate hypergeometric-series normal forms and identify them with the corresponding pure partition functions for all \(\beta>1/2\) in the rainbow case and all \(\beta\ge2/3\) in the neighbor case. Equivalently, these ranges are \(\kappa\in(0,8)\) and \(\kappa\in(0,6]\), respectively. The proof is analytic. The null-state PDEs and Möbius covariance yield recursion relations for the trivariate coefficient arrays. In the rainbow case, coefficient estimates give convergence and boundary regularity on the closed cube. In the neighbor case, Pfaff systems continue the local power series to a neighborhood of \([0,1)^3\), while side-face equations, regular normal estimates, and corner propagation give continuity on \([0,1]^3\) for \(\beta\ge2/3\). The endpoint \(\beta=2/3\), corresponding to \(\kappa=6\), requires a logarithmic normal term. The two-dimensional boundary degenerations are classical Appell \(F_1\) and Horn \(G_2\) functions. The probabilistic identification uses SLE martingale arguments and Itô calculus, together with positivity and boundary regularity. We also discuss boundary degenerations, including heuristic connections with boundary Green's functions.

18.
arXiv (CS.AI) 2026-06-16

When Agent Automation Becomes Profitable: Quantifying and Insuring Autonomous AI Risk through Trace-Economic Underwriting

arXiv:2606.16465v1 Announce Type: new Abstract: AI agents can now take irreversible actions in operational systems, but agent-caused losses are still not clearly assigned, priced, or transferred. Providers often disclaim consequential damages, users are left with uncompensated losses, and default human review limits the efficiency gains of automation. We ask when autonomous AI deployment can become economically acceptable despite failure risk. Our answer is to quantify risk at the customer-task-trace episode level and transfer it through insurance. Automation is acceptable when its expected benefit exceeds the premium, control cost, and remaining risk. This requires a defined role with bounded permissions and comparable traces. We introduce trace-economic underwriting, which maps tool-use traces to customer exposure and claimable loss, then uses this representation for pricing, control, and risk transfer. It uses deterministic economic labels rather than an LLM judge. In our trace-to-loss testbed, trace-economic pricing reduces pricing MAE from $17.7K to $569 and removes regressive cross-subsidy. A 300-trace expert audit accepts 295 labels unchanged. On 1,000 real SWE-smith traces, trace-conditioned controls reduce CVaR95 by 72%. Theorem~1 gives a finite-sample scope condition. We release code, labels, and audit sheets.

19.
medRxiv (Medicine) 2026-06-11

Global population frequencies of NAT2 star alleles observed in three large biobanks

NAT2 is an important pharmacogene which encodes the N-acetyltransferase 2 enzyme that is involved in the metabolism of multiple medications, and variants in this gene can affect patient response to these medications. CPIC has published a clinical guideline for prescribing hydralazine using NAT2 genotypes. Just prior to the guideline, updated NAT2 star allele numbering and definitions were released, differing somewhat from the historical nomenclature. Clinical pharmacogenomic testing panels often test for the most common star alleles, so knowledge of the most common updated NAT2 star alleles is critical for the implementation of the CPIC NAT2/hydralazine guideline. We first determine NAT2 diplotype frequencies from UK Biobank (UKBB) 200k phased genomes, then analyzed allele, diplotype, and phenotype population frequencies from the All of Us Research program, PennMedicine BioBank (PMBB) and UKBB 500k datasets. We found that analyzing NAT2 diplotypes from phased data provides critical information for algorithms designed to predict diplotypes from unphased data. We observed that NAT2*5, *6, and *4 were the most common star alleles in that order, and the top 11 most frequent NAT2 star alleles were the same across all biobanks. However, differences in star allele frequencies across biogeographical populations were observed. The largest difference led to a higher frequency of NAT2 poor metabolizer phenotypes as compared to rapid and intermediate metabolizer phenotypes in all global populations except in the EAS population, where NAT2 poor metabolizers were in the minority.

20.
arXiv (CS.AI) 2026-06-16

LLM4RTL: Tool-Assisted LLM for RTL Generation

arXiv:2606.15500v1 Announce Type: cross Abstract: Large language models (LLMs) have facilitated impressive progress in software engineering, code generation, tooling, and systems. Concurrently, a significant body of research has developed which explores a growing variety of methods and systems for applying LLMs to hardware and chip design (e.g., systems for RTL code generation based on functional description). However, when it comes to open Verilog/RTL code-generation, we need high-quality training samples to build specialized and more effective LLM systems through fine-tuning or low-rank adaptation. Here, we propose a ``judge-renew-check-renew-check'' (JRCRC) pipeline which updates a current public dataset using a hierarchy of state-of-the-art commercial LLM models differing in their costs and capabilities in RTL code generation. This approach achieves a cost-effective mechanism for filtering and refining code-generation samples into a higher-quality training dataset. Our experiments also identify some common weaknesses of LLMs in rule-based reasoning and logic, and consequently, in RTL code-generation. Having identified these weaknesses, we develop an architecture for incorporating pre-processing tools to dynamically assist the LLMs in inferring logical relationships from tabular data formats. With our tools-assisted architecture for RTL code generation, we achieve significant overall performance gains in the VerilogEval benchmark and outperform many state-of-the-art methods. Our LLM4RTL system achieves performance comparable to that of GPT-4O using a significantly much smaller LLM.

21.
medRxiv (Medicine) 2026-06-11

Computer Vision Scoring of Figure Copy and Recall

Objective. Figure copy and recall tests are sensitive measures of visuoconstruction and visual episodic memory, but their clinical is constrained by labor-intensive manual scoring. We developed and validated an automated, element-level scoring pipeline using Vertex AI object detection for the tablet-based figure copy and recall tasks in the California Cognitive Assessment Battery (CCAB). The automated scoring pipeline duplicated the scoring procedures used by expert manual raters. Methods. A normative sample of 2,011 community-dwelling adults aged 18-90 completed figure copy and delayed recall trials at baseline, with subsamples retested at 1 day and at 6, 18, and 30 months. Participants completed the drawings with their index finger on a tablet computer with finger position digitized to analyze the speed and timing of individual drawing strokes A convolutional object-detection model trained on the Vertex AI AutoML Vision platform identified each of twelve canonical figure elements in rendered drawings. Separate element presence and location scores were computed after homographically warping drawings onto a canonical template to produce trial-level Element, Location, and Total scores. To compare Vertex and human scores, Vertex AI and expert human raters independently scored 1500 randomly selected drawings to evaluate inter-rater agreement, including a common subset of 100 drawings scored by Vertex AI and all raters. Results. Total scores were virtually indistinguishable (r = 0.966) from human-human agreement (mean r = 0.971) as were Element presence scores (mean r = 0.959 vs. r = 0.963). Location-score agreement (r = 0.951) was slightly below the human-human mean (r = 0.972) due to pixel-level analysis by Vertex AI that was impossible for human raters. The Vertex pipeline showed no preferential advantage for the single expert rater who categorized Elements during training. Automated scores showed strong demographic gradients, age effects on Recall (r = -0.32) were approximately twice those in Copy conditions (r = -0.16). A Memory Cost score (Recall - Copy) showed a monotonic age-related decline from +0.40 z in the youngest subjects to -0.54 z in the oldest. Kinetic analysis revealed that drawing speed and efficiency showed significant age-related changes. Overnight test-retest reliability was high (Recall r = 0.72) and the Recall trial showed a large overnight learning effect ({Delta} = +1.18) that continued with repeated tests up to 30 months ({Delta} = +0.75).

22.
arXiv (CS.CV) 2026-06-19

FrequencyFormer: A Co-Designed Sensor-to-Processor Pipeline for Frequency-Domain Vision Transformer Inference

Deploying vision transformers (ViTs) on sensor-edge systems is limited not only by on-device compute, but also by the energy and bandwidth required to transmit high-dimensional image data from the sensor to the processor. While in-sensor and near-sensor computing reduce this cost through early feature extraction, existing methods often provide only modest compression. We observe that the frequency domain provides a naturally compact representation of visual information and can be exploited at the sensor level to reduce sensor-to-processor data movement. Building on this insight, we present FrequencyFormer, a co-designed sensor-to-processor pipeline for efficient ViT inference. FrequencyFormer includes: (1) a multi-scale DCT tokenizer that compresses a 224x224 image into compact frequency-domain tokens, achieving up to 128x reduction in off-chip data volume with modest accuracy loss; (2) a LUT-based near-sensor hardware implementation that leverages fixed DCT coefficients for multiplier-free, energy- and area-efficient tokenization; and (3) a modified MIPI-based low-power communication architecture that further reduces transfer energy. FrequencyFormer serves as a drop-in replacement for standard ViT patch embedding and remains compatible with pretrained backbones across classification, detection, and segmentation tasks. The pipeline achieves 28.8 TOPS/W, reduces communication energy by 230x, and lowers total sensor-side energy by 2.22x, demonstrating frequency-domain tokenization as a scalable foundation for in-sensor ViT deployment.

23.
arXiv (CS.AI) 2026-06-19

Frequency-Aware Flow Matching for Continuous and Consistent Robotic Action Generation

arXiv:2606.20135v1 Announce Type: cross Abstract: Flow matching has emerged as a standard paradigm for robotic manipulation owing to its strong expressive power for modelling complex, multimodal action distributions, alongside similar approaches like diffusion policy. However, existing methods rely on discretized action chunks, making them brittle to demonstrations collected at heterogeneous control frequencies and prone to temporally inconsistent actions that degrade control stability. In this paper, we propose Frequency-Aware Flow Matching (FAFM), which outputs continuous, temporally consistent actions. To handle heterogeneous frequency input, we transform discrete action sequences into the frequency domain with the discrete cosine transform (DCT), perform flow matching over the resulting coefficients, and reconstruct continuous actions via cosine basis expansion. To generate temporally consistent actions, we regularize the first-order temporal derivative to promote smooth actions. This corresponds to a Sobolev-type constraint that suppresses high-frequency errors and discourages abrupt action changes. Our FAFM is simple, introduces no additional network parameters and applies to standalone flow-matching policies and vision-language action models. Across synthetic toy benchmark, obstacle avoidance, LapGym, and LIBERO, FAFM improves success rates, multimodal expressivity, motion smoothness, convergence speed, robustness to mechanical bias and mixed-frequency input. These gains are consistent when deployed on a real-world Franka robot. Code available at https://anonymous.4open.science/r/FAFM.

24.
arXiv (quant-ph) 2026-06-16

Exactly Solvable Quantum Model with Spin-Dependent Coulomb Interaction

arXiv:2501.05103v5 Announce Type: replace Abstract: In this work, we report an exactly solvable quantum model featuring a spin-dependent Coulomb interaction, described by the spin vector potential \(\vec{\mathcal{A}} = k (\vec{r} \times \vec{S}) / r^2\) together with a Coulomb-type scalar potential \(\varphi = \kappa / r\) . The model is governed by the Schrödinger-type Hamiltonian \(\mathcal{H}_S = \vec{\Pi}^2 / (2M) + q \varphi\) in nonrelativistic quantum mechanics and by the Dirac-type Hamiltonian \(\mathcal{H}_D = c \vec{\alpha} \cdot \vec{\Pi} + \beta M c^2 + q \varphi\) in relativistic quantum mechanics, where \(\vec{\Pi} = \vec{p} - (q/c)\vec{\mathcal{A}}\) is the canonical momentum. We demonstrate two main results: (i) Just as the Coulomb-type scalar potential \(\mathcal{S}_Maxwell = \{\vec{\mathcal{A}} = 0,\ \varphi = \kappa / r\}\) is a local exact solution of Maxwell's equations on $r\neq0$, the gauge potential \(\mathcal{S}_YM = \{\vec{\mathcal{A}} = k (\vec{r} \times \vec{S}) / r^2,\ \varphi = \kappa / r\}\) constitutes a local exact solution of the Yang–Mills equations on the punctured region $r\neq0$. (ii) Both Hamiltonians \(\mathcal{H}_S\) and \(\mathcal{H}_D\) can be solved exactly in the presence of this spin-dependent Coulomb interaction. The resulting energy spectra are derived, and they naturally reduce to those of the ordinary hydrogen atom when the spin-dependent terms are neglected. Finally, we clarify the quantization conditions and the fixed-background interpretation of the model.

25.
arXiv (CS.LG) 2026-06-16

Zero-order Parameter-free Optimization for LMO-based Methods: Novel Approach for Efficient Fine-tuning

arXiv:2606.14970v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) has become a central application of modern optimization, enabling pretrained models to adapt to diverse downstream tasks and domain-specific data. A major obstacle in large-scale fine-tuning is the memory overhead of backpropagation, which requires storing activations, gradients, and optimizer states. Zeroth-order (ZO) optimization offers a memory-efficient alternative, but its performance is highly sensitive to the stepsize and smoothing parameter, often requiring costly task-specific tuning. Parameter-free (PF) optimization addresses this issue by adapting algorithmic parameters without prior knowledge of problem-dependent constants. Moreover, large-scale fine-tuning can benefit from geometry-aware updates that account for the heterogeneous structure of parameter blocks, which can be modeled through methods that exploit linear minimization oracle (LMO). In this work, we study PF adaptation for LMO-based ZO optimization and introduce $\texttt{AdaNAGED}$, a method that unifies gradient-free training, adaptive tuning, and non-Euclidean update geometry. We establish convergence guarantees and validate the method on large-scale LLM fine-tuning task with $\texttt{OPT}-1.3\mathrm{B}$ model.