Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
Nature (Science) 2026-06-22

Cancer cells adopt unprecedented strategies to produce a molecule that protects them from iron-dependent death

The finding that spermine molecules in cells bind to iron to prevent it unleashing ferroptosis, a type of cell death, opens up strategies for treating tissue damage and cancer. The finding that spermine molecules in cells bind to iron to prevent it unleashing ferroptosis, a type of cell death, opens up strategies for treating tissue damage and cancer.

02.
arXiv (CS.LG) 2026-06-11

Seeing Before Colliding: Anticipatory Safe RL with Frozen Vision-Language Models

arXiv:2606.11266v1 Announce Type: new Abstract: The cost signal that constrained-RL algorithms optimize against is almost always reactive: the simulator emits a non-zero cost only after a collision has begun, and the Lagrange multiplier of PPO-Lagrangian grows only after the episode budget has been exceeded. At race speeds, where collisions are instantaneous and irreversible, any safety mechanism that waits for cost to accumulate is structurally too late. We present VLM-Safe-RL, a framework that integrates a frozen vision-language model into the CMDP Lagrangian update as an anticipatory cost term. The framework comprises four contributions: (i) Decoupled Dual-Path CLIP, independent reward/cost paths that respect the CMDP's factorization; (ii) VLM-Lagrange, an augmented multiplier update that incorporates a per-step VLM cost as an anticipatory term; (iii) Confidence Gating, a Bayes-optimal weight derived from a logistic noise model on the CLIP margin; and (iv) VLMPPOLag, the composed algorithm. On Safety-Gymnasium FormulaOne L2, our principal evaluation ($n{=}5$ seeds, $10^{6}$ steps, budget $d_{lim}{=}25$) VLMPPOLag$+$Conf is the only configuration in our default budget comparison that simultaneously retains substantive return ($J_r{\approx}40$) and holds cost within budget on a majority of seeds; the five constraint-aware baselines (PPOLag, CPO, CPPOPID, CPO-CLG, PPOLag-RND) each fail at least one requirement. The mechanism generalizes to held-out MetaDrive Medium (catastrophe rate $41\%{\to}26\%$, 95\% bootstrap CI $[-26,-5]$\,pp) and shows directionally consistent transfer to Bullet Safety-Gym; we report honestly where it does not (MetaDrive Easy/Hard, Qwen2-VL backbone) and trace the Hard failure to a Lagrangian-regulation pathology rather than the VLM signal itself. To our knowledge, this is the first work to use frozen VLM signals as an anticipatory cost term inside the CMDP Lagrangian update.

03.
arXiv (CS.CL) 2026-06-11

Scenario-based Probing and Steering Cultural Values in Large Language Models–Extended Version

Large Language Models (LLMs) are deployed across cultural contexts but often reflect homogenized values inherited from training data. Evaluations of cultural alignment typically rely on direct prompting with survey-style questions, which frequently elicit neutral or safety-aligned responses and fail to capture underlying model preferences. We propose a framework for probing and steering latent cultural representations in LLMs along the two Inglehart–Welzel axes of the World Values Survey (WVS). By translating social value questions into scenario-based behavioral dilemmas, we extract token-level probabilities to measure implicit values and apply activation steering, optionally combined with country-conditioned prompting, to shift model behavior without retraining. Across three open-source LLMs and four target cultures, we find substantial variation in steerability and identify latent entanglement, where interventions along one cultural dimension induce shifts along another. This coupling mirrors correlations in human WVS data and persists across activation, prompt, and hybrid steering. It constrains axis-independent alignment, though general task performance is largely preserved.

04.
arXiv (CS.AI) 2026-06-17

Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents

arXiv:2606.18223v1 Announce Type: cross Abstract: With sophisticated cyber-attacks becoming increasingly prevalent, modern networks require intelligent autonomous cyber-defense agents trained via Reinforcement Learning (RL). These agents employ neurosymbolic approaches such as behavior trees with learning-enabled components (LECs) to learn, reason, adapt, and implement security rules while maintaining critical operations. However, these autonomous networks are partially observable systems, i.e., the cyber-attacker's (red agent's) actions are not observable, making it difficult for the defender to predict red actions, learn red policies, or assess the attacker's intrusion levels. To address this, we propose a Policy Learning Technique using imitation learning to learn policies for partially observable RL agents with discrete states and discrete actions. We apply this technique in an autonomous cyber environment to predict red agent's actions from network observations and defender actions. Integrated with a neurosymbolic cyber-defense agent, our method effectively handles different red policies and achieves high prediction accuracy across diverse simulated scenarios.

05.
arXiv (CS.CV) 2026-06-12

MaskWAM: Unifying Mask Prompting and Prediction for World-Action Models

World Action Models (WAMs) present a promising paradigm for robotic control via video prediction. However, current WAMs suffer from fundamental spatial bottlenecks: standard text inputs introduce referential ambiguity in cluttered scenes, while unstructured RGB predictions lack semantic grounding and remain biased by task-irrelevant backgrounds. To overcome these limitations, we introduce MaskWAM, an object-centric world-action model. By jointly integrating masks as both explicit inputs and predictions via a unified Mixture of Transformers (MoT), MaskWAM unlocks robust policy generalization. This design provides two key benefits: (1) predicting future masks yields object-centric semantic supervision that suppresses visual noise, significantly enhancing even standard text-conditioned WAMs; and (2) coupling this predictive supervision with first-frame visual prompts, such as target object masks, establishes a precise spatial anchor that substantially reduces language ambiguity. Crucially, as WAMs are inherently vision-driven architectures, direct mask conditioning yields substantially stronger guidance than text alone, establishing a precise and robust paradigm for manipulating unseen objects. Evaluations on LIBERO, RoboTwin, and real-world tasks demonstrate that MaskWAM significantly outperforms baselines in both language-clear and language-ambiguous tasks.

06.
arXiv (CS.CL) 2026-06-15

Token-Level LLM Collaboration via FusionRoute

Large language models (LLMs) exhibit strengths across diverse domains. However, achieving strong performance across these domains with a single general-purpose model typically requires scaling to sizes that are prohibitively expensive to train and deploy. On the other hand, while smaller domain-specialized models are much more efficient, they struggle to generalize beyond their training distributions. To address this dilemma, we propose FusionRoute, a robust and effective token-level multi-LLM collaboration framework in which a lightweight router simultaneously (i) selects the most suitable expert at each decoding step and (ii) contributes a complementary logit that refines or corrects the selected expert's next-token distribution via logit addition. Unlike existing token-level collaboration methods that rely solely on fixed expert outputs, we provide a theoretical analysis showing that pure expert-only routing is fundamentally limited: unless strong global coverage assumptions hold, it cannot in general realize the optimal decoding policy. By augmenting expert selection with a trainable complementary generator, FusionRoute expands the effective policy class and enables recovery of optimal value functions under mild conditions. Empirically, across both Llama-3 and Gemma-2 families and diverse benchmarks spanning mathematical reasoning, code generation, and instruction following, FusionRoute outperforms both sequence- and token-level collaboration, model merging, and direct fine-tuning, while remaining competitive with domain experts on their respective tasks.

07.
arXiv (CS.CL) 2026-06-24

Exploring Language-Agnosticity in Function Vectors: A Case Study in Machine Translation

Function vectors (FVs) are vector representations of tasks extracted from model activations during in-context learning. While prior work has shown that multilingual model representations can be language-agnostic, it remains unclear whether the same holds for function vectors. We study whether FVs exhibit language-agnosticity, using machine translation as a case study. Across three decoder-only multilingual LLMs, we find that translation FVs extracted from a single English$\to$X direction transfer to other target languages, consistently improving the rank of correct translation tokens across multiple unseen languages. We further find that the highest-gain tokens span multiple languages and that translation FVs across directions share most of their top-ranked heads, indicating that the FV encodes a largely language-agnostic translation signal rather than a language-pair-specific mapping.

08.
arXiv (CS.LG) 2026-06-17

On the Memorization Behavior of LLMs in Generative Recommendation: Observations, Implications, and Training Strategies

arXiv:2606.17276v1 Announce Type: cross Abstract: Generative recommendation (GR) has emerged as a promising direction for recommender systems. Recently, large language models (LLMs) have been increasingly adopted for GR, as their rich pretrained knowledge is expected to help them generalize beyond common user behavior patterns that traditional memorization-oriented baselines can capture. However, existing LLM-based GR works largely ignore LLMs' well-known tendency to memorize, which, if present in LLMs fine-tuned for GR, would restrict their utilization of pretrained knowledge. In this work, we investigate this concern by examining one-hop memorization, where a model recommends items that are direct successors of items in the training data. We show that LLMs do this more than non-LLM-based GR models-in fact, the vast majority of their gains over GR baselines are actually on users whose target items can be predicted through one-hop memorization. We intuit that improving performance on the remaining users requires LLMs to learn richer item-item relations beyond one-hop transitions. To achieve this, we propose IIRG, a novel training strategy that teaches LLMs to capture: (1) collaborative relations derived from item co-occurrences across multiple hops in user sequences, and (2) semantic relations among items with similar themes, both of which can serve as useful recommendation signals. We show that IIRG significantly improves over LLMs trained solely with standard next-item prediction, with especially large gains for users whose test items are not covered by train-time one-hop transitions.

09.
arXiv (CS.CL) 2026-06-18

PreUnlearn: Auditing Collateral Knowledge Damage Before Large Language Model Unlearning

Machine unlearning for large language models (LLMs) aims to remove specified knowledge while preserving the rest of the model's capabilities. However, the boundary between knowledge to forget and knowledge to retain is often unclear, since related and even distant information may be entangled in the model. In this paper, we study LLM unlearning from a data-centric perspective and measure how unlearning effects propagate from the forget set to same-domain and distant-domain knowledge. We find a consistent decay pattern: collateral damage is strongest near the forget set, weakens with semantic distance, but does not disappear at domain boundaries. We further ask whether such damage can be audited before unlearning is executed. We formulate forget-set auditing as a pre-unlearning prediction task and analyze which data features are most predictive of downstream damage. Our results show that interaction features between the forget set and evaluation set provide the strongest signals, suggesting that collateral damage is partly reflected in data geometry before model updates occur. These findings position forget-set auditing as an early warning tool for identifying risky unlearning runs and designing more reliable unlearning procedures.

10.
medRxiv (Medicine) 2026-06-18

Biomedical Capacity, Governance, and Health Security: A Dominican Republic Research Analysis of Stakeholder Perspectives

The COVID-19 pandemic exposed critical vulnerabilities in globally concentrated biomedical supply chains and accelerated interest in nearshoring and hemispheric health-security strategies. The Dominican Republic, already the third-largest medical device exporter in Latin America, occupies a strategically significant but institutionally constrained position within this realignment. This study evaluates stakeholder perceptions of the principal opportunities and barriers affecting biomedical ecosystem development in the Dominican Republic, with particular attention to governance, workforce capacity, and value-chain upgrading pathways. Methods. A concurrent mixed-methods design was employed, integrating a cross-sectional electronic survey of 142 purposively sampled domain experts (administered September-December 2025) with a qualitative executive consultation with senior government and industry leaders. Survey analyses combined descriptive statistics, one-sample t-tests against the scale neutral midpoint, chi-square goodness-of-fit tests, Friedman non-parametric ranking, Spearman rank correlations, and exploratory linear and logistic multivariable regression. Qualitative responses were analyzed using a framework approach grounded in the Triple Helix model of innovation systems. Results. Perceived government support was significantly below neutral (mean = 2.67, SD = 1.12; p = 0.034). Workforce shortages (83.3%) and weak academia-industry collaboration (71.4%) were the most frequently endorsed barriers ({chi}2(5) = 18.7, p = 0.002). Regulatory modernization (88.1%) and workforce development (85.7%) ranked as the highest-priority policy levers (Friedman p = 0.005). Clinical trials and contract research organization services were the dominant sub-sector priority (76.2%, binomial p < 0.001). In multivariable analysis, perceived government support, talent availability, and confidence in IP protection jointly explained 46% of the variance in sector competitiveness (R2 = 0.46, p < 0.001). Strong majority support existed for a formal public-private biomedical coordination authority (73.8%, p < 0.001).Conclusion. Institutional credibility and advanced human capital–rather than geography or market access–are the perceived binding constraints on the Dominican Republics biomedical trajectory. Regulatory modernization, targeted workforce investment, and the establishment of a national biomedical coordination authority represent the highest-leverage interventions for positioning the country as a hemispheric hub for biomedical manufacturing, clinical research, and health security.

11.
arXiv (CS.CV) 2026-06-16

A New k-Space Model for Non-Cartesian Fourier Imaging

For the past several decades, it has been popular to reconstruct Fourier imaging data using model-based approaches that can easily incorporate physical constraints and advanced regularization/machine learning priors. The most common modeling approach is to represent the continuous image as a linear combination of shifted "voxel" basis functions. Although well-studied and widely-deployed, this voxel-based model is associated with longstanding limitations, including high computational costs, slow convergence, and a propensity for artifacts. In this work, we reexamine this model from a fresh perspective, identifying new issues that may have been previously overlooked (including undesirable approximation, wrap-around, and nullspace characteristics). Our insights motivate us to propose a new model that is more resilient to the limitations (old and new) of the previous approach. Specifically, the new model is based on a Fourier-domain basis expansion rather than the standard image-domain voxel-based approach. Illustrative results, which are presented in the context of non-Cartesian MRI reconstruction, demonstrate that the new model enables improved image quality (reduced artifacts) and/or reduced computational complexity (faster computations and improved convergence).

12.
bioRxiv (Bioinfo) 2026-06-11

TifBERT: a self-supervised foundation model for normalization-robust bulk RNA-seq representation learning

Bulk RNA sequencing remains central to translational genomics, yet foundation-model development has largely focused on single-cell data. Existing transformer approaches for bulk RNA-seq often rely on expression discretization, numerical reconstruction, external gene embeddings, or restricted gene sets, limiting robustness across normalization schemes and cohorts. Here, we introduce TifBERT, a self-supervised framework for full-transcriptome bulk RNA-seq representation learning. TifBERT converts each unordered expression profile into a sample-specific gene sequence using term frequency-inverse document frequency (TF-IDF) ordering, prioritizing genes that are both highly expressed within a sample and selectively expressed across the cohort. It is then pretrained using masked gene modeling, predicting gene identities from transcriptomic context rather than reconstructing expression values. Pretrained on harmonized TCGA Pan-Cancer data spanning five RNA-seq normalization schemes, TifBERT learns contextual representations across approximately 10,000 genes without expression binning, landmark-gene restriction, or external biological embeddings. Across 33 TCGA cancer types, TifBERT achieved 90.83% accuracy, 0.996 macro AUC-ROC, and 0.903 MCC. It also captured pathway-level biology, achieving mean sample-wise and pathway-wise Pearson correlations of 0.754 and 0.762 across 1,387 PARADIGM pathway activities. Independent evaluation on GTEx healthy tissues showed preservation of tissue-level transcriptomic structure without retraining. In comparison with existing models, TifBERT achieves competitive subtype discrimination with substantially greater stability and produces markedly richer embedding geometry (effective rank 95.6 versus 6.3), without requiring expression discretization or in-distribution pretraining exposure. Together, TifBERT provides a scalable, normalization-independent foundation model for reusable bulk transcriptomic representation learning

13.
arXiv (CS.AI) 2026-06-16

Co-Scraper: query-aware DOM Pruning and Reusable Scraper Synthesis for Lightweight Web Data Extraction

arXiv:2606.14821v1 Announce Type: cross Abstract: The abundant and heterogeneous nature of web content necessitates automated information extraction, and generating scrapers that can be reused across similar web pages offers an effective solution for scalable data extraction. In this work, we propose Co-Scraper, a two-stage framework capable of handling the hierarchical complexity of long HTML documents. By integrating a query-aware DOM pruning mechanism with stable extraction strategy induction, Co-Scraper can effectively transforms web content into executable programmatic wrappers using a fine-tuned Qwen3-8B model. On the test set of SWDE, Co-Scraper achieves state-of-the-art performance with an F1 score of 94.78% and a reuse success rate of 90.39%. This framework significantly enhances the accuracy and resilience of data extraction, providing a highly efficient approach for web data acquisition tasks.

14.
arXiv (quant-ph) 2026-06-11

Clifford disentanglers for entanglement reduction in molecular electronic structure simulations

arXiv:2606.12056v1 Announce Type: new Abstract: Entanglement is a key bottleneck limiting the efficiency of tensor-network and quantum simulations of molecular electronic structures. Here, we systematically assess and extend Clifford disentanglers as a structure-preserving approach to entanglement reduction: they can modify the entanglement structure of qubit wavefunctions while retaining the Pauli-string form of qubit Hamiltonians. To enable a practical search over Clifford transformations, we classify Clifford operators by their action on the Schmidt spectrum across a bipartition, reducing the two- and four-qubit search spaces to 20 and 91392 representatives, respectively. Embedded in an iterative Clifford-augmented matrix product state framework, these transformations reduce the energy errors at fixed bond dimension for the molecular test cases studied and mitigate the dependence on orbital orderings and fermion-to-qubit mappings. We further show that Clifford disentanglers can also benefit quantum simulations such as the shallow-circuit variational quantum eigensolver calculations. Together, these results establish Clifford disentanglers as a useful structure-preserving entanglement-engineering tool for tensor-network and quantum simulations of molecular electronic structure, while also clarifying their correlation dependence and motivating future developments.

15.
arXiv (CS.AI) 2026-06-19

MetaResearcher: Scaling Deep Research via Self-Reflective Reinforcement Learning in Adversarial Virtual Environments

arXiv:2606.19893v1 Announce Type: new Abstract: Deep research agents have demonstrated remarkable capabilities in autonomous information gathering and synthesis, yet their training remains constrained by the static nature of simulated environments, the limits of fact-retrieval-only task designs, and the inefficiency of outcome-based reinforcement learning. In this work, we propose MetaResearcher, a novel framework that scales deep research agent training across four synergistic dimensions. First, we introduce an Evolving Virtual World that injects temporal dynamics and adversarial misinformation into the training environment, forcing agents to develop source credibility assessment and temporal conflict resolution skills. Second, we design Discovery-Oriented Tasks – including hypothesis generation and contradiction resolution – that transcend simple fact retrieval and push agents toward genuine research behaviors. Third, we propose a Self-Reflective Meta-Reward mechanism within the GRPO framework that jointly optimizes for answer correctness, search path efficiency, reflection depth, and tool call diversity, directly addressing the repetitive action loop problem observed in prior work. Fourth, we introduce a Heterogeneous Multi-Agent Swarm architecture comprising specialized Scout, Filter, and Synthesizer models that learn collaborative research strategies through coordinated reinforcement learning. Built upon the LiteResearcher infrastructure, MetaResearcher requires zero marginal API cost for training while targeting substantial improvements in both benchmark performance (GAIA, Xbench-DS) and epistemic robustness under adversarial conditions. We present the complete framework design, training methodology, and planned experimental validation.

16.
arXiv (CS.CL) 2026-06-15

Non-Parametric Machine Text Detection via Multi-View Gaussian Processes

Adversarial conditions such as paraphrasing and targeted style transfer sharply degrade the accuracy of machine text detectors. A document, however, carries multiple complementary signals (e.g., stylistic features, likelihood and rank-order features, and structural features), and an attack that suppresses one may leave others intact. While a parametric classifier can learn to combine these features given sufficient supervision, classifiers are prone to making confidently incorrect predictions when the distribution shifts (e.g., novel attacks or unseen language models). To address this, we propose a multi-view, non-parametric detection framework that extracts complementary feature views from the same document and aggregates per-view evidence through a Gaussian process ensemble. By aggregating evidence across views, an adversary must simultaneously defeat multiple independent axes of detection, substantially raising the cost of evasion. The Gaussian process formulation additionally provides calibrated probabilities and principled abstention on out-of-distribution inputs, supporting reliable deployment in high-stakes settings. We evaluate on three benchmarks spanning diverse generators and attacks: the DetectRL and RAID benchmarks, and the PAN2025 shared task and demonstrate that our multi-view detector maintains strong performance under the considered attacks, outperforming existing approaches against held out attacks.

17.
medRxiv (Medicine) 2026-06-22

ECG-Guided Pre-Screening of Family Members for Hypertrophic Cardiomyopathy

Background: Current clinical guidelines recommend serial ECG and echocardiographic surveillance for first-degree relatives of probands with Hypertrophic Cardiomyopathy (HCM). Objectives: To evaluate the accuracy and validity of ECG alone as a pre-screening tool for the diagnosis of HCM and to develop a random forest (RF) model for HCM phenotype prediction. Method: Pediatric relatives of primary HCM probands attending the cardiomyopathy screening program at The Hospital for Sick Children were included from 1993 to 2025. Subjects were followed until the last follow-up, censored at phenotype conversion. ECGs were classified as normal or abnormal based on predefined parameters. Associations between binary ECG variables and HCM phenotype were assessed using Phi ({varphi}) coefficient. A Random Forest classifier was developed using significant ECG variables (70:30 training: test split) and evaluated using precision, recall, specificity, negative predictive value, F1 score and AUROC. Feature importance was assessed using SHAP analysis. Variables with an impact of >5% were included in a simplified model, which was evaluated by repeating performance metrics and externally validated in a healthy cohort. Results: 350 screened relatives (44% female, mean follow-up 6.8 +- 4.8 years) were included. At baseline, 13% (46350) were phenotype-positive for HCM. 9 subjects converted during the surveillance. Thirteen ECG variables were significantly associated with phenotype-positive HCM and were included in the full random forest model. Four variables had >5% impact (Left ventricular hypertrophy, right ventricular hypertrophy, T-wave inversion and ST-segment depression) and were included in a simplified model, which maintained high specificity (93% vs 97%), negative predictive value (97% vs 93%) and AUROC (90% vs 96%). The simplified model classified 83% subjects as phenotype-negative, with eight being false-negative, all of whom developed an abnormal ECG in a mean of 1 year, and none had an interim adverse cardiac event. The simplified model was evaluated in an independent healthy cohort of 153 school-age subjects and correctly identified 98% as phenotype-negative with 100% NPV. Conclusion: ECG abnormalities were strongly associated with phenotype-positive status. A simplified ECG-based random forest model using four ECG variables demonstrated high specificity and negative predictive value for identifying phenotype-negative subjects. If prospectively validated, this could reduce the need for concurrent echocardiographic screening by up to 83% per encounter, lowering screening burden and cost.

18.
bioRxiv (Bioinfo) 2026-06-11

GeroEngine: Generative single-cell aging trajectories reveal a bidirectionally traversable identity core and direction-specific inflammatory remodeling

Authors:

Single-cell RNA sequencing (scRNA-seq) maps aging tissues at high resolution but is destructive, preventing longitudinal tracking; dropout and zero-inflation artifacts, amplified by shift-invariant linear simulations, confound age-associated variability. We developed GeroEngine, a technical-artifact-aware framework combining VAE-based trajectory simulation, LOPO cross-validation, linear baselines, reverse traversal, and reverse-directed network inference. In microglia and HSCs, the VAE reduced technical-artifact carryover while preserving trajectory heterogeneity and improving alignment to artifact-reduced reference manifolds. Consensus GeroTargets and GeroRegulators defined tissue-specific GeroNetworks organized into three pillars: lineage/replication identity collapse, a sex-dimorphic endocrine/stress core, and inflammatory remodeling. Forward and reverse simulations aligned to the common young[-&gt;]old aging axis revealed a sign-coherent, direction-specific program: identity/replication targets were bidirectionally recovered, whereas MHC/NF-{kappa}B inflammatory programs were preferentially forward-recovered. These results support identity collapse as a deep traversable core of aging and nominate upstream homeostatic restoration over downstream inflammatory suppression.

19.
arXiv (quant-ph) 2026-06-19

Arrival times of an atomic Bose-Einstein condensate

arXiv:2606.20281v1 Announce Type: cross Abstract: The times of flight of an atomic Bose-Einstein condensate are theoretically investigated in the experimentally unexplored regime corresponding to detection close to the trap of the condensate. In this regime, there is no consensus on how to calculate the distribution of times of arrival onto the detector. For non-interacting particles, distinct theoretical predictions have been made in the past. This work analyses how these predictions are modified for an interacting Bose-Einstein condensate. For this purpose, a time-dependent Gross-Pitaevskii equation is solved analytically and numerically.

20.
arXiv (CS.CL) 2026-06-17

Prompt Perturbation for Reliable LLM Evaluation over Comparison Graphs

Evaluating large language models (LLMs) is important for understanding their capabilities, comparing competing systems, and supporting the deployment of reliable models in practice. For open-ended tasks, pairwise evaluation has become a popular paradigm, in which two responses to the same prompt are compared and the resulting judgments are aggregated into an overall ranking. A central challenge of this paradigm is intransitivity: the induced comparison outcomes may fail to support any coherent global ranking. For example, one may observe cyclic preferences such as $A \succ B \succ C \succ A$, or inconsistencies involving ties such as $A \equiv B\equiv C\neq A$. Such contradictions make the resulting leaderboard unstable and challenging to interpret. In this paper, we propose a prompt perturbation framework for improving the consistency of pairwise LLM evaluation. Our approach generates perturbed variants of each prompt, uses the resulting comparison graphs to identify and filter out structurally inconsistent comparison patterns, and then applies standard ranking methods to the filtered comparisons. A key feature of the proposed framework is that graph-level structural consistency is incorporated explicitly into the evaluation pipeline before ranking aggregation. This provides a simple and principled way to reduce cyclic inconsistencies and improve the reliability of LLM rankings.

21.
bioRxiv (Bioinfo) 2026-06-17

Beyond phylogeny: Genome-wide DNA sequence patterns suggest DNA physical properties associated with thermal adaptation in extremophile microbes

Authors:

Temperature is a fundamental constraint on biological systems, yet how it is reflected in genome sequence organization remains unclear. Here, we show that genome-wide distributions of short DNA sequences contain a robust signal of thermal adaptation that is largely independent of phylogeny. Using Structural Topic Modelling (STM), a machine-learning approach for identifying groups of co-occurring sequence motifs, we analyze canonical 6-mer and 9-mer frequency profiles of bacterial and archaeal genome proxies (randomly sampled genomic regions) and identify motif families systematically associated with thermophiles and psychrophiles. In bacterial thermophiles, the identified motif families are dominated by highly specific, overrepresented and co-occurring C- and G-stacked hexamers, and a distinct family of CG-periodic hexamers recurring across multiple temperature comparisons. In contrast, bacterial psychrophile-associated motifs are dominated by low-complexity A-, T-, and AT-run hexamers. Thermophilic archaea generally exhibit a distinct CTAG-centred hexamer family, suggesting that different domains may adapt to similar environmental constraints through different sequence-level solutions. However, this domain-level contrast is not absolute: in a targeted analysis of two thermophilic bacterium–archaeon pairs, we find unusually similar frequencies of all the STM-identified thermophile-associated hexamer families, suggesting that shared high-temperature environments can, in specific cases, partially override phylogenetic divergence. Notably, the identified motif families constitute only a small and highly selective subset of the vast space of possible G+C-rich or A+T-rich sequences. This indicates that thermal adaptation is associated with specific sequence architectures rather than broad shifts in nucleotide composition. Accordingly, the observed signal cannot be explained by overall base composition alone, but instead arises from structured combinations and positional arrangements of nucleotides within short sequence contexts. Related motif families are recovered at both k=6 and k=9, indicating that the signal reflects systematic shifts in genome-wide sequence organization rather than isolated sequence motifs. These patterns are consistent with known sequence-dependent DNA physical properties documented in biochemical and biophysical studies, including differences in base-stacking interactions and conformational flexibility. Together, our results suggest that genome-wide sequence organization reflects sequence-dependent DNA physical properties associated with thermal adaptation, revealing a previously underappreciated physical layer of genomic information beyond phylogenetic history.

22.
arXiv (CS.CV) 2026-06-18

PEFT-MedSAM: Efficient Fine-Tuning of Medical Foundation Models for Explainable Skin Lesion Segmentation

Automated segmentation of skin lesions using deep learning models for dermoscopic images can be very helpful in finding melanomas earlier than they would normally be detected. However, most deep learning methods available do not perform well. The aim of this paper is to present a parameter-efficient fine-tuning method called PEFT-MedSAM for adapting the Medical Segment Anything Model (MedSAM) to automatically segment dermoscopic skin lesions. The PEFT-MedSAM method uses only the lightweight mask decoder for training the model while keeping the pre-trained image encoder and prompt encoder frozen. The experiments performed on the ISIC 2018 benchmark dataset shows that PEFT-MedSAM obtains a dice coefficient of .9411 and an intersection over union value of .8918 when compared to both a fully trained U-Net baseline (.8715 dice coefficient) and zero-shot MedSAM inference (.8997 dice coefficient). The external validation of the model using PH2 dataset shows .9467 dice coefficient with +/- .0310 standard deviation. Supportive evidence for these claims include a p-value less than .0001 for Wilcoxon signed rank tests comparing the two datasets and bootstrap-estimated 95% confidence intervals of [.9364,.9447] that represent the estimated range of possible values for the average dice coefficient obtained by repeating the test. To increase clinical trustworthiness, we used Grad-CAM explainability along with a pointing game based evaluation methodology to evaluate the CNN baseline model on the validation set. The results showed that we had an accuracy rate of 98.27% on the validation set of 519 images and confirmed that the model classified regions containing skin lesions.

23.
arXiv (CS.LG) 2026-06-19

Learning to Emulate Chaos: Adversarial Optimal Transport Regularization

arXiv:2604.21097v2 Announce Type: replace-cross Abstract: Chaos arises in many complex dynamical systems, from weather to power grids, but is difficult to accurately model with data-driven methods such as machine learning emulators. While emulators are promising tools for accelerating simulations and solving inverse problems, they still struggle to learn chaotic dynamics, where sensitivity to initial conditions renders exact long-term forecasts infeasible, especially given noisy data. Recent work instead trains emulators to match the statistical properties of chaotic attractors, but these approaches often rely on handcrafted summary statistics or large, diverse multi-environment datasets. In this work, we propose a family of adversarial optimal transport objectives that can jointly learn high-quality summary statistics and a physically consistent emulator from a single noisy trajectory. We theoretically analyze and experimentally validate a Sinkhorn divergence formulation (2-Wasserstein) and a WGAN-style dual formulation (1-Wasserstein) of our approach. Numerical experiments across a variety of chaotic systems, including ones with high-dimensional spatiotemporal chaos, show that emulators trained using our proposed objectives have significantly improved long-term statistical fidelity.

24.
arXiv (CS.CL) 2026-06-16

CoCoGEC: Counterfactual Generation for Robust Grammatical Error Correction

Grammatical error correction (GEC) systems are usually trained and evaluated on GEC benchmarks, but their performance often drops sharply once the surrounding context is slightly perturbed or extended. This indicates that the existing GEC models usually fail to understand the error patterns in the varying contexts. In this paper, we thoroughly investigate the counterfactuals for GEC tasks, where the subtle changes to the contexts could lead to the label flipping issue. We propose CoCoGEC, a counterfactual generation framework that creates copies of training instances with error-irrelevant contexts altered. Our framework systematically generates counterfactuals by (1) generating intra- and inter-sentence counterfactuals that maintain the error patterns as well as syntax of the original instances by altering the word-level and sentence-level contexts; (2) revising the generated counterfactuals by selecting the instances with flipped labels and high GEC Mutual Information (MI) coefficient. Extensive experiments show that our method substantially improves the stability of GEC models, outperforming a set of data augmentation baselines. Particularly, it could achieve absolute F0.5 gains of +9.9, +11.3, and +20.8 points on the perturbed BEA-19*,CoNLL-14*, and TEM-8* data set.Our code is released at https://github.com/Quinnok/CoCoGEC

25.
bioRxiv (Bioinfo) 2026-06-11

A high-quality chromosome-scale reference genome assembly for Asparagus racemosus var. CIM-Shakti (Shatavari), a medicinal plant of Ayurvedic importance

Asparagus racemosus Wild., commonly known as Shatavari, is an important medicinal plant in Ayurveda and is valued for its steroidal saponins, particularly shatavarin compounds, which contribute to its adaptogenic, galactagogue, immunomodulatory, and therapeutic properties. Despite its medicinal and economic importance, genomic resources for this species have remained limited, restricting molecular breeding, pathway discovery, and comparative evolutionary studies within Asparagaceae. Here, we report a high quality chromosome scale reference genome assembly of A. racemosus var. CIM Shakti generated using PacBio HiFi long read sequencing and Omni C chromatin conformation scaffolding. The pseudo haploid assembly spans 817 Mb across 53 scaffolds, with a scaffold N50 of 98.50 Mb, L50 of 5, and a largest scaffold of 113.80 Mb. Ten major chromosome scale pseudomolecules were resolved, corresponding to the haploid chromosome complement of A. racemosus. The assembly showed high gene space completeness, with BUSCO completeness of 99.8% against the Eukaryota dataset and 98.0% against the Embryophyta dataset. BlobToolKit profiling further supported assembly quality, with GC content of approximately 39 to 40% and no major evidence of contamination. EDTA based repeat annotation identified 580.93 Mb of interspersed repetitive elements, accounting for 71.06% of the 817.57 Mb genome assembly. The repeat landscape was dominated by LTR retrotransposons, particularly Gypsy elements, which accounted for 25.01% of the assembly, followed by unclassified LTR elements at 26.58% and Copia elements at 4.84%. Structural and functional annotation identified 29,199 protein coding genes represented by 29,199 transcript models, 138,433 exons, and 125,201 CDS features. The annotation was structurally robust, with an average gene length of 4,605.1 bp, 4.74 exons per transcript, and 97.80% of transcripts containing multiple exons. The CIM Shakti reference genome provides a foundational genomic resource for investigating steroidal saponin biosynthesis, sex chromosome evolution, repeat driven genome expansion, and comparative genomics in Asparagaceae. This assembly will support future studies on medicinal trait improvement, conservation genomics, and genomics assisted breeding of climate resilient Shatavari cultivars.