Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CV) 2026-06-18

Beyond Nearest Neighbor Interpolation in Data Augmentation

Avoiding the risk of undefined categorical labels using nearest neighbor interpolation overlooks the risk of exacerbating pixel level annotation errors in augmented training data. Additionally, the inherent low pass filtering effects of interpolation algorithms exacerbate the risk of degrading high frequency structural details within annotated regions of interest. To avoid these risks, the author modified convolutional neural networks data transformation functions by incorporating a modified geometric transformation function, removing reliance on nearest neighbor interpolation, and integrating a mean-based class filtering mechanism to handle undefined categorical labels with alternative interpolation algorithms. The author also implemented an offline data augmentation pipeline to generate interpolation specific augmented training data, enabling quantitative assessment of interpolation specific low pass filtering effects on augmented training data. Experimental evaluation on three medical image segmentation datasets and the XBAT+ datasets demonstrated performance gains across multiple quantitative metrics.

03.
bioRxiv (Bioinfo) 2026-06-19

Perturbation Curve models continuous transcriptional response trajectories and improves prediction of genetic modulations

Single-cell CRISPR screens, Perturb-seq, have revolutionized functional genomics by revealing biological causality. However, although perturbation assignments are typically represented as discrete labels, the cell-level effective strength of perturbations is often continuous and diverse. Current analytical frameworks struggle to decouple the variability in perturbation strength from the diversity of downstream responses. Here, we present Perturbation Curve (PertCurve), a nonlinear, curve-based computational framework that models the trajectories of transcriptomic responses by explicitly incorporating diverse perturbation magnitudes and strengths. By ordering cells by perturbation strength, we demonstrate that PertCurve accurately recapitulates the response magnitudes and reveals the distinct modularity and asynchrony patterns of downstream gene behaviors. These patterns are categorized into archetypes, including proportional, sensitive, and threshold responses. By applying this framework across CRISPRi/a modalities, we identify universal response patterns in viral infection, apoptosis, and proliferation genes, and reveal previously overlooked context-specific regulatory features in cell differentiation. Finally, incorporating PertCurve into perturbation prediction models and evaluation metrics enhances predictive performance, delivering actionable insights for refining established models.

04.
arXiv (CS.AI) 2026-06-16

Retrieve, Don't Retrain: Extending Vision Language Action Models to New Tasks at Test Time

arXiv:2606.15631v1 Announce Type: cross Abstract: Extending a vision-language-action (VLA) policy to a new task typically requires task-specific teleoperated demonstrations and per-task fine-tuning, making adaptation costly in both data collection and compute. In this paper, we show that this target-side per-task adaptation cost can be replaced by retrieval. Our retrieval-augmented policy is trained once on paired demonstrations from the target embodiment (query) and a cheaper embodiment (pool, e.g., human-hand video), then frozen. New tasks are added at deployment by appending pool-side demonstrations to a retrieval pool. The frozen policy conditions on retrieved trajectories at every control step, so new tasks are absorbed by indexing data rather than updating parameters. Fine-tuning is needed only to take on a new, unseen embodiment, not for each new task. We show that retrieval improves policies beyond a specific backbone, including standard VLA policies, but its effect is especially pronounced in Cosmos Policy, a video-generation-based world-action model (WAM). In this setting, retrieval supplies coarse task progression, while the WAM's future-image objective provides an additional visual consistency signal that strengthens the retrieval-conditioned actions. On PushT, we study how retrieval provides a reusable high-level motion prior for cross-embodiment generalization to unseen goal angles, while on RoboTwin 2.0 our method outperforms cross-embodiment baselines on unseen tasks, and we additionally demonstrate the method on a real robot.

05.
arXiv (CS.AI) 2026-06-16

Infant Spontaneous Movement Noise Improves Exploration in Deep RL

arXiv:2606.16590v1 Announce Type: cross Abstract: Exploration in deep reinforcement learning (RL) is commonly implemented as temporally uncorrelated white noise. However, recent works show that temporally correlated colored noise can improve exploration efficiency by producing smooth trajectories with better coverage of the state space. We inquire whether action noise inspired by infant spontaneous movements can also improve exploration in deep RL. We find that the power spectral densities of babies' end-effector velocities follow a colored noise process where the spectral exponent increases with age. Inspired by this developmental pattern, we introduce a mechanism that progressively increases the temporal auto-correlation of exploration noise during RL training, matching the infant statistics. Experiments across several RL environments show that infant-inspired noise produces structured exploratory behavior and can improve learning efficiency compared to conventional exploration strategies. These findings suggest that human motor and cognitive development can provide useful guidance for designing learning mechanisms in artificial agents. Our code is available at https://github.com/trieschlab/baby-noise-rl.

06.
arXiv (CS.CL) 2026-06-12

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Search Agents – large language models augmented with search tools – have intensified the need for future-proof evaluation benchmarks. Existing benchmarks such as BrowseComp rely on static knowledge, making them vulnerable to test-set contamination and parametric memorization. Consequently, models can achieve high scores through fact recall rather than genuine retrieval, obscuring true browsing competence via reasoning shortcuts. In this paper, we introduce EvoBrowseComp, an evolving benchmark of 400 English and 400 Chinese contamination-free complex questions synthesized via live-web traversal. To collect these questions, we design a three-agent collaborative framework: (1) a QA synthesis agent that retrieves fresh knowledge from the live web to synthesize QA pairs; (2) an information filtering agent that filters retrieved knowledge in terms of credibility and popularity to block parametric shortcuts; and (3) a high-level guidance agent that formalizes questions into reasoning graphs to reduce logical redundancy and shortcuts in synthesized QA pairs. Because the framework supports fully automated synthesis, EvoBrowseComp can be regularly updated to prevent data contamination and maintain temporal freshness. Extensive experiments confirm its great difficulty, requiring broad horizontal search. It establishes a scalable paradigm for auto-updatable, high-difficulty benchmarking that keeps pace with both evolving world knowledge and advancing agent capabilities.

07.
arXiv (quant-ph) 2026-06-12

Efficient certification of intractable quantum states with few Pauli measurements

arXiv:2511.07300v2 Announce Type: replace Abstract: Efficient verification of quantum computational resources is crucial as experiments advance toward fault-tolerance. Universal quantum computation can be achieved by consuming resource states through simple Pauli measurements, yet a significant gap remains between states that are easy to certify and those required for universality. We focus on Clifford-enhanced Product States, a class of resource states obtained by applying Clifford circuits to a product of single-qubit, potentially magic, states. While essential for universal computation, the certification of such states has previously relied on query oracles that are \#P-hard to implement, leaving their efficient, oracle-free verification an open challenge. In this work, we demonstrate that such classically intractable resource states can be efficiently verified using only Pauli measurements. Our protocol achieves sample- and time-efficiency in both i.i.d.\ and adversarial settings. This work fills a gap in Pauli-based certification, providing a new practical pathway to verify resource states that drive universal Pauli-based quantum computation.

08.
arXiv (CS.CL) 2026-06-16

AdaPLD: Adaptive Retrieval and Reuse for Efficient Model-Free Speculative Decoding

Speculative decoding accelerates generation by verifying multiple drafted tokens in a single target-model forward pass, reducing sequential decoding iterations. Model-free variants avoid auxiliary draft models by reusing text and model states already available during generation, but their speedup depends on the reliability of the constructed drafts. We identify two limitations of existing reuse-based methods: lexically anchored retrieval has limited recall under surface-form variation, and deterministic span copying can be brittle when the retrieved context does not uniquely determine the continuation. We propose AdaPLD, a training-free method that adaptively improves both retrieval and draft construction. AdaPLD preserves high-precision lexical reuse while using semantic similarity to recover additional reuse opportunities when lexical matching fails. It further constructs branched reuse hypotheses to account for continuation uncertainty, rather than relying on a single copied span. Across diverse benchmarks, AdaPLD reduces target-model forward passes and achieves up to $3.10\times$ decoding speedup.

09.
arXiv (CS.LG) 2026-06-11

Seeing Below the Limit of Detection: A Censored-Poisson Bayesian Latent-Growth Change-Point Detector (the Span Detector) for Serial ctDNA in HR+/HER2- Metastatic Breast Cancer

arXiv:2606.11876v1 Announce Type: cross Abstract: Circulating-tumour DNA (ctDNA) carries evidence of drug resistance months before imaging shows it, but the earliest evidence lives below the assay's limit of detection (LoD): a nascent subclone is detected only intermittently, producing a flickering sequence of faint detects and non-detects. Commercial liquid biopsies treat each draw as an independent snapshot and a non-detect as nothing. We argue a non-detect is a left-censored observation, and the pattern of non-detects and faint detects over time carries actionable evidence of growth before any single value is trustworthy. We introduce Span, a censored-Poisson Bayesian latent-growth change-point detector that models the binary detection process, accumulates a sequential generalised-likelihood-ratio statistic for an upward change-point in the per-variant detection rate, and raises a competing-risks alarm with calibrated false-alarm control. Span has no learned weights, so there is nothing to overfit. On a synthetic cohort of HR+/HER2- metastatic breast cancer on first-line CDK4/6-inhibitor plus endocrine therapy, at a matched 10% false-alarm rate, Span roughly doubles the fraction of impending progressions caught three months ahead (indolent regime: 25% vs 11% for the snapshot), with a falsifiable dose-response: large for indolent emergence, vanishing for fast emergence. A value-trajectory baseline performs identically to the snapshot, isolating the gain to the censored detection model. The survival backbone matches a Cox baseline on real breast-cancer data (GBSG-2, n=686; C-index 0.67 vs 0.68), and on a real longitudinal cohort with clean biomarkers (PBC2, n=312) the same pipeline correctly declines to win, a falsifiable boundary test confirming the mechanism is regime-specific. All ctDNA trajectories are synthetic.

10.
arXiv (quant-ph) 2026-06-19

Distinguishing quantum processes with bounded coherent memory

arXiv:2606.19511v1 Announce Type: new Abstract: Distinguishing multi-time quantum processes is a fundamental task underlying the diagnosis, benchmarking, and learning of temporally correlated quantum dynamics. The standard benchmark for distinguishing two processes is the strategy-norm distance, which optimizes over arbitrary adaptive probing strategies but can require large coherent memory and time-dependent control. We introduce machines for autonomous distinction~($\mathsf{MAD}$s): probing strategies that apply the same quantum instrument at each time step, retain the full classical outcome record, and carry a coherent memory of dimension $d_A$. Optimizing over these strategies defines a memory-parametrized distinguishability measure, $d^{(N)}_{\mathsf{MAD}}(\mathbf{P}^N,\mathbf{Q}^N;d_A)$. We show that the resulting hierarchy is monotone in coherent memory and complete at finite times. Specifically, any admissible $N$-step probing strategy can be compiled into a single $\mathsf{MAD}$ with an internal counter and sufficiently large coherent memory, so the hierarchy saturates the strategy-norm benchmark. For recurrent processes generated by repeated system–environment interactions, we derive a single-step description that separates the generation of new distinguishing information from the propagation and decay of information generated at earlier times. Numerical results in a repeated-interaction model show that increasing coherent memory systematically improves the $\mathsf{MAD}$ success probability and closes the gap to the strategy-norm distance while remaining substantially more tractable to evaluate. $\mathsf{MAD}$ distinguishability therefore provides an operational and scalable framework for quantifying what can be learned about genuinely multi-time quantum processes with bounded coherent memory.

11.
arXiv (CS.CV) 2026-06-16

BioAutoML-NAS: An End-to-End AutoML Framework for Multimodal Insect Classification via Neural Architecture Search on Large-Scale Biodiversity Data

Insect classification is important for agricultural management and ecological research, as it directly affects crop health and production. However, this task remains challenging due to the complex characteristics of insects, class imbalance, and large-scale datasets. To address these issues, we propose BioAutoML-NAS, the first BioAutoML model using multimodal data, including images, and metadata, which applies neural architecture search (NAS) for images to automatically learn the best operations for each connection within each cell. Multiple cells are stacked to form the full network, each extracting detailed image feature representations. A multimodal fusion module combines image embeddings with metadata, allowing the model to use both visual and categorical biological information to classify insects. An alternating bi-level optimization training strategy jointly updates network weights and architecture parameters, while zero operations remove less important connections, producing sparse, efficient, and high-performing architectures. Extensive evaluation on the BIOSCAN-5M dataset demonstrates that BioAutoML-NAS achieves 96.81% accuracy, 97.46% precision, 96.81% recall, and a 97.05% F1 score, outperforming state-of-the-art transfer learning, transformer, AutoML, and NAS methods by approximately 16%, 10%, and 8% respectively. Further validation on the Insects-1M dataset obtains 93.25% accuracy, 93.71% precision, 92.74% recall, and a 93.22% F1 score. These results demonstrate that BioAutoML-NAS provides accurate, confident insect classification that supports modern sustainable farming.

12.
arXiv (CS.CL) 2026-06-12

HKVM-RAG: Key-Value-Separated Hypergraph Evidence Organization for Multi-Hop RAG

Multi-hop RAG poses a data-engineering problem beyond passage matching: under fixed retrieval budgets, a system must organize retrieved text into evidence units that expose answer chains. Dense retrievers score passages independently, while graph-based memories make associations explicit but often rely on pairwise or entity-centered keys that fragment multi-hop evidence. We present HKVM-RAG, a key-value-separated evidence-organization layer. It assembles answer-path hyperedges from cached passage-level LLM evidence tuples and uses them as retrieval keys, while retaining passage text as answer values. To isolate key-space design, our fixed-substrate protocol holds the tuple cache, candidate passages, reader, and evaluation budget constant across pairwise graph and hypergraph variants. Weighted hypergraph key-value retrieval improves over KG-PPR by +3.426 F1 on 2WikiMultiHopQA and +3.592 F1 on MuSiQue; HotpotQA shows that higher structured support coverage need not yield standalone answer-F1 gains. We therefore study WHG-KV as an evidence-control signal rather than a dense-retrieval replacement. Oracle and train-to-dev analyses identify support selection as repairable, and a dense-aware controller combines frozen ColBERTv2 and HKVM rank/score features using out-of-fold HKVM predictions. It reaches 88.846, 65.073, and 85.810 F1 on the three benchmarks, improving over ColBERTv2 by +11.084, +6.763, and +5.966 F1. Source-level ablations show that matched non-WHG structured signals do not match the WHG-KV gains. These results provide bounded evidence that key-value-separated hypergraph organization can serve as a reusable evidence-control mechanism for multi-hop RAG.

13.
medRxiv (Medicine) 2026-06-11

Computer Vision Scoring of Figure Copy and Recall

Objective. Figure copy and recall tests are sensitive measures of visuoconstruction and visual episodic memory, but their clinical is constrained by labor-intensive manual scoring. We developed and validated an automated, element-level scoring pipeline using Vertex AI object detection for the tablet-based figure copy and recall tasks in the California Cognitive Assessment Battery (CCAB). The automated scoring pipeline duplicated the scoring procedures used by expert manual raters. Methods. A normative sample of 2,011 community-dwelling adults aged 18-90 completed figure copy and delayed recall trials at baseline, with subsamples retested at 1 day and at 6, 18, and 30 months. Participants completed the drawings with their index finger on a tablet computer with finger position digitized to analyze the speed and timing of individual drawing strokes A convolutional object-detection model trained on the Vertex AI AutoML Vision platform identified each of twelve canonical figure elements in rendered drawings. Separate element presence and location scores were computed after homographically warping drawings onto a canonical template to produce trial-level Element, Location, and Total scores. To compare Vertex and human scores, Vertex AI and expert human raters independently scored 1500 randomly selected drawings to evaluate inter-rater agreement, including a common subset of 100 drawings scored by Vertex AI and all raters. Results. Total scores were virtually indistinguishable (r = 0.966) from human-human agreement (mean r = 0.971) as were Element presence scores (mean r = 0.959 vs. r = 0.963). Location-score agreement (r = 0.951) was slightly below the human-human mean (r = 0.972) due to pixel-level analysis by Vertex AI that was impossible for human raters. The Vertex pipeline showed no preferential advantage for the single expert rater who categorized Elements during training. Automated scores showed strong demographic gradients, age effects on Recall (r = -0.32) were approximately twice those in Copy conditions (r = -0.16). A Memory Cost score (Recall - Copy) showed a monotonic age-related decline from +0.40 z in the youngest subjects to -0.54 z in the oldest. Kinetic analysis revealed that drawing speed and efficiency showed significant age-related changes. Overnight test-retest reliability was high (Recall r = 0.72) and the Recall trial showed a large overnight learning effect ({Delta} = +1.18) that continued with repeated tests up to 30 months ({Delta} = +0.75).

14.
bioRxiv (Bioinfo) 2026-06-11

SPARK: A Systems-level Computational Framework for Reconstructing Transcriptomic State Organisation in Lung Adenocarcinoma

Lung adenocarcinoma (LUAD) exhibits substantial molecular heterogeneity, which complicates tumour stratification and limits the ability of mutation-centric models to capture tumour behaviour and predict patient outcomes. This study investigates whether coordinated transcriptomic programs can provide a systems-level representation of tumour states. Bulk RNA-sequencing data from the TCGA-LUAD cohort were analysed to reconstruct pathway-level transcriptomic organisation using a stability-optimised network framework (SPARK). This analysis identified eight transcriptomic modules representing coordinated biological processes active across tumours. Module activity scores were subsequently used to derive a composite Transcriptomic Risk Score through elastic-net Cox proportional hazards modelling. The resulting risk score showed a significant association with overall survival in the discovery cohort and improved prognostic discrimination beyond clinical variables. An independent evaluation in the CPTAC-LUAD cohort confirmed the prognostic signal and preserved risk stratification across patient groups. Unsupervised clustering of module activity further revealed three transcriptomic patient groups characterised by distinct biological programs, genomic alteration patterns, and survival outcomes. Single-cell analysis also demonstrated that the identified transcriptomic modules reflect coordinated organisation of the tumour-immune-stromal ecosystem across cellular compartments. Together, these findings suggest that LUAD heterogeneity can be organised into coordinated transcriptomic programs with measurable clinical relevance, providing a systems-level framework for representing tumour molecular states.

15.
arXiv (quant-ph) 2026-06-11

Permutation-Invariant N-body gates via Tavis-Cummings Hamiltonian

arXiv:2506.03453v3 Announce Type: replace Abstract: Global control provides a promising route to implementing multi-qubit gates without individual qubit addressing. This is especially appealing for permutation-invariant (PI) gates, whose symmetry is often broken when they are compiled into individually addressed one- and two-qubit gates. Important examples include SWAP, $\sqrt{iSWAP}$, and the n-qubit controlled-Z gate, which is equivalent, up to two single-qubit Hadamard gates, to the multi-qubit Toffoli gate. Motivated by this global-control perspective, we show that all PI unitaries on an arbitrary number of qubits can be realized using the Tavis-Cummings (TC) interaction, the multi-qubit version of the Jaynes-Cummings interaction, together with global uniform z and x fields. Here, the $n$ qubits are identically coupled to a single bosonic mode (oscillator), which is initialized in and returned to its vacuum state. A corollary is that all PI states, including GHZ and Dicke states, can be prepared using the same global control. For the case n=2 qubits, which is particularly important in quantum computing, we also find explicit pulse sequences for implementing all PI qubit unitaries that conserve angular momentum in the z direction, using only the TC interaction and global z fields. This includes controlled-Z, SWAP, and $\sqrt{iSWAP}$.

16.
arXiv (CS.CV) 2026-06-18

Quantification of Uncertainty with Adversarial Models in Medical Image Segmentation

Reliable pixel-level uncertainty quantification holds the potential to transform clinical workflows by enabling high-fidelity longitudinal monitoring and distinguishing true pathological changes from artifacts. Ideally, these models provide the stability required for critical treatment planning and surgical intervention. However, standard deep learning models often suffer from miscalibration, yielding overconfident predictions that mask underlying vulnerabilities at subtle pathological boundaries. To address this, we propose QUAM-SM, a post-hoc framework using targeted adversarial search to identify "adversarially fragile" pixels. By actively seeking perturbations that expose predictive instability, our method highlights regions where decisions are most vulnerable to being flipped. Importantly, the framework disentangles epistemic uncertainty from aleatoric uncertainty. Experiments on two public datasets with multiple expert annotations demonstrate that QUAM-SM outperforms both standard and recent uncertainty estimation approaches in terms of reliability and boundary sensitivity. Code is available at https://github.com/HanaJebril/quam_sm

17.
arXiv (CS.CV) 2026-06-17

EmbodiTTA: Resource-Efficient Test-Time Adaptation for Embodied Visual Systems

Continual Test-time adaptation (CTTA) continuously adapts the deployed model on every incoming batch of data. While achieving optimal accuracy, existing CTTA approaches present poor real-world applicability on resource-constrained edge devices, due to the substantial memory overhead and energy consumption. In this work, we first introduce a novel paradigm – on-demand TTA – which triggers adaptation only when a significant domain shift is detected. Then, we present OD-TTA, an on-demand TTA framework for accurate and efficient adaptation on edge devices. OD-TTA comprises three innovative techniques: 1) a lightweight domain shift detection mechanism to activate TTA only when it is needed, drastically reducing the overall computation overhead, 2) a source domain selection module that chooses an appropriate source model for adaptation, ensuring high and robust accuracy, 3) a decoupled Batch Normalization (BN) update scheme to enable memory-efficient adaptation with small batch sizes. Extensive experiments show that OD-TTA achieves comparable and even better performance while reducing the energy and computation overhead remarkably, making TTA a practical reality.

18.
arXiv (math.PR) 2026-06-12

Fourier Dimensions of Mandelbrot Cascades under Minimal Integrability

Authors:

arXiv:2606.08703v2 Announce Type: replace Abstract: This note announces exact Fourier dimension formulas for canonical Mandelbrot cascade measures under the minimal Kahane Peyriere integrability condition and records the canonical b adic extension on cubes. In the dyadic interval setting, the theorem is proved in a balanced vector weight model allowing dependence between sibling weights. Almost surely on non extinction, the Fourier, energy, and L2 dimensions all equal the energy exponent. The scalar specialization gives the canonical Mandelbrot Kahane Fourier dimension formula under the minimal integrability condition. On the circle, the endpoint formula is given by the endpoint lower local dimension exponent. For the b adic Mandelbrot cascade on cubes, the Fourier dimension is the minimum of 2 and the energy exponent, with the universal Fourier barrier at dimension two providing the high dimensional obstruction.

19.
medRxiv (Medicine) 2026-06-12

Reduced nighttime smartphone use among cohabiting partners: a longitudinal study under the lens of social control of health behaviors theory

Objective: We examined the link between cohabitation with a partner and nighttime smartphone use through the social control of health behavior theory. Background: Nighttime smartphone use is a behavioral risk factor for sleep problems. While previous research has predominantly focused on individual-level risks of sleep disturbances, the role of social context remains underexplored. Theoretical frameworks, specifically the Social Control of Health Behavior, suggest that social relationships regulate health-related behaviors; however, it is unclear how far this regulation extends to modern digital behaviors among couples. Method: We analyzed survey data from three waves of the SmartSleep Study (2018, 2020, and 2023; total N = 25,028), including a longitudinal follow-up subset (N = 1,003). We tested multivariate associations between living with a partner, changes in cohabitation status and frequent nighttime smartphone use by fitting generalized linear mixed-effects models. Additionally, we mapped the complex interplay between indicators of social integration, social support, smartphone use, and sleep quality using hierarchical clustering of non-linear correlations. Results: Cohabiting participants had lower odds of frequent nighttime smartphone use compared to those living alone (OR = 0.66; 95% CI: 0.61, 0.72). This lower risk was driven primarily by cohabitation with a partner (OR = 0.49; 95% CI: 0.36, 0.66). Longitudinal analysis supported these findings, showing that sustained cohabitation was associated with less frequent nighttime use (OR = 0.56; 95% CI: 0.38, 0.82). Clustering analysis revealed that indicators of social integration and support clustered with favorable sleep quality. Conclusion: Our findings suggest that the health-protective effects of cohabitation with a partner extend to digital behaviors. Consistent with social control of health behavior theory, the presence of a partner appears to reduce frequent nighttime smartphone use, highlighting the critical importance of considering social context when addressing digital health hygiene and promoting sleep.

20.
arXiv (CS.CL) 2026-06-16

SING: Synthetic Intention Graph for Scalable Active Tool Discovery in LLM Agents

Large language model (LLM) agents increasingly rely on agent harnesses that manage context, tools, and multi-turn execution, making tools a central interface for acting in realistic digital environments. As harness-connected tool ecosystems expand to hundreds or thousands of APIs, services, and task-specific skills, exhaustive tool schema injection becomes costly and imposes a closed-world assumption that limits agents to a predefined static inventory. Retrieval-augmented tool selection offers a natural alternative, but existing one-shot retrieval methods often fail to align isolated tool descriptions with the agent's true task intention, especially in long-horizon tasks where required capabilities emerge through decomposition, observations, and newly induced subgoals. We propose SING, an intention-aware active tool discovery framework that builds an intention-tool graph linking user intentions, tool capabilities, and tool collaboration patterns, and dynamically retrieves tools according to evolving task states. Using a unified corpus of 7,471 tools, we evaluate SING on three real-world tool-use benchmarks. SING improves Global Recall@5 by up to 59.8% and downstream success rate by up to 28.9% over baselines, while reducing full-corpus tool-schema exposure by 99.8%, demonstrating that intention-aware graph structure enables more accurate and context-efficient tool discovery in large-scale agentic ecosystems.

21.
bioRxiv (Bioinfo) 2026-06-11

Tumour evolution as ground truth for cancer whole-genome sequencing

Cancer genomes are shaped by evolutionary processes that couple mutagenesis, clonal selection, chromosomal instability, spatial growth and treatment response into structured genomic patterns, yet current benchmarking strategies largely ignore this evolutionary dependency. Here, we present SCOUT, a large-scale synthetic whole-genome sequencing resource of over 200 samples, designed for systematic benchmarking of tumour genomic analysis and evolutionary inference under controlled evolutionary ground truth. Unlike conventional task-specific simulations, SCOUT models tumour evolution as a latent generative process that simultaneously shapes mutations, copy-number alterations, variant allele frequencies, mutational signatures and clonal architectures. SCOUT recapitulates key features of solid and haematological malignancies, including driver mutations, chromosomal instability, intratumour heterogeneity, spatial sampling and treatment-associated evolutionary dynamics in tumour and matched-normal longitudinal and multi-region sequencing designs. Using SCOUT, we benchmarked widely used methods for somatic variant detection, copy-number analysis, mutational signature inference and tumour evolutionary reconstruction. Across analytical tasks, performance deteriorated in low-purity, highly subclonal and structurally complex tumours, while spatial sampling bias and hypermutation generated spurious evolutionary signals that confounded tumour interpretation across multiple inference layers. Evolutionary simulations further distinguished lineage-restricted genetic bottlenecks from multi-lineage resistance dynamics associated with tumour plasticity. Tumour purity consistently exerted a stronger effect on inference accuracy than sequencing depth. Together, our results establish evolutionary ground truth as a prerequisite for reproducible benchmarking and biologically interpretable analysis of cancer whole-genome sequencing data.

22.
arXiv (CS.CL) 2026-06-16

Long-Context Modeling via GSS-Transformer Hybrid Architecture with Learnable Mixing

Modeling long-range dependencies remains a central challenge in natural language processing. Transformer architectures achieve strong performance via self-attention but scale quadratically ($O(N^2)$) with sequence length, while State Space Models (SSMs) scale linearly ($O(N)$) but suffer from a selective recall bottleneck, struggling to retrieve precise information from compressed states. This creates a fundamental tradeoff between efficiency and perplexity. To tackle these challenges, we propose the Parallel Hybrid Architecture (PHA), which runs Gated State Spaces (GSS), Grouped Query Attention (GQA), and Feed-Forward Networks (FFNs) as independent parallel branches fused by a learnable mixing mechanism. Instead of forcing SSMs to approximate attention or serializing the two paradigms, PHA allows each branch to specialize: GSS captures global context, while attention performs selective retrieval, with FFN providing complementary processing. On WikiText-103, PHA achieves 16.51 PPL at 125M parameters, outperforming Hedgehog (16.70) and H3-125M (23.70). Scaling to 180M parameters yields 16.42 PPL, which gives comparable results with the pure attention baseline while delivering 24\% higher throughput and up to 40\% lower memory usage at long contexts. On OpenWebText, our 125M model achieves 19.72 PPL, outperforming standard Transformers (20.60) and GSS hybrid baselines (19.80). These results demonstrate that separating sequence modeling paradigms into parallel specialists enables Transformer-level perplexity with substantially improved efficiency for long-context language modeling.

23.
arXiv (CS.CL) 2026-06-12

HyperTool: Beyond Step-Wise Tool Calls for Tool-Augmented Agents

Tool-augmented LLM agents commonly rely on step-wise atomic tool calls, where each invocation, observation, and value transfer is exposed in the main reasoning trace. This creates an execution-granularity mismatch: locally deterministic tool workflows are unfolded into repeated model-visible decisions, consuming context and forcing the model to manage low-level dataflow in the trace. We introduce HyperTool, a unified executable MCP-style tool interface that changes the model-visible unit of tool execution. A model invokes HyperTool with a code block that can call existing tools through their original schemas, manipulate returned values, and pass intermediate results locally, folding deterministic tool subroutines into a single outer call. To train models to use this interface, we synthesize HyperTool-format trajectories from cross-tool compositional tasks and verify them in real MCP environments. On MCP-Universe, HyperTool improves average accuracy from 15.69\% to 35.29\% on Qwen3-32B and from 9.93\% to 33.33\% on Qwen3-8B, and surpass GPT-OSS and Kimi-k2.5 on average accuracy, showing that our HyperTool can substantially improve multi-step tool use.

24.
arXiv (CS.CL) 2026-06-11

DEFINED: A Data-Efficient Computational Framework for Fine-Grained Creativity Assessment in Debate Scenarios

Human creativity has emerged as a critical competency in the era of large language models. Assessing creativity in complex, open-ended environments is a grand challenge in data mining, currently hindered by a reliance on standardized simple tasks and the scarcity of fine-grained expert data. As an ecologically valid assessment context, debate reflects multiple dimensions of creativity, encompassing both divergent thinking and convergent thinking. Moreover, debate is a data-rich domain, with a large volume of publicly accessible materials. Current mainstream automated scoring methods are poorly suited to complex settings such as debate, and therefore still rely on costly human evaluation. To this end, this paper proposes DEFINED, a data-efficient computational framework for fine-grained creativity assessment in debate scenarios. DEFINED operationalizes debate creativity through a hierarchical eight-dimensional metric system, implemented via a pre-trained autoregressive language model with a hierarchical scoring head that supports both fine-grained and coarse-grained evaluation. Statements and their associated expert scores were obtained from authentic debate competitions, and a constrained data augmentation strategy was employed to address the elite bias inherent in the original data. DEFINED adopts a mixed-granularity training strategy enabling robust learning from limited fine-grained supervision annotated by trained graduate experts. To rigorously validate ecological validity beyond synthetic benchmarks, we incorporate an empirical study with debate-naive participants, utilizing these authentic data to serve as a qualitative case study for mid-to-low proficiency populations. Across our evaluation protocol, our scoring model achieves accurate and stable scoring, outperforming prompt-based large language model evaluators and existing debate scoring methods.

25.
arXiv (CS.CV) 2026-06-15

HumP-KD: A Hybrid Uncertainty-Aware Multi-Stage Progressive Knowledge Distillation Framework for Efficient Fire Classification

Real-time fire classification systems require models that are simultaneously accurate, computationally efficient, and deployable on resource-constrained hardware. This work proposes HumP-KD, a Hybrid Uncertainty-aware Multi-stage Progressive Knowledge Distillation framework for efficient fire classification. Two datasets, FlameVision and Dataset-II, containing 8,600 and 31,309 images, are used. Various CNN and transformer baselines are applied under standard preprocessing, online augmentation, Gaussian noise and motion blur robustness conditions. The proposed HumP-KD model distills knowledge from two frozen heterogeneous transformer teachers, Swin-Tiny and ViT-Base, along with their Meta-MLP ensemble, into a lightweight MobileViT-S student via three tightly integrated components. Hierarchical Progressive Knowledge Distillation employs a Hierarchical Feature Builder. It generates a fused spatial attention mask to guide distillation toward discriminative regions selectively. Multi-Stage Knowledge Distillation progressively activates three distillation stages across training. On Dataset-II, HumP-KD achieves a mean F1 score of $0.9876 \pm 0.0063$ across 10 independent trials, significantly outperforming the MobileViT-S baseline trained without distillation ($0.9537 \pm 0.0351$), with statistical significance confirmed by both independent t-test ($p = 0.0195$) and Wilcoxon signed-rank test ($W = 1$, $p = 0.0039$). The proposed method also demonstrates strong generalization across datasets and robustness under degraded visual conditions. The student model retains only 4.94M parameters and 19.01Mb model size, representing a $5.7\times$ parameter reduction over Swin-Tiny and a $17.5\times$ reduction over ViT-Base, while achieving 37.72 CPU FPS, making it suitable for real-time deployment.