Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (quant-ph) 2026-06-12

Generalized two-qubit Hamiltonian for Projective Quantum Feature Maps

arXiv:2606.13641v1 Announce Type: new Abstract: Projected quantum feature maps provide a strategy for using quantum processors as feature generators for classical machine-learning models. Building on counterdiabatic Ising-glass and one-dimensional Heisenberg PQFMs, we introduce a generalized two-qubit Hamiltonian-based PQFM that provides a unified way to encode classical features through local Pauli fields and pairwise two-qubit Pauli interactions. This construction allows distinct classical variables to be embedded along different Pauli axes of the same qubit, increasing the information density of shallow circuits while remaining compatible with hardware constraints. We develop and implement these methods in pqfmlib, a publicly available Python library for constructing, executing, and benchmarking Hamiltonian-based PQFMs.We then benchmark the generalized Hamiltonian PQFMs against reference PQFMs on four biomedical classification datasets under a nested cross-validation protocol with paired statistical tests. Quantum features are generated using both IBM quantum processors with up to 156 qubits and statevector simulations. Our results show that the generalized two-qubit Hamiltonian family provides the most consistent pattern of statistically supported gains over matched classical baselines, although the performance of all methods depends on the dataset, encoding strategy, measured observables, and hardware conditions. These findings support generalized Hamiltonian PQFMs as a promising route toward near-term quantum utility.

02.
Nature (Science) 2026-06-10

Mitochondria tethered to the nucleus secure its energy supply

Direct interactions between the cell’s powerhouses and nuclear pores might channel energy straight into the nucleus, fuelling cell division and differentiation. Direct interactions between the cell’s powerhouses and nuclear pores might channel energy straight into the nucleus, fuelling cell division and differentiation.

03.
arXiv (CS.CV) 2026-06-15

Towards Physically Realizable Adversarial Attenuation Patch against SAR Object Detection

Deep neural networks have demonstrated excellent performance in SAR target detection tasks but remain susceptible to adversarial attacks. Existing SAR-specific attack methods can effectively deceive detectors; however, they often introduce noticeable perturbations and are largely confined to digital domain, neglecting physical implementation constrains for attacking SAR systems. In this paper, a novel Adversarial Attenuation Patch (AAP) method is proposed that employs energy-constrained optimization strategy coupled with an attenuation-based deployment framework to achieve a seamless balance between attack effectiveness and stealthiness. More importantly, AAP exhibits strong potential for physical realization by aligning with signal-level electronic jamming mechanisms. Experimental results show that AAP effectively degrades detection performance while preserving high imperceptibility, and shows favorable transferability across different models. This study provides a physical grounded perspective for adversarial attacks on SAR target detection systems and facilitates the design of more covert and practically deployable attack strategies. The source code is made available at https://github.com/boremycin/SAAP.

04.
arXiv (CS.CV) 2026-06-17

Structured Adversarial Camouflage via Voronoi Diagrams

Pixel-wise adversarial patches are computationally heavy and often visually detectable, limiting utility in security-critical systems. We present adversarial Voronoi camouflage that optimizes only seed-point locations under fixed, printable palettes using a soft assignment, producing structured, splinter camouflage-like patterns without additional regularization. Evaluated on person detection with COCO-style AP@[.5:.95], naive placement (Inria -> COCO) performs comparably bad, while garment-level application via segmentation mask (3DPeople) results in a significant AP drop. The attack transfers to out-of-domain backgrounds and across detector families (YOLOv9/10/11/12), indicating robustness in black-box settings. Repainting with different palettes largely nullifies the effect, and single-color tweaks show limited tolerance (

05.
arXiv (CS.LG) 2026-06-12

Forecasting Is Not Attribution: Localizing Decoder Bypass in Graph-Based Neural Marketing Mix Models

arXiv:2606.12687v1 Announce Type: new Abstract: Marketing mix models are used to forecast business outcomes and to attribute those outcomes to marketing channels, but these goals are not equivalent. We study a failure mode in graph-based neural MMM called attribution bypass: a high-capacity decoder can obtain low forecasting error through target autoregression, dense communication, co-movement, context, or latent memory while failing to route counterfactual sensitivity through the graph used as the attribution object. We introduce DICE-MMM as a bounded diagnostic and training framework. We do not claim that observational neural MMM identifies causal effects. Instead, DICE separates three questions often conflated in graph-based MMM: graph recovery, forecasting accuracy, and whether the trained decoder's perturbation-induced influence is graph aligned. Stage 1 trains a graph encoder with a restricted graph-mediated decoder. Stage 2 freezes the selected encoder and trains a graph-safe latent decoder whose cross-node communication must pass through the supplied graph. Decoder use is evaluated with CIG, AR-CIG, and graph-swap tests. Across controlled R/d/T swaps and an external multi-graph rawlog stress test, DICE improves stable graph recovery over CausalMMM. The experiments show that forecasting accuracy is not an attribution certificate: in a sparse-target benchmark, no-graph and full-graph decoders achieve MSE@7 around 0.004 while AR-CIG nAUPRC remains near or below zero, whereas an oracle graph reaches 0.807 +/- 0.129 at comparable MSE. Frozen graph-swap localizes the bottleneck: the same DICE-hard-trained decoder moves from nAUPRC -0.044 +/- 0.006 under learned graph inputs to 0.894 +/- 0.027 with the oracle graph. The contribution is a stress test and failure-localization framework showing that low MSE can hide attribution bypass and that the unresolved bottleneck is graph-support selection, not forecasting or decoder capacity.

06.
arXiv (CS.CL) 2026-06-25

PhoneBuddy: Training Open Models for Agentic Phone Use

Phones are becoming an important execution surface for general-purpose agents, but training open models for reliable phone use remains difficult because the environment that matters at deployment, real devices running real apps, is slow, stateful, side-effectful, and hard to reset or verify, while scalable mock environments only approximate real behavior. We present PhoneBuddy, a training recipe and open-model line for agentic phone use that combines a real-app environment with a mock-app environment, PhoneWorld, which reconstructs runnable mock apps from real GUI usage structure. PhoneBuddy first builds a shared supervised fine-tuning stage from trajectories collected in both environments, then compares real-app RL against mixed RL across both environments. Across a 150-task human evaluation on real phones spanning apps, mini-apps, and cross-app workflows, task success rate improves from 36.67\% after supervised fine-tuning to 40.67\% after real-app RL and 45.33\% after mixed RL. On AndroidWorld, the same progression rises from 60.3\% to 77.2\% to 83.2\%. These results show that mock-app training is not a replacement for real-app RL, but a complementary source of scalable, resettable, and automatically checked interaction. The gains are strongest on app and mini-app tasks, while long-horizontal cross-app workflows remain an important open challenge.

07.
arXiv (CS.LG) 2026-06-15

Utility-Constrained Policy Optimization

arXiv:2606.14029v1 Announce Type: new Abstract: Constrained MDPs (CMDPs) are a widely adopted framework for incorporating safety into RL agents; however, the framework does not support risk-sensitive constraints. This can be problematic: For example, CMDPs allow for optimal solutions that, in order to satisfy the risk-neutral constraints, mix infrequent catastrophic behaviors and frequent, overly conservative ones. Moreover, prior empirical results suggest that enforcing stricter, risk-sensitive constraints can improve performance even under risk-neutral evaluation. The natural framework to incorporate risk-sensitive constraints is utility-constrained MDPs (UCMDPs), but no practical solutions for this problem existed. In this work, we introduce a simple yet powerful methodology for UCMDPs and constrained RL. Besides allowing for risk-sensitive constraints, our framework does not require us to fix constraint limits in advance of training the agent, provided that a sensible range is known. This increases policy flexibility and, in practice, allows for adjustments to these limits at no extra training cost. Besides benefiting from the generality of the framework, our agent shows strong performance in practice, consistently matching or outperforming existing baselines in several Safety Gymnasium benchmark tasks.

09.
arXiv (CS.LG) 2026-06-19

MassSpecGym in the Wild: Uncovering and Correcting Evaluation Pitfalls in AI-Driven Molecule Discovery

arXiv:2606.19624v1 Announce Type: new Abstract: Reliable benchmarking is critical for developing machine learning models for tandem mass spectrometry (MS/MS) based molecule discovery. Subtle issues in experimental design and model evaluation procedures can degrade the trustworthiness of such benchmarks and lead to erroneous conclusions. We conduct a thorough review of model evaluation issues in the recent MS/MS machine learning literature, using the standard MassSpecGym benchmark suite as a case study to illustrate the impact of these issues. We find evaluation issues in at least 17 of 26 papers reporting MassSpecGym benchmark results in the first year of its adoption. We isolate three classes of failures: (i) data leakage, (ii) shortcut learning, and (iii) implementation bugs and metric divergence. Through extensive experimentation and code replication, we quantify the impact of these issues and show how they corrupt the evaluation standards MassSpecGym was designed to enforce. We distill our findings into recommendations generalizable to MS/MS challenges, benchmarks, and custom evaluation setups. We also release MassSpecGym v1.5, an implementation of our recommendations in the MassSpecGym benchmarking suite which addresses the failure modes identified in this audit. MassSpecGym v1.5 is publicly available at https://github.com/pluskal-lab/MassSpecGym.

10.
arXiv (CS.CL) 2026-06-19

From 50K to 8.2 Million in 24 Hours: Vozinha's Algorithmic Consecration and the Multilingual Making of World Cup Visibility

We present a multilingual computational discourse analysis of how language constructed the algorithmic consecration of Vozinha, the 40-year-old Cape Verde goalkeeper, after Spain 0-0 Cape Verde at the 2026 FIFA World Cup. The study contributes a multilingual corpus in Portuguese, Spanish, English, and French; a nine-frame narrative taxonomy with cue-based frame annotation; a reproducible annotation pipeline combining LLM-assisted suggestion with human validation; and an analysis of cross-lingual narrative diffusion across discourse phases. We treat the platform follower count itself, narrated as "50k to 8M", as a linguistic object: a circulating and narratable proof of visibility rather than a mere measurement. The follower-growth timeline is used only as contextual metadata: we reconstruct a conservative phase structure, not a continuous API-native series, and type every datapoint by value class, confidence, and evidence type. The only exact primary scraper anchor is 8,235,652 followers at 2026-06-16 15:47 UTC; all other figures are reported as estimated ranges or thresholds, including an estimated pre-match baseline of 45k-56k. Findings suggest that distinct languages carried distinct frames: Portuguese mobilization, Spanish crisis, English nation-making, and a shared platform-metric spectacle through which peripheral athletic performance became globally visible. As a v0.1 pilot, the paper releases the corpus schema, frame taxonomy, annotation guidelines, hashed visual-evidence log, and typed timeline, while flagging full double annotation and inter-annotator agreement as planned work.

11.
arXiv (CS.AI) 2026-06-24

Minimisation of Quasar-Convex Functions Using Random Zeroth-Order Oracles

arXiv:2505.02281v3 Announce Type: replace-cross Abstract: This paper explores the performance of a random Gaussian smoothing zeroth-order (ZO) scheme for minimising quasar-convex (QC) and strongly quasar-convex (SQC) functions in both unconstrained and constrained settings. For the unconstrained problem, we establish the ZO algorithm's convergence to a global minimum along with its complexity when applied to both QC and SQC functions. For the constrained problem, we introduce the new notion of proximal-quasar-convexity and prove analogous results to the unconstrained case. Specifically, we derive complexity bounds and prove convergence of the algorithm to a neighbourhood of a global minimum whose size can be controlled under a variance reduction scheme. Beyond the theoretical guarantees, we demonstrate the practical implications of our results on several machine learning problems where quasar-convexity naturally arises, including linear dynamical system identification and generalised linear models.

12.
arXiv (CS.AI) 2026-06-16

Integrating Reasoning and Generalization in Text-to-SQL via Self-Enhanced Fine-Tuning

arXiv:2606.15598v1 Announce Type: new Abstract: Text-to-SQL aims to translate natural language questions into executable SQL queries over structured databases, enabling non-expert users to access data intuitively. While recent advances in large language models (LLMs) have shown promise in this task, existing LLM-based approaches often struggle to strike a balance between strong reasoning capabilities and robust generalization. To address these limitations, we propose CoTE-SQL to enhance the LLM-based text-to-SQL generation with three key innovations: (i) self-enhanced reasoning traces distilled from LLMs without human annotation, (ii) structured chain-of-thought (CoT) prompting with modular decomposition and examples retrieval, and (iii) error-aware revision based on SQL execution feedback. Extensive experiments on the Spider and Bird benchmarks demonstrate that CoTE-SQL achieves new state-of-the-art performance among methods built on open-source LLMs with comparable model sizes on Bird (53.39% EX / 59.02 VES) and strong results on Spider (79.60% EX / 77.19 VES), with especially significant gains on complex queries. Results highlight the effectiveness of combining self-enhancement, structured reasoning, and execution-time feedback within an LLM-based framework for text-to-SQL design.

13.
arXiv (CS.AI) 2026-06-19

Improving End-to-End Speech Recognition for Dysarthric Speech through In-Domain Data Augmentation

arXiv:2606.19797v1 Announce Type: cross Abstract: Dysarthric speech recognition is crucial for facilitating effective communication among individuals with dysarthria. However, accurately recognizing dysarthric speech poses significant challenges due to varying severity levels and limited data availability. In this paper, we explore data augmentation techniques for dysarthric automatic speech recognition (ASR) systems by fine-tuning the End-to-End pre-trained Wav2Vec2 model, with a specific focus on severity levels. To address the challenges of data scarcity and the need for extensive data in fine-tuning pre-trained ASR systems for dysarthric speech, we investigate four prominent data augmentation methods: Speaking-Rate Modification (SRM), Pitch Modification (PM), Formant Modification (FM), and vocal tract Length Perturbation (VTLP), tailored to different aspects of dysarthria. The study uses individually fine-tuned Wav2Vec2 models for each severity class as baseline systems. Additionally, we conducted severity-specific fine-tuning of the ASR model using augmented data. Results demonstrate distinct efficacy patterns for each augmentation technique across severity levels. The best WERs were achieved with SRM ($s$=0.8) for low (9.02\%) and medium (38.11\%) severities, and with PM ($\tau$=0.8) for high severity (55.15\%), reflecting relative improvements of 30.02\%, 16.64\%, and 15.47\%, respectively. These results confirm the effectiveness of the augmentation methods in improving dysarthric ASR performance.

14.
medRxiv (Medicine) 2026-06-17

Performance of five risk stratification tools for paediatric pneumonia against WHO scores using data from the PediCAP trial in sub-Saharan Africa

Background Risk stratification tools for childhood pneumonia have been proposed to improve identification of children at highest risk of death, particularly in low-resource settings. However, their added value over the WHO Integrated Management of Childhood Illness (IMCI) criteria and danger signs remains uncertain. Methods We conducted a secondary analysis of a multi-country randomised controlled trial of children without HIV hospitalised with pneumonia in Mozambique, South Africa, Uganda, Zambia, and Zimbabwe. We evaluated the performance of five published risk scores alongside WHO IMCI severity classification and danger signs. Discrimination for (1) in-hospital mortality, (2) 28-day mortality, and (3) 28-day readmission or death was assessed using area under the receiver operating characteristic curve (AUC). Comparative performance and clinical utility were examined. Results Of the 1010 participants, 18 (1.8%) died in hospital, 22 (2.2%) died in hospital or in the 7 days post-discharge, and 63 (6.2%) died or were readmitted by day 28. Univariate case-fatality rates were highest for variables associated with malnutrition, convulsions, and hypoxaemia. All risk scores demonstrated moderate discrimination for in-hospital and in-hospital+7-day mortality (AUC range approximately 0.75-0.84), with no meaningful differences between models, and performed similarly to the WHO danger signs and IMCI severity classification. In contrast, all approaches performed poorly in predicting 28-day readmission or death (AUC approximately 0.54-0.58). No risk score consistently outperformed simple clinical criteria. Conclusions In this multi-country dataset, we found no evidence that published paediatric pneumonia risk scores meaningfully outperform WHO IMCI-based clinical assessment for predicting mortality. The relatively small number of mortality events limits precision, and modest differences cannot be excluded. These findings suggest that, in low-resource settings, strengthening implementation of existing WHO clinical criteria may be more effective than adopting more complex prediction tools.

15.
arXiv (CS.LG) 2026-06-25

Learning Dynamical Systems from Multiple Sparse Datasets: A Hierarchical Bayesian Modeling Approach

arXiv:2606.24966v1 Announce Type: new Abstract: Estimating parameters of dynamical systems from sparse, noisy, and irregularly sampled data is often severely ill-conditioned. When multiple related datasets are available, they provide additional information if the shared structure and variability are properly modeled. We propose a hierarchical Bayesian framework for probabilistic meta-learning in dynamical systems, modeling dataset-specific parameters as draws from a shared population distribution. A numerical ODE solver is embedded within gradient-based MCMC to enable efficient posterior inference of the shared population and dataset-specific parameter distribution. Experiments show improved predictive performance over unpooled methods, highlighting the potential for data-efficient system identification in settings with sparse data.

16.
arXiv (math.PR) 2026-06-15

Hierarchical symmetry selects log-Poisson cascades: classification, uniqueness, and stability

arXiv:2604.01632v2 Announce Type: replace Abstract: Within i.i.d. multiplicative cascades, a single axiom – the hierarchical symmetry, a linear contraction on incremental scaling exponents – is shown to be necessary and sufficient for the cascade multiplier to be log-Poisson. We prove: (1) a characterization theorem determining the log-Poisson law with explicit parameters, within the class of all multipliers with finite lattice moments; (2) a classification theorem locating the log-Poisson class inside the log-infinitely-divisible family and identifying the mechanism by which every rival sub-family fails the symmetry; (3) a stability theorem with sharp constants – $(1+\beta)^{1/2}$ when the limiting increment is known, $\sqrt{2}$ when it is fitted – and (4) an unconditional propagation theorem transferring the bound to the multiplier distribution at the sharp rate $\Theta(\sqrt{\varepsilon})$, with a matching lower bound. Beyond independence, the classification extends exactly at the level of asymptotic statistics (limiting cumulant generating function, large deviations, multifractal spectrum) and provably not at the level of laws: an explicit stationary ergodic Markov multiplier satisfies the symmetry exactly with a non-log-Poisson marginal, while exchangeable multipliers collapse to the i.i.d. log-Poisson cascade and finite-state Markov multipliers cannot satisfy the symmetry at all. In the continuous category of exactly scale-invariant log-infinitely-divisible multifractal random measures, no finite moment window of structure-function exponents identifies the cascade class, whereas at the level of the scale-invariance generator the symmetry selects exactly the Barral-Mandelbrot compound Poisson cascade, with scale-ratio-free stability constants. The proofs reduce to second-moment identities on [0,1] via the change of variables $u = e^{kx}$, boundedness of the multiplier, and multiplicative couplings.

17.
bioRxiv (Bioinfo) 2026-06-19

Accurate detection of tumor clonality and ongoing expansion mode from genomic data

Recent evidence shows that despite considerable effort, currently available algorithms for estimating intra-tumor heterogeneity (ITH) remain limited. We developed DECODE (Deciphering Cancer Origin from DNA Evolution), a novel mutation clustering method that incorporates the impact of sample-specific sequencing coverage and mutation calling biases. On synthetic data, DECODE outperformed existing methods across multiple clonality metrics and accurately detected and characterized the neutral tail in the site frequency spectrum (SFS), which encodes the tumor's ongoing expansion mode. In acute myeloid leukemia, accounting for the neutral tail enabled DECODE to yield more parsimonious clonal decompositions that align more closely with known subclonal dynamics that drive relapse. Applied to data from The Cancer Genome Atlas, DECODE not only detected a neutral SFS tail in most samples across tumor types but also uncovered a clinically meaningful link between ITH and survival in low-grade glioma. By jointly inferring clonality and expansion mode, DECODE provides two complementary and prognostically relevant readouts of tumor evolution from single tumor genomic samples.

18.
bioRxiv (Bioinfo) 2026-06-18

Bioinf-Farma: supervised integration of epitope prediction and recombinant protein developability for automated vaccine candidate prioritization

Vaccine antigen discovery requires prioritizing protein candidates according to both immunogenic potential and recombinant expression feasibility. These properties are typically evaluated using separate computational tools, requiring researchers to integrate heterogeneous outputs through ad hoc workflows. Here, we present BIOINF-farma, a modular platform integrating epitope prediction and developability assessment for rational antigen selection within a unified environment. Candidates can be submitted as amino acid sequences or three-dimensional structures. When experimental structures are unavailable, BIOINF-farma automatically searches for models in AlphaFold DB or performs structure prediction using Boltz-2, ensuring a standardized structural representation for downstream analyses. Antigenicity is quantified by combining structure-based conformational epitope signals (MLCE/REBELOT-BEPPE) and sequence-based linear epitope propensity scores (BepiPred 3.0) into a protein-level Antigenicity Score, with a classification threshold optimized on a manually curated validation dataset. Developability is evaluated through two supervised Random Forest meta-learners that integrate three solubility predictors (DeepSoluE, SoluProt, Protein-Sol) and three thermal stability predictors (TemStaPro, ProLaTherm, BertThermo), whose outputs are combined into an Expression Efficiency Score (EES). By integrating complementary predictive signals, the meta-learning framework achieves greater accuracy and robustness than individual predictors while maintaining performance across a broad range of sequence identities. The Antigenicity Score effectively discriminates antigenic from non-antigenic proteins with a large effect size, whereas EES successfully distinguishes soluble from insoluble outcomes on an independent panel of recombinant proteins expressed in Escherichia coli. BIOINF-farma jointly assesses antigenicity and expression feasibility within a single framework. Its modular architecture facilitates the incorporation of future predictive methods, while its web-based interface makes the full pipeline accessible to users without programming expertise, supporting rapid candidate triage in vaccine research and emerging pathogen responses.

19.
arXiv (CS.CL) 2026-06-24

Same Lesson, Different Story: Cross-Lingual Reconstruction of Cultural Narratives in Large Language Models

The evaluation of cultural grounding context becomes complex when multiple cultures convey the same moral lesson. This challenge is particularly relevant to large language models (LLMs), which produce narratives across a wide range of languages and cultural contexts. However, it remains uncertain whether these models preserve culturally grounded meaning when equivalent moral lessons are conveyed through distinct cultural forms. This study introduces a multilingual evaluation narrative framework that integrates a cross-linguistic collection of 414 proverbs spanning 15 languages and uses four LLMs to generate 13k narratives. By employing semantically equivalent proverbs as culturally grounded prompts, the analysis assesses whether models preserve meaning across languages, how cross-lingual conditioning influences narrative realization, and whether different model families converge on similar interpretations. Results indicate that cross-lingual prompting largely preserves proverb-level semantic meaning while systematically redistributing agency, social positioning, and narrative structure. Additionally, strong inter-model convergence is observed in both monolingual and cross-lingual settings, suggesting that multilingual LLMs rely on shared semantic abstractions despite architectural and linguistic differences. These findings shed light on the need for more comprehensive evaluations of cultural grounding. Relying exclusively on semantic similarity in multilingual narrative assessments may overestimate cultural preservation by neglecting culturally meaningful variations in narrative expression.

20.
arXiv (CS.CV) 2026-06-11

Illumination-Robust Camera-Based Heart-Rate Estimation for Physiological Sensing in Robots

Physiological awareness is important for service, social, and assistive robots that interact with humans in everyday environments. Remote photoplethysmography (rPPG) enables non-contact heart-rate (HR) estimation from an RGB camera, making it a promising sensing modality for robot-mounted vision systems. However, illumination variation remains a major barrier to robust deployment. This paper presents an end-to-end spatial-temporal transformer framework for remote HR estimation on a new dataset with varied illumination. Our estimator integrates PRNet-based 3D face alignment, clip-level illumination augmentation, the Residual Temporal Standardization Module, and controlled hybrid temporal-frequency supervision. The training objective combines a Soft-Shifted Pearson waveform loss with a spectral Kullback-Leibler divergence loss, where a tuned weight ($\mathbf{\beta}$) controls the contribution of frequency-domain heart-rate guidance. Experiments on a static all-level mix protocol covering three illumination levels show that $\mathbf{\beta}=5$ provides the strongest result among the tested beta settings, achieving a best-run HR mean absolute error (MAE) of 0.79 bpm and an HR correlation of 0.982. Compared with the PhysFormer baseline evaluated on our dataset, our estimator reduces HR MAE by 93.6 %, while increasing HR correlation from 0.088 to 0.982, making it usable when illumination varies.

21.
arXiv (CS.LG) 2026-06-19

A deep learning framework for jointly solving transient Fokker-Planck equations with arbitrary parameters and initial distributions

arXiv:2604.06001v2 Announce Type: replace-cross Abstract: Efficiently solving the Fokker-Planck equation (FPE) is central to analyzing complex parameterized stochastic systems. However, current numerical methods lack parallel computation capabilities across varying conditions, severely limiting comprehensive parameter exploration and transient analysis. This paper introduces a deep learning-based pseudo-analytical probability solution (PAPS) that, via a single training process, simultaneously resolves transient FPE solutions for arbitrary multi-modal initial distributions, system parameters, and time points. The core idea is to unify initial, transient, and stationary distributions via Gaussian mixture distributions (GMDs) and develop a constraint-preserving autoencoder that bijectively maps constrained GMD parameters to unconstrained, low-dimensional latent representations. In this representation space, the panoramic transient dynamics across varying initial conditions and system parameters can be modeled by a single evolution network. Extensive experiments on paradigmatic systems demonstrate that the proposed PAPS maintains high accuracy while achieving inference speeds four orders of magnitude faster than GPU-accelerated Monte Carlo simulations. This efficiency leap enables previously intractable real-time parameter sweeps and systematic investigations of stochastic bifurcations. By decoupling representation learning from physics-informed transient dynamics, our work establishes a scalable paradigm for probabilistic modeling of multi-dimensional, parameterized stochastic systems.

22.
arXiv (CS.LG) 2026-06-24

Hardware-Oriented Inference Complexity of Kolmogorov-Arnold Networks

arXiv:2604.03345v2 Announce Type: replace Abstract: Kolmogorov-Arnold Networks (KANs) have recently emerged as a powerful architecture for various machine learning applications. However, their unique structure raises significant concerns regarding their computational overhead. Existing studies primarily evaluate KAN complexity in terms of Floating-Point Operations (FLOPs) required for GPU-based training and inference. However, in many latency-sensitive and power-constrained deployment scenarios, such as neural network-driven non-linearity mitigation in optical communications or channel state estimation in wireless communications, training is performed offline and dedicated hardware accelerators are preferred over GPUs for inference. Recent hardware implementation studies report KAN complexity using platform-specific resource consumption metrics, such as Look-Up Tables, Flip-Flops, and Block RAMs. However, these metrics require a full hardware design and synthesis stage that limits their utility for early-stage architectural decisions and cross-platform comparisons. To address this, we derive generalized, platform-independent formulae for evaluating the hardware inference complexity of KANs in terms of Real Multiplications (RM), Bit Operations (BOP), and Number of Additions and Bit-Shifts (NABS). We extend our analysis across multiple KAN variants, including B-spline, Gaussian Radial Basis Function (GRBF), Chebyshev, and Fourier KANs. The proposed metrics can be computed directly from the network structure and enable a fair and straightforward inference complexity comparison between KAN and other neural network architectures.

23.
arXiv (CS.CL) 2026-06-18

REVES: REvision and VErification–Augmented Training for Test-Time Scaling

Test-time scaling via sequential revision has emerged as a powerful paradigm for enhancing Large Language Model (LLM) reasoning. However, standard post-training methods primarily optimize single-shot objectives, creating a fundamental misalignment with multi-step inference dynamics. While recent work treats this as multi-turn reinforcement learning (RL), conventional approaches optimize over the multi-step trajectories directly, failing to further exploit the high-quality mistakes in intermediate steps that model can learn from correcting them. We propose a two-stage iterative framework that alternates between online data/prompt augmentation and policy optimization. By converting the intermediate steps (``near-miss'' answers) in the successful recovery trajectories into decoupled revision and verification prompts, our approach concentrates training on both effective answer transformation and error identification. This approach enables efficient off-policy data generation and reduces the computational overhead of long-horizon sampling compared to standard multi-turn RL. On LiveCodeBench, using publicly available test cases as feedback, we observe gains of +6.5 points over the RL baseline and +4.0 points over standard multi-turn training. Beyond coding, our approach matches the previously reported SOTA result on circle packing while using the smallest base model (4B) and far fewer rollouts than the much larger evolutionary search systems. Math results under ground-truth verification further confirm improved correction ability. It also generalizes to out-of-distribution constraint-satisfaction puzzles such as n\_queens and mini\_sudoku, where correctness is defined entirely by problem constraints. Code is available at https://github.com/yxliu02/REVES.git.

24.
arXiv (CS.AI) 2026-06-18

Dynamic In-Group Persona Generation for Enhancing Human-AI Rapport

arXiv:2606.18256v1 Announce Type: cross Abstract: LLM-based chatbots are increasingly applied in interpersonal domains such as counseling and peer support, where establishing human-AI rapport is crucial yet remains challenging. In this work, we introduce a novel approach for conditioning LLMs with in-group personas, which (i) first identifies a user's primary concern and brief personal context (e.g., a computer science undergraduate worried about future career prospects), and (ii) generates a synthetic in-group persona that shares a similar primary concern while differing in background and narrative details, such as age or profession (e.g., a junior researcher at an AI startup). Furthermore, we conduct a human-subject study to systematically evaluate the effectiveness of in-group persona agents in enhancing human-AI rapport. We compare our approach against two baseline conditions: a conventional agent without persona conditioning and an agent exhibiting minimal self-disclosure (e.g., "I've felt that too"). Results from post-task questionnaires assessing rapport and user experience indicate that the in-group persona agent significantly improves perceived rapport and personal relevance compared to the baselines, and also yields more positive user experience-most notably higher engagement.

25.
arXiv (CS.CL) 2026-06-12

EvoBrowseComp: Benchmarking Search Agents on Evolving Knowledge

Search Agents – large language models augmented with search tools – have intensified the need for future-proof evaluation benchmarks. Existing benchmarks such as BrowseComp rely on static knowledge, making them vulnerable to test-set contamination and parametric memorization. Consequently, models can achieve high scores through fact recall rather than genuine retrieval, obscuring true browsing competence via reasoning shortcuts. In this paper, we introduce EvoBrowseComp, an evolving benchmark of 400 English and 400 Chinese contamination-free complex questions synthesized via live-web traversal. To collect these questions, we design a three-agent collaborative framework: (1) a QA synthesis agent that retrieves fresh knowledge from the live web to synthesize QA pairs; (2) an information filtering agent that filters retrieved knowledge in terms of credibility and popularity to block parametric shortcuts; and (3) a high-level guidance agent that formalizes questions into reasoning graphs to reduce logical redundancy and shortcuts in synthesized QA pairs. Because the framework supports fully automated synthesis, EvoBrowseComp can be regularly updated to prevent data contamination and maintain temporal freshness. Extensive experiments confirm its great difficulty, requiring broad horizontal search. It establishes a scalable paradigm for auto-updatable, high-difficulty benchmarking that keeps pace with both evolving world knowledge and advancing agent capabilities.