Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-17

A tensor network approach for chaotic time series prediction

arXiv:2505.17740v2 Announce Type: replace Abstract: Making accurate predictions of chaotic time series is a complex challenge. Reservoir computing, a neuromorphic-inspired approach, has emerged as a powerful tool for this task. It exploits the memory and nonlinearity of dynamical systems without requiring extensive parameter tuning. However, selecting and optimizing reservoir architectures remains an open problem. Next-generation reservoir computing simplifies this problem by employing nonlinear vector autoregression based on truncated Volterra series, thereby reducing hyperparameter complexity. Nevertheless, the latter suffers from exponential parameter growth in terms of the maximum monomial degree. Tensor networks offer a promising solution to this issue by decomposing multidimensional arrays into low-dimensional structures, thus mitigating the curse of dimensionality. This paper explores the application of a previously proposed tensor network model for predicting chaotic time series, demonstrating its advantages in terms of accuracy and computational efficiency compared to conventional echo state networks. Using a state-of-the-art tensor network approach enables us to bridge the gap between the tensor network and reservoir computing communities, fostering advances in both fields.

02.
arXiv (CS.LG) 2026-06-18

P$^2$CE: Model-Agnostic Plausible Pareto-Optimal Counterfactual Explanations

arXiv:2606.18418v1 Announce Type: new Abstract: The increasing use of machine learning algorithms in social applications has raised concerns about fairness and transparency, leading to the development of counterfactual explanations. These explanations supports individuals to understand and potentially alter unfavorable decisions in areas such as loan applications, job selections, and more, by providing actionable changes to input features that would lead to a desired outcome. Existing methods often struggle to balance feasibility, plausibility, and computational efficiency. To address this, we introduce P$^2$CE, an algorithm for generating plausible Pareto-optimal counterfactual explanations, offering users a diverse set of optimal trade-offs between different notions of feasibility. P$^2$CE employs an auxiliary isolation forest outlier detector to ensure that explanations are in accordance with the data distribution and leverages SHAP values to obtain optimal results with short computing times, regardless of the underlying model. Our algorithm was empirically evaluated on three datasets, demonstrating superior performance in terms of both solution quality and computational efficiency compared to related techniques.

03.
arXiv (CS.LG) 2026-06-16

Convergence Rate Analysis of the AdamW-style Shampoo: Unifying One-Sided and Two-Sided Preconditioning

arXiv:2601.07326v4 Announce Type: replace-cross Abstract: This paper studies AdamW-style Shampoo, an effective variant of the classical Shampoo that won the external tuning track of the AlgoPerf neural network training competition. Our analysis unifies one-sided and two-sided preconditioning. When the exponents of the two preconditioners sum to $1/2$, we establish the convergence rate $\frac{1}{K}\sum_{k=1}^KE\left[||\nabla f(X_k)||_*\right]\leq O(\frac{\sqrt{m+n}C}{K^{1/4}})$, where $K$ represents the number of iterations, $(m,n)$ denotes the dimensions of the matrix-valued parameters, and $C$ matches the constant appearing in the optimal convergence rate of SGD. Theoretically, the nuclear norm and Frobenius norm satisfy $||\nabla f(X)||_F\leq ||\nabla f(X)||_*\leq \sqrt{\min\{m,n\}}||\nabla f(X)||_F$, which suggests that our convergence rate is analogous to the optimal $\frac{1}{K}\sum_{k=1}^KE\left[||\nabla f(X_k)||_F\right]\leq O(\frac{C}{K^{1/4}})$ convergence rate of SGD in the ideal case where $||\nabla f(X)||_*= \Theta(\sqrt{\min\{m,n\}})||\nabla f(X)||_F$ and $m$ and $n$ are of comparable magnitude. Then, we extend our analysis to settings where the preconditioning exponents do not sum to 1/2, and establish convergence with an explicit but more involved rate.

04.
arXiv (math.PR) 2026-06-17

Analysis of the asymmetric shelf shuffle

arXiv:2606.18047v1 Announce Type: new Abstract: In an asymmetric shelf shuffle, a deck of $n$ cards is dealt sequentially from the bottom and assigned one of the $m$ shelves uniformly at random. The card is placed at the top of the assigned shelf with probability $p$, and at the bottom of the assigned shelf with probability $(1-p)$. Analysis of the shelf shuffle has gained much attention recently, and the case $p=1/2$ was first treated by Diaconis–Fulman–Holmes [Ann. Appl. Prob. 23 (2013), no. 4, 1692–1720]. In this paper, we extend the analysis of the shelf shuffle to general $p\in (0, 1)$. In particular, we study the distribution of cycles, cycle lengths, number of descents, number of valleys, number of inversions, and the RSK shape of a permutation obtained from an asymmetric shelf shuffle. Our results extend the analysis of Diaconis–Fulman–Holmes to arbitrary $p$. Furthermore, our analysis of the distribution of descents and inversions is new even for $p=1/2$.

05.
arXiv (CS.LG) 2026-06-18

A finite-element-inspired bipartite graph learned simulator for manufacturability assessment in large-deformation sheet forming

arXiv:2605.22845v2 Announce Type: replace-cross Abstract: Explicit dynamic finite element (FE) simulations are widely used for large deformation engineering analysis, but repeated simulations remain costly during design space exploration and optimisation. In explicit FE analysis, nodal kinematics and element level deformation measures evolve through coupled node element updates. This motivates graph learned simulators that approximate one step FE state transitions and roll them out autoregressively. However, many mesh based graph surrogates are node centred, which makes element level variables and native nodal elemental exchange less direct to represent. This work proposes CAttBiGNN, a cross attention based bipartite graph neural network for coupled nodal elemental learning. The graph represents FE mesh nodes and elements as distinct entities linked by directed node element edges, enabling nodal displacement increments and element level deformation states to be predicted on their native discretisation domains. An edge aware cross attention processor uses geometric edge embeddings to modulate directional node element message passing. For larger graphs, CAttBiUGNN combines the bipartite processor with graph downsampling and upsampling to improve long-range information propagation. The method is evaluated on dome shaped cold forming and corner shaped hot forming benchmarks. Comparisons with node centred baselines and bipartite and attention ablations show improved accuracy and balance in nodal displacement and elemental thinning prediction during autoregressive rollout. The results indicate that the proposed finite element inspired learned simulator can support manufacturability oriented field prediction and efficient design space exploration in large deformation sheet material forming.

06.
arXiv (CS.CL) 2026-06-12

Examining the Cognitive Gap Between Authors and Peer Reviewers on Academic Paper Novelty

Novelty is a crucial metric for assessing the quality of academic papers. Scholars strive to highlight the novel aspects of their work, particularly in the title, abstract, and introduction. Peer review, serving as the gatekeeper of scientific rigor, rigorously evaluates the novelty of papers, yet a cognitive gap may exist between author self-promotion and reviewer evaluation. To investigate this, we analyzed 15,328 academic papers published in Nature Communications from 2016 to 2021, along with their peer-review comments. We found that both reviewers and authors emphasize result-oriented innovation, with reviewers adopting a more comprehensive evaluation perspective. Furthermore, by examining promotional intensity against inherent paper novelty, we found that its effect depends on the paper's actual innovation level. Highly innovative papers benefit from stronger promotional language, receiving more positive evaluations. We also found that promotional language significantly correlates with reviewer disagreement on novelty specifically for papers of moderate innovativeness, whereas it has negligible impact for papers with either very high or very low novelty. This reveals how promotional language operates most prominently in the gray area of academic evaluation.

07.
arXiv (CS.CV) 2026-06-18

A Multi-Domain Benchmark for Detecting AI-Generated Text-Rich Images from GPT-Image-2

Text-rich images often contain privacy-sensitive, transactional, or decision-relevant information. As recent multimodal image generation models become increasingly capable of synthesizing realistic textual content and structured visual designs, detecting AI-generated text-rich images has become an important challenge for digital trust and content authenticity. Existing benchmarks, however, largely focus on object-centric images and provide limited coverage of scenarios where textual semantics and layout organization are central. In this paper, we introduce a multi-domain benchmark for detecting text-rich images generated by OpenAI's GPT Image 2. The benchmark contains 8,602 images across six representative categories: commercial posters, infographics, academic posters, receipts, tables, and UI screenshots. Using this benchmark, we evaluate five representative AI-generated image detectors in a zero-shot setting and analyze their overall, category-wise, and post-processing robustness. Our results show that detector performance is highly domain-dependent: methods that perform well in some categories often fail on others, and even the strongest conventional detector exhibits severe sensitivity to JPEG compression. We further conduct an exploratory evaluation with a multimodal vision-language model, revealing both its promise and its limitations on structured formats. These findings highlight the need for text- and layout-aware detection methods for modern AI-generated images. Our dataset is released at XXX.

08.
arXiv (CS.LG) 2026-06-12

Masked Neural Detection for Constrained Channel Coding in Molecular Communication

arXiv:2606.12489v1 Announce Type: cross Abstract: Molecular communication (MC) suffers from severe diffusion memory because molecules released for one symbol may arrive during later symbols. Neural sequence detectors, especially sliding bidirectional recurrent neural networks (SBRNNs), can substantially outperform threshold detectors in such channels. This raises a central question for MC channel coding: does a code whose advantage was established under threshold detection retain it when both coded and uncoded transmission are evaluated with neural detection? This letter answers this question for run-length-limited ISI-mitigation (RLIM) codes, a class of constrained codes previously shown to provide large BER gains in MC. Across the tested operating points, the best RLIM-SBRNN receiver beats the best uncoded receiver, chosen between threshold and SBRNN detection, in $46$ of $59$ cases, with a mean gain of $10.36\times$ over those wins. We also propose an RLIM-tailored training mask for compact SBRNN detectors, improving the unmasked RLIM-SBRNN in $227$ of $236$ comparisons with $3.267\times$ mean gain when masking is beneficial. Finally, the compact masked RLIM-SBRNN is competitive with channel-state-aware MLSE despite using no channel knowledge.

09.
arXiv (math.PR) 2026-06-16

Riviera model with egoistical settlers

arXiv:2606.16791v1 Announce Type: cross Abstract: The Riviera model mimics a densifying settlement along the coastline. In the lattice version, houses are built sequentially in empty sites with the constraint that every newly built house has at least one empty neighboring site. The distribution of clusters of adjacent houses does not obey a closed set of evolutionary equations, but the void-cluster-void distribution does. We compute the latter and extract the cluster distribution from it. In the jammed state, when all voids have length one and the evolution ceases, the cluster distribution has a neat form and exhibits a factorial decay with the length of the cluster. To investigate finite systems, we employ a static approach directly treating jammed states. If the coastline is a finite segment, we determine the statistics of the number of empty sites in the jammed state (the average, variance, and higher cumulants). We also study a continuum version in which houses are built along the line so that each newly built house is sufficiently separated from at least one neighboring house.

10.
arXiv (CS.CL) 2026-06-12

Beyond Uniform Tokens: Adaptive Compression for Time Series Language Models

Large language models (LLMs) have enabled time series (TS) analysis by jointly modeling numerical observations and textual context through a shared token interface. However, TS tokens and prompt tokens exhibit fundamentally different information structures, making uniform token processing inefficient. In this paper, we study token efficiency in TS language modeling from an asymmetric-token perspective. We show that TS tokens have highly uneven spectral contributions, where many tokens share redundant frequency patterns while a small subset preserves critical temporal evidence. We also observe that prompt-token influence attenuates with model depth, suggesting that full prompt retention across all layers is unnecessary. Based on these findings, we develop an adaptive token budgeting framework that compresses TS tokens via frequency-domain structure and progressively reduces prompt tokens across layers. Experiments across forecasting, classification, imputation, and anomaly detection demonstrate up to 7.68$\times$ inference acceleration and performance gains in 78\% of evaluated settings, showing the effectiveness of asymmetric token compression for scalable TS foundation models.

11.
arXiv (CS.AI) 2026-06-12

Functional Cache Grafting: Robust and Rapid Code-Policy Synthesis for Embodied Agents

arXiv:2606.13097v1 Announce Type: cross Abstract: Code-writing large language models (CodeLLMs) generate executable code policies for embodied agents by translating natural language goals and environmental constraints into structured control programs. However, policy generation in open-domain embodied environments suffers from two fundamental limitations: (i) delayed decoding caused by repetitive prefill computation over long prompts, and (ii) limited robustness due to fully generative decoding, which often produces API mismatches, missing safety guards, and unstable control logic. To address these limitations, we present FCGraft, a Functional Cache Grafting framework. FCGraft maintains a library of function-level validated code skeletons and their associated prompt-level Transformer key-value (KV) caches, and synthesizes new policies by retrieving relevant functions and grafting their KV caches when a new task is provided. Given retrieved function caches, FCGraft performs cache grafting via stitching, which composes cached function segments into a composite policy, and patching, which locally adapts only the necessary code regions to satisfy task-specific parameters and constraints with minimal additional decoding. By eliminating redundant prefill computation, this approach reduces generation latency, while reusing validated control structures improves robustness over prompt-level caching methods RAGCache, achieving 18.31% higher task success rate and 2.3x faster policy synthesis.

12.
medRxiv (Medicine) 2026-06-12

Sociodemographic and health correlates of reimbursement authorizations for cannabis for medical purposes in Canadian veterans: A cross-sectional study linking the Life After Services Studies 2019 and Health Administrative Databases

Background Evidence on factors associated with cannabis for medical purposes (CMP) authorizations among Veterans Affairs Canada (VAC) clients remains limited and inconsistent, particularly concerning mental health and posttraumatic stress disorder (PTSD), a leading indication for use. We investigated demographic, clinical and service characteristics associated with VAC authorizations for CMP reimbursement. Method We linked VAC administrative CMP program data with responses from the 2019 Life After Services Studies cross-sectional survey of Regular Force veterans released between 1998 and 2018. Multivariable logistic regressions examined associations between CMP reimbursement (yes/no) and demographic, clinical and well-being factors, with analyses stratified by PTSD status. Results Among 1,289 respondents (weighted n=33,131), 18.4% were authorized for CMP reimbursement. Younger age (

13.
arXiv (CS.AI) 2026-06-11

Towards an Inferentialist Account of Information Through Proof-theoretic Semantics

arXiv:2605.05368v5 Announce Type: replace-cross Abstract: Information is one of the most widely-discussed concepts of the current era. However, a great deal of insightful work notwithstanding, it is yet to be given wholly convincing logical or mathematical foundations. Without them, we lack adequate reasoning tools for understanding the complex ecosystems of systems upon which the society depends. We seek to rectify this by taking a first step towards developing an inferentialist semantic theory of information. There are three key interacting components. First, conceptual analysis: the metaphysics of information. Dretske expressed the key concepts of information in terms of intentionality, truth, and transmissibility. We replace truth with inferability, and trace the consequences of this replacement. Second, logic: proof-theoretic semantics (P-tS) provides a mathematical-logical realization of inferentialist reasoning. Using P-tS, we develop the first steps towards a mathematical-logical theory of an inferentialist primitive unit of information, the 'inferon'. This proof-theoretic approach counterpoints the model-theoretic view of information articulated in situation theory. Furthermore, we argue that it facilitates addressing all three components of van Benthem and Martinez's categorization of the understandings of information, as range, as correlation, and as code. Our focus is on information-as-correlation. Third, systems: the P-tS tools we develop provide the basis for a mathematical account of distributed systems modelling – a key tool from informatics for understanding the organization of information processing systems. This yields a reasoning-based theory of information flow in models of distributed systems. Overall, we seek to give a conceptually rigorous mathematical-logical account of information and its role within informatics, grounded in inference and reasoning.

14.
arXiv (CS.CV) 2026-06-18

Benchmarking Large Vision-Language Models on Fine-Grained Image Tasks: From Evaluation to Diagnosis

Recent advancements in Large Vision-Language Models (LVLMs) have demonstrated remarkable multimodal perception and reasoning capabilities. While numerous benchmarks have evaluated LVLMs from holistic or task-specific perspectives, their capabilities on fine-grained image tasks-fundamental to computer vision-remain insufficiently understood. To address this gap, we introduce FG-BMK, a comprehensive fine-grained evaluation benchmark containing 1.01 million questions and 0.28 million images, covering diverse scenarios from common object-centric domains to specialized domains. FG-BMK jointly evaluates dialogue-level fine-grained semantic recognition and feature-level visual discriminability through human-oriented and machine-oriented paradigms, enabling diagnostic analysis of whether LVLM failures arise from insufficient visual representations, weak visual-to-semantic grounding, or limited fine-grained knowledge. Through extensive experiments on a diverse set of representative LVLMs/VLMs, we find that current LVLMs remain inadequate fine-grained recognizers, with failures arising from intertwined bottlenecks in visual representations, semantic grounding, modality alignment, and category-level knowledge. We further analyze training design factors for improving fine-grained capabilities and examine how visual and linguistic perturbations affect LVLM predictions. These findings provide diagnostic insights into the limitations of current LVLMs and offer guidance for future data construction and model design in developing more reliable LVLMs for fine-grained visual tasks. Our code is open-source and available at https://fg-bmk.github.io/.

15.
arXiv (quant-ph) 2026-06-12

A Quantum Algorithm for Random Number Generation

arXiv:2606.13034v1 Announce Type: new Abstract: We present a quantum algorithm for random number generation that achieves a provable quadratic speedup over classical Markov chain mixing, building on the Diaconis-Shahshahani Fourier analysis of the top-to-random card shuffle. The algorithm integrates three quantum primitives into a unified mixing circuit: the Quantum Fourier Transform (QFT), which diagonalizes the Markov transition operator; controlled phase rotations, which encode the shuffle eigenvalue spectrum; and the Grover diffusion operator, which acts as a quantum analogue of the Aldous-Diaconis strong uniform stopping time by reflecting amplitudes about their mean at each iteration. For an n-qubit register, the mixing time is O(\sqrt{n \log n}) iterations. Extending to m qudits of local dimension d reduces this to O(\sqrt{\log_d N}) iterations, where N = d^m, compared to the classical O(n \log n) bound. The qudit formulation further reduces QFT circuit depth from O(\log^2 N) to O(\log_d^2 N) gates per layer by encoding the same N-state space using m = \log_d N subsystems instead of \log_2 N qubits. We validate both variants on IBM superconducting hardware.

17.
arXiv (CS.AI) 2026-06-17

FoundCause: Causal Discovery with Latent Confounders from Observational Data

arXiv:2606.17516v1 Announce Type: cross Abstract: Causal discovery from observational data remains challenging due to the need to recover directed structure and latent confounding without interventions. We propose FoundCause, an amortized causal discovery model trained entirely on synthetic data that maps datasets directly to causal graphs in a single forward pass. By learning from large collections of simulated structural causal models, FoundCause captures transferable statistical patterns that generalize beyond individual datasets. The architecture incorporates several key inductive biases for causal discovery. It uses a permutation-invariant transformer encoder with alternating attention over samples and variables to jointly model cross-variable dependence and per-variable distributions. Pairwise statistical features derived from classical asymmetry measures are injected through statistics-conditioned attention, guiding the model toward known causal signals. A factorized decoder separates edge existence from direction, while a triangular refinement module enables reasoning over higher-order causal motifs such as chains and colliders. In addition, a dedicated confounder module based on learnable latent tokens explicitly models hidden common causes, and the model explicitly handles missing data via its masked input representation. To our knowledge, FoundCause is the first amortized causal discovery approach to explicitly model latent confounding. FoundCause outperforms 11 classical non-amortized methods (e.g., PC, GES, NOTEARS-style optimization) and 4 amortized causal discovery methods on 15 real-world datasets, achieving +9.6% improvement in $F_1$, +1.2% in AUROC, and an 18.9% reduction in structural Hamming distance relative to the strongest non-amortized methods, while performing inference in a single forward pass.

18.
arXiv (CS.CV) 2026-06-15

PMOF: A Dataset and Benchmark for Passenger Monitoring Using Overhead Fisheye Cameras

Autonomous staff-free public transport requires reliable in-vehicle passenger monitoring. However, perception inside moving vehicles is challenged by confined spaces, variable illumination, motion-induced background variation, occlusion, and limited viewpoints. To mitigate these spatial constraints, ceiling-mounted fisheye cameras provide full-scene coverage from a single viewpoint. Yet existing public overhead fisheye datasets are recorded in static environments and do not capture the domain shift introduced by vehicle motion. To fill this gap, we introduce PMOF, Passenger Monitoring using Overhead Fisheye cameras, the first public dataset of top-view fisheye imagery captured inside a moving vehicle, comprising over 19k manually annotated frames. PMOF provides rotated bounding boxes, tracking identifiers, and action labels, supporting object detection, tracking, and action recognition. We benchmark PMOF using YOLO26m-obb models fine-tuned under multiple dataset configurations that combine PMOF with existing overhead fisheye datasets. Cross-domain fine-tuning with custom rotation-aware augmentation achieves 94.8% AP50 on PMOF and 96.5% AP50 on an unseen overhead fisheye dataset from a different domain. Our results highlight the domain gap between static and moving environments and show that incorporating PMOF improves detection performance and advances generalization beyond passenger monitoring to broader fisheye-based person detection tasks. The dataset and code are available at https://swermuth.github.io/pmof/.

19.
bioRxiv (Bioinfo) 2026-06-10

A Unified Spatial AI Framework for Cross-Domain Tissue-State Analysis in Trauma, Oral, and Cardiovascular Pathology

Authors:

Objective: To develop a cross-domain spatial AI framework for identifying conserved tissue-state organisation across trauma, oral disease, and cardiovascular tissue using spatial transcriptomic data. Methods: Four public spatial transcriptomic datasets spanning wound healing, periodontitis, oral squamous cell carcinoma, and cardiac tissue were integrated using recurrence modelling, graph-based spatial learning, fuzzy tissue-state analysis, and tensor decomposition. Cross-domain coupling, spatial fragmentation, recurrence structure, and permutation-based topological validation were evaluated. Results: Six conserved fuzzy tissue states were identified, dominated by extracellular matrix remodelling, fibroblast/stromal activation, endothelial signalling, and inflammatory pathways. Latent embedding analysis demonstrated strong overlap between trauma and oral domains, while cardiovascular tissue exhibited more compact spatial organisation. Oral inflammatory tissue showed the highest fragmentation, whereas cardiovascular tissue demonstrated greater recurrence coherence. Tensor decomposition identified conserved stromal-remodelling programmes across domains. Permutation testing confirmed significantly elevated graph modularity and reduced spatial entropy relative to null distributions. Conclusion: The proposed framework identified conserved spatial tissue-state architecture linking wound healing, oral pathology, and cardiovascular tissue despite differences in tissue origin, pathology, and acquisition technology. Significance: These findings demonstrate the potential of spatial AI for investigating conserved stromal and inflammatory microenvironmental organisation across clinically related disease systems and may support spatial biology research in trauma–oral–systemic health.

20.
arXiv (quant-ph) 2026-06-12

Interference of critical dynamics associated with zero modes

arXiv:2606.13200v1 Announce Type: new Abstract: We study the interference of critical dynamics associated with zero modes (ICDZM) in the generalized Creutz ladders using closed quench paths that pass through two critical points successively. By reading out the final zero-mode transfer probability, we find rich ICDZM interference patterns dependent on the quench path. In particular, when the closed path links two topologically nontrivial phases, the ICDZM pattern may either vanish or exhibit period doubling. Within the framework of WKB analysis, this phenomenon is well clarified by the interference phase accumulated in the quench procedure. We also demonstrate that the zero-mode transfer probability can be detected by the deviation of the boundary particle number from its initial fractional value, which arises from the blending of bulk modes in the critical dynamics. As an edge defect, the zero-mode transfer probability captures both the ICDZM oscillation and the known anomalous defect production in a non-closed quench path. These results identify ICDZM and the corresponding edge defect as probes for critical dynamics associated with topological zero modes.

21.
arXiv (CS.AI) 2026-06-16

CoffeeBench: Benchmarking Long-Horizon LLM Agents in Heterogeneous Multi-Agent Economies

arXiv:2606.16613v1 Announce Type: new Abstract: As LLM agents become capable of increasingly long-horizon tasks, evaluating their performance in economic systems is becoming increasingly important. Unlike existing benchmarks that primarily evaluate a single agent interacting with a passive environment, economic systems are inherently multi-agent, requiring autonomous agents to communicate, negotiate, and transact while pursuing their own objectives over extended periods. We introduce CoffeeBench, a benchmark for evaluating LLM agents in a long-horizon multi-agent economy composed of heterogeneous firms. In CoffeeBench, two farmers, two roasters, and two retailers autonomously operate their businesses over a 90-day simulation, each seeking to maximize cumulative net income through communication and transactions while managing cash, inventory, and pricing. The evaluated model controls one coffee roaster, while the remaining firms are controlled by fixed reference agents. Across several recent open-weight and proprietary LLMs, all models outperform a passive baseline that takes no actions, with most achieving positive net income. Analysis of agent behavior reveals substantial differences in long-horizon economic interaction: higher-performing models communicate more actively with other firms, whereas Claude~Haiku~4.5 exhibits an idle-drift failure mode, repeatedly choosing inaction despite producing coherent assessments and plans. We release our code and agent trajectories to support future research.

22.
arXiv (CS.LG) 2026-06-18

BLADE: Scalable Bi-level Adaptive Data Selection for LLM Training

arXiv:2606.18650v1 Announce Type: new Abstract: As Large Language Model (LLM) datasets scale to trillions of tokens, data selection has emerged as a critical frontier to filter out uninformative noise and construct adaptive learning trajectories. Beyond static heuristic filtering, advanced data selection methods for LLM training largely follow two paradigms, each with fundamental limitations. Influence-based methods provide principled bi-level objectives but require intractable inverse-Hessian computations, while excess-loss methods are computationally efficient but rely on a static reference model that becomes misaligned with the evolving proxy model during training. We propose BLADE (Bi-Level Adaptive Data sElection), a Hessian-free framework for data selection. BLADE reformulates the bi-level optimization problem underlying influence-based methods as a penalized single-level objective via Lagrange multipliers, avoiding inverse-Hessian computation while revealing a principled connection to excess-loss based data selection. The resulting objective recovers an excess-loss form but replaces the static reference model with a dynamic one that stays synchronized with training. Theoretically, we prove that this penalized formulation guarantees first-order convergence. For efficient online batch selection, we instantiate BLADE as a memoryless randomized block-coordinate Frank-Wolfe algorithm. Extensive experiments show that BLADE consistently outperforms state-of-the-art data selection baselines, providing a practical recipe for LLM training.

23.
arXiv (quant-ph) 2026-06-11

Partitioned Iterative Quantum Scheduling of Satellites for Urgent Disaster Response: Case study of Wildfire

arXiv:2606.12310v1 Announce Type: new Abstract: The standard in Earth-observation tasks today is having near real-time access to surface images in response to changing conditions. For instance, as urban environments interface more with wildlands and wildfires become less predictable, their tracking with satellite resources becomes essential. This requires the coordination of increasingly large constellations of satellites, giving rise to challenging computational problems. With wildfire detection and tracking as a backdrop, we investigate the power of special purpose and novel computing paradigms to tackle the ensuing satellite scheduling problems, making a compelling case for quantum algorithms. We bring quantum scheduling algorithms closer to implementation by examining both the emerging iterative quantum algorithm framework, which comes with analytic guarantees compared to some classical algorithms, and distributed quantum computing methods whose relevance is on the rise as utility-scale problems begin to get solved with quantum computers. Drawing strength from several computing fronts, we develop a distributed/parallelization scheme in conjunction with the quantum algorithm design and apply these techniques to real-world datasets for wildfire detection. While our quantum subprocesses are currently too small to see significant quantum advantage, our results validate the utility of these techniques, and continue forging the path toward distributed quantum computing.

24.
arXiv (CS.CV) 2026-06-11

Exploring Adaptive Masked Reconstruction for Self-Supervised Skeleton-Based Action Recognition

Recently, masked skeleton reconstruction models have emerged as strong action representation learners, driving significant progress in self-supervised skeleton-based action recognition. However, existing state-of-the-art methods must predict an exceedingly large number of spatiotemporal patches, significantly prolonging training time. Besides, by treating all spatiotemporal regions equally during reconstruction, these models are distracted from learning the critical motion patterns that underlie action semantics. To address these challenges, we propose Adaptive Masked Reconstruction (AMR), a faster and stronger pre-training framework. We first decouple the decoder from the encoder, enabling flexible prediction of larger spatiotemporal patches and dramatically reducing reconstruction complexity. Given that larger patches contain more complex information, which is challenging to predict and consequently degrades performance, we accordingly introduce an adaptive guidance module. This module identifies regions of high motion informativeness, guiding the model to focus on the most discriminative parts of each patch and alleviating reconstruction difficulty. Experiments on NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD datasets demonstrate that AMR not only accelerates pre-training substantially but also improves downstream recognition accuracy, surpassing current state-of-the-art approaches.

25.
arXiv (CS.CV) 2026-06-12

AudioX-Turbo: A Unified Framework for Efficient Anything-to-Audio Generation

Audio and music generation based on flexible multimodal control signals is a widely applicable topic, with the following key challenges: 1) a unified multimodal modeling framework, 2) large-scale, high-quality training data, and 3) the prohibitive inference cost of multi-step diffusion sampling. As such, we propose AudioX-Turbo, a unified and efficient framework for anything-to-audio generation that integrates varied multimodal conditions (i.e., text, video, and audio signals) in this work. AudioX-Turbo follows a teacher-student paradigm. The teacher AudioX-Base is built on a Multimodal Diffusion Transformer with a Multimodal Adaptive Fusion module that aligns diverse multimodal inputs for high-fidelity synthesis, and is then distilled into the few-step student AudioX-Turbo via Distribution Matching Distillation adapted to flow matching, complemented by a diffusion-based discriminator for high-quality few-step generation. To support the training of AudioX-Turbo, we construct a large-scale, high-quality dataset, IF-caps-Pro, comprising approximately 9.2M samples curated through a two-stage data collection and annotation pipeline. We benchmark AudioX-Turbo across a wide range of tasks, finding that our model achieves superior performance, especially on text-to-audio and text-to-music generation, while operating at only 4 sampling steps and requiring approximately 25x fewer function evaluations (NFE) than multi-step baselines. These results demonstrate that our method is capable of audio generation under flexible multimodal control, showing efficient and powerful instruction-following capabilities. The code and datasets will be available at https://zeyuet.github.io/AudioX-Turbo/.