Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
Nature (Science) 2026-06-10

Deep learning four decades of human migration

Authors:

Human migration is a fundamental driver of global demographic change, shaping population structure, labour markets and social policy across countries1–3. Although long-term migration patterns are often linked to economic development4, they can shift rapidly in response to shocks such as conflict, environmental crises and political change5. Despite its importance, migration remains difficult to measure consistently: existing data are sparse, concentrated in high-income settings and are fragmented across incompatible definitions, temporal resolutions and data types6–8. Past efforts have relied on partial datasets, including flow records, stock estimates and model-based reconstructions with limited coverage9–14. A central challenge is therefore to construct a globally consistent, high-resolution account of migration flows over time. Here we present a new dataset of annual origin-destination migration across 230 countries and regions from 1990 to the present, integrating diverse data sources into a unified modelling framework. By combining official statistics, census-based stocks, net migration estimates and past flow reconstructions, our approach produces temporally detailed and spatially comprehensive estimates that substantially extend existing resources. Using an ensemble of deep recurrent neural networks informed by geographic, economic, cultural and political covariates, we capture both persistent trends and short-term responses to changing conditions—all while propagating uncertainty to generate confidence bounds. Our results outperform existing five-year flow estimates on held-out data and provide finer temporal resolution, revealing previously obscured dynamics in global migration patterns. This framework highlights regions in which uncertainty remains high and data collection is most urgently needed. By releasing all data, code and trained models, we provide a transparent and reproducible foundation for future work. These advances enable a more timely and detailed understanding of human mobility, with implications for research and policy in an increasingly dynamic global system. A global annual migration-flow dataset (1990–2024) is produced using deep-learning models and diverse sources to estimate movements across 230 countries with improved temporal resolution, coverage and uncertainty estimates.

02.
arXiv (CS.LG) 2026-06-12

Physics-Informed Neural Networks for Chemotherapy Pharmacokinetics: Benchmarking the Clinical Estimator and Exposing Parameter Identifiability

arXiv:2606.12658v1 Announce Type: new Abstract: Physics-Informed Neural Networks (PINNs) are an attractive tool for partial-observation problems in biology, where the governing dynamics are known but some compartments cannot be measured. Chemotherapy pharmacokinetics (PK) is a clean instance: drug concentration in plasma is routinely measured, but concentration in tissue – which determines tumour kill and off-target toxicity – is not. We benchmark a PINN against the standard clinical baseline (nonlinear least-squares on the analytical biexponential plasma solution, hereafter NLS) and a physics-agnostic neural baseline (a data-only MLP) on two PK problems. On the linear two-compartment problem, NLS is near-optimal; the PINN matches it to within a small constant factor while also producing the tissue curve in a single training pass, whereas the data-only MLP fails on tissue by roughly 10x. On a Michaelis-Menten extension (saturable elimination), the biexponential closed form no longer exists, so NLS is mis-specified and silently returns meaningless rate constants. The PINN instead exposes a deeper fact: the Michaelis-Menten two-compartment model is non-identifiable from plasma alone, and the PINN reports this honestly by converging to a basin with k12 -> 0. Adding two sparse tissue observations largely resolves identifiability: across five seeds the PINN recovers k21 to within 1% of truth and Vmax, Km to within one standard-deviation bar, while k12 moves in the correct direction (0.02 -> 0.82) but remains ~2 sigma below truth – a recovery the closed-form NLS estimator cannot attempt at all, because its biexponential ansatz describes only plasma. Our claim is not that PINNs beat NLS. It is that PINNs offer a uniform recipe that ties the textbook estimator on the textbook problem, exposes structural identifiability that the textbook estimator hides, and absorbs heterogeneous measurements within a single loss.

03.
arXiv (math.PR) 2026-06-19

Theory of uncertain probability: can we derive the probability density function of uncertain random experiments with continuously changing conditions?

Authors:

arXiv:2606.20169v1 Announce Type: new Abstract: This paper aims to explore the formation mechanism of probability distribution in situations where the differences among random experiments are distinguishable, and these differences continue to evolve along with the dynamic changes in conditions and their mechanisms of action. To this end, we are motivated to devise a new theoretical system – theory of uncertain probability (TUP) with Kolmogorov's system and nonlinear theories as special cases. TUP develops a novel model that integrates probability and uncertainty as well as the known and unknown to more accurately depict numerous typical random phenomena under more realistic assumptions, and thus provides appropriate tools for greater variety of real needs. It also allows for pioneering interpretation of the causal mechanisms underlying many important distributional characteristics and incorporation of pathwise property to distribution model.

04.
arXiv (CS.CL) 2026-06-12

Beyond Uniform Tokens: Adaptive Compression for Time Series Language Models

Large language models (LLMs) have enabled time series (TS) analysis by jointly modeling numerical observations and textual context through a shared token interface. However, TS tokens and prompt tokens exhibit fundamentally different information structures, making uniform token processing inefficient. In this paper, we study token efficiency in TS language modeling from an asymmetric-token perspective. We show that TS tokens have highly uneven spectral contributions, where many tokens share redundant frequency patterns while a small subset preserves critical temporal evidence. We also observe that prompt-token influence attenuates with model depth, suggesting that full prompt retention across all layers is unnecessary. Based on these findings, we develop an adaptive token budgeting framework that compresses TS tokens via frequency-domain structure and progressively reduces prompt tokens across layers. Experiments across forecasting, classification, imputation, and anomaly detection demonstrate up to 7.68$\times$ inference acceleration and performance gains in 78\% of evaluated settings, showing the effectiveness of asymmetric token compression for scalable TS foundation models.

05.
arXiv (CS.AI) 2026-06-15

When Sample Selection Bias Precipitates Model Collapse

arXiv:2606.13732v1 Announce Type: new Abstract: The proliferation of recursive training on synthetic data can alleviate data scarcity but risks model collapse, where repeated training erodes distributional tails and homogenizes outputs. Data selection is widely viewed as a remedy, yet its reliability depends critically on the reference distribution used by the verifier. We show that in low-resource verification regimes, where each verifier observes only a small, fragmented, and biased slice of the target manifold, selection itself becomes biased. This situation naturally arises in low-resource data silos such as healthcare consortia or proprietary financial institutions, where raw data cannot be pooled and local references are inherently incomplete. As a result, selection preferentially retains samples aligned with the local manifold while pruning globally relevant tail modes, turning from a safeguard against collapse into a mechanism that precipitates it. We theoretically prove that such siloed selection accelerates collapse and induces power-law diversity decay. As an initial mitigation, we construct Wasserstein proxy references from multiple silos without sharing raw data. Empirical results confirm that local-reference selection fails on skewed distributions, whereas collaborative proxy references mitigate diversity degradation, suggesting that recursive synthetic-data pipelines require particular caution when real-data coverage is fragmented or scarce.

06.
arXiv (CS.CV) 2026-06-16

Question-Aware Evidence Ledgers for Video Relational Reasoning

The VRR-QA challenge evaluates visual relational reasoning in videos, where answers often depend on implicit spatial relations, event boundaries, target identity, and dialogue context rather than a single salient frame. We present a test-time reasoning pipeline built around a strong GPT-5.5 video QA solver and a set of question-aware evidence ledgers. The initial solver answers each question from a uniform video representation, while routed ledgers are prompted to make the required targets, count units, reference frames, and temporal or spatial scope explicit for counting, spatial, endpoint, viewpoint, and dialogue reasoning. External tools such as open-vocabulary detection, depth cues, pair crops, ASR, and scene-graph ledgers are used only as evidence sources. A conservative gate keeps the current answer unless independent evidence uniquely supports a different option. The final evidence-gated pipeline achieves 92.95% overall accuracy and 93.79% macro accuracy on the challenge test split.

07.
arXiv (CS.CL) 2026-06-11

MA-DLE: Speech-based Automatic Depression Level Estimation via Memory Augmentation

Speech-based automatic estimation of depression levels is essential for enabling early detection and timely intervention, particularly in resource-constrained mental health settings. In recent years, deep learning has demonstrated impressive success across various domains, including affective computing and mental health assessment. Most existing approaches rely on RNN-based architectures (such as LSTM and GRU) to model temporal information for depression estimation. However, the extracted features often emphasize only a few adjacent speech segments, limiting their ability to capture long-range dependencies. To overcome this limitation, we introduce a memory-based feature augmentation method that enhances the representational capacity of GRU-extracted features. Rather than indiscriminately incorporating historical data, our memory bank is designed to selectively integrate two types of components in order to reduce redundancy and irrelevance: (1) historical temporal features that closely resemble the current GRU output, offering complementary contextual information; and (2) dynamic memory features identified based on feature variability, which capture behavioral and emotional fluctuations indicative of depressive symptoms. To effectively fuse the memory-augmented features with GRU outputs, we further design a Hierarchical Attention Fusion (HAF) module. Our method is evaluated on the widely used DAIC-WOZ and E-DAIC datasets, achieving state-of-the-art performance.

09.
arXiv (CS.LG) 2026-06-11

Learning Object Manipulation from Scratch via Contrastive Interaction

arXiv:2606.11525v1 Announce Type: cross Abstract: Contrastive Reinforcement Learning (CRL) has seen recent success in a wide variety of goal-conditioned robotics tasks by learning structured representations of the dynamics. However, despite its success in locomotion and simpler control domains, CRL often struggles in interaction-rich manipulation. We argue that a key source of this difficulty is object-centric interaction, such as contact or grasping, that induces distinct changes in the underlying dynamic modes. In this work, we formulate manipulation dynamics as a piecewise-smooth Markov process and show that interaction-induced mode changes create piecewise nonlinear reachability structures that are difficult for standard CRL energy functions to represent and plan over. Based on this analysis, we introduce Interaction-weighted Resampling (IWR). IWR performs interaction-aware resampling around phases before, during, and after interactions, encouraging the learned representation to preserve the mode boundaries that determine future reachability to capture multi-modal and piecewise nonlinear reachability. Across interaction-centric environments, including 2D dynamic control, robotic manipulation, and robot air hockey, IWR improves both sample efficiency and overall performance over prior CRL methods, with 19.8% average improvement in simulation. Finally, using a sim-to-real pipeline with policies trained by IWR, we demonstrate the first real-world goal-conditioned robot air hockey agent capable of hitting goals, improving success from 25% to 60%. Project Page: IWR-arxiv.github.io.

10.
arXiv (CS.CV) 2026-06-11

MentisOculi: Revealing the Limits of Reasoning with Mental Imagery

Frontier models are transitioning from multimodal large language models (MLLMs) that merely ingest visual information to unified multimodal models (UMMs) capable of native interleaved generation. This shift has sparked interest in using intermediate visualizations as a reasoning aid, akin to human mental imagery. Central to this idea is the ability to form, maintain, and manipulate visual representations in a goal-oriented manner. To evaluate and probe this capability, we develop MentisOculi, a procedural, stratified suite of multi-step reasoning problems amenable to visual solution, tuned to challenge frontier models. Evaluating visual strategies ranging from latent tokens to explicit generated imagery, we find they generally fail to improve performance. Analysis of UMMs specifically exposes a critical limitation: While they possess the textual reasoning capacity to solve a task and can sometimes generate correct visuals, they suffer from compounding generation errors and fail to leverage even ground-truth visualizations. Our findings suggest that despite their inherent appeal, visual thoughts do not yet benefit model reasoning. MentisOculi establishes the necessary foundation to analyze and close this gap across diverse model families.

11.
arXiv (math.PR) 2026-06-11

Heat kernel estimates for Markov processes with blowing-up jump kernels

arXiv:2512.24807v2 Announce Type: replace Abstract: In this paper, we establish sharp two-sided heat kernel estimates for a large class of purely discontinuous symmetric Markov processes on closed subsets $F$ of $\mathbb{R}^d$, whose jump kernels blow up on a Borel subset $\Sigma$ of $F$. We assume that $F\setminus \Sigma$ is a $\kappa$-fat set and is dense in $F$. To the best of our knowledge, this is the first work establishing sharp heat kernel estimates for jump processes whose jump kernels blow up on part of the state space. The jump kernels under consideration take the form $J(x,y)=|x-y|^{-d-\alpha}{\mathcal B}(x,y)$, where $\alpha\in (0,2)$ and the function ${\mathcal B}(x,y)$ blows up at a subset $\Sigma$ of $F$. A fundamental obstacle is that the tails of the jump measures are not uniformly bounded, and hence standard techniques in heat kernel analysis do not provide a priori off-diagonal estimates. To overcome this difficulty, we develop a new approach based on weighted integral estimates for the heat kernel that are sensitive to both the blow-up behavior of the jump kernel and the geometry of $F\setminus \Sigma$. Examples of processes falling within our general framework include traces of isotropic $\alpha$-stable processes in $C^{1,\rm Dini}$ sets, processes in Lipschitz sets arising in connection with the nonlocal Neumann problem, and a large class of resurrected self-similar processes in the closed upper half-space.

12.
arXiv (quant-ph) 2026-06-17

The Standard Model, The Exceptional Jordan Algebra, and Triality

Authors:

arXiv:2006.16265v2 Announce Type: replace-cross Abstract: Jordan, Wigner and von Neumann classified the possible algebras of quantum mechanical observables, and found they fell into 4 "ordinary" families, plus one remarkable outlier: the exceptional Jordan algebra. We point out an intriguing relationship between the complexification of this algebra and the standard model of particle physics, its minimal left-right-symmetric $SU(3)\times SU(2)_{L}\times SU(2)_{R}\times U(1)$ extension, and $Spin(10)$ unification. This suggests a geometric interpretation, where a single generation of standard model fermions is described by the tangent space $(\mathbb{C}\otimes\mathbb{O})^{2}$ of the complex octonionic projective plane, and the existence of three generations is related to $SO(8)$ triality.

13.
arXiv (CS.AI) 2026-06-12

EpiBench: Verifiable Evaluation of AI Agents on Epigenomics Analysis

arXiv:2606.13602v1 Announce Type: new Abstract: We introduce EpiBench, a verifiable benchmark for short-horizon epigenomics analysis. EpiBench evaluates whether agents can make well-defined analysis decisions from realistic workflow states and return deterministically gradable answers. The benchmark includes 106 evaluations across CUT\&Tag/CUT\&RUN, ATAC-seq, ChIP-seq, and DNA methylation workflows. Across 5,088 valid trajectories from 16 model-harness pairs, no system passed a majority of attempts: GPT-5.5 / Pi led at 45.0\% (143/318 attempts; 95\% confidence interval (CI), 36.3–53.7), followed by GPT-5.5 / OpenAI Codex at 39.9\% (127/318 attempts; 95\% CI, 31.6–48.3). Claude Opus 4.8 Max / Pi and GPT-5.4 / Pi each passed 39.0\% (124/318 attempts; 95\% CI, 30.2–47.8 and 31.0–47.0, respectively). Performance varies across assay types, and many failed runs still contain parts of the correct answer. Agents often found the right files and computed useful intermediate results, but failed when the task required deeper, assay-specific scientific judgment.

15.
arXiv (CS.LG) 2026-06-17

CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models

arXiv:2606.17464v1 Announce Type: new Abstract: Membership inference attacks (MIAs) are a canonical way to assess a machine learning model's privacy properties. Although several attempts have been made to evaluate MIAs on language models, the extant literature has suffered numerous difficulties in constructing clean evaluations to test new techniques. In particular, subtle distribution shifts between member and non-member sets can undermine the statistical validity of MIAs; recent work has underscored this by showing that "blind" methods with no access to the underlying model can perform far better than published methods on the same benchmarks. This paper constructs a benchmark for principled evaluation of MIAs against LLMs, by leveraging the insight that training data before and after a fixed point during training are drawn from the same distribution. Therefore, all open-source models with intermediate checkpoints and public training data can be converted into MIA testbeds. We apply our framework to a half-dozen published attacks on the Pythia and OLMo family of models, from 70M to 7B parameters. To facilitate further privacy research, we open-source a modular library for designing and implementing attacks in this setting: https://github.com/safr-ai-lab/pandora_llm.

16.
arXiv (math.PR) 2026-06-16

A small noise approximation for Muller's Ratchet

arXiv:2606.15842v1 Announce Type: new Abstract: We consider an infinite system of SDEs with Fleming-Viot noise indexed by $k=0,1,2,\dots$, whose parameters $\alpha,\lambda$, and $\nu$ are the (deleterious) selection coefficient, the (uni-directional) mutation rate, and a quantity which determines the size of the system's fluctuations. The SDE's unique weak solution $X(t) = (X_k(t))_{k=0,1,2,...}$ models what is known in population genetics as Muller's ratchet. Here, $X_k(t)$ stands for the frequency of individuals carrying $k$ deleterious mutations. Since the mutation process is uni-directional, $t\mapsto \inf\{k: X_k(t)> 0\}$ is non-decreasing for almost every path of $X$, and we refer to an increase as a click of Muller's ratchet. A long standing question concerns the clicking rate of Muller's ratchet. Using Duhamel's principle for semigroups, we give a partial answer by approximating $E(\sum_{k=1}^\infty kX_k(t) )$ and $E\big(X_0(t)\big)$ up to $O(1/\nu^2)$ for fixed $\alpha$, $\lambda$ and $t>0$. Our results suggest that $\psi:=\nu \alpha e^{-\lambda/\alpha}$ is a crucial quantity also when the mutation/selection ratio $\theta = \lambda/\alpha$ is moderately large: for large $\nu \alpha$, clicking of the ratchet on the time scale $\frac 1\alpha \log \theta$ becomes rare as soon as $\psi$ becomes large.

17.
arXiv (CS.AI) 2026-06-11

FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning

arXiv:2606.12406v1 Announce Type: cross Abstract: Contact-rich manipulation requires force sensitivity, but many robot arms lack dedicated force sensors due to their high cost. We present Neural External Torque Estimation (NEXT), a data-driven method that estimates external joint torques without needing any dedicated force sensors. NEXT trains in 1 minute from only 10 minutes of free-motion data, yet achieves estimates comparable to dedicated joint-torque sensors. NEXT enables force-feedback teleoperation on low-cost arms and improves policy learning through Force-Informed Re-Sampling Training (FIRST), which up-samples pre-contact and contact segments during behavior cloning. Across five long-horizon tasks, FIRST outperforms prior force-aware policies by over 17% in task progress. Together, NEXT and FIRST bring force-aware teleoperation and policy learning to off-the-shelf robots without additional sensing hardware. Video results and code are available at https://jasonjzliu.com/factr2

18.
arXiv (quant-ph) 2026-06-17

Kinematic properties of the Pauli equation

arXiv:2606.17548v1 Announce Type: new Abstract: Based on the Wigner-Vlasov formalism, this paper investigates the kinematic properties of the Pauli equation. It is shown that the probability current associated with the Pauli equation can be represented as a superposition of two currents with certain expansion coefficients. Each of these currents corresponds to a particular component of the spinor. The expansion coefficients effectively serve as weighting functions that determine the probability contribution of the corresponding spinor component. Therefore, each spin projection corresponds to its own probability flux. A new system of the Hamilton-Jacobi equations and also a system of motion equations in electromagnetic fields are obtained, taking into account the interaction between the spin and the magnetic field. To illustrate how these equations can be applied we have investigated the quantum system kinematics in detail using an exact solution of the Pauli equation in the presence of a uniform magnetic field and an asymmetric quadratic potential.

19.
arXiv (CS.AI) 2026-06-15

Elastic Queries Reinforcement Learning: Self-Aware Policy Execution for VLA Models

arXiv:2606.14375v1 Announce Type: cross Abstract: Vision-language-action (VLA) models are powerful action generators for robot manipulation, but they are typically executed with fixed inference and replanning schedules. This rigidity ignores the uneven difficulty of robot control: contact-rich or uncertain states may need more computation and fresher feedback, while easier states can often be handled with fewer inference steps and longer open-loop execution. We propose Elastic Queries Reinforcement Learning (EQRL), a framework that makes each VLA policy query elastic. A lightweight latent-schedule adaptor jointly selects the latent input, denoising budget, and action chunk length, without fine-tuning the underlying VLA model. To make scheduling difficulty-aware, EQRL trains a critic over the joint latent-schedule action and derives a state difficulty signal from critic ensemble disagreement. This signal guides compute toward difficult states, while a learned residual allows task-driven correction. We formulate variable chunk execution as query-level macro-action RL with chunk-dependent discounting and an amortized number-of-function-evaluations (NFE) budget. Across simulation and real-robot manipulation, EQRL reduces amortized inference cost while preserving or improving task success.

20.
arXiv (quant-ph) 2026-06-16

Cosmological Pseudo-Entropy

arXiv:2606.15227v1 Announce Type: cross Abstract: We study pseudo entropy $\mathcal{S}$, a recent generalization of entanglement entropy, for scalar cosmological perturbations in de Sitter space with sound speed $0.024 \leq c_s \leq 1$, and in expanding and contracting FLRW backgrounds with varying equation-of-state parameter $w$. In de Sitter space, $\mathrm{Re}(\mathcal{S})$ grows after horizon exit while $c_s$ controls its onset and saturates at late times. A similar saturation occurs in expanding-accelerating and contracting-decelerating backgrounds. In contrast, expanding-decelerating and contracting-accelerating backgrounds show large early-time $\mathrm{Re}(\mathcal{S})$ followed by oscillations after horizon re-entry. This happens because while the squeezing freezes, the squeezing angle doesn't. Unlike entanglement entropy, pseudo entropy possesses an imaginary part, $\mathrm{Im}(\mathcal{S})$, as well, which can encode the relative phase. $\mathrm{Im}(\mathcal{S})$ decays to zero in de Sitter and expanding-accelerating cases, but forms dense sub-Hubble oscillation bands in expanding-decelerating and contracting-accelerating backgrounds. Compared with entanglement entropy, Krylov complexity, and Nielsen circuit complexity, pseudo entropy captures otherwise hidden phase information; in the unsaturated regime, its slope is $\sqrt{2}$ times that of Nielsen complexity. Unlike circuit complexity, whose saturation bound is $w$-independent, pseudo entropy is sensitive to $w$ during the transition regime, making it a finer information theoretic diagnostic of cosmological dynamics.

21.
arXiv (CS.LG) 2026-06-16

p-PSO: A Penalized Particle Swarm Optimization Technique for Finding D-Optimal Designs with Mixed Factors in Generalized Linear Models

arXiv:2606.15962v1 Announce Type: cross Abstract: Finding D-optimal designs for generalized linear models (GLMs) is challenging due to the dependence of the Fisher information matrix on unknown parameters and the lack of closed-form solutions, particularly when input factors include both discrete and continuous variables. Although classical algorithms and recent metaheuristic approaches have offered partial solutions, there remains a need for robust and computationally efficient methods. In this paper, we propose a penalized Particle Swarm Optimization (PSO) approach, named $p$-PSO. Here we introduce a new, general-purpose penalty formulation for constrained optimization and demonstrate its effectiveness in optimal design problems. The formulation is algorithm-agnostic and applicable to a broad class of black-box optimization methods. Results show that the method is highly efficient, with its primary contribution being a penalty formulation that enables the direct use of an off-the-shelf PSO algorithm and extends naturally to more general constrained optimization tasks.

22.
Nature (Science) 2026-06-10

Gene ancestries reveal diverse microbial associations during eukaryogenesis

The origin of eukaryotes remains a central enigma in biology1. Continuing debates agree on the pivotal role of a symbiosis between an alphaproteobacterium and an Asgard archaeon2,3. However, the nature, timing and contributions of other potential bacterial partners4–6 and the role of interactions with viruses7–9 remain contentious. To address these questions, we used advanced phylogenomic approaches and comprehensive datasets spanning the known diversity of cellular life and viruses. Our analysis provided a revised reconstruction of the last eukaryotic common ancestor (LECA) proteome, in which we traced the phylogenetic origin of each protein family. We found compelling evidence for multiple waves of horizontal gene transfer from diverse bacterial donors, with some likely to have preceded mitochondrial endosymbiosis. We inferred plausible traits of the major donors and their functional contributions to the LECA. Our findings support a contribution of horizontal gene transfers to shaping the proteomes of pre-LECA ancestors and suggest a facilitating role of Nucleocytoviricota viruses. Taken together, our results suggest that ancient eukaryotes may have originated within complex microbial ecosystems through a succession of diverse associations that left a footprint of horizontally transferred genes. Phylogenomic reconstruction of the proteome of the last eukaryotic common ancestor sheds light on the origin of eukaryotes, indicating an important role of horizontal transfer of genes from diverse bacterial and viral donors.

23.
arXiv (CS.CV) 2026-06-18

VidCRAFT3: Camera, Object, and Lighting Control for Image-to-Video Generation

Controllable image-to-video (I2V) generation transforms a reference image into a coherent video guided by user-specified control signals. While precise control over camera motion, object motion, and lighting is essential for high-fidelity creation, existing methods often treat these factors independently. This overlooks the physical coupling among viewpoint, geometry, and illumination in dynamic scenes, leading to visual inconsistencies such as mismatched shadows and perspective drift under simultaneous changes. We present VidCRAFT3, a unified and flexible I2V framework that explicitly models cross-factor interactions among geometry, motion, and illumination, enabling both independent and joint control over camera motion, object motion, and lighting direction. Image2Cloud provides explicit 3D geometric priors for accurate camera motion control. ObjMotionNet encodes sparse object trajectories into multi-scale motion features to guide realistic object motion. A Spatial Triple-Attention Transformer integrates lighting direction through lighting cross-attention for consistent relighting. To address the scarcity of jointly annotated data, we construct the VideoLightingDirection (VLD) dataset with accurate per-frame lighting direction annotations, and introduce a three-stage progressive training strategy that enables robust learning without fully joint annotations. Extensive experiments demonstrate that VidCRAFT3 achieves state-of-the-art performance in control precision and visual coherence across diverse scenarios.

24.
arXiv (CS.AI) 2026-06-11

MPC-Patch-Bench: Security-Aware LLM Code Patch for Multi-Party Computation

arXiv:2606.11416v1 Announce Type: cross Abstract: Repository-level benchmarks for evaluating Large Language Model (LLM) code repair on Secure Multi-Party Computation (MPC) software do not yet exist, and directly transplanting general-purpose benchmarks such as SWE-bench fails on three structural fronts: (i) MPC repositories are dominated by generic Python infrastructure rather than cryptographic logic; (ii) high-value MPC fixes lack the standardized tests rigid extraction pipelines require; and (iii) standard fail-to-pass evaluation is insufficient for code that must also be cryptographically safe. MPC is increasingly deployed for privacy-preserving machine learning, biomedical collaboration, and secure analytics. Existing MPC-specific code-synthesis efforts cover only operator-level or single-framework tasks; evaluating LLM agents on real repository-level MPC repair instead demands MPC-aware data curation and a verifier matched to the security and numerical-fidelity guarantees MPC programs must obey neither of which existing benchmarks provide. We introduce MPC-Patch-Bench, a repository-level benchmark organised around two frameworks. (1)The Data Curation Framework combines a domain-specific curation agent that filters raw pull requests through three cryptographic layers with a human-AI completion engine that synthesizes missing problem statements and Fail-to-Pass/Pass-to-Pass tests, yielding 205 fully verified instances. (2)The MPC Verifier provides dedicated security and numerical-fidelity checks via dynamic differential testing against plaintext oracles and MPC-specific static analysis rules that flag unsafe reveals, insecure arithmetic, and illegal public/private casts. The strongest evaluated LLM functionally resolves only 22.9% of MPC-Patch-Bench tasks; the MPC Verifier further reduces verified resolution to 17.1%, with up to 40% of functionally-passing patches rejected for cryptographic or numerical-fidelity violations.

25.
arXiv (CS.AI) 2026-06-11

Preregistration for Experiments with AI Agents

arXiv:2606.11217v1 Announce Type: cross Abstract: The proliferation of large language models (LLMs) and autonomous AI agents has given rise to a rapidly growing methodological paradigm: "in silico" behavioral experiments. Originally conceived as a way to use AI agents as proxies for human participants in studies of cognition, decision-making, and social dynamics, this approach has taken on new significance – as AI agents increasingly negotiate, transact, and make consequential decisions on behalf of people and organizations, understanding their behavior has become a research priority in its own right. While these experiments with AI agents offer unprecedented advantages in terms of scalability, cost efficiency, and experimental control, they also inherit, and in some cases amplify, methodological vulnerabilities that have long plagued human subjects research. To address these issues, this paper argues that preregistration practices – central to improving the credibility of human subjects experiments – should now be extended to experiments with AI agents. We systematically catalog the researcher degrees of freedom that experiments with AI agents introduce – model selection, prompt wording, settings, and outcome-contingent redesign, for example – and show how the low cost of iteration and lack of reporting norms make these choices both easy to exploit and difficult to detect. We propose a preregistration template tailored to experiments with AI agents and call on conferences, journals, and funding agencies to make preregistration standard practice for this emerging research paradigm.