Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (math.PR) 2026-06-11

Capital Asset Pricing Model with Size Factor and Normalizing by Volatility Index

arXiv:2411.19444v5 Announce Type: replace-cross Abstract: The Capital Asset Pricing Model (CAPM) relates a well-diversified stock portfolio to a benchmark portfolio. We insert size effect in CAPM, capturing the observation that small stocks have higher risk and return than large stocks, on average. For some size-based stock portfolios, dividing their returns by the Volatility Index makes them closer to independent and normal. In this article, we combine these ideas to create a new discrete-time model, which includes volatility, relative size, and CAPM. We fit this model using real-world data, prove the long-term stability, and connect this research to Stochastic Portfolio Theory. We fill important gaps in our previous article on CAPM with the size factor.

02.
arXiv (CS.AI) 2026-06-16

Orchestrated Reality: From Role-Play to Living, Playable Game Worlds – LLM-Driven World Simulation as a Parameterized-Action POMDP

arXiv:2606.16014v1 Announce Type: cross Abstract: Many games rely on storytelling combined with systems that track levelling, NPC behaviour, and consequence simulation; bridging tightly-authored narrative with deeply-simulated worlds – most acute in sandbox and open-world settings – has been prohibitively expensive. LLM-driven worlds open a new path: a single harness can coordinate numerical state, narrative voice, storytelling pacing, and rule logic together. Realising this requires the LLM system to sustain a persistent world (who is where, what has just happened, what is currently true), which today's deployed systems do not: the narrative voice asserts state in free prose without any validated representation, so a fully autonomous game engine remains infeasible. We treat this as an architectural choice, not a limitation of language models, and report work in progress on a framework – orchestrated reality – that makes the world a canonical object owned by a singleton orchestration agent analogous to the tabletop-RPG Game Master (GM). We formalise an LLM-driven game world for a human player as a Parameterized-Action POMDP: state is a tree of canonical JSON entities, actions decompose as $a=(k, x_k)$ (a discrete intent kind plus structured JSON parameters), the agent observes only a narrative projection $o=O(s)$ of state, and the transition kernel $F$ is an LLM-driven Plan-Diff-Validate-Apply (PDVA) pipeline that commits schema-validated, content-hashed JSON deltas. We give the formal model, a JSON-state example, a worked single-turn example, and a catalogue of 15 illustrative incidents drawn from a real deployment showing the framework in action. Empirical validation through a planned human player study – together with multi-NPC concurrent agency and deployment as an RL environment – is situated as future work.

03.
arXiv (CS.AI) 2026-06-19

Process-Verified Reinforcement Learning for Theorem Proving via Lean

arXiv:2606.20068v1 Announce Type: new Abstract: While reinforcement learning from verifiable rewards (RLVR) typically has relied on a single binary verification signal, symbolic proof assistants in formal reasoning offer rich, fine-grained structured feedback. This gap between structured processes and unstructured rewards highlights the importance of feedback that is both dense and sound. In this work, we demonstrate that the Lean proof assistant itself can serve as a symbolic process oracle, supplying both outcome-level and fine-grained tactic-level verified feedback during training. Proof attempts are parsed into tactic sequences, and Lean's elaboration marks both locally sound steps and the earliest failing step, yielding dense, verifier-grounded credit signals rooted in type theory. We incorporate these structured rewards into a GRPO-style reinforcement learning objective with first-error propagation and first-token credit methods that balances outcome- and process-level advantages. Experiments with STP-Lean and DeepSeek-Prover-V1.5 show that tactic-level supervision outperforms outcome-only baselines in most settings, delivering improvements on benchmarks such as MiniF2F and ProofNet. Beyond empirical gains, our study highlights a broader perspective: symbolic proof assistants are not only verifiers at evaluation time, but can also act as process-level reward oracles during training. This opens a path toward reinforcement learning frameworks that combine the scalability of language models with the reliability of symbolic verification for formal reasoning.

04.
arXiv (CS.CL) 2026-06-16

Can Agents Read the Room? Benchmarking Visual Social Intelligence in Multimodal Simulation

Social interaction depends on both language and visible social signals, such as facial expressions, posture, gaze, and emotional shifts. Yet existing social-agent benchmarks are largely text-based and rarely test whether multimodal agents can use visual cues to guide interaction. We introduce \textsc{\benchmarkname{}}, a benchmark evaluating visual social intelligence in multimodal social simulation. It contains 240 scenarios, 585 role instances, and 2,340 role-task instances, combining aligned textual-visual evidence, structured role profiles, and four role-level tasks: expression task, characteristic task, interaction regulation task, and interaction outcome task. Evaluating seven recent MLLMs under verbalized-vision and direct-vision reveals a clear gap between local role enactment and interaction management: role-specific expression and conflict handling are near saturation, whereas interaction regulation and visually grounded outcome achievement remain substantially more difficult. The code is released at https://github.com/JunsWan/AgentViSS, and the dataset is available at https://huggingface.co/datasets/JunsWan/AgentViSS.

05.
arXiv (CS.CV) 2026-06-15

CottonLeafVision: An Explainable and Robust Deep Learning Framework for Cotton Leaf Disease Classification

Globally, cotton is a highly economically beneficial crop, as the textile industry heavily depends on it. So, the precise identification and detection of cotton leaf disease is crucial for economic stability. The development goal of "CottonLeafVision" is to accurately classify and detect cotton leaf disease. With this goal, we have evaluated multiple pretrained Deep Convolutional Neural Networks, including DenseNet201, InceptionV3, and VGG19 on a publicly available cotton leaf disease image dataset. This image dataset includes seven classes, six disease classes, and one healthy class, collected under various field conditions reflecting real-world challenges. Among these pretrained models, with DenseNet201, we have achieved the highest classification accuracy of 98%. To enhance the model reliability and interpretability, we have implemented different techniques and methods such as Gradient-weighted Class Activation Mapping (Grad-CAM), occlusion sensitivity analysis and adversarial training to increase the noise resistance of the model. Finally, we have developed a prototype in order to utilize the model's capabilities on real life agriculture. This paper shows the deep learning model's capabilities to classify the disease in real-life cotton disease management situations.

06.
arXiv (CS.LG) 2026-06-17

Edge Flow: A Tractable and Predictive Continuous-Time Model for Gradient Descent at the Edge of Stability

Authors:

arXiv:2606.18080v1 Announce Type: new Abstract: Gradient descent in deep learning may operate at the edge of stability (EoS), a regime in which the largest eigenvalue of the loss Hessian hovers near the stability threshold $2/\eta$, where $\eta$ is the learning rate. Classical analysis tools such as gradient flow and the descent lemma do not apply here, motivating the search for a continuous-time model valid at EoS. We propose Edge Flow, a system of three coupled ordinary differential equations that provides a tractable, faithful, and predictive model of gradient descent dynamics at EoS. Edge Flow decomposes the dynamics into a center, an oscillation direction, and an oscillation magnitude. The center follows a modified gradient flow on a symmetrized loss; the direction tracks a top eigenvector of the Hessian via Rayleigh quotient dynamics; and the magnitude grows or decays exponentially depending on whether the sharpness exceeds or falls below the threshold $2/\eta$. Crucially, sharpness stabilization emerges from the coupled dynamics via a self-stabilization feedback loop. Discretizing Edge Flow only requires two gradient evaluations and one Hessian–vector product at each iteration. We demonstrate empirically that Edge Flow tracks the dynamics of gradient descent at least as faithfully as previously proposed continuous-time EoS models, while in addition resolving the oscillation of the sharpness at the onset of EoS, and that it provides a principled framework for understanding and mitigating instabilities in this regime.

07.
arXiv (CS.AI) 2026-06-17

Statistical Foundations of LLM-based A/B Testing: A Surrogacy Framework for Human Causal Inference

arXiv:2606.17165v1 Announce Type: cross Abstract: Organizations and researchers show increasing interest in using large language models (LLMs) in place of human participants in A/B tests, in the hope of experimenting faster and at lower cost. We study when a treatment effect estimated on LLM outcomes recovers the effect that would have been measured on the human population of interest. Distributional equivalence between LLM and human outcomes would make any standard estimator valid but is unrealistic. We therefore develop a statistical framework that adapts surrogate endpoint theory to LLMs. The framework shows that calibrating LLM outcomes to human outcomes identifies the average treatment effect under surrogacy and comparability conditions that are jointly weaker than distributional equivalence. When these conditions fail, the effect of interest is only partially identified, and we provide diagnostics that can falsify surrogacy on historical experiments together with a bound on the worst-case bias from limited overlap. We further show that the stochasticity inherent to LLMs introduces both bias and variance, but using an average of multiple draws as the surrogate mitigates both. We illustrate the methods and theory in simulations and an application to A/B tests on Upworthy headlines. A central takeaway from our work is that the validity of LLM outcomes as surrogates can only be falsified for past treatments and never verified for new ones, so human experiments remain indispensable for novel interventions. We discuss the role of LLM choice, prompting, and temperature as design variables, and how to size human experiments for validation.

08.
arXiv (CS.CL) 2026-06-19

What sentiment analysis can't see: Measuring whether customers were helped, and what went wrong, across 70,000 support conversations

Most companies read their customer support data at scale using sentiment analysis, which measures how customers sound rather than whether they were satisfied with the result. We tested a richer alternative on 70,450 support conversations from a leading online fundraising platform: alongside tone, we used GPT-5.4 to estimate each customer's satisfaction and to flag whether they reported a concrete problem, then validated all three readings against the 1-to-5 ratings customers left on the conversations they rated. The satisfaction estimate tracked those ratings far better than sentiment did, correlating at 0.47 against 0.36 and flagging unhappy customers with far fewer false alarms. The structured read also sees what sentiment cannot: tone and satisfaction disagree in 44% of conversations, a single "Neutral" label hides everything from quietly satisfied customers to ones who quietly gave up, and the largest group of all is "tolerated friction," customers who are satisfied but still reporting a fixable problem, a standing issue that no sentiment-based dashboard can surface. The broader finding is that LLM-based annotation can capture far more than the tonality of a customer's language, offering strong potential for new business metrics grounded instead in the customer's state (whether they were satisfied) and the cause of their problem extracted directly from the raw textual data of interactions and feedback.

09.
arXiv (quant-ph) 2026-06-12

Vacuum photon emission and mean electromagnetic field in pair-creating external backgrounds

arXiv:2606.12547v1 Announce Type: cross Abstract: We develop a perturbative description of vacuum radiative processes in quantum electrodynamics with a prescribed external electromagnetic background capable of producing electron-positron pairs. Since the initial vacuum is then unstable and the in- and out-vacua are inequivalent, radiative observables require a real-time formulation beyond the ordinary in-out approach of vacuum-stable QED. Using the Keldysh-Schwinger-Fradkin nonequilibrium technique, we derive the mean number density of emitted photons through the second nonvanishing order in the fine-structure constant. The leading term, of order $\alpha$, reproduces the known vertex and tadpole mechanisms, while the complete order-$\alpha^2$ correction contains interference, loop, and induced-current contributions. We also give an independent derivation based on the spectral decomposition of the identity operator in the in-Fock space, where the photon number density is represented as a sum of squared transition amplitudes and vacuum-disconnected terms are canceled by the optical theorem generalized to an unstable vacuum. In addition, we compute the mean electromagnetic field through order $e^3$, including the electromagnetic dressing of the induced vacuum current, and verify it using the corresponding Schwinger-Dyson equations. The final formulas are expressed in terms of exact solutions and propagators of the Dirac equation in the external background and apply to general spacetime-dependent field configurations.

10.
arXiv (CS.CV) 2026-06-24

Cyclic Denoising Reveals Ultrastable Memories in Diffusion Models

We introduce cyclic denoising – repeated forward and reverse diffusion at controlled noise amplitudes – as an extraction attack for image diffusion models. Inspired by random organization in disordered solids, cyclic denoising exposes regions of the learned distribution that are largely inaccessible to standard sampling. The dynamics drive samples toward attractors with a broad stability spectrum. The deepest attractors are ultrastable: they regenerate after near-total corruption and persist through thousands of noising-denoising cycles. Many of these attractors correspond to memorized training images, including stock photographs, brand watermarks, and web-crawl artifacts. The attack requires only sampler-level control, with no gradients, weight inspection, prompts, captions, or prior knowledge of the training data. Unlike generate-and-filter attacks, which rely on large-scale prompted generation and post-hoc similarity or membership-inference filtering, our main protocol is fully unconditioned. We demonstrate the phenomenon in Stable Diffusion v1.4 and in a pixel-space DDPM, showing consistent behavior across latent- and pixel-space diffusion models. Across noise amplitudes, we observe a yielding-like transition: low-amplitude cycling produces trivial absorbing fixed points or limit cycles, while larger amplitudes induce rearrangements, basin hopping, and long-lived trapping in structured memorized attractor basins. We also observe hierarchical partial absorption, prompt-stabilized basins, and cross-initial-condition universality of the recovered attractor set. Our results therefore show that cyclic denoising is both a physics-inspired probe of generative landscapes and a practical tool for memorization auditing, with implications for privacy, copyright compliance, and model fingerprinting.

11.
arXiv (CS.CL) 2026-06-16

Long-Context Modeling via GSS-Transformer Hybrid Architecture with Learnable Mixing

Modeling long-range dependencies remains a central challenge in natural language processing. Transformer architectures achieve strong performance via self-attention but scale quadratically ($O(N^2)$) with sequence length, while State Space Models (SSMs) scale linearly ($O(N)$) but suffer from a selective recall bottleneck, struggling to retrieve precise information from compressed states. This creates a fundamental tradeoff between efficiency and perplexity. To tackle these challenges, we propose the Parallel Hybrid Architecture (PHA), which runs Gated State Spaces (GSS), Grouped Query Attention (GQA), and Feed-Forward Networks (FFNs) as independent parallel branches fused by a learnable mixing mechanism. Instead of forcing SSMs to approximate attention or serializing the two paradigms, PHA allows each branch to specialize: GSS captures global context, while attention performs selective retrieval, with FFN providing complementary processing. On WikiText-103, PHA achieves 16.51 PPL at 125M parameters, outperforming Hedgehog (16.70) and H3-125M (23.70). Scaling to 180M parameters yields 16.42 PPL, which gives comparable results with the pure attention baseline while delivering 24\% higher throughput and up to 40\% lower memory usage at long contexts. On OpenWebText, our 125M model achieves 19.72 PPL, outperforming standard Transformers (20.60) and GSS hybrid baselines (19.80). These results demonstrate that separating sequence modeling paradigms into parallel specialists enables Transformer-level perplexity with substantially improved efficiency for long-context language modeling.

12.
arXiv (CS.CL) 2026-06-16

Beyond Text-to-SQL: An Agentic LLM System for Governed Enterprise Analytics APIs

Enterprise analytics aims to make organizational data accessible for decision-making, yet non-technical users still face barriers when using traditional business intelligence tools or Text-to-SQL systems. While recent Text-to-SQL approaches based on Large Language Models (LLMs) promise natural language access to structured data, they fall short in enterprise settings where analytics pipelines rely on governed APIs rather than raw databases. In practice, these APIs encapsulate complex business logic to ensure consistency, auditability, and security. However, delegating mathematical or aggregation logic to an LLM introduces reliability and compliance risks. To this end, we present Analytic Agent, an LLM-based agentic system that translates natural language intents into secure interactions with enterprise analytics APIs. Evaluated on 90 real enterprise use cases constructed by domain experts, it reliably interprets user goals, validates permissions, executes governed queries, and generates compliant visualizations through multi-step reasoning and policy-aware orchestration.

13.
arXiv (CS.AI) 2026-06-16

ROSA-RL: Uncertainty-Aware Roundabout Optimized Speed Advisory with Reinforcement Learning

arXiv:2606.16558v1 Announce Type: new Abstract: Roundabouts challenge automated driving in mixed traffic, as heterogeneous and non-deterministic human behavior, unknown driving intentions, and high interaction complexity create uncertainty about whether the conflict zone will be blocked or available at the moment of entry. We present ROSA-RL – uncertainty-aware Roundabout Optimized Speed Advisory with Reinforcement Learning. It enables safe and efficient roundabout entry for automated and human-driven vehicles in mixed traffic through probabilistic conflict forecasting. A Transformer-based model predicts conflict zone occupancy over a five-second horizon, capturing multi-agent interactions to anticipate upcoming conflicts and available gaps. The prediction outputs encode uncertainty in future motion and intent, and augment the state of a classical RL framework, enabling uncertainty-aware speed coordination. Evaluated in simulations grounded in real-world data, ROSA-RL can effectively handle uncertainty and outperform a comparable model-based baseline, closing the gap to an ideal setting assuming fully known occupancy while improving traffic efficiency and safety. The source code of this work is available under: github.com/urbanAIthi/ROSA-RL.

14.
arXiv (CS.CL) 2026-06-11

Grammar-Constrained Decoding Can Jailbreak LLMs into Generating Malicious Code

Large Language Models (LLMs) are increasingly used for code generation, raising concerns that they may be misused to produce malicious code. Meanwhile, Grammar-Constrained Decoding (GCD) has been widely adopted to improve the reliability of LLM-generated code by enforcing syntactic validity. In this paper, we reveal a counterintuitive risk: this reliability-oriented technique can itself become an attack surface. We uncover a new jailbreak attack, termed CodeSpear, that exploits GCD to induce LLMs into generating malicious code. Our experiments show that simply applying a benign code grammar constraint can effectively jailbreak LLMs. To address this vulnerability, we propose CodeShield, a safety alignment approach that robustly preserves safe behavior even under attacker-controlled grammar constraints. CodeShield aligns the model in the code modality by teaching it to generate honeypot code under GCD. Such code is semantically harmless, so it does not implement the malicious request, and structurally diverse, so it is difficult to suppress through grammar tightening. At the same time, CodeShield still preserves natural-language refusals when natural language is available. Experiments on 10 popular LLMs across 4 benchmarks show that CodeSpear outperforms representative jailbreak baselines and increases the attack success rate by more than 30 percentage points on average. CodeShield also restores safety under CodeSpear while preserving benign utility. Our findings reveal a fundamental risk of GCD and call for greater attention to its potential security implications.

15.
arXiv (CS.CL) 2026-06-24

What's Missing in Vision-Language Models? Probing Their Struggles with Causal Order Reasoning

Despite the impressive performance of vision-language models (VLMs) on downstream tasks, their ability to understand and reason about causal relationships in visual inputs remains unclear. Robust causal reasoning is fundamental to solving complex high-level reasoning tasks, yet existing benchmarks often include a mixture of reasoning questions, and VLMs can frequently exploit object recognition and activity identification as shortcuts to arrive at the correct answers, making it challenging to truly assess their causal reasoning abilities. To bridge this gap, we introduce VQA-Causal and VCR-Causal, two new benchmarks specifically designed to isolate and rigorously evaluate VLMs' causal reasoning abilities. Our findings reveal that while VLMs excel in object and activity recognition, they perform poorly on causal reasoning tasks, often only marginally surpassing random guessing. Further analysis suggests that this limitation stems from a severe lack of causal expressions in widely used training datasets, where causal relationships are rarely explicitly conveyed. We additionally explore fine-tuning strategies with hard negative cases, showing that targeted fine-tuning can improve model's causal reasoning while maintaining generalization and downstream performance. Our study highlights a key gap in current VLMs and lays the groundwork for future work on causal understanding.

16.
arXiv (CS.CL) 2026-06-24

Removing Noise, not Finding Gold: Quality Filtering for Large-Scale Pretraining

Large-scale models are pretrained on massive web-crawled datasets containing documents of mixed quality, making data filtering essential. A popular method is Classifier-based Quality Filtering (CQF), which trains a binary classifier to distinguish between pretraining data and a small, high-quality set. It assigns each pretraining document a quality score defined as the classifier's score and retains only the top-scoring ones. We provide an in-depth analysis of CQF. We show that while CQF improves downstream task performance, it does not necessarily enhance language modeling on the high-quality dataset. We explain this paradox by the fact that CQF implicitly filters the high-quality dataset as well. We further compare the behavior of models trained with CQF to those trained on synthetic data of increasing quality, obtained via random token permutations, and find starkly different trends. Our results challenge the view that CQF captures a meaningful notion of data quality.

17.
arXiv (CS.LG) 2026-06-15

Decoupled Latent Optimization of Diffusion Models for Full Waveform Inversion

arXiv:2606.14139v1 Announce Type: new Abstract: Full waveform inversion (FWI) recovers subsurface velocity from seismic recordings by solving a severely ill-posed, nonconvex PDE-constrained optimization. Classical regularizers stabilize the inversion but fail to reproduce realistic geological structures; recent diffusion-prior methods improve realism at the cost of a fragile trade-off between data fidelity and prior consistency. We propose Decoupled Latent Optimization (DLO), which relaxes the standard latent-optimization formulation into a quadratic-penalty objective over an auxiliary physical variable and a latent variable. The data-fidelity gradient acts in physical space, the diffusion sampler contributes only through a decoded prior sample, and the standard smoothed-velocity initialization of classical FWI is preserved. On the OpenFWI benchmark, DLO outperforms classical regularizers and existing diffusion-based methods under clean, noisy, and missing-trace acquisitions. The prior, trained on 70*70 OpenFWI models, transfers directly to the Marmousi and Overthrust benchmarks, where DLO recovers intricate fault structures and remains robust to initialization smoothing and measurement noise.

18.
Nature (Science) 2026-06-24

A <i>Streptomyces</i> megacluster encodes synergistic biotin-targeting antibiotics

Authors:

Natural products remain a major source of antibiotics, but discovery efforts have traditionally treated biosynthetic gene clusters as sources of individual bioactive molecules1–5. Increasing evidence has suggested that microorganisms can instead encode coordinated multi-metabolite systems, yet the genetic architectures and biological logic of such systems remain poorly understood6–12. Here we show that Streptomyces spp. encode a highly conserved biosynthetic megacluster that produces four structurally distinct natural product families—stravidins, acidomycin, dapamycins, and 2-methyl-7-keto-8-aminopelargonic acid (α-Me-KAPA)—alongside the biotin-binding protein streptavidin. These components converge on bacterial biotin metabolism through complementary mechanisms, including enzyme inhibition, prodrug activation, cofactor mimicry and biotin sequestration. The encoded metabolites are co-produced and act synergistically across Gram-negative and mycobacterial species, with stravidin S2 and α-Me-KAPA showing enhanced efficacy in combination in a mouse model of multidrug-resistant Escherichia coli infection. This megacluster reveals a genetically encoded chemical arsenal that functions as a naturally evolved combination therapy against a conserved metabolic pathway. More broadly, our findings suggest that higher-order biosynthetic architectures may represent an overlooked reservoir of antibiotic mechanisms and support a shift from discovering isolated natural products to reconstructing native synergistic systems. In Streptomyces spp., a conserved biosynthetic gene megacluster produces an arsenal of distinct antimicrobials that converge on bacterial biotin biosynthesis as a naturally evolved combination therapy.

19.
arXiv (CS.CV) 2026-06-11

Vision Transformers for Face Recognition Need More Registers

Recent advances in Vision Transformers (ViTs) for face recognition (FR) have moved beyond the standard CLS-token paradigm. In this paradigm, a special classification token (CLS) is prepended to the patch embeddings and used as a representation of the input for downstream tasks. An alternative approach, Concatenated Patch Embeddings (CPE), instead leverages all patch tokens by concatenating them into a single vector, which is then projected into a compact face representation. CPE has been shown to improve recognition performance in comparison to CLS-based ones, but our qualitative analysis of attention maps showed the presence of artifacts that limit their interpretability. To address this issue, we incorporate register tokens, learnable tokens concatenated to the initial patch embeddings, and processed jointly through the ViT encoder blocks. This mechanism has been shown to produce more structured and interpretable attention maps compared to baseline ViT. We empirically demonstrate that these artifacts consistently appear across various ViT backbones, including small and large models, and that introducing register tokens effectively mitigates them. Adding four or eight registers significantly enhances interpretability, with eight registers providing the highest verification accuracies and smoothest attention structures. Our resulting model, ViT-8R, corresponds to a CPE-based ViT-B architecture augmented with eight register tokens achieves state-of-the-art performance among ViT-based FR models on large-scale IJB-B and IJB-C benchmarks. Also, ViT-8R produces substantially clearer attention maps compared with the baseline model, which offer deeper insight into the model's attention behavior (https://github.com/TaharChettaoui/ViT-FR-Registers)

20.
arXiv (CS.CL) 2026-06-16

A Practical Evaluation Method for Long-Form Simultaneous Speech-to-Speech Translation

Simultaneous speech-to-speech translation (SimulS2ST) enables real-time cross-lingual communication, but existing evaluation has focused largely on short or pre-segmented speech rather than long-form, continuous input. Prior approaches are difficult to reproduce and make assumptions that do not hold for end-to-end systems. We present a practical evaluation method for long-form SimulS2ST. Given source speech, pre-segmented source transcripts, and reference translations, we run automatic speech recognition (ASR) and forced alignment on the generated target speech to recover token-level timestamps, then apply a sentence-embedding-based aligner to match the target text to its corresponding source sentences. This enables sentence-level computation of latency and quality metrics, including YAAL and xCOMET, which are then aggregated into final system-level scores. Experiments on representative SimulS2ST systems show that the method is effective in practice and reveal that current systems suffer from substantial latency accumulation on long speech.

21.
arXiv (CS.CL) 2026-06-15

Multi-component Causal Tracing in Large Language Models

Causal tracing systematically intervenes on a large language model's (LLM's) internal representations to uncover and quantify the causal pathways linking specific inputs or computations to specific metrics of interest, quantifying the LLM's behavior. Building on previous single-component or single-layer studies, this paper presents a unified framework for causally tracing multiple components simultaneously. This framework systematically identifies the subsets of components (e.g., attention heads and multi-layer perceptron neurons) most critical to a desired target performance metric (e.g., accuracy and fairness). This is achieved by incorporating flexible interventions applied to a wide range of desired metrics. To address the combinatorial complexity of the multi-component problem, an efficient algorithm is designed that leverages soft interventions and a carefully designed metric transformation, converting the combinatorial search problem into a continuous one that can be solved efficiently under proper constraints, thereby generating proper binary decisions for selecting components. Experimental results demonstrate that the proposed method efficiently identifies subsets of the model's components that have a high impact on the target metric, outperforming existing baseline approaches. Our code is available at https://github.com/ZiruiYan/multi-component-causal-tracing.

22.
arXiv (CS.AI) 2026-06-16

Protein Design with Agent Rosetta: A Case Study for Specialized Scientific Agents

arXiv:2603.15952v2 Announce Type: replace Abstract: Large language models (LLMs) are capable of emulating reasoning and using tools, creating opportunities for autonomous agents that execute complex scientific tasks. Protein design provides a natural testbed: although machine learning (ML) methods achieve strong results, these are largely restricted to canonical amino acids and narrow objectives, leaving unfilled need for a generalist tool for broad design pipelines. We introduce Agent Rosetta, an LLM agent paired with a structured environment for operating Rosetta, the leading physics-based heteropolymer design software, capable of modeling non-canonical building blocks and geometries. Agent Rosetta iteratively refines designs to achieve user-defined objectives, combining LLM reasoning with Rosetta's generality. We evaluate Agent Rosetta on design with canonical amino acids, matching specialized models and expert baselines, and with non-canonical residues – where ML approaches fail – achieving comparable performance. Critically, prompt engineering alone often fails to generate Rosetta actions, demonstrating that environment design is essential for integrating LLM agents with specialized software. Our results show that properly designed environments enable LLM agents to make scientific software accessible while matching specialized tools and human experts.

23.
arXiv (CS.CV) 2026-06-17

MOCHI: Motion Enhancement of Collaborative Human-object Interactions

Collaborative human-object interaction shows dynamic and complex movements that require mutual anticipation and continuous adjustment between participants and the shared object. Modeling such collaborative multi-human object interaction (MHOI) scenarios requires high-quality data acquisition as a foundational step; however, this is challenging due to the inherent complexity of MHOI where human-human and human-object interactions occur simultaneously. Such complexity leads to noisy MHOI captures characterized by several artifacts: contact misalignment between hands and objects, motion jitter and temporal inconsistencies in the captured sequences, and missing or incomplete finger-level articulation details. To address these challenges, we present MOCHI (MOtion Enhancement of Collaborative Human-object Interactions), a two-stage framework for enhancing noisy MHOI data. Our approach first generates physically plausible hand grasps through optimization from noisy body input, producing grasps that are both physically plausible and semantically consistent with the body pose, where these optimized grasps are extended into complete hand-object interaction sequences. Consequently, the full-body motion for all participants are refined through a diffusion-based noise optimization framework that uses single-person motion priors. During the optimization process, we introduce optimization objectives to encode human-object and human-human interaction information within these single-person priors. Experimental results demonstrate the effectiveness of our pipeline across diverse MHOI data, either acquired by existing capture methods or synthesized by generative models. We further show robustness of our system across varying numbers of participants and types of interactions, and demonstrate various applications including keyframe-based MHOI creation and data augmentation through varying object geometries.

24.
arXiv (quant-ph) 2026-06-12

Coarse-grained quantum thermodynamics: Observation-dependent quantities, observation-independent laws

arXiv:2507.15918v2 Announce Type: replace Abstract: In both classical and quantum thermodynamics, physical quantities are typically assigned objective values defined independently of our observations. We then refer to the 'work performed by a gas', or the 'entropy of the gas', regardless of how they are evaluated. Here, we question this conception in the context of quantum thermodynamics, estimating how the definition of pivotal thermodynamic quantities is affected by experimental instruments of limited precision. We find that the coarse-grained thermodynamic quantities frequently lead to different conclusions from those drawn in fine-grained scenarios. For instance, the irreversibility of a process, or its work payoff, can significantly vary with the instrument precision. We show nonetheless that coarse-grained thermodynamic quantities satisfy the same relations (i.e., the second law inequality, the relation between dissipation and distinguishability of a process from its time-reverse, and the quantum work fluctuation theorems) as their fine-grained counterparts. These results highlight the observation-independence of relations linking thermodynamic quantities which are themselves observation-dependent.

25.
PLOS Medicine 2026-05-15

Spatial transcriptomic-metabolic features of tumor foci and tumor capsule in microvascular invasion with hepatocellular carcinoma: A spatial multi-omics study

Authors:

by Zhi-Hui Luo, Na Wang, Jingwei Zhao, Fei Long, Si Wu, Wei Zhong, Wei-Ming Chen, Bicheng Wang, Kun Wang, Yufeng Yuan, Jingjiao Zhou, Chunhui Yuan, Fubing Wang Background Microvascular invasion (MVI) is closely related to the recurrence and metastasis of hepatocellular carcinoma (HCC), but the underlying cellular mechanism remains largely elusive. This study aims to elucidate the regional cellular discrepancy between MVI-positive (MVI+) and MVI-negative (MVI−) HCC by integrating Spatial transcriptomics (ST) and spatial metabolomics (SM). Methods and findings ST and SM were performed on six tissue samples from four patients (including 2 MVI+, 2 MVI−, and 2 paratumor tissues), with the integration of 79 public single-cell RNA sequencing datasets of HCC. Patient identity was used as a covariate in the linear equation for regional differentially expressed gene analysis with the ST data. Clinical validation was conducted through multiplex immunofluorescence staining in 79 patients, together with external validation in the cancer genome atlas (TCGA)-liver hepatocellular carcinoma (LIHC) cohort (n = 299) and an independent microarray dataset (n = 62). For cell-type-specific metabolic profiling, spatial transcriptomic-metabolic registration was performed. The functional roles of key metabolites were further validated in vitro using inflammatory cancer-associated fibroblasts (iCAFs) derived from hepatic stellate cells (HSCs) and primary CAFs through co-culture models and various functional assays assessing cell proliferation, migration, and invasion. In the tumor lesion, a malignant STMN1+HMGN2+GPC3+ cell subtype enriched in MVI+ HCC was identified, which exhibited enhanced proliferative activity and was associated with poor prognosis. This finding was further confirmed in a local cohort of 79 patients, where multiplex immunofluorescence staining for the three genes (STMN1, HMGN2, and GPC3) showed significantly higher expression in the MVI+ group than in the MVI− group (p = 0.046). Integrated SM analysis further revealed that this cell population underwent metabolic reprogramming characterized by suppressed glycerolipid metabolism. In the tumor capsule, iCAFs-related genes were downregulated in MVI+ cases, and iCAFs were located distally from the tumor boundary. Spatial metabolite mapping showed a strong correlation between taurine and iCAFs, and functional assays demonstrated that taurine promotes HCC proliferation and migration by suppressing iCAF activity. One limitation of this study is the small sample size of spatial omics data, which hinders a more complete molecular functional analysis of the STMN1+HMGN2+GPC3+ cell subtype and iCAFs in MVI+ HCC. Larger-scale ST cohorts are required to further validate and expand the findings of this study. Conclusions This integrative spatial atlas proposes a hypothesis that there exists a highly proliferative and metabolically reprogrammed malignant cell subtype in the tumor lesion of MVI+ HCC, and that taurine in the tumor capsule modulates iCAF activity to influence tumor progression. The exploratory results provide mechanistic insights into MVI-related HCC progression and offer potential avenues for targeted therapeutic intervention of MVI+ HCC.