论文广场 - AcademicHub

01.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2509.18588

UniECG: Understanding and Generating ECG in One Unified Model

作者:

Jiarui Jin ↗Haoyu Wang ↗Xiang Lan ↗Jun Li ↗Hongyan Li ↗Shenda Hong ↗

Electrocardiogram (ECG) interpretation is a fundamental skill in medical education, yet students often need more than static examples to connect waveform evidence with diagnostic reasoning. This paper presents UniECG as a step toward interactive ECG education. UniECG supports two complementary learning interactions: given an ECG signal or image, it generates an evidence-based explanation; given a textual learning objective, it generates a corresponding ECG signal example for case-based learning. The model follows a two-stage design. First, it learns grounded ECG explanation from ECG signal–image–text data. Second, it introduces special ECG generation tokens and aligns their hidden representations with a pretrained text-conditioned ECG diffusion model, enabling controllable signal-level ECG generation. We evaluate UniECG through grounded ECG explanation and generation-oriented qualitative analysis, examining its potential to support explanation and case-based learning. UniECG is intended as an educational aid and a research step toward interactive AI-assisted ECG learning, rather than a clinically validated diagnostic system.

阅读与讨论 → 访问原文 →

02.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2508.07797

Power Battery Detection

作者:

Xiaoqi Zhao ↗Peiqian Cao ↗Chenyang Yu ↗Zonglei Feng ↗Lihe Zhang ↗Hanqi Liu ↗Jiaming Zuo ↗Youwei Pang ↗Jinsong Ouyang ↗Weisi Lin ↗Georges El Fakhri ↗Huchuan Lu ↗…

Power batteries are essential components in electric vehicles, where internal structural defects can pose serious safety risks. We conduct a comprehensive study on a new task, power battery detection (PBD), which aims to localize the dense endpoints of cathode and anode plates from industrial X-ray images for quality inspection. Manual inspection is inefficient and error-prone, while traditional vision algorithms struggle with densely packed plates, low contrast, scale variation, and imaging artifacts. To address this issue and drive more attention into this meaningful task, we present PBD5K, the first large-scale benchmark for this task, consisting of 5,000 X-ray images from nine battery types with fine-grained annotations and eight types of real-world visual interference. To support scalable and consistent labeling, we develop an intelligent annotation pipeline that combines image filtering, model-assisted pre-labeling, cross-verification, and layered quality evaluation. We formulate PBD as a point-level segmentation problem and propose MDCNeXt, a model designed to extract and integrate multi-dimensional structure clues including point, line, and count information from the plate itself. To improve discrimination between plates and suppress visual interference, MDCNeXt incorporates two state space modules. The first is a prompt-filtered module that learns contrastive relationships guided by task-specific prompts. The second is a density-aware reordering module that refines segmentation in regions with high plate density. In addition, we propose a distance-adaptive mask generation strategy to provide robust supervision under varying spatial distributions of anode and cathode positions. The source code and datasets will be publicly available at \href{https://github.com/Xiaoqi-Zhao-DLUT/X-ray-PBD}{PBD5K}.

阅读与讨论 → 访问原文 →

03.

medRxiv (Medicine) 2026-06-10 DOI: HASH:430f39b6646b957b0b40bc4a6d3739bf

A risk-of-contagion index using a Bayesian based model for the COVID-19 epidemic in Mexico

作者:

Corona-Moreno ↗Acuna-Zegarra ↗M. A ↗Santana-Cibrian ↗Velasco-Hernandez ↗J. X ↗

During the COVID-19 pandemic, limited testing capacity and reporting delays complicated epidemic surveillance and decision-making in Mexico. We calibrated textit{covidestim}, a Bayesian nowcasting model, to estimate the total SARS-CoV-2 infections from reported cases and deaths using Mexican surveillance data. Disease-progression distribution priors were calibrated using Mexico City records and validated through comparisons with national seroprevalence surveys, hospitalization data, and annual reported severe-case rates across all states. Using the reconstructed estimates of active infections, we implemented an event-based risk framework that quantifies the probability of encountering at least one infectious individual in gatherings of different sizes. This probability was subsequently translated into a four-level epidemiological traffic-light indicator and computed at both state and municipality levels. The resulting estimates revealed substantial spatial heterogeneity that is obscured by state-level aggregation, particularly in states with marked differences between urban and rural municipalities. To evaluate consistency with public-health indicators, we compared the proposed risk classification with the official Mexican epidemiological traffic-light system, considering interpretable gathering sizes relevant to public-health decision making. Weekly reports derived from this framework were delivered to policymakers in the State of Queretaro in Mexico, as an anticipation tool for school reopening and public-space management. This demonstrates that this Bayesian reconstruction of infections combined with event-based risk metrics can provide an interpretable and generalizable municipality-level complement to routine surveillance systems, particularly in regions with limited testing capacity and heterogeneous local transmission dynamics.

阅读与讨论 → 访问原文 →

04.

arXiv (CS.AI) 2026-06-12 DOI: arXiv:2606.13241

Brick: Spatial Capability Routing for the Mixture-of-Models (MoM) Paradigm

作者:

Francesco Massa ↗Marco Cristofanilli ↗

arXiv:2606.13241v1 Announce Type: new Abstract: Defining query difficulty is one of the hardest problems in deployment engineering. Existing LLM routers rely on surface features such as domain labels, keywords, and token count, ignoring the within-domain variance that actually determines model success. Frontier models cost ten to one hundred times more than local open-weight models, so at production scale even small per-request savings become a direct cloud-bill lever. We present Brick, a multimodal router that scores each model on six capability dimensions, combines this with a per-query difficulty estimate, and dispatches via a cost-penalized geometric rule. A continuous preference knob lets operators slide between max-quality and max-saving profiles at deploy time. On a benchmark of 5,504 queries, Brick at max-quality reaches 76.98% accuracy, beating the best single model (75.02%) and all tested routers. At a neutral cost-quality profile, Brick achieves 74.11% accuracy at 4.71x lower cost than always using the strongest model. At min-cost, it cuts cost 22.15x with 11.85 points accuracy loss. Median latency drops from 51.2s to 22.8s.

阅读与讨论 → 访问原文 →

05.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2606.11770

SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning

作者:

Chao Lei ↗Yanbei Jiang ↗Markus Hiller ↗Zhijian Zhou ↗Xunye Tian ↗Krista A. Ehinger ↗Nir Lipovetzky ↗

arXiv:2606.11770v1 Announce Type: new Abstract: Spatial reasoning remains a challenge for Multimodal Large Language Models (MLLMs), as it requires reliable multi-hop inference over both intermediate states and state transitions. Current studies often leave intermediate states unverified and treat state transitions as implicit processes, which limits reliability in multi-hop spatial reasoning. To address this, we propose State-aware Visualization-of-Thought (SVoT), a reinforcement learning framework that generates interleaved, verifiable intermediate states and visualizations. SVoT integrates transition reasoning chains into the generation processes, enabling the model to verify action preconditions and effects through interleaved textual and visual reasoning. We train SVoT via Group Relative Policy Optimization (GRPO), instantiating verification through reward design and evaluating the efficacy of different fine-grained rewards. As existing benchmarks reduce state transitions to single-variable updates, substantially simplifying the problems, we establish five domains by extending classical environments and introducing two novel domains, Pacman and Gather, that require multi-object interactions and numerical reasoning. These domains support systematic evaluation of multi-hop spatial reasoning with quantitative verification of generated intermediate states and transition reasoning. SVoT with transition-aware supervision achieves state-of-the-art performance across the introduced domains, yielding up to a 65% absolute accuracy gain on out-of-distribution test sets.

阅读与讨论 → 访问原文 →

06.

arXiv (CS.CL) 2026-06-16 DOI: arXiv:2606.15345

Beyond Monolingual Deep Research: Evaluating Agents and Retrievers with Cross-Lingual BrowseComp-Plus

作者:

Yuheng Lu ↗Qingcheng Zeng ↗Heli Qi ↗Puxuan Yu ↗Fuheng Zhao ↗Rui Yang ↗Hitomi Yanaka ↗Naoto Yokoya ↗Weihao Xuan ↗

Deep research agents are increasingly evaluated on their ability to search for evidence, reason over retrieved sources, and produce grounded answers. Existing browsing benchmarks, however, largely assume that the user's query and the supporting evidence are written in the same language, leaving open whether agentic search systems can operate when relevant evidence appears in another language. We introduce XBCP (Cross-lingual BrowseComp-Plus), a controlled benchmark that preserves the English question-and-answer space of BrowseComp-Plus but varies the languages of the supporting documents. XBCP instantiates two complementary settings: in the cross-lingual setting, each query is paired with evidence in a single assigned language. In the multilingual setting, the full evidence corpus is distributed equally and randomly across 12 languages spanning high-resource and low-resource regimes. We evaluate four deep research agents using sparse and dense multilingual retrievers, measuring answer accuracy, evidence recall, search behavior, calibration, citation fidelity, and oracle retrieval. Results reveal substantial degradation when evidence is translated. Even strong, dense retrievers lose evidence recall, and agents become less calibrated and cite evidence less reliably. Notably, accuracy remains lower even when all gold evidence is supplied directly. These findings suggest that cross-lingual deep research exposes both retrieval failures and an independent, agent-side difficulty in integrating language-mismatched evidence.

阅读与讨论 → 访问原文 →

07.

arXiv (CS.CV) 2026-06-16 DOI: arXiv:2606.16193

Cascaded Sparse Autoencoders Learn Multi-Level Visual Concepts in Multimodal LLMs

作者:

Yusong Zhao ↗Hengyi Wang ↗Tanuja Ganu ↗Akshay Nambi ↗Hao Wang ↗

Multimodal Large Language Models (MLLMs) have demonstrated strong performance on vision-language tasks, yet their internal visual representations remain difficult to interpret. Sparse Autoencoders (SAEs) provide a scalable way to decompose dense model activations into sparse, interpretable features. However, existing SAE architectures primarily recover flat feature dictionaries and are less suited for explicit multi-level concept organization. In this paper, we introduce cascaded sparse autoencoders (CSAEs) for learning hierarchical visual concepts in MLLMs. Rather than nesting or stacking SAE sparse activation codes, CSAEs train a second-level SAE directly on the decoder weights of the first-level SAE, treating learned low-level feature directions as inputs for higher-level abstraction. This design enables CSAEs to learn "concepts of concepts" while avoiding drawbacks from the shared-prefix coupling of nesting, Matryoshka-style hierarchies and the bottlenecks of naively stacked SAEs. Experiments across Qwen3-VL, Gemma-3, and LLaVA on multiple visual datasets show that CSAEs improve interpretability in terms of hierarchical concept coherence over state-of-the-art SAE baselines. Results on concept steering further demonstrate that the learned concept groups support effective group-level interventions in MLLM outputs.

阅读与讨论 → 访问原文 →

08.

arXiv (CS.CV) 2026-06-17 DOI: arXiv:2606.17256

Contrastive Action-Image Pre-training for Visuomotor Control

作者:

Yuvan Sharma ↗Dantong Niu ↗Anirudh Pai ↗Zekai Wang ↗Zhuoyang Liu ↗Baifeng Shi ↗Stefano Saravalle ↗Boning Shao ↗Ruijie Zheng ↗Jing Wang ↗Konstantinos Kallidromitis ↗Yusuke Kato ↗…

Existing vision encoders for robotics face a fundamental bottleneck: robotic datasets lack the scale necessary for large-scale pre-training. Prior work circumvents this data scarcity by turning to internet-scale image and language data or egocentric human video. While these models show promise, neither paradigm learns from paired vision and action data, which downstream visuomotor control policies require. However, robot trajectories, the most direct source of this paired signal, are not available at pre-training scale, motivating us to extract action signals from abundant human video instead. To this end, we introduce CAIP (Contrastive Action-Image Pre-training), a vision encoder that treats human hand poses from large-scale egocentric video as a proxy for end-effector actions. By extracting 3D hand keypoints, a representation that aligns naturally with downstream robot action spaces, CAIP learns a unified action-image representation through a contrastive objective. Leveraging 32,041 hours of egocentric human video and only 88 hours of robotic manipulation data, CAIP outperforms state-of-the-art vision encoders including DINOv2, SigLIP, MVP, and R3M. Evaluated on a challenging real-world dexterous manipulation setup using Dexmate Vega and Sharpa Wave hands, CAIP yields performance gains of more than 30% on tasks involving folding, pouring, and fine-grained manipulation. Our results show that our method of contrastive action-centric pre-training yields a scalable path to achieving robust visual representations better suited for physical interaction.

阅读与讨论 → 访问原文 →

09.

medRxiv (Medicine) 2026-06-15 DOI: HASH:486e0317cb1f7c4621164ab070011df8

Midwifery Practice in Conflict Contexts: Lived Experiences from Somalia and Nigeria

作者:

Elnakib ↗Ngozi Iwu ↗Mohammed ↗H. A ↗Mary ↗Tappis ↗Charity ↗Rejoice Helma ↗Israel-Isah ↗Kazeem Olalekan ↗Odonye ↗Ahmed ↗…

Background: Midwives are a central cadre in the health system, particularly in conflict-affected settings where they are sometimes the primary or even only skilled providers available. Yet, despite their critical role, there is limited qualitative evidence capturing their lived experiences and how these shape workforce entry, retention, and overall well-being. Methods: Drawing on a phenomenological research methodology, this qualitative study was embedded within a larger prospective longitudinal cohort of midwifery students and graduates in Somalia and Nigeria. We conducted focus group discussions with graduate midwives (n=48 in Nigeria; n=63 in Somalia) to explore their experiences transitioning into the workforce and their realities working in health systems impacted by conflict and violent insecurity. Data were analysed using inductive thematic analysis. Results: Five themes emerged from the data: (1) job search and workforce entry, which was described as fraught with challenges and shaped by a set of formal systems in Nigeria but informal networks and structural barriers in Somalia (2) working conditions that were marked by resource scarcity, infrastructural challenges, and heavy and unreasonable workloads, (3) safety, security and coping strategies that differed across the two contexts but reflected persistent exposure to violence and a reliance on ad hoc and personal coping in lieu of systematic protection, (4) community perceptions of midwives, shaped and constrained by social and gender norms and (5) mental health and emotional wellbeing, highlighting stress, burnout and moral injury experienced by this cadre. Conclusion: Our findings highlight the profound challenges faced by midwives working in conflict-affected settings, and they shine a light on the urgent need to support and invest in this critical and predominantly female health workforce.

阅读与讨论 → 访问原文 →

10.

arXiv (CS.AI) 2026-06-11 DOI: arXiv:2606.11215

The Environmental Cost of LLMs in AIED: Reporting and Practices

作者:

Sabrina C. Eimler ↗Lukas Erle ↗Daniel Flood ↗Aditi Haiman ↗Luca H\"ackert ↗Andr\'e Helgert ↗Lachlan McGinness ↗B\"usra Yapici ↗

arXiv:2606.11215v1 Announce Type: cross Abstract: Large Language Model (LLM) usage in recent years has become increasingly widespread in the Artificial Intelligence in Education (AIED) community. While LLMs offer unique avenues for learners and educators, using LLMs comes with computational and environmental costs. These costs are mostly hidden due to a lack of standardised procedures to measure and report these impacts. To address this gap, we first conducted a literature review of all papers published as part of the AIED 2025 conference proceedings, determining if and how computational or environmental costs of LLMs are reported. Most projects use LLMs, but few report computational resources used and almost none discuss environmental impacts of LLMs as an ethical concern. To address this lack of standardised reporting practices, we propose an open-source method for systematically measuring and reporting the computational expense of LLMs and environmental impact of running Machine Learning (ML) AIED systems. We provide software solutions to measure the carbon footprint for both local and cloud based hardware. We also provide an easy-to-use formula to calculate the computational expense of frontier LLMs even when the exact number of parameters is not known. Overall, we hope to motivate colleagues to use our method to strive for more transparent reporting of hidden costs of using LLMs in the AIED community.

阅读与讨论 → 访问原文 →

11.

arXiv (quant-ph) 2026-06-12 DOI: arXiv:2606.13447

Quantum optical photoelectron interferometry

作者:

arXiv:2606.13447v1 Announce Type: new Abstract: We present a general theoretical framework for multiphoton processes driven by quantum light fields, establishing a direct link between photon statistics and photoelectron observables. Our results show that the autocorrelation and cross-correlation functions, which quantify the underlying photon statistics, are directly mapped onto the resulting photoelectron spectra. Although our framework is broadly applicable, we demonstrate specifically in the example of reconstruction of attosecond beating by interference of two-photon transitions (RABBIT) the influence of the light statistical properties. In this approach, the amplitude, contrast and phase of the oscillations of the sideband signal as a function of pump-probe delay reveal the quantum nature of light. We analyze these observables across several quantum configurations, including correlated infrared and harmonic modes, as well as the uncorrelated case with non-classical harmonic statistics, thereby establishing a general framework for quantum-light RABBIT spectroscopy. We compare the analytical theory with numerical simulations for the case of classical harmonics and an infrared field in a squeezed coherent state, obtaining excellent agreement. Our results reveal how the interplay between classical and quantum correlations dictates the coherence of the photoemission process, providing a new window into the quantum-optical foundations of attosecond science.

阅读与讨论 → 访问原文 →

12.

arXiv (math.PR) 2026-06-16 DOI: arXiv:2606.14830

Pricing Excess-of-Loss Reinsurance and CAT Bonds under Climate Uncertainty: A Cox Process Framework with Temperature-Dependent Stochastic Intensity

作者:

Nader Karimi ↗Foad Shokrollahi ↗

arXiv:2606.14830v1 Announce Type: cross Abstract: This paper develops a climate-aware pricing framework for excess-of-loss (XL) reinsurance contracts and catastrophe (CAT) bonds under non-stationary catastrophe risk. Catastrophe arrivals are modeled as a Cox process whose stochastic intensity depends exponentially on a temperature-related climate index. To represent climate dynamics, the index is modeled as a mean-reverting Ornstein–Uhlenbeck process around a time-dependent warming trend. Within this setting, aggregate losses follow a compound Cox structure with lognormal severities. Pricing is performed under a reduced-form risk-adjusted measure, which provides a tractable valuation approach for XL reinsurance layers and binary zero-coupon CAT bond payoffs in an incomplete market setting. Because catastrophe losses are not dynamically replicable, the framework emphasizes scenario-based valuation rather than model-independent no-arbitrage bounds. A Monte Carlo valuation scheme is implemented to quantify the economic implications of climate-dependent catastrophe intensity. The numerical results show that climate dependence materially changes the loss-generation mechanism and affects the valuation of catastrophe-linked contracts. In the baseline calibration, the climate-aware model increases the excess-of-loss reinsurance premium and lowers the CAT bond price relative to the stationary benchmark. Furthermore, our analysis of the 99.5\% Tail Value-at-Risk (TVaR) indicates that stationary benchmarks may underestimate economic capital requirements by approximately 13.7\% compared to the climate-aware framework, highlighting the potential regulatory relevance of the proposed model. This finding highlights that benchmark design is critical for interpreting climate-pricing effects.

阅读与讨论 → 访问原文 →

13.

arXiv (CS.LG) 2026-06-11 DOI: arXiv:2603.12901

A theory of learning data statistics in diffusion models, from easy to hard

作者:

Lorenzo Bardone ↗Claudia Merger ↗Sebastian Goldt ↗

arXiv:2603.12901v2 Announce Type: replace-cross Abstract: While diffusion models have emerged as a powerful class of generative models, their learning dynamics remain poorly understood. We address this issue first by empirically showing that standard diffusion models trained on natural images exhibit a distributional simplicity bias, learning simple, pair-wise input statistics before specializing to higher-order correlations. We reproduce this behaviour in simple denoisers trained on a minimal data model, the mixed cumulant model, where we precisely control both pair-wise and higher-order correlations of the inputs. We identify a scalar invariant of the model that governs the sample complexity of learning pair-wise and higher-order correlations that we call the diffusion information exponent, in analogy to related invariants in different learning paradigms. Using this invariant, we prove that the denoiser learns simple, pair-wise statistics of the inputs at linear sample complexity, while more complex higher-order statistics, such as the fourth cumulant, require at least cubic sample complexity. We also prove that the sample complexity of learning the fourth cumulant is linear if pair-wise and higher-order statistics share a correlated latent structure. Our work describes a key mechanism for how diffusion models can learn distributions of increasing complexity.

阅读与讨论 → 访问原文 →

14.

arXiv (CS.CV) 2026-06-12 DOI: arXiv:2602.00122

VDE Bench: Evaluating The Capability of Image Editing Models to Modify Visual Documents

作者:

Hongzhu Yi ↗Yujia Yang ↗Yuanxiang Wang ↗Tong Li ↗Zhenyu Guan ↗Tianyu Zong ↗Jiahuan Chen ↗Chenxi Bao ↗Tiankun Yang ↗Haopeng Jin ↗Yixuan Yuan ↗Xinming Wang ↗…

In recent years, image editing models have made significant progress, enabling users to manipulate visual content in a flexible and interactive manner through natural language instructions. However, an important yet underexplored research direction remains dense visual document image editing, which involves modifying textual content within images while faithfully preserving the original text style and background context. Existing methods primarily focus on English scenarios and images with relatively sparse text, and thus cannot adequately address dense, structurally complex documents or non-Latin scripts such as Chinese. To bridge this gap, we propose VDE Bench (Visual Doc Edit Bench), a rigorously human annotated and evaluated benchmark specifically designed to assess the performance of image editing models on bilingual Chinese-English and complex visual document editing tasks. The benchmark comprises a high quality dataset of 942 instruction based image editing samples, whose seed images encompass dense Chinese and English text documents including academic papers, posters, presentation slides, examination materials, and newspapers. Furthermore, we introduce a novel evaluation framework that systematically quantifies editing performance at the OCR parsing level, thereby enabling fine grained assessment of text modification accuracy. Based on this benchmark, we conduct a comprehensive evaluation of representative image editing models. Human verification demonstrates a high degree of consistency between human judgments and automated evaluation metrics. VDE Bench constitutes the first systematic benchmark for evaluating the performance of image editing models on bilingual dense text visual documents.

阅读与讨论 → 访问原文 →

15.

arXiv (CS.CL) 2026-06-18 DOI: arXiv:2606.19051

Which Sections of a Research Paper Best Reveal Its Research Methods? Evidence from Library and Information Science

作者:

Qiuyu Fang ↗Jiayi Hao ↗Chengzhi Zhang ↗

Research methods are essential carriers of knowledge contribution in academic papers. Automatic multi-label classification of research methods can support knowledge services such as method retrieval, review generation, and research intelligence analysis. While existing studies primarily rely on titles and abstracts, abstracts often provide only limited methodological information, whereas utilizing full-text content faces challenges related to excessive length and information redundancy. Therefore, this paper proposes a segment combination strategy by partitioning the full-text content according to its physical postion. Using an annotated corpus of 1,954 full-text articles from three representative journals in Library and Information Science (JASIST, LISR, and JDoc), we evaluate the classification performance of various segments and their combinations across multiple models. Experimental results indicate that methodological information is distributed unevenly within the full-text content, with the middle-to-late and final segments exhibiting greater discriminative power. Furthermore, integrating bibliographic metadata with cross-segment combination strategies effectively enhances classification performance.

阅读与讨论 → 访问原文 →

16.

arXiv (CS.LG) 2026-06-16 DOI: arXiv:2606.05878

TS-ICL: A Flexible Time-Indexed Foundation Model for Time Series via In-Context Learning

作者:

Etienne Le Naour ↗Tahar Nabil ↗Adrien Petralia ↗

arXiv:2606.05878v2 Announce Type: replace Abstract: Foundation models mark a profound paradigm shift in time series modeling, with task-specific models being superseded by general-purpose zero-shot models. Yet, current approaches primarily focus on forecasting, while real-world time series are often irregularly and partially observed, requiring models that can jointly forecast, impute missing values, and handle degraded sampling conditions. To address these challenges, we introduce TS-ICL, a novel probabilistic In-Context Learning encoder–regressor Transformer that unifies forecasting and imputation. TS-ICL formulates time series tasks as timestamp-aligned regression and naturally incorporates covariates by training on synthetic dependency structures generated from a novel causal data prior. Empirically, TS-ICL achieves a new state-of-the-art in imputation, while remaining competitive with leading forecasting foundation models across both univariate and covariate-aware benchmarks. It shows particularly strong performance in forecasting with partially observed look-back windows.

阅读与讨论 → 访问原文 →

17.

arXiv (CS.CL) 2026-06-17 DOI: arXiv:2504.03991

Algorithmic Prompt Generation for Diverse Human-like Teaming and Communication with Large Language Models

作者:

Siddharth Srikanth ↗Varun Bhatt ↗Boshen Zhang ↗Werner Hager ↗Charles Michael Lewis ↗Katia P. Sycara ↗Aaquib Tabrez ↗Stefanos Nikolaidis ↗

Understanding how humans collaborate and communicate in teams is essential for improving human-agent teaming and AI-assisted decision-making. However, relying solely on data from large-scale user studies is impractical due to logistical, ethical, and practical constraints, necessitating synthetic models of multiple diverse human behaviors. Recently, agents powered by Large Language Models (LLMs) have been shown to emulate human-like behavior in social settings. But, obtaining a large set of diverse behaviors requires manual effort in the form of designing prompts. On the other hand, Quality Diversity (QD) optimization has been shown to be capable of generating diverse Reinforcement Learning (RL) agent behavior. In this work, we combine QD optimization with LLM-powered agents to iteratively search for prompts that generate diverse team behavior in a long-horizon, multi-step collaborative environment. We first show, through a human-subjects experiment, that humans exhibit diverse coordination and communication behavior in this domain. We then present a series of experiments showing that our approach captures behaviors that are difficult to observe without large-scale data collection, and a follow-up user study to show that these generated behaviors are human-like. Our findings highlight the combination of QD and LLM-powered agents as an effective tool for studying teaming and communication strategies in multi-agent collaboration.

阅读与讨论 → 访问原文 →

18.

arXiv (CS.CV) 2026-06-19 DOI: arXiv:2606.19682

Vortex: Multi-Modal Fusion System for Intelligent Video Retrieval

作者:

Duc-Tho Nguyen ↗Hieu-Hoc Tran-Minh ↗Khanh-Hoa Lam ↗Hoang-Nhut Ly ↗Huu-Phuc Huynh ↗Thanh-Tien Tran ↗Trung-Nghia Le ↗

This paper presents Vortex, the multimodal video retrieval system developed by our team, FocusOnFun, for the Ho Chi Minh City AI Challenge 2025, designed to advance intelligent multimedia search and temporal reasoning. The system integrates adaptive keyframe extraction, multimodal metadata generation from vision-language and speech models, and a hybrid retrieval strategy that fuses CLIP and SigLIP2 embeddings through Reciprocal Rank Fusion to balance global and fine-grained semantics. To enhance interactivity, Vortex incorporates Rocchio-based relevance feedback and a multi-stage temporal search mechanism for sequential event alignment. Built on Milvus and Elasticsearch, the architecture enables scalable indexing and efficient retrieval. Evaluated in the official competition, our FocusOnFun team's system achieved a score of 79.6/88 (90.5\%) in the Preliminary Round and was further evaluated in the Final Round, achieving an `Excellent' overall performance with `Outstanding' results in the question-answering (QA) task. This demonstrating the complementary strengths of CLIP and SigLIP2 and confirming the effectiveness of the hybrid retrieval approach. The system establishes a robust foundation for future research in intelligent, context-aware, and interactive video retrieval.

阅读与讨论 → 访问原文 →

19.

arXiv (CS.LG) 2026-06-12 DOI: arXiv:2305.08175

ResidualPlanner+: a scalable matrix mechanism for marginals and beyond

作者:

Guanlin He ↗Yingtai Xiao ↗Levent Toksoz ↗Zeyu Ding ↗Danfeng Zhang ↗Daniel Kifer ↗

arXiv:2305.08175v5 Announce Type: replace-cross Abstract: Noisy marginals are a common form of confidentiality protecting data release and are useful for many downstream tasks such as contingency table analysis, construction of Bayesian networks, and even synthetic data generation. Privacy mechanisms that provide unbiased noisy answers to linear queries (such as marginals) are known as matrix mechanisms. We propose ResidualPlanner and ResidualPlanner+, two highly scalable matrix mechanisms. ResidualPlanner is both optimal and scalable for answering marginal queries with Gaussian noise, while ResidualPlanner+ provides support for more general workloads, such as combinations of marginals and range queries or prefix-sum queries. ResidualPlanner can optimize for many loss functions that can be written as a convex function of marginal variances (prior work was restricted to just one predefined objective function). ResidualPlanner can optimize the accuracy of marginals in large scale settings in seconds, even when the previous state of the art (HDMM) runs out of memory. It even runs on datasets with 100 attributes in a couple of minutes. Furthermore, ResidualPlanner can efficiently compute variance/covariance values for each marginal (prior methods quickly run out of memory, even for relatively small datasets). ResidualPlanner+ provides support for more complex workloads that combine marginal and range/prefix-sum queries (e.g., a marginal on race, a range query on age, and a combined race/age tabulation that answers age range queries for each race). It even supports custom user-defined workloads on different attributes. With this added flexibility, ResidualPlanner+ is not necessarily optimal, however it is still extremely scalable and outperforms the prior state-of-the-art (HDMM) on prefix-sum queries both in terms of accuracy and speed.

阅读与讨论 → 访问原文 →

20.

arXiv (quant-ph) 2026-06-17 DOI: arXiv:2606.17177

Hybrid Ferromagnet-SNSPDs: Single photon induced order-to-disorder transition in ferromagnets coupled to thin film superconductors

作者:

Leif Bauer ↗Daien He ↗Sathwik Bharadwaj ↗Zubin Jacob ↗

arXiv:2606.17177v1 Announce Type: cross Abstract: The development of midwave and longwave infrared single photon detectors is crucial for their emerging applications in spectroscopy, remote sensing, exoplanet detection, and free space quantum communications. However, existing sensors need to be operated at extremely low temperatures (0.08-0.9K) to reduce dark noise and hence require the use of advanced cryogenics such as dilution refrigerators or $^3$He cryogens, significantly limiting applications. Here we propose a vortex-engineering approach based on a hybrid phase transition in a ferromagnet/superconductor bilayer to increase the operating temperature of infrared single photon detectors up to 3.75K. We show that the introduction of a ferromagnetic layer produces a local magnetic field which impedes vortex crossing in the superconductor, reducing dark noise. When a single photon is incident, the photon-induced hotspot causes an order-to-disorder transition in the ferromagnet, leading to a vortex-induced phase transition in the superconducting layer. By engineering the ferromagnet's Curie temperature to be close to the device's operating temperature, single photon sensitivity can be achieved at increased operating temperatures. We predict at midwave/longwave infrared wavelengths (3-14$\mu$m) the operating temperature can be raised to 3.25-3.75K, enabling significantly simpler cooling systems.

阅读与讨论 → 访问原文 →

21.

arXiv (CS.LG) 2026-06-12 DOI: arXiv:2606.12478

Boltzmann Attention: Learnable Ising Couplings for Cooperative Attention

作者:

Gilhan Kim ↗Daniel K. Park ↗

arXiv:2606.12478v1 Announce Type: new Abstract: Attention mechanisms are central to modern sequence models, yet standard attention computes relevance primarily through individual query–key similarities. Although softmax normalization introduces competition among positions, a standard attention layer does not explicitly parameterize learnable interactions between attention decisions. This limits its ability to directly model cooperative or antagonistic co-attention structure within the attention mechanism itself. We propose Boltzmann attention, an energy-based generalization in which attention patterns are governed by an interacting Ising model. The method augments the usual data-dependent local fields with learnable pairwise couplings, allowing the model to represent inter-position correlations beyond those captured by softmax or sigmoid attention. Experiments on character-level language modeling and synthetic bracket matching show that Boltzmann attention consistently improves over standard softmax attention within a standard Transformer architecture, with the advantage becoming more pronounced as sequence length increases. A four-way ablation confirms that the improvement arises from the learnable pairwise couplings. These results suggest that explicit inter-position interactions provide a principled enhancement for attention-based sequence modeling. Moreover, the Ising formulation opens a natural path toward quantum-computing-based sampling strategies: we demonstrate that diabatic quantum annealing provides a practical training method while maintaining competitive performance with exact Boltzmann computation.

阅读与讨论 → 访问原文 →

22.

arXiv (CS.CL) 2026-06-17 DOI: arXiv:2606.18158

The Measurement Gap in the Automation of EU Law: Benchmarking Doctrinal Legal Reasoning under the EU AI Act

作者:

Mich\`ele Finck ↗

Large language models now produce legal text of at least median quality, yet no existing benchmark can evaluate whether they perform doctrinal legal reasoning, which forms the interpretive core of legal work, rather than the ancillary, paralegal tasks that most current legal-AI evaluations measure. This measurement gap is not only methodological but legal: the EU AI Act makes "appropriate accuracy" a binding requirement for high-risk AI used in the judicial domain, yet that requirement cannot acquire operational content without the very doctrinal-reasoning benchmark the field lacks.

阅读与讨论 → 访问原文 →

23.

bioRxiv (Bioinfo) 2026-06-16 DOI: HASH:3768e0eeba2bfb6c42637cd5568b2ff6

PhenoBIC: operator-free single-cell spatial phenotyping in multiplex imaging data using deep learning of cell staining patterns

作者:

Sankaranarayanan ↗Zhao ↗Hernandez ↗M. G ↗Clemens ↗E. A ↗Smythe ↗K. S ↗Kazerouni ↗A. S ↗Carr ↗L. L ↗…

Multiplex imaging is a valuable tool for spatially examining tissue microenvironments at the single-cell level to uncover biological and clinical insights. However, most multiplex image analysis workflows currently require manual intervention for cell phenotyping, which slows progress, demands human effort, and yields operator-dependent outputs. Here, we developed PhenoBIC, a pre-trained deep learning model for image classification of the multiplexed biomarker signals in a cell (Biomarker Imprint of a Cell) to classify cell phenotypes. We show that PhenoBIC (F1-score ~0.88) outperforms manual gating (widely used) and other machine learning-based computational approaches for cell marker expression classification. We validated this across multiple biomarkers, tissue sampling strategies (whole biopsies and tissue microarrays), multiplex panels, imaging platforms, and tissue types. We have released our in-house training and validation datasets of ~1.4 million manually curated cell expression ground truth labels. We have also open-sourced PhenoBIC and enabled its community-wide deployment via the QuPath interface.

阅读与讨论 → 访问原文 →

24.

arXiv (CS.AI) 2026-06-17 DOI: arXiv:2606.17420

Feynman Kac Reweighted Schrödinger Bridge Matching for Surface-Based Tau PET Harmonization

作者:

Jianwei Zhang ↗Xinyu Nie ↗Jiaxin Yue ↗Yonggang Shi ↗

arXiv:2606.17420v1 Announce Type: cross Abstract: Tau PET imaging is central to tracking Alzheimer's disease progression, but systematic differences between scanners, protocols, and radiotracers across sites introduce nonbiological variability that inflates biomarker variance, reduces sensitivity to disease effects, and can bias downstream clinical assessments. Harmonization methods aim to remove these site-induced shifts while preserving biologically meaningful signal, yet existing approaches struggle when source and target cohorts differ in subgroup composition, risking conflation of site effects with biological variation such as tau-positivity status. We propose the Feynman Kac Reweighted Schröodinger Bridge Matching (FKRSBM) model to address this problem. Rather than routing data through a Gaussian noise prior as in diffusion-based methods, FKRSBM learns a direct stochastic transport process between source and target distributions via entropy-regularized optimal transport. To enforce biologically consistent transport, FKRSBM incorporates a subgroup-aware endpoint proposal derived from a Feynman Kac reweighting of the reference bridge measure, implemented entirely through stratified importance sampling at the data level and requiring no changes to the underlying bridge-matching solver or network architecture. For surface-based neuroimaging, FKRSBM employs a spherical convolutional backbone operating on cortical meshes to perform vertex-level harmonization. We evaluate the method on tau PET SUVR maps, harmonizing PI-2620 data from the HABS-HD cohort into the AV-1451 domain of ADNI. Compared against ComBat, CycleGAN, a diffusion-based method (DF), and unregularized Diffusion Schröodinger Bridge Matching (DSBM), FKRSBM achieves superior distributional alignment, reduced tau-positivity sign mismatch, stronger APOE subgroup alignment, and improved downstream disease classification performance.

阅读与讨论 → 访问原文 →

25.

Nature (Science) 2026-06-09 DOI: HASH:9bd37bb17efd613e53e8d0422d603555

Good recycling starts at home — and benefits the world

作者: 未知作者

New research supports the value of household-level waste separation. But policies must also carefully consider consumer behaviours to maximize the quality of material collected. New research supports the value of household-level waste separation. But policies must also carefully consider consumer behaviours to maximize the quality of material collected.

阅读与讨论 → 访问原文 →

探索全球前沿学术脉络