Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-18

ProductConsistency: Improving Product Identity Preservation in Instruction-Based Image Editing via SFT and RL

Recent advances in instruction-based image editing have enabled models to perform complex visual edits from natural language instructions. However, in product-centric scenarios where preserving product features, branding, and textual elements are critical, current open and closed source models often struggle to maintain this fine-grained object identity. This issue is further compounded by the lack of datasets for instruction-based product image editing with text fidelity constraints, leaving it largely treated as an implicit capability of instruction-based image editing models. In this work, we introduce the ProductConsistency dataset which is designed to improve product-centric image editing. Our approach includes a supervised fine-tuning (SFT) dataset of 87k samples for product editing, a reinforcement learning (RL) dataset with 869 unique product images, and a new benchmark dataset, the ProductConsistency Benchmark, to allow rigorous and standardized evaluation of editing models. To guide RL training, we propose a Cyclic Consistency reward that enforces semantic preservation of product identity by using caption similarity between the original product description and captions generated from the edited image. We fine-tune both Qwen-Image-Edit-2511 and Flux.1-Kontext-dev using our dataset and demonstrate consistent improvements over baseline models in OCR and Perceptual metrics, and MLLM-based evaluations as well, indicating stronger product consistency, text rendering, and overall visual quality; with the Qwen-Image-Edit-2511 model achieving a 5x reduction in the character error rate. The code and pipeline is available at https://anonymous.4open.science/r/ProductConsistency-6FCC/README.md

02.
arXiv (CS.CL) 2026-06-11

Semantic Grading of Written Answers in Low-Resource Language Bangla Using a Fine-Tuned Lightweight Language Model

Bangla is among the world's most widely spoken languages, yet it remains underserved in educational NLP research. In many remote and rural regions, access to qualified subject teachers is limited, and written answers are consequently graded largely by hand, restricting timely and consistent feedback. Automatic assessment is challenging because semantically correct responses can vary substantially in surface form. We present a bilingual (Bangla-English) evaluation system designed for low-resource educational settings that prioritizes semantic correctness over lexical overlap. Our approach fine-tunes a lightweight language model to grade each response using the question, reference answer, and student answer, producing a numeric score and concise, context-grounded feedback suitable for classroom deployment. We also construct a synthetic bilingual dataset to enable controlled training and evaluation. Across proprietary and open-source LLMs evaluated under a unified protocol, our QLoRA-tuned Qwen3-8B confirms consistent improvement by producing the most leakage-resistant feedback (RoRa = 0.819) in synthetic evaluation and the strongest agreement with human scores (rho = 0.936, MAE = 0.725) in a dedicated human study.

03.
arXiv (CS.CV) 2026-06-15

HULFSynth : An INR based Super-Resolution and Ultra Low-Field MRI Synthesis via Contrast factor estimation

We present an unsupervised single image bidirectional Magnetic Resonance Image (MRI) synthesizer that synthesizes an Ultra-Low Field (ULF) like image from a High-Field (HF) magnitude image and vice-versa. Unlike existing MRI synthesis models, our approach is inspired by the physics that drives contrast changes between HF and ULF MRIs. Our forward model simulates a HF to ULF transformation by estimating the tissue-type Signal-to-Noise ratio (SNR) values based on target contrast values. For the Super-Resolution task, we used an Implicit Neural Representation (INR) network to synthesize HF image by simultaneously predicting tissue-type segmentations and image intensity without observed HF data. The proposed method is evaluated using synthetic ULF-like data from generated from standard 3T T$_1$-weighted images for qualitative assessments and paired 3T-64mT T$_1$-weighted images for validation experiments. WM-GM contrast improved by 52% in synthetic ULF-like images and 37% in 64mT images. Sensitivity experiments demonstrated the robustness of our forward model to variations in target contrast, noise and initial seeding.

04.
arXiv (CS.CV) 2026-06-18

Recognizing and Reconstructing a Multi-Unit Floor Plan

Digital twins have a major potential to form a significant part of urban management in emergency planning, as they allow more efficient designing of the escape routes, better orientation in exceptional situations, and faster rescue intervention. Nevertheless, creating the twins still remains a largely manual effort, due to a lack of 3D-representations, which are available only in limited amounts for some new buildings. Thus, in this paper we aim to synthesize 3D information from commonly available 2D architectural floor plans. We propose two novel pixel-wise segmentation methods based on the MDA-Unet and MACU-Net architectures with improved skip connections, an attention mechanism, and a training objective together with a reconstruction part of the pipeline, which vectorizes the segmented plans to create a 3D model. The proposed methods are compared with two other state-of-the-art techniques and several benchmark datasets. On the commonly used CubiCasa benchmark dataset, our methods have achieved the mean F1 score of 0.86 over five examined classes, outperforming the other pixel-wise approaches tested. We have also made our code publicly available to support research in the field.

05.
arXiv (quant-ph) 2026-06-12

Fibonacci Steady-States and Persistent Oscillations in an Ordered Multimode Dicke Model

arXiv:2606.13072v1 Announce Type: new Abstract: Ultracold atoms in multimode optical cavities provide a rich testbed for many-body phenomena enabled by light-mediated interactions. Recent experiments include realizations of spin glasses and associative memories, as described by multimode Dicke models with disordered couplings. However, the properties of multimode Dicke models with ordered coupling geometries remain largely unexplored. In this work, we investigate the stable steady-states of the multimode Dicke model with an ordered nearest-neighbor coupling geometry, where $n_c$ atomic clusters are coupled via $n_c-1$ cavity modes. We show that the number of mean-field stable steady-states in the superradiant phase exhibits Fibonacci scaling with the number of atomic clusters, and that a subset of these steady-states exhibit persistent oscillations. Using both the truncated Wigner approximation and the numerically-exact hierarchy of pure states, we further demonstrate that these features of the stable steady-state solutions persist for finite cluster sizes. Ordered multimode Dicke models, such as the nearest-neighbor coupling geometry considered here, are accessible with current experimental technologies and point toward a broader class of strongly interacting dissipative systems with similarly rich behavior.

06.
arXiv (math.PR) 2026-06-15

Asymptotic analysis of the normal inverse Gaussian cumulative distribution

arXiv:2509.05664v2 Announce Type: replace-cross Abstract: Using a recently derived integral in terms of elementary functions, we derive new asymptotic expansions of the normal inverse Gaussian cumulative distribution function. One of the asymptotic representations is in terms of the normal Gaussian distribution or complementary error function.

07.
arXiv (quant-ph) 2026-06-19

Phase locking nuclear spins in silicon with spin-orbit coupling

arXiv:2606.20340v1 Announce Type: new Abstract: Because they have such long coherence times, nuclear spins have extraordinary potential for use in quantum information processing devices. However, coherent nuclear spin control generally requires external phase references, such as microwave control fields. Here, we phase-lock a $^{29}$Si nuclear spin ensemble in a silicon quantum dot using only the internal electronic spin-orbit coupling as a phase reference. When driven with the quantum-dot electrons, the nuclear spins align themselves to a phase determined by the electronic spin-orbit coupling and the timing of the drive protocol. This enables us to measure the coherent precession and inhomogeneous dephasing of the nuclear spins. We corroborate our results with detailed numerical simulations of the many-body electron nuclear system. Our work opens new routes for coherently controlling solid-state nuclear spin ensembles.

08.
medRxiv (Medicine) 2026-06-15

Quality Improvement Based Implementation and Evaluation of a Decision Aid for Patients with Nephrolithiasis

Introduction Patients with nephrolithiasis face challenges in making a high-quality, preference sensitive decision. Our prior work established feasibility and patient acceptance of a software-based decision aid (DA). The objectives for this study were to identify implementation strategies for the DA in routine care and determine whether DA implementation enhances decisional quality for patients. Methods New nephrolithiasis patients were recruited from the institution Medical Center from June 2018 to April 2024 to receive a software-based pre-visit DA that measured care preferences and used decision analysis to rank treatments. The RE-AIM framework and Plan-Do-Study-Act (PDSA) cycles were used to improve implementation outcomes. Patients completed survey instruments evaluating decisional conflict, shared decision-making, care satisfaction, and treatment choice following their provider visit. These metrics were compared in the DA cohort (n=81) to those in a usual care cohort (n=78) with Wilcoxon rank-sum and Chi-square (or Fishers exact) tests. Results Implementation data revealed sustained reach and progressive improvement in fidelity. The DA cohort reported higher decisional quality relative to controls (p=0.003) and reported greater support/advice to make a choice (p=0.005). The DA cohort more often discussed options with their doctor (87.5% vs 69.2%, p=0.005) and were more likely to be promoters of their provider (p

09.
arXiv (CS.CL) 2026-06-16

SHARD: Safe and Helpful Alignment via Self-Reframing Distillation

Large language models often struggle with sensitive prompts. They may refuse outright, provide generic safety boilerplate, or fail to address the user's legitimate informational needs that can be answered safely. We introduce SHARD, a self-reframing distillation method to improve safe-helpfulness. It first rewrites sensitive prompts to surface benign intent using philosophical guidelines, then reframes its original responses into safe, more helpful ones, and finally fine-tunes the model on its self-reframed responses. Across DNA and the English subset of LINGUASAFE, SHARD improves helpfulness for most model families while preserving safety. It also remains competitive with distillation from a larger teacher model, suggesting that models can internalize safe and helpful behavior elicited from their own. Warning: This paper contains content that may be offensive or harmful.

10.
PLOS Medicine 2026-05-22

Differences in tuberculosis prevalence by sex in low- and middle-income countries over 1993–2025: A systematic review and meta-analysis

by Nicole A. Swartwood, Nanki Singh, Seyed Alireza Mortazavi, Melike Hazal Can, Hening Cui, Do Kyung Ryuk, Peter MacPherson, Katherine C. Horton, Nicolas A. Menzies Background Global and national initiatives to combat tuberculosis (TB) have expanded over recent years. Despite this, the TB burden remains high in some population groups, with men recognized as having elevated TB risks. Summary measures of sex differences in TB prevalence were last estimated in 2016. Since then, many additional prevalence surveys have been conducted, including in the highest TB burden countries. We conducted a systematic review of sex-stratified TB prevalence survey data published over 1993–2025, to provide updated estimates of male-to-female (M:F) TB prevalence ratios and determine whether sex-related disparities in TB burden have closed over time. Methods and findings We identified surveys reporting community-representative, sex-stratified estimates of pulmonary TB prevalence in low- and middle-income countries (LMICs), including surveys from an earlier review (covering January 1993–March 2016) and a new systematic review (covering 1st December 2015–13th October 2025). This review was prospectively registered with PROSPERO (CRD42024503853) and included searches of PubMed, Embase, Global Health, the Cochrane Library, Africa Index Medicus, LILACS, and SciELO. We extracted data on bacteriologically confirmed and smear-positive TB prevalence among adults (aged ≥ 15 years), stratified by sex. Risk of bias was evaluated using eight criteria specific to prevalence surveys. We fit multi-level Bayesian regression models with study- and country-level random effects to estimate the M:F ratio of TB prevalence (male prevalence divided by female prevalence), overall and for key subgroups. In meta-regression analyses, we estimated how prevalence ratios varied over time and according to known TB risk factors and TB case definitions.We identified 10,124 publications and extracted data from 100 eligible studies representing 102 unique prevalence surveys and 4,658,310 participants (45.6% male) in 33 LMICs. TB prevalence was higher in men than women in 90/102 of the included surveys, with a pooled M:F prevalence ratio of 2.02 (95% credible interval (CrI): 1.71, 2.34) for bacteriologically confirmed TB and 2.38 (95% CrI: 1.91, 2.90) for smear-positive TB. Time trend analyses showed a 2.0% (95% CrI: −0.2, 4.5%) average annual change in the M:F ratio of bacteriologically confirmed TB over the study period. The M:F prevalence ratio was estimated to be higher for countries with greater excess HIV prevalence among men, and countries with greater gender equity (as measured by the United Nation’s Gender Development Index). The estimated M:F prevalence ratio was also higher for surveys that did not restrict testing to individuals reporting TB symptoms. Study limitations include heterogeneity in survey methods and definitions, as well as limited data from the Americas, Eastern Mediterranean, and Europe WHO world regions and post-COVID-19 period. Conclusions Men in LMICs consistently experience TB at a higher prevalence than women. Time trend estimates are uncertain, but consistent with widening sex differences in TB prevalence over the last three decades, despite efforts to address the risk factors underlying this excess TB burden.

11.
arXiv (CS.CL) 2026-06-18

RCEM: Robust Conversational Search EMbedder in Distributional Shift

We propose RCEM, a Robust Conversational search EMbedder that is additionally equipped with LLM's query reformulation capability without losing base model's generalization. Unlike prior conversational dense retrieval approaches that learn direct conversation-to-passage matching, RCEM aligns conversations, prepended by special token, to LLM-rewritten queries, while preserving the original embedding space. The unchanged embedding space automatically maps the rewritten-query to the relevant passages. As a result, RCEM (1) reduces overfitting by simplifying the alignment task from long passages to shorter rewritten queries, (2) eliminates the need for conversation-to-passage relevance labels for training, and (3) maintains its original embedding space that allows conversational queries against indexes built by original embedder without rebuilding them. Extensive experiments show that RCEM consistently outperforms prior approaches, achieving up to 30% improvement under distributional shift.

12.
arXiv (CS.CL) 2026-06-12

Why Sampling Is Not Choosing: Intentionality, Agency, and Moral Responsibility in Large Language Models

Recent advances in large language models (LLMs) have prompted claims that such systems exhibit agency or qualify as moral agents. This paper argues that these attributions are misguided. We maintain that moral responsibility requires commitment-bearing agency grounded in intrinsic intentionality and self-attributed action, and that such agency constitutes the form of free will relevant to responsibility. Although LLMs generate coherent and normatively evaluable outputs, their operation is fully characterized by probabilistic input-output mappings learned from data. Their apparent intentionality is derived rather than intrinsic, and their outputs are neither owned as commitments nor guided by reasons. Variability introduced by stochastic sampling does not amount to choice or authorship. We address objections from the intentional stance, functionalism, compatibilism, and the presence of moral reasoning in model outputs, arguing that none suffice to establish genuine agency.

13.
arXiv (CS.CV) 2026-06-12

Cross-Modal Masked Compositional Concept Modeling for Enhancing Visio-Linguistic Compositionality

Contrastively trained vision-language models like CLIP, have made remarkable progress in learning joint image-text representations, but still face challenges in compositional understanding. They often exhibit a "bag-of-words" behavior–struggling to capture the object relations, attribute-object bindings, and word order dependencies. This limitation arises not only from the reliance on global, single-vector representations for optimization, but also from the insufficient exploitation and modeling of the rich compositional information inherently present in paired image text data. In this work, we propose MACCO (MAsked Compositional Concept MOdeling), a framework that masks compositional concepts in one modality and reconstructs them conditioned on the full contextual information from the other, enabling the model to capture and align cross-modal compositional structures more effectively. To facilitate this process, we introduce two auxiliary objectives that jointly align and regularize masked features both inter-modally and intra-modally. Extensive experiments on five compositional benchmarks, along with in-depth analyses, demonstrate that our approach not only significantly enhances compositionality in VLMs but also improves their ability to capture syntactic structure and linguistic information. Additionally, the improved compositionality also benefits text-to-image generation and multimodal large language model. Code is available at https://github.com/hiker-lw/MACCO.

14.
arXiv (CS.CL) 2026-06-18

Language Models as Interfaces, Not Oracles: A Hybrid LLM-ML System for Pediatric Appendicitis

Large language models (LLMs) can make clinical decision support more accessible by interpreting free-text documentation, but their direct use as diagnostic engines is limited by sensitivity to prompts, information order, and plausible but incorrect outputs. Structured machine-learning models offer more stable risk prediction, yet they require tabular inputs that are difficult to integrate with narrative clinical workflows. We present ClaMPAPP (Clinical Language-assisted Machine-learning Pipeline for Appendicitis), a hybrid system that uses an LLM as an interface rather than as the final decision-maker. ClaMPAPP extracts schema-constrained clinical features from note-like narratives, applies deterministic plausibility checks, and passes validated features to an XGBoost classifier trained on clinical, laboratory, and ultrasound variables. We evaluated ClaMPAPP on two independent pediatric appendicitis cohorts from German hospitals and compared it with end-to-end LLM baselines, including open-source and proprietary models. To preserve ground truth while testing free-text input, narratives were generated from structured electronic health records through template rendering and constrained LLM rewriting, with additional sentence-order permutation to assess positional robustness. ClaMPAPP achieved the strongest overall diagnostic performance in both internal and external validation while minimizing missed appendicitis cases, the key safety concern in acute triage. End-to-end LLMs showed unstable sensitivity-specificity trade-offs and greater degradation under narrative reordering. These results support an LLM-as-interface, ML-as-predictor design that separates natural-language usability from predictive inference and provides a more auditable pathway for clinical decision support.

15.
arXiv (CS.AI) 2026-06-12

The AI Legal Specialist: A Juridically Autonomous Professional Profile for AI Governance

arXiv:2606.12415v1 Announce Type: cross Abstract: The rapid global expansion of artificial intelligence regulation has generated, across multiple jurisdictions, a demand for legal expertise dedicated to AI that the market has addressed in a fragmented manner. Data protection officers extend their remit beyond data protection law; privacy lawyers reposition themselves toward AI; compliance officers add AI chapters to their existing manuals. This paper argues that none of these adaptive responses adequately covers the professional space opened by the emerging global AI regulatory landscape, of which the EU Artificial Intelligence Act (Regulation (EU) 2024/1689) is the most comprehensive instance, alongside the Council of Europe Framework Convention on AI, the United States executive and sectoral framework, and analogous initiatives in the United Kingdom, Canada, Brazil, China, Japan, Singapore, and beyond. A distinct professional profile is required: the AI Legal Specialist, conceived as a jurist – understood broadly to encompass any professional with advanced legal training – operating at the intersection of legal interpretation and AI governance. The profile is juridically autonomous: it derives its existence from the structure of regulatory obligations generated wherever AI is subject to substantive regulation, rather than from any technical standard or the extension of adjacent roles. The paper provides a juridically grounded definition of the profile, argues for its autonomy from adjacent figures and international standards, proposes a reference competence architecture aligned with the European e-Competence Framework (e-CF, EN 16234-1) as a methodological choice, and articulates the conditions for its operational measurement through key performance indicators. The contribution is intended as a foundation for international standardization of the profile and as a reference for practice, curricula, and adoption across jurisdictions.

16.
arXiv (CS.CV) 2026-06-11

Anatomically Conditioned Recurrent Refinement for Topology-Aware Circle of Willis Segmentation

Segmenting the Circle of Willis (CoW) from Magnetic Resonance Angiography (MRA) is challenging due to complex topology and thin vascular structures that are prone to fragmentation. Standard Convolutional Neural Networks (CNNs) often fail to capture these topological constraints, resulting in "broken vessel" artifacts. To address this, we propose the Anatomically Conditioned Recurrent Refinement U-Net (AC2RUNet). Our architecture decouples segmentation into two streams: a Static Stream that extracts invariant anatomical features and a lightweight Dynamic Stream that iteratively refines topological errors over time. We further introduce a dynamic curriculum learning strategy that transitions from high-recall geometric supervision to topology-aware constraints. Validated on the TopCoW dataset, AC2RUNet substantially reduces Hausdorff Distance (4.72 mm vs 9.17 mm) and Betti number errors (0.19 vs 0.40), improving topological connectivity over the nnU-Net baseline while maintaining comparable volumetric Dice.

17.
arXiv (CS.AI) 2026-06-12

Existence Precedes Value: Joint Modeling of Observational Existence and Evolving States in Time Series Forecasting

arXiv:2606.13571v1 Announce Type: cross Abstract: Real-world time series are often highly incomplete and irregular due to sensor dormancy, transmission delays, and event-driven sampling, making reliable forecasting fundamentally challenging. Existing methods have evolved from impute-then-forecast pipelines to continuous-time models such as Neural ODEs and continuous-time graph networks. While these approaches improve the modeling of historical irregularity, they still rely on an implicit oracle assumption at inference time: the timestamps of future valid observations are presumed to be known in advance. This assumption limits practical relevance, since in many real systems the more fundamental question is not only what the future value will be, but also whether a valid observation will occur at all. In this paper, we propose Timeflies, a unified framework that reformulates forecasting as a joint problem of future observability inference and value estimation. To explicitly model the interaction between observation dynamics and state evolution, Timeflies adopts an observation stream and a value stream, coupled through three dedicated modules for reliability-aware embedding, observation-guided dependency modeling, and joint prediction. We further construct Shadow, a benchmark that combines natural missingness from public datasets with real-world industrial data, and introduce the Observation-Value Joint Entropy (OVJE) metric to comprehensively evaluate this coupled predictability. Extensive experiments show that Timeflies consistently outperforms existing methods, highlighting the importance of explicitly modeling future observability in time series forecasting with missing values. Code and dataset are available in https://github.com/ant-intl/Timeflies.

18.
arXiv (CS.LG) 2026-06-11

Minimal surfaces, Knots, and Neural Networks

arXiv:2605.26234v2 Announce Type: replace-cross Abstract: A recent conjecture by Joel Fine posits a relationship between the coefficients of the HOMFLY polynomial of a knot $K$ in the 3-sphere $S^3$, and the signed count of minimal surfaces in hyperbolic 4-space $\mathrm{H}^4$ meeting the sphere at infinity at $K$, with prescribed genus and self-intersection number. In this paper, we develop a novel machine learning framework based on Physics-Informed Neural Networks (PINNs) to solve the minimal surface equation in hyperbolic space. We utilise this framework to test Fine's Conjecture by constructing near-minimal surfaces bounding various families of knots in $S^3$. Furthermore, we develop an algorithmic method to find self-intersections and compute their sign. For every knot analysed, the computationally discovered minimal surfaces and their self-intersection numbers perfectly align with the predictions of Fine's Conjecture, providing empirical evidence for it.

19.
arXiv (CS.CV) 2026-06-16

Revealing Artifacts via Noise Amplification: A Novel Perspective for AI-Generated Video Detection

With the rapid advancement of video generation models, distinguishing between AI-generated and authentic videos has emerged as a challenging endeavor. The majority of existing research endeavors concentrate on the development of detectors for identifying samples generated by generative adversarial networks. Nevertheless, the detection of AI-generated videos, particularly those produced by text-to-video models, still remains an uncharted territory. Although state-of-the-art text-to-video models can generate realistic visual content similar to real videos, they fall short of generating the details of the images and the changes in details within the videos. Inspired by this, we address AI-generated video detection from a novel perspective of bit-planes, which can effectively describe the details or noises in images or videos. To this end, we propose a simple yet effective approach called Noise Amplification. This approach first extracts noise signals based on bit-planes, then amplifies these noise signals, and finally feeds them into the discriminator networks for video fake classification. Noise amplification is comprehensively constructed by incorporating three aspects: pixel-level intensity enhancement, region-level spatial amplification, and frame-level temporal aggregation. To evaluate methods of AI-generated video detection in challenging scenarios, we also introduce a benchmark named HardGVD. Extensive experiments on both the large-scale dataset GenVidBench and HardGVD show that our simple approach significantly outperforms state-of-the-art methods.

20.
arXiv (CS.AI) 2026-06-15

Low-Burden LLM-Based Preference Learning: Personalizing Assistive Robots from Natural Language Feedback for Users with Paralysis

arXiv:2604.01463v2 Announce Type: replace-cross Abstract: Physically Assistive Robots require personalized behaviors to ensure user safety and comfort. However, traditional preference learning methods, like exhaustive pairwise comparisons, cause substantial physical and cognitive fatigue for users with severe motor impairments. To solve this, we propose a low-burden, offline framework that translates unstructured natural language feedback directly into deterministic robotic control policies. To safely bridge the gap between ambiguous human speech and robotic code, our pipeline uses Large Language Models (LLMs) grounded in the Occupational Therapy Practice Framework. This clinical reasoning decodes subjective user reactions into explicit physical and psychological needs, which are then mapped into transparent decision trees. Before deployment, an automated "LLM-as-a-Judge" verifies the code's structural safety. We validated this system in a simulated meal preparation study with 10 adults with paralysis. Results show our natural language approach significantly reduces user workload compared to traditional baselines. Additionally, occupational therapists confirmed the generated policies are safe and accurately reflect user preferences.

21.
PLOS Computational Biology 2026-06-16

Evolution and the ultimatum game: An agent-based model with interbirth intervals and population structure

by Jeffrey C. Schank, Matt L. Miller The ultimatum game (UG) is widely used to study mutually beneficial exchanges, fairness, and prosocial behavior across different societies. However, human behavior in UG experiments does not align with the game-theoretical prediction that proposers should offer the least positive amount and responders should accept such offers. Instead, proposers make generous offers that are greater than the minimum responders are willing to accept, resulting in generous offers with wide offer-acceptance gaps. Numerous evolutionary models of the UG have been created and studied to explain human behavior, particularly generous offers made in UG experiments. These models have recently faced criticism for lacking biological realism and not adequately explaining the data. Here, we present an agent-based model inspired by our hunter-gatherer ancestors and with a biologically more realistic selection process. We assume that (1) agents exist in group-structured and group-clustered populations, where reproduction (2) depends on resource accumulation, but (3) is limited by interbirth intervals. We ran simulations to assess whether this biologically more realistic model evolves patterns of behavior consistent with patterns in the data from meta-analyses of human behavior in the UG. For the proposed model, we show that generous offers robustly evolve, as well as the difficult-to-explain offer-acceptance gaps, only in group-structured populations with interbirth intervals. We demonstrate that these results are robust and may help explain variation in data across societies. We discuss how interbirth intervals interact with group structure to modulate offer and rejection costs, favoring the evolution of generous offers, offer-acceptance gaps, and other patterns in the data on human behavior in the UG. We also discuss why weak selection and/or high mutation rate models cannot explain all the patterns in UG experimental data. We discuss biological realism and conclude that group structure and interbirth intervals may be essential for explaining prosocial behavior across societies.

22.
arXiv (CS.CL) 2026-06-16

Risk-Aware LLM Agents for Geospatial Data Retrieval: Design and Preliminary Adversarial Evaluation

We present an LLM-driven framework for retrieving remote sensing data from cloud-based geospatial catalogues using natural language queries. The system converts user intent into structured API calls, enabling efficient access to satellite imagery and environmental datasets. The architecture integrates three agents: Guardrail for safety and policy enforcement, General-QA for intent interpretation, and Recommender-Analyst for schema-aware API call generation. This coordinated design ensures reliable, semantically aligned interaction with external data services. The modular framework is portable across platforms through API schema substitution and supports applications in environmental monitoring, disaster response, and climate analysis. It establishes a scalable interface between user intent and geospatial infrastructure, enabling streamlined and automated Earth observation workflows. Preliminary experiments under adversarial multi-turn settings show that prompt-level safety instructions improve robustness, although rare high-impact failures persist in API manipulation scenarios and highlight the need for adaptive, system-level defenses that balance safety, usability, and cost efficiency, which motivates the use of our intercept-level Guardrail agent.

23.
arXiv (CS.CV) 2026-06-16

Latent Action Pretraining Through World Modeling

Vision-Language-Action (VLA) models have gained popularity for learning robotic manipulation tasks that follow language instructions. State-of-the-art VLAs, such as OpenVLA and $\pi_{0}$, were trained on large-scale, manually labeled action datasets collected through teleoperation. More recent approaches, including LAPA and villa-X, introduce latent action representations that enable unsupervised pretraining on unlabeled datasets by modeling abstract visual changes between frames. Although these methods have shown strong results, their large model sizes make deployment in real-world settings challenging. In this work, we propose LAWM, a model-agnostic framework to pretrain imitation learning models in a self-supervised way, by learning latent action representations from unlabeled video data through world modeling. These videos can be sourced from robot recordings or videos of humans performing actions with everyday objects. Our framework is able to transfer learned knowledge across tasks, environments, and embodiments. It outperforms models pretrained with ground-truth robot actions and other similar pretraining methods on the LIBERO benchmark and real-world setup, while being efficient and practical for real-world settings.

24.
arXiv (CS.CL) 2026-06-16

GePBench: Evaluating Fundamental Geometric Perception for Multimodal Large Language Models

Geometric shapes play important roles in both physical world and human cognition. While multimodal large language models (MLLMs) have made significant advancements in visual understanding, their abilities to recognize geometric shapes and their spatial relationships, which we term geometric perception, are not explicitly and systematically explored. To address this gap, we introduce GePBench, a novel benchmark specifically designed to assess the geometric perception capabilities of MLLMs. Our extensive evaluations reveal that even the current state-of-the-art MLLMs exhibit significant deficiencies in geometric perception tasks. Furthermore, we show that models trained with GePBench data demonstrate considerable improvements on a wide range of downstream tasks, highlighting the critical role of geometric perception in enabling advanced multimodal applications. Our code and datasets are available at \href{https://github.com/Changhao-Xiang/GePBench}{https://github.com/Changhao-Xiang/GePBench}.

25.
arXiv (CS.AI) 2026-06-12

LLMs as ASP Programmers: Self-Correction Enables Task-Agnostic Nonmonotonic Reasoning

arXiv:2604.27960v2 Announce Type: replace Abstract: Recent large language models (LLMs) have achieved impressive reasoning milestones but continue to struggle with high computational costs, logical inconsistencies, and sharp performance degradation on high-complexity problems. While neuro-symbolic methods attempt to mitigate these issues by coupling LLMs with symbolic reasoners, existing approaches typically rely on monotonic logics (e.g., SMT) that cannot represent defeasible reasoning – essential components of human cognition. We present "LLM+ASP," a framework that translates natural language into Answer Set Programming (ASP), a nonmonotonic formalism based on stable model semantics. Unlike prior "LLM+ASP" approaches that require manually authored knowledge modules, domain-specific prompts, or evaluation restricted to single problem classes, our framework operates without any per-task engineering and applies uniformly across diverse reasoning tasks. Our system utilizes an automated self-correction loop where structured feedback from the ASP solver enables iterative refinement. Evaluating across six diverse benchmarks, we demonstrate that: (1) stable model semantics allow LLMs to naturally express default rules and exceptions, outperforming SMT-based alternatives by significant margins on nonmonotonic tasks; (2) iterative self-correction is the primary driver of performance, effectively replacing the need for handcrafted domain knowledge; (3) compact in-context reference guides substantially outperform verbose documentation, revealing a "context rot" phenomenon where excessive context hinders constraint adherence.