Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (CS.CV) 2026-06-11

Adapting Vision-Language Models from Iconic to Inclusive for Multi-Label Recognition Without Labels

Understanding multi-label images remains a challenging task in computer vision. With the rapid progress of vision-language multimodal learning, vision-language models (VLMs) enable zero-shot recognition without labeled data. However, due to their intrinsic design, these models often prioritize the most iconic object and omit other contextual positives. This intrinsic bias conflicts with the nature of multi-label learning, thereby limiting their applicability. In this work, we propose an unsupervised framework that adapts VLMs from iconic recognition toward inclusive understanding, enabling label-free multi-label image recognition. Our approach consists of two key stages, ``cutting'' and ``sewing'': In the cutting stage, we present the multi-sampling response estimator to prevent the model from concentrating only on one single object. In the second sewing stage, the multi-object blend adaptation is introduced to adjust the labels to better conform to the multi-label distribution while preserving the intrinsic characteristics of the original model within only one epoch. Extensive experiments show that our framework significantly outperforms existing unsupervised approaches on four public datasets, even surpassing several representative weakly supervised baselines. These results demonstrate the potential of adapting pre-trained VLMs for more comprehensive visual understanding without manual annotations. Our code is publicly available at https://github.com/iCVTEAM/TailorCLIP.

02.
arXiv (CS.CL) 2026-06-24

WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models

Recent decoder-only autoregressive text-to-speech (AR-TTS) models produce high-fidelity speech, but their memory and compute costs scale quadratically with sequence length due to full self-attention. In this paper, we propose WAND, Windowed Attention and Knowledge Distillation, a framework that adapts pretrained AR-TTS models to operate with constant computational and memory complexity. WAND separates the attention mechanism into two: persistent global attention over conditioning tokens and local sliding-window attention over generated tokens. To stabilize fine-tuning, we employ a curriculum learning strategy that progressively tightens the attention window. We further utilize knowledge distillation from a full-attention teacher to recover high-fidelity synthesis quality with high data efficiency. Evaluated on three modern AR-TTS models, WAND preserves the original quality while achieving up to 66.2% KV cache memory reduction and length-invariant, near-constant per-step latency.

03.
arXiv (CS.AI) 2026-06-16

Epileptic Seizure Detection in Separate Frequency Bands Using Feature Analysis and Graph Convolutional Neural Network (GCN) from Electroencephalogram (EEG) Signals

arXiv:2604.00163v2 Announce Type: replace-cross Abstract: Epileptic seizures are neurological disorders characterized by abnormal and excessive electrical activity in the brain, resulting in recurrent seizure events. Electroencephalogram (EEG) signals are widely used for seizure diagnosis due to their ability to capture temporal and spatial neural dynamics. While recent deep learning methods have achieved high detection accuracy, they often lack interpretability and neurophysiological relevance. This study presents a frequency-aware framework for epileptic seizure detection based on ictal-phase EEG analysis. The raw EEG signals are decomposed into five frequency bands (delta, theta, alpha, lower beta, and higher beta), and eleven discriminative features are extracted from each band. A graph convolutional neural network (GCN) is then employed to model spatial dependencies among EEG electrodes, represented as graph nodes. Experiments on the CHB-MIT scalp EEG dataset demonstrate high detection performance, achieving accuracies of 97.1%, 97.13%, 99.5%, 99.7%, and 51.4% across the respective frequency bands, with an overall broadband accuracy of 99.01%. The results highlight the strong discriminative capability of mid-frequency bands and reveal frequency-specific seizure patterns. The proposed approach improves interpretability and diagnostic precision compared to conventional broadband EEG-based methods.

04.
arXiv (CS.AI) 2026-06-12

An Explainable AI Assistant for Introductory Programming Education: Improving Feedback Reliability with Instructor-AI Collaboration

arXiv:2606.12425v1 Announce Type: cross Abstract: Active learning is widely recognized as an effective approach for improving learning outcomes in introductory programming courses. However, insufficient instructional support often limits students' access to timely, personalized feedback, which is crucial for mastering foundational programming concepts. Although recent advances in AI, particularly large language models, offer scalable opportunities for feedback, concerns about explainability and reliability remain. In this paper, we present an AI-driven classroom assistant that leverages an explainable AI model to analyze student code, map logical errors to instructor-identified misconceptions, and deliver instructor-authored feedback, thereby grounding reliability in instructor-defined pedagogical knowledge. To evaluate the effectiveness of our framework, we conducted an expert evaluation to examine its alignment with instructor-verified feedback and deployed the system in a classroom setting to assess students' perceptions of its usability. Results indicate that the assistant can provide accurate, instructor-verified feedback to students while fostering a positive experience.

05.
arXiv (CS.AI) 2026-06-16

An Attention Mechanism for Robust Multimodal Integration in a Global Workspace Architecture

arXiv:2602.08597v3 Announce Type: replace Abstract: Robust multimodal systems must remain effective when some modalities are noisy, degraded, or unreliable. Existing multimodal fusion methods often learn modality selection jointly with representation learning, making it difficult to determine whether robustness comes from the selector itself or from full end-to-end co-adaptation. Motivated by Global Workspace Theory (GWT), we study this question using a lightweight top-down modality selector operating on top of a frozen multimodal global workspace. We evaluate our method on two multimodal datasets of increasing complexity: Simple Shapes and MM-IMDb 1.0, under structured modality corruptions. The selector improves robustness while using far fewer trainable parameters than end-to-end attention baselines, and the learned selection strategy transfers better across downstream tasks, corruption regimes, and even to a previously unseen modality. Beyond explicit corruption settings, on the MM-IMDb 1.0 benchmark, we show that the same mechanism improves the global workspace over its no-attention counterpart and yields decent benchmark performance.

06.
arXiv (CS.AI) 2026-06-18

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

arXiv:2603.00656v2 Announce Type: replace Abstract: Real-world user requests to LLM agents are often underspecified. Agents must interact to acquire missing information and make correct downstream decisions. However, current multi-turn GRPO-based methods often rely on trajectory-level reward computation, which leads to credit assignment problems and insufficient advantage signals within rollout groups. A feasible approach is to identify valuable interaction turns at a fine granularity to drive more targeted learning. To address this, we introduce InfoPO (Information-Driven Policy Optimization), which frames multi-turn interaction as a process of active uncertainty reduction and computes an information-gain reward that credits turns whose feedback measurably changes the agent's subsequent action distribution compared to a masked-feedback counterfactual. It then combines this signal with task outcomes via an adaptive variance-gated fusion to identify information importance while maintaining task-oriented goal direction. Across diverse tasks, including intent clarification, collaborative coding, and tool-augmented decision making, InfoPO consistently outperforms prompting and multi-turn RL baselines. It also demonstrates robustness under user simulator shifts and generalizes effectively to environment-interactive tasks. Overall, InfoPO provides a principled and scalable mechanism for optimizing complex agent-user collaboration. Code is available at https://github.com/kfq20/InfoPO.

07.
arXiv (quant-ph) 2026-06-16

Finite-Dimensional Type I von Neumann Algebras in PyTorch: A GPU-Accelerated Framework for Random Block-Diagonal Operators

arXiv:2606.15882v1 Announce Type: cross Abstract: We present \texttt{torch\_vn\_algebra}, an open-source Python library built on PyTorch for numerical experiments with finite-dimensional Type I von Neumann algebras (direct sums of matrix algebras). The library provides: $\bullet$ a compact batched tensor representation $(B,C,k_{\max},k_{\max})$ that handles both Monte Carlo samples and multiple direct summands; $\bullet$ lazy evaluation of operators to avoid unnecessary memory allocation; $\bullet$ generation of random operators with arbitrary eigenvalue distributions (user-provided samplers) and various unitary ensembles (Haar, $\mathrm{SU}(n)$, COE, CSE, diagonal phases); $\bullet$ functional calculus via SVD (absolute value, square root, inverse, entropy) and a hybrid method for extreme eigenvalues (exact diagonalisation for $k_{\max}\le256$, otherwise power iteration); $\bullet$ three trace functionals (blunt, normalised subspace trace, and the von Neumann tracial state); $\bullet$ GPU-accelerated batched linear algebra for moderate-scale Monte Carlo studies (e.g., $2\times10^4$ samples of $100\times100$ operators). The library is validated against analytical expectations (Haar moments, trace properties). Performance benchmarks on a Tesla P100 GPU are presented and discussed. Limitations and future work are outlined. The code is open-source.

08.
bioRxiv (Bioinfo) 2026-06-14

Transposable elements as evolutionary substrates of proteindisorder in the human proteome

Intrinsically disordered regions (IDRs) are central contributors to protein function, evolution and human disease, yet the evolutionary routes that seed new disordered segments within pre-existing proteins are still poorly understood. Sequence insertions provide a powerful mechanism for disorder expansion, but the genomic donors of inserted IDR and its long-term conformational fate remain largely unknown. Transposable elements (TEs), abundant mobile genetic elements with distinctive compositional biases, represent compelling candidates for generating disorder within proteins. Here, we systematically mapped TE-derived segments across human proteins and isoforms, and we found that these insertions are strongly enriched in intrinsic disorder. The structural consequences of their insertion are shaped by TE class and family, reflecting the sequence biases of the elements from which they originate. Recent, Primate specific insertions preferentially generate disordered segments, whereas older insertions more frequently occupy ordered structural contexts, revealing an age-dependent transition in the conformational state of TE-derived sequences. TE-containing isoforms are expressed at lower levels than TE-free isoforms, particularly when insertions are young and disorder-rich, suggesting that intrinsic disorder may constrain the cellular tolerance of newly exonized sequences. These findings identify TEs as a major evolutionary mechanism linking genome mobility to the emergence of new disordered conformational ensembles in the human proteome.

10.
arXiv (CS.CL) 2026-06-11

Every Act Has Its Price: Compressed Moral Composition in Frontier LLMs

Existing LLM moral benchmarks usually ask which isolated moral act, value, or foundation a model prefers. This is useful but incomplete. Realistic judgments often require a model to combine several moral signals within the same option. We introduce **Moral Trolley Arena**, a two-stage blind ELO benchmark for measuring how LLMs compose moral evidence. The single-scene arena first calibrates individual moral acts from a 229-scenario corpus across five Moral Foundations Theory foundations; the composite arena then combines calibrated acts into two-act moral items over a controlled intensity grid and measures the resulting composite preferences. Across ten frontier models, composite judgments are largely predicted by component act strength, but the relation is consistently compressed rather than simply additive. Models also show non-additive intensity anchoring, bounded foundation-specific residuals after component control, and highly convergent composite preference surfaces across providers. These results suggest that moral audits should measure composition rules for moral evidence, not only rankings over isolated acts.

11.
arXiv (CS.CV) 2026-06-11

DynaTok: Token-Based 4D Reconstruction from Partial Point Clouds

We address 4D reconstruction from partial point cloud sequences, where depth-sensor observations are incomplete, unordered, and lack explicit temporal correspondences. This geometry-only setting is challenging due to missing observations and ambiguous dynamics. While recent progress has largely relied on image-based methods, existing point-based approaches typically focus on single objects, assume relatively complete inputs, or require explicit correspondences. To address these limitations, we propose DynaTok, a point-based framework for correspondence-free 4D reconstruction from partial point cloud sequences without images. DynaTok encodes frames into compact latent tokens, aggregates incomplete observations over time with a Transformer-based spatiotemporal encoder, and decouples geometry and motion through residual tokens in a unified model. A flow-matching decoder then reconstructs complete, temporally consistent 4D point-cloud sequences conditioned on the latent tokens. Experiments on object- and scene-level benchmarks demonstrate improved reconstruction quality and temporal coherence from partial point cloud observations. Project page: https://wrchen530.github.io/dynatok/.

12.
bioRxiv (Bioinfo) 2026-06-10

Promera: a unified model for biomolecular structure prediction, filtering, and design

Generative models have become staple tools for modeling and designing biomolecular structures. However, although these tools have improved in structural prediction accuracy, their ability to filter designed binders—an essential use case—remains insufficient; whereas design methods have focused more on unconstrained binder generation rather than capabilities enabled by controllable design. We introduce Promera, a unified generative model that combines all-atom structure prediction with improved filtering and controllable design. We find that Promera's confidence metrics are more accurate for filtering binders from non-binders for both miniproteins and nanobodies, while its co-folding performance surpasses popular open-source models (OpenFold3-p2, Boltz-2) on therapeutically relevant categories. As a design model, Promera generates binders by predicting masked protein sequences with optional epitope, paratope, and template constraints. Remarkably, our nanobody designs match the in silico success rates from backprop-based techniques (mBER) when evaluated under co-folding confidence filters. We further provide two in silico demonstrations of the the versatile capabilities of our design method: epitope targeting of the Andes hantavirus glycoprotein with VHHs and active state stabilization of the beta-2 andrenergic GPCR. We conclude by proposing a scaling law for co-folding models, suggesting a path for further performance improvement.

13.
arXiv (CS.CV) 2026-06-16

Beyond Scalar Rewards by Internalizing Reasoning into Score Distributions

Reward models are central to text-to-image post-training, but visual preference is subjective and better represented as a distribution over rubric scores than as a deterministic scalar. Existing scalar, score-token, and pairwise reward models over-compress uncertainty and fine-grained score differences, while reasoning-based generative rewards provide stronger judgments but are costly to deploy and difficult to use as direct optimization signals. We propose Z-Reward, a teacher-student reward modeling framework that decouples reasoning-heavy judgment from efficient reward deployment. The teacher is a large VLM that uses reasoning to infer rubric-aligned score distributions, and is trained with Group-wise Direct Score Optimization (GDSO), which combines policy-gradient rewards from distribution expectations with direct pointwise and pairwise supervision on score distributions and score gaps. The student is trained with Reasoning-Internalized Score Distillation (RISD), which transfers the teacher's reasoning-conditioned score distribution into a compact VLM without requiring explicit reasoning chains at inference time. On our internally annotated evaluation set, the 27B GDSO teacher reaches 89.6% human preference accuracy, outperforming SFT, RewardDance, and GRPO, while the 9B RISD student reaches 88.6%, outperforming the OPD baseline and closely matching the larger teacher. We further show that Z-Reward can serve as a differentiable reward signal for text-to-image optimization, yielding a 41.3% net human-preference improvement over the SFT baseline.

14.
arXiv (math.PR) 2026-06-18

A Unified Approach to Beta Moments, Combinatorial Identities, and Random Walks

arXiv:2605.05420v2 Announce Type: replace Abstract: The study of random walks has increasingly been popular across diverse disciplines such as statistics, mathematics, quantum physics, where they are used to model paths consisting of successive random steps in a mathematical space. A fundamental quantity of interest is the probability that a simple symmetric random walk returns to the origin after 2n steps. In this paper, we develop a unified probabilistic approach that connects the return probabilities in arbitrary dimensions with moment representations. Using this framework, we provide probabilistic proofs of several combinatorial identities involving beta and gamma functions, and derive new combinatorial identities in general dimensions.

15.
arXiv (CS.LG) 2026-06-17

Maximin Relative Improvement: Fair Learning as a Bargaining Problem

arXiv:2602.04155v2 Announce Type: replace-cross Abstract: When deploying a single predictor across multiple subpopulations, we propose a fundamentally different approach: interpreting group fairness as a bargaining problem among subpopulations. This game-theoretic perspective reveals that existing robust optimization methods such as minimizing worst-group loss or regret correspond to classical bargaining solutions and embody different fairness principles. We propose relative improvement, the ratio of actual risk reduction to potential reduction from a baseline predictor, which recovers the Kalai-Smorodinsky solution. Unlike absolute-scale methods that may not be comparable when groups have different potential predictability, relative improvement provides axiomatic justification including scale invariance and individual monotonicity. We establish finite-sample convergence guarantees under mild conditions.

16.
arXiv (CS.AI) 2026-06-16

Phase-Aware Guidance Injection for Recurrent MAPPO in Assembly-Line Disruption Recovery

arXiv:2606.16330v1 Announce Type: new Abstract: Disruption recovery in industrial assembly lines requires timely decisions under machine faults, worker absence, and emergency orders. Existing methods either rely on rigid handcrafted recovery logic or learn adaptive policies that do not readily exploit heterogeneous external recovery knowledge at decision time to reduce abnormal recovery time (ART) and preserve on-time delivery (OTD). To address this gap, we propose a phase-aware guidance injection framework that augments a trained recurrent MAPPO (RMAPPO) scheduling policy through logit-level action bias during evaluation. The framework provides a unified decision-time interface for rule-based, replay-based, and online LLM-based guidance, while activating intervention only during abnormal and recovery phases. Experiments on a custom AssemblyLineEnv show that high-quality rule guidance yields the strongest gains, replay-based guidance degrades smoothly under imperfect availability, and online LLM guidance still provides useful intermediate improvements. These results show that decision-time guidance injection can exploit heterogeneous recovery hints without redesigning the actor.

17.
arXiv (CS.AI) 2026-06-16

AI systems out-persuade expert humans

arXiv:2606.16475v1 Announce Type: cross Abstract: Many societal decisions are settled by contests of persuasion. Conversational AI is a powerful new entrant in these contests, but whether it can out-persuade skilled and highly incentivized humans has remained unclear. Here, in a series of four preregistered experiments (n = 18,978 conversations from 6,923 people), we pitted AI systems against a range of human persuaders, including laypeople, winners of a separately preregistered four-round online persuasion tournament, professional canvassers, and world championship debaters. We found that AI systems were reliably more persuasive than expert humans, even when expert humans chose their issues, researched in advance, underwent hours of live, structured practice, and were incentivized with {\pounds}1,000 cash bonuses. In a follow-up study, AI's advantage persisted after experts received a coaching tool that let them practice against the AI that beat them, review their performance history, and see what AI would have said at key moments. We found converging evidence that AI's advantage stemmed from rapidly deploying larger quantities of information: after coaching, expert humans could tie an AI constrained to respond at human speeds and with human-length messages. In a final study, we show that AI's advantage extends to consequential real-world behavior: AI was nearly 3x more effective than professional canvassers from a UK fundraising firm at raising real-money donations to Save the Children. Together, these results establish that frontier AI systems out-persuade expert humans in conversation, with significant implications for political communication.

18.
arXiv (CS.AI) 2026-06-24

A Simplex Witness Certificate and Escape Force for Constant Collapse in Variational Autoencoders

arXiv:2605.18224v4 Announce Type: replace-cross Abstract: We study exact constant collapse in variational autoencoders: the deterministic encoder mean becomes independent of the input. The prior remains the standard Gaussian. Before VAE training, we select a fixed teacher posterior from a GMM-based view of the data and attach a fixed latent-only simplex witness to the encoder mean. This construction yields two linked objects. The first is a certificate: if the witness prediction improves on the best constant predictor of the teacher, the encoder mean cannot be input-independent constant. The second is a local escape direction: on the collapsed manifold, the teacher residual gives a sample-dependent descent direction for the alignment loss. For any full-support teacher posterior, the same geometry also gives a closed-form latent code with zero teacher-witness alignment error. Its scaled versions trace a margin-energy path from the constant predictor to the exact teacher code, which quantifies non-collapse inside the protected witness subspace. We instantiate the method on MNIST, CIFAR-10, and CIFAR-100. With searched unsupervised PCA-GMM teachers, vanilla VAEs fail the teacher-witness certificate in all five seeds on CIFAR-10 and CIFAR-100, while RST variants pass in all five seeds. Under collapse-stress settings with \(\beta_{\mathrm{KL}}\in\{2,4,8\}\), vanilla VAE again fails in all seeds, whereas RST-alpha-prefit remains certificate-positive. Escape trajectories on both natural-image datasets increase the witness margin from a low-margin initialization and exhibit nonzero teacher-induced gradient norms. The analysis is confined to exact constant collapse of the encoder mean; generation quality, decoder use, and other collapse modes remain separate questions.

19.
arXiv (CS.AI) 2026-06-12

SymQNet: Amortized Acquisition for Low-Latency Adaptive Hamiltonian Learning

arXiv:2606.12808v1 Announce Type: cross Abstract: Adaptive Hamiltonian learning is central to calibrating and characterizing quantum devices. In an adaptive controller, choosing the next experiment is itself a computation. Bayesian design rules are recomputed after every posterior update, and that step can take seconds. Across hundreds of shots, those seconds become a significant wall-clock cost for adaptivity. We introduce SymQNet, an amortized reinforcement-learning approach for low-latency adaptive Hamiltonian learning. SymQNet learns a posterior-conditioned acquisition policy offline, then uses a fast policy forward pass online while retaining Bayesian posterior feedback. On transverse-field Ising benchmarks, SymQNet substantially reduces acquisition latency relative to bounded Fisher-information search and bounded two-step Bayesian active learning by disagreement (BALD). At five qubits, it reduces acquisition-only decision latency by $47.1\times$ and $72.6\times$ relative to these online baselines; at twelve qubits, full simulated steps take $1.02$ s for SymQNet versus $13.27$ s for bounded two-step BALD. Overall, we show that learned acquisition can make adaptive Hamiltonian learning practical for repeated low-latency workloads.

20.
arXiv (CS.LG) 2026-06-16

Distilling latent electrostatics from foundation machine learning interatomic potentials

arXiv:2606.15001v1 Announce Type: cross Abstract: Foundation machine learning interatomic potentials (MLIPs) have enabled atomistic simulations across broad regions of chemical and materials space, but many remain computationally expensive and lack explicit electrostatics, limiting their use for systems governed by long-range interactions and electrical response. Previously, we introduced Latent Ewald Summation (LES), which learns latent atomic charges and long-range electrostatics from density functional theory (DFT) energy and force labels alone. Here, we use LES to extract electrostatics that are latent in foundation models: energies and forces predicted by a teacher model are used to train a lightweight LES-augmented student MLIP, with optional fine-tuning on additional DFT data. The resulting models reduce computational cost while providing access to Born effective charge tensors, and infrared spectra. We benchmark student models distilled from a broad set of foundation MLIPs, including UMA, MACE, Orb, eSEN, GemNet-OC, PET, and EquiformerV2-based models, against experimental infrared spectra for liquid water, concentrated hydrochloric acid, and the anatase TiO2(101)-water interface. Across these systems, electrostatic response can be extracted from most foundation MLIPs. The benchmark further shows that the underlying DFT level and dataset used to train the teacher model play a larger role than architecture in determining electrostatic and spectroscopic accuracy. For the TiO2-water interface, fine-tuning with a modest amount of higher-level DFT data improves structural and infrared predictions. LES-based distillation therefore provides a practical route for converting foundation MLIPs into efficient, electrically responsive models, while also testing the physical fidelity encoded in foundation models.

21.
arXiv (CS.CL) 2026-06-19

PsyScore: A Psychometrically-Aware Framework for Trait-Adaptive Essay Scoring and ZPD-Scaffolded Feedback

Effective Automated Essay Scoring (AES) are expected to support both reliable assessment and actionable instructional feedback. However, existing approaches often treat scoring and feedback as separate components: neural scoring models provide limited interpretability, while Large Language Model (LLM)-based feedback is typically insensitive to learners proficiency levels. To address this fragmentation, this work proposes PsyScore, a psychometrically-aware framework that integrates diagnostic assessment with instructional scaffolding through a shared latent ability representation. PsyScore comprises three key modules: a Trait-Adaptive Neural IRT Scorer that incorporates the Graded Partial Credit Model (GPCM) into a neural architecture, enabling the precise estimation of student ability while maintaining psychometric interpretability, a ZPD-Scaffolded Feedback Generator, which conditions multi-agent feedback strategies on the diagnosed ability parameter to adapt instructional focus across different proficiency levels, and a Multi-Perspective Feedback Evaluation Strategy that assesses feedback quality via pairwise preference judgements and student revision simulations. Experiments on the ASAP++ dataset demonstrate that PsyScore achieves competitive scoring performance while providing more pedagogically aligned feedback.

22.
arXiv (CS.CV) 2026-06-11

The N-Body Problem: Parallel Execution from Single-Person Egocentric Video

Humans can intuitively parallelise complex activities, but can a model predict this from observing a single person? Given one egocentric video, we introduce the N-Body Problem: predicting how N individuals, can hypothetically perform the same set of tasks. The goal is to maximise speed-up, but naive assignment of video segments to individuals often violates real-world constraints, leading to physically impossible scenarios like two people using the same object or occupying the same space. To quantify this, we formalise the N-Body Problem and propose a suite of metrics to evaluate both performance (speed-up, task coverage) and feasibility (spatial collisions, object conflicts and causal constraints). As a proof of concept, we introduce a structured prompting strategy that guides a Vision-Language Model (VLM) to reason about the 3D environment, object usage, and temporal dependencies, producing a viable parallel execution. On 100 videos from EPIC-Kitchens and HD-EPIC, for $N = 2$, our structured prompt improves action coverage by 45% over a baseline prompt for Gemini 2.5 Pro, while simultaneously slashing collision rates, object and causal conflicts by 51%, 52% and 55% respectively.

23.
medRxiv (Medicine) 2026-06-11

Impact of Out-Migration and Remittances on Food Consumption Outcomes among Rural Households in Tigray, Ethiopia

作者:

This study examines the effects of rural out-migration and remittance inflows on food consumption outcomes among rural households in the Tigray region of Ethiopia. Utilizing household survey data collected from 521 rural households across three distinct Weredas (districts) (Tahtay Maichew, Kola Tembien, and Kilte-awlaelo). A Binary Probit model was employed to identify factors influencing migration decisions, while an Endogenous Switching Regression (ESR) model was used to estimate the impact of migration on food consumption outcomes while controlling for selection bias and unobserved heterogeneity. Food security was measured using the Food Consumption Score (FCS) and dietary diversity indicators. The empirical results reveal that severe food insecurity is widespread, with over 60% of all surveyed households falling into the "Poor" food consumption category. Descriptive baseline comparisons show that migration and remittance transfers marginally shift the raw average FCS upward from 23.86 to 25.48. However, this impact is profoundly nuanced: remittances serve as an immediate consumption-smoothing safety net but run parallel to a "labor-lost" constraint that reduces own-production capacities, forcing households to rely increasingly on market purchases for staple foods. The findings reveal that migration creates short-term labor shortages in agricultural production; however, remittance inflows substantially improve household food consumption frequencies, particularly for pulses, vegetables, and other nutrient-rich foods. After accounting for self-selection bias and unobserved traits, the rigorous ESR estimates indicate that migration increases the Food Consumption Score of participating households by an average Treatment Effect on the Treated (ATT) of 10.75 points, shifting them into more secure dietary tiers. Moreover, remittances help households mitigate the adverse effects of drought and other shocks by relaxing liquidity constraints and supporting both food purchases and agricultural investments. The study recommends establishing target food security safety nets for non-remittance households, promoting scale-appropriate labor-saving agricultural technologies, expanding traditional communal labor-sharing innovations, and boosting irrigation and agricultural input support programs to enhance rural food security and livelihood resilience.

24.
medRxiv (Medicine) 2026-06-13

Projected population level impact and cost-effectiveness of clinic and community-based tuberculosis screening approaches

The South Africa National Department of Health have set ambitious targets to scale up TB testing, focusing primarily on clinic attendees. In the context of declining funding for TB care and prevention, the most cost-effective approaches for targeting testing should be identified. We developed a mathematical model of TB in South Africa, explicitly incorporating clinic attendance by sex and HIV/ART status. We simulated six screening approaches over 2026-2035 (individually and in combination): three clinic-based (symptom screening, intensified targeted universal TB testing [TUTT, symptom-agnostic sputum testing of clinic attendees in key risk groups], and intensified TUTT allowing saliva samples) and three targeted community-based (community radiographic screening, symptom screening, and universal Xpert Ultra testing), each implemented at a range of coverage levels. Model outputs were combined with a mechanistic cost function to estimate potential impact and cost-effectiveness from a societal perspective. The most cost-effective standalone approach was community radiographic screening at 10% annual population coverage, with an incremental cost-effectiveness ratio (ICER) of $421 per disability-adjusted life year (DALY) averted. 10/11 scenarios along the expansion path included community radiographic screening at progressively higher coverage, combined with a clinic-based approach. Combining complementary approaches to reach both groups at increased risk of TB (e.g. clinic-based screening) and groups with lower screening coverage (e.g. community-based screening) may increase cost-effectiveness of TB screening, compared to standalone approaches. When designing TB screening strategies, both population risk and existing screening coverage should be considered.

25.
arXiv (CS.AI) 2026-06-24

LLMs Prompted for Legal Context Object More: Overrefusal from Small On-Premises LLMs in Criminal Legal Context

arXiv:2606.24585v1 Announce Type: new Abstract: While the validity of LLMs' use in the legal context remains subject to ethical and legal debate, legal professionals are already experimenting with personal LLMs, if only for translation and reformulation. However, even such a seemingly innocuous use can introduce biases through case processing speed if LLM assistants selectively refuse assistance on certain topics. To better anticipate such biases, we investigate several modern small LLMs that are most likely to be used as on-device assistants, to assess the impact of overrefusal on legal prompts. Surprisingly, we find that authority-style prefixes (``you are acting as an assistant of the national supreme court'', ``[...] defense lawyer'') systematically increase refusal rates by 2–20x over the no-prefix baseline, while a known role-play jailbreak prefix shows mixed effects, sharply increasing refusals in some models and barely shifting them in others. The finding suggests that small on-prem deployable LLMs are unstable under contextual framings that a real institutional user might naturally introduce, and further investigation is essential to minimize opportunities for bias.