Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-17

Differential Determinants of Past Behavior and Future Intention Regarding Voluntary Blood Donation: A Cross-Sectional Study of Knowledge, Attitudes, and Practices in Qingdao, China

Background A persistent gap between motivation and action threatens voluntary blood supply. This study examined the publics knowledge, attitudes, and practices (KAP) regarding blood donation, with a particular focus on identifying the different determinants of past blood donation behavior and future willingness to donate. Methods Convenience sampling was used to conduct a cross-sectional survey among 1,058 eligible people in Qingdao, China, between July and November 2025. Data were collected via a self-designed KAP questionnaire. To find independent characteristics linked to previous behavior and future intention, respectively, multivariable binary logistic regression was used. Results Overall, 37.0% of participants (n=391) had a lifetime donation history, while 39.2% (n=415) intended to donate in the next 12 months. Past behavior was positively associated with older age (36-45 years: OR=6.84; 95% CI: 3.21-14.58), higher education (OR=2.06; 95% CI: 1.33-3.17), and interpersonal interaction channels (OR=1.45; 95% CI: 1.01-2.09) but hindered by safety concerns (OR=0.23; 95% CI: 0.16-0.34). Conversely, future intention was positively correlated with male sex (OR=1.69; 95% CI: 1.24-2.29), prior donation history (OR=2.69; 95% CI: 1.87-3.86), having family members or friends in need of blood (OR=2.75; 95% CI: 1.96-3.85), and traditional media exposure (OR=3.33; 95% CI: 2.18-5.10). Higher education was adversely correlated with future intention (OR=0.55; 95% CI: 0.38-0.79). Conclusion There is a substantial disparity between donation motivation and action. The determinants of past behavior and future intention are asymmetric, suggesting that stage-specific interventions are required, using social mobilization for initiating first-time donations, while employing family reciprocity and authoritative communication to sustain long-term engagement.

02.
arXiv (CS.LG) 2026-06-18

Fair Online Resource Allocation

arXiv:2606.18679v1 Announce Type: cross Abstract: We study the problem of fair online resource allocation, motivated by applications such as refugee resettlement and airline scheduling, where agents arrive sequentially and must be assigned to facilities with limited capacities. We introduce a model that maximizes the overall welfare subject to resource constraints and a Lipschitz fairness requirement, which ensures that similar agents arriving in the same batch receive similar expected outcomes. We first analyze the offline problem, proving that the value of the optimal fair allocation is at least an $\Omega(1/\gamma)$ fraction of the optimal unfair allocation, where $\gamma$ is the fairness coefficient, thereby bounding the price of fairness. For the online setting, we propose an algorithm based on dual mirror descent that enforces fairness constraints within batches while estimating optimal dual variables. We prove that this algorithm achieves sublinear regret relative to the optimal offline fluid benchmark. Finally, we validate our theoretical results using real-world data from the Refugee Economies Programme, demonstrating the algorithm's performance and examining the trade-offs between welfare maximization and fairness enforcement.

03.
arXiv (CS.LG) 2026-06-19

Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates

arXiv:2606.19549v1 Announce Type: new Abstract: Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This late feedback is costly: adapters that are strong in isolation can interfere destructively once their updates are combined. We ask whether this outcome can be anticipated. We formalize adapter mergeability as the degree to which an adapter preserves its single-task utility after merging, and show that it can be forecast from signals measured in the first few percent of training – chiefly how the low-rank updates and their gradients align across tasks and how much they disturb shared representations. We package these signals into MergeProbe, a lightweight predictor that estimates pairwise and set-level retention and turns the estimate into a concrete decision: merge directly, reweight, prune, or route. On MERGE-PEFT, a five-domain benchmark spanning math, code, science, instruction following, and safety, MergeProbe attains the best average and worst-case retention among strong interference-aware merge baselines while adding far less deployment overhead than full task routing. This turns LoRA merging from a post-hoc engineering step into an anticipatory measurement problem.

04.
Nature (Science) 2026-06-08

Fifty years since a simple equation described the chaos of biology

An exploration of chaos theory in population dynamics showed that unpredictable systems can often be modelled using surprisingly simple mathematics. An exploration of chaos theory in population dynamics showed that unpredictable systems can often be modelled using surprisingly simple mathematics.

05.
arXiv (CS.CV) 2026-06-11

SceneMiner: Identity-Preserving Multi-Task Fine-Tuning for Unified BEV Scene Mining

Mining hard, safety-critical scenes from driving logs is bottlenecked by the absence of difficulty labels, and no single proxy, collision risk, trajectory ambiguity, or semantic rarity suffices to find such scenes on its own. We present SceneMiner, a unified, camera-only bird's-eye-view pipeline that emits complementary mining signals from a frozen vision-language backbone in a single forward pass, with no LiDAR or radar: a retrieval embedding for text-prompted scenario search, a multi-label scene-tag distribution, and a continuous physics-based risk score (a motion forecast is a byproduct, not a contribution). Building such a multi-head model exposes our central finding, a failure mode we term cross-task interference: adding or upgrading one head shifts a shared activation stream and degrades weight-frozen sibling heads, so freezing parameters alone is insufficient. Our contribution, identity-preserving multi-task fine-tuning, removes this interference by zero-initializing every new sub-module and freezing every parameter that feeds the shared stream. The mining heads are thereby preserved bit-identically while training only ~102k parameters. The tagging head reaches mAP 0.4614 (micro-F1 0.5557) on 20 scene tags by pooling each scene into 32 visual tokens, and the embedding head supports text-prompted retrieval, validated qualitatively. Code is available at: https://anonymous.4open.science/r/sceneminer_anonymous-64E5

06.
arXiv (quant-ph) 2026-06-15

Quantitative and Optimal Device-Independent Lower Bounds on Detection Efficiency

arXiv:2511.19302v2 Announce Type: replace Abstract: This paper examines a quantitative and optimal lower bound on the detector efficiency in a (2,2,2) Bell experiment within a fully device-independent framework, whereby the detectors used in the experiment are uncharacterized. We provide a tight lower bound on the minimum efficiency required to observe a desired Bell-CHSH violation using the Navascués-Pironio-Acín (NPA) hierarchy, confirming tightness up to four decimal places with numerical optimization over explicit quantum realizations. We then introduce the effect of dark counts and demonstrate how to quantify the minimum required efficiency to observe a desired CHSH violation with an increasing dark count error. Finally, to obtain an analytical closed-form expression of the minimum efficiency, we consider the set of no-signaling behaviors that satisfy the Tsirelson bound, which are easier to characterize than the quantum set. Using such behaviors, we find a simple closed-form expression for a lower bound on the minimum efficiency which is monotonically increasing with the CHSH violation, though the analytically obtained lower bounds are meaningfully below the numerically tight lower bound.

07.
arXiv (CS.CV) 2026-06-11

Cross-Modal Benchmarking for Robotic Perception in Natural Environments

Natural environments present a complex challenge to robotics perception systems. Current models, particularly vision foundation models, are largely trained on structured, urban environments leading to weaknesses in their perception for field robotics tasks. We showcase the limitations of current models using our recently released WildCross benchmark, a new cross-modal benchmark for place recognition and metric depth estimation in large-scale natural environments. WildCross comprises over 476K sequential RGB frames with semi-dense depth and surface normal annotations, each aligned with accurate 6DoF pose and synchronized dense lidar submaps. In this work, we provide an expanded analysis of the benchmark results from the recent WildCross benchmark, with particular emphasis on expanded metric depth estimation experiments. Access to the code repository and dataset for this work can be found at https://csiro-robotics.github.io/WildCross.

08.
arXiv (CS.CV) 2026-06-16

Planning with Unified Multimodal Models

With the powerful reasoning capabilities of large language models (LLMs) and vision-language models (VLMs), many recent works have explored using them for decision-making. However, most of these approaches rely solely on language-based reasoning, which limits their ability to reason and make informed decisions. Recently, a promising new direction has emerged with unified multimodal models (UMMs), which support both multimodal inputs and outputs. We believe such models have greater potential for decision-making by enabling reasoning through generated visual content. To this end, we propose Uni-Plan, a planning framework built on UMMs. Within this framework, a single model simultaneously serves as the policy, dynamics model, and value function. In addition, to avoid hallucinations in dynamics predictions, we present a novel approach self-discriminated filtering, where the generative model serves as a self-discriminator to filter out invalid dynamics predictions. Experiments on embodied decision-making tasks show that Uni-Plan substantially improves success rates compared to VLM-based methods, while also showing strong data scalability, requiring no expert demonstrations and achieving better performance under the same training-data size. This work lays a foundation for future research in reasoning and decision-making with UMMs.

09.
arXiv (CS.AI) 2026-06-17

Agentic Discovery of Non-Canonical Antimicrobial Peptides with AMPGAN v3

arXiv:2606.17127v1 Announce Type: cross Abstract: Antimicrobial resistance causes to over a million deaths annually. Antimicrobial peptides (AMPs) are a promising solution, but generative AMP models are not yet ready to design peptides with non-natural amino acids and/or chemical modifications, which are essential for real-world peptide drugs. We present AMPGAN v3, a multi-objective conditional GAN that expands the generative vocabulary to D-amino acids and N/C-terminus modifications such as amidation. By separating adversarial and activity-aware supervision across two specialized discriminators, AMPGAN v3 substantially improves training stability and outperforms prior generative AMP models on external classifiers. We validated five candidates spanning three structural classes in vitro; two showed activity against Gram-positive strains, with the best candidate reaching MIC 8 {\mu}g/mL against B. subtilis. To support downstream curation, we further present PepCraft, a multi-agent framework for end-to-end AMP discovery in which a Planning Agent orchestrates specialized executors for generation, filtering, and verification. Its prioritization recommendations align with our in vitro outcomes. Together, these contributions let us examine, on a small but real scale, how generative and agentic AI compose in therapeutic peptide discovery. Code: https://github.com/marszzibros/AMPGANv3

10.
arXiv (CS.AI) 2026-06-18

A Hybrid LSTM–Vision Transformer Architecture for Predicting HRRR Forecast Errors

arXiv:2606.19026v1 Announce Type: cross Abstract: Forecast errors in high-resolution numerical weather prediction (NWP) systems are often linked to unresolved planetary boundary layer (PBL) processes, convection, terrain-induced circulations, and other vertically structured atmospheric phenomena. Previous work demonstrated that Long Short-Term Memory (LSTM) networks can successfully predict forecast errors in the High-Resolution Rapid Refresh (HRRR) model using mesonet observations, but we believe performance degradation is linked to periods of complex vertical atmospheric evolution. To address this limitation, we develop a hybrid LSTM-Vision Transformer (LSTM-ViT) framework that combines temporal sequence learning from surface observations with atmospheric profiles from the New York State Mesonet profiler network. The LSTM-ViT framework is trained to predict HRRR hourly precipitation, 10 m wind speed, and 2 m temperature forecast errors at individual mesonet stations. Across all three predictors, incorporation of profiler-derived atmospheric structure improves forecast error prediction skill relative to the baseline LSTM architecture, with the largest gains occurring at shorter forecast lead times and during periods of enhanced PBL activity. Improvements are particularly pronounced for precipitation forecast error, where the LSTM-ViT framework achieves approximately a twofold increase in predictive skill relative to the baseline LSTM while better capturing convectively driven error evolution and reducing degradation associated with PBL processes. These results demonstrate that combining temporal sequence learning with vertically informed attention mechanisms provides a physically meaningful pathway for improving forecast error prediction in operational NWP systems. Our research offers forecasters enhanced guidance regarding model bias and forecast confidence.

11.
medRxiv (Medicine) 2026-06-15

Supporting people to access social security payments through the Special Rules for End of Life: a qualitative study of the perspectives of patients, carers and health care professionals

Background: People living with terminal illness face a double financial burden from additional costs and loss of earning for themselves and their carers. Social security benefits are intended to help alleviate some of this financial pressure, and in the UK and other countries people are eligible for fast-tracked access to financial support via the Special Rules for End of Life. One in 3 people who are eligible miss out on this support, yet there is limited evidence on the reasons for this take-up deficit. Objectives: The aim of this study is to understand the barriers and facilitators to claiming benefits for terminally ill people from the perspectives of patients, carers, and health care professionals. Methods: This is a qualitative study combining i) focus groups with healthcare professionals recruited via professional networks and social media, and ii) interviews with patients and carers recruited in hospital and hospice settings. We analysed the data using Practical Thematic Analysis Results: Fifty-five multidisciplinary healthcare professionals participated in 11 focus groups, and we interviewed 10 patients and carers. We constructed five descriptive themes to summarise the data: Navigating priorities and uncertainty; positive impacts alongside a sense of shame and stigma; talking about money, difficulties and dividends; everybodys, yet nobodys, responsibility; and sticking points in the system. Conclusion: The themes reveal several challenges that may contribute to people not taking up this financial support. However, discussions about access to benefits were also seen as a core part of holistic care, a positive way to offer support and a gateway to other discussions about end-of-life care preferences and decisions. Recommendations for policy and practice include evaluating the adoption of a diagnostic rather than a prognostic eligibility criteria, integrating discussions about benefits into existing processes such as advance care planning, and improving education and support for clinicians.

14.
arXiv (CS.LG) 2026-06-18

UST-GNN: A Unified Spatial–Topological Graph Neural Network Framework for Urban Analytics–Demonstrated through a Case Study on Urban Health Prediction

arXiv:2504.04739v3 Announce Type: replace Abstract: Understanding how social, demographic, environmental, and spatial factors jointly shape urban outcomes is essential for sustainable urban development and evidence-based policy. Traditional statistical approaches often struggle to capture complex non-linear relationships, while many machine learning methods overlook the joint roles of spatial autocorrelation and network topology in urban systems. Recent advances in GeoAI have addressed these challenges only partially, often treating spatial effects, graph structure, evaluation, and interpretability separately. We present UST-GNN, a unified spatial–topological graph neural network framework that integrates neighbourhood connectivity, heterogeneous urban features, and positional/locational embeddings into a single representation. Using the MedSAT dataset, which contains over 150 environmental and socio-demographic variables and six prescription outcomes across 4,835 neighbourhoods in Greater London, UST-GNN outperforms strong statistical, geographically enhanced, and graph Machine Learning baselines, improving out-of-sample $R^2$ by 8.4–13.2\% under strict spatial cross-validation. We further introduce a lightweight principal-component module to interpret learned node embeddings geographically and relate them to policy-relevant covariates. The resulting analyses recover established patterns, offer new perspectives on debated associations, and reveal novel predictors warranting further causal investigation. Together, these findings demonstrate the value of graph-based spatial machine learning for urban health analytics, environmental inequality assessment, and evidence-based urban policy. Beyond predictive gains, UST-GNN provides a unified GeoAI analytical pipeline that can be embedded into urban digital twin workflows for scenario testing, monitoring, and data-informed decision-making for healthier, more sustainable cities.

15.
arXiv (CS.LG) 2026-06-18

On the Stability of the Jacobian Matrix in Deep Neural Networks

arXiv:2506.08764v3 Announce Type: replace Abstract: Deep neural networks are known to suffer from exploding or vanishing gradients as depth increases, a phenomenon closely tied to the spectral behavior of the input-output Jacobian. Prior work has identified critical initialization schemes that ensure Jacobian stability, but these analyses are typically restricted to fully connected networks with i.i.d. weights. In this work, we go significantly beyond these limitations: we establish a general stability theorem for deep neural networks that accommodates sparsity (such as that introduced by pruning) and non-i.i.d., weakly correlated weights (e.g. induced by training). Our results rely on recent advances in random matrix theory, and provide rigorous guarantees for spectral stability in a much broader class of network models. This extends the theoretical foundation for initialization schemes in modern neural networks with structured and dependent randomness.

16.
arXiv (CS.AI) 2026-06-15

Hybrid Open-Ended Tri-Evolution Makes Better Deep Researcher

arXiv:2606.13710v1 Announce Type: new Abstract: Deep research and agent evolution serve as de-facto tasks for AI agents in real-world applications toward artificial general intelligence. The former enables autonomous retrieval and integration of information in open-ended environments to tackle open-ended research tasks, yet it is constrained by the static parametric deep research capabilities of agent systems. The latter allows agents to autonomously interact with the environment to gain experiences that evolve model capabilities. However, its effectiveness has been widely validated only on verifiable tasks with standard answers, leaving a gap with open-ended research tasks. To bridge these two critical tasks, we propose the Hybrid Open-Ended Tri-Evolution (HOTE) framework, which leverages hybrid-mode reinforcement learning to facilitate the collaborative evolution of a proposer, solver and judge based on web-scale knowledge, moving toward autonomous evolving agents in open-ended tasks and environments. Extensive experiments on three long-form deep research benchmarks demonstrate that the 8B model trained via HOTE surpasses the strongest static open 8-32B models as well as those trained by state-of-the-art deep research training methods with less time overhead, and further verify that the evolution of all three modules in HOTE is indispensable.

17.
arXiv (CS.LG) 2026-06-19

A Solver-Free Training Method for Predict-then-Optimize

arXiv:2606.19587v1 Announce Type: cross Abstract: We propose a scalable method for training prediction (machine learning) models in the predict-then-optimize paradigm, where model outputs serve as coefficients for a subsequent linear optimization task. Directly minimizing the empirical decision regret is intractable for linear programming and combinatorial optimization since the decision mapping is piecewise constant, and the gradients are zero almost everywhere. While existing methods address this by smoothing the differentiation process, they suffer from scalability issues, since a computationally expensive solver call is required for every gradient evaluation. To address this, we propose a decision-focused learning pipeline based on a measure transformation principle, which yields a new surrogate loss that is completely optimization-solver-free during training. We establish theoretical guarantees, including Fisher consistency and excess risk bounds. Empirically, our method achieves decision quality competitive with state-of-the-art methods while reducing training time by orders of magnitude.

18.
Nature (Science) 2026-06-17

Probing picometre-scale interlayer deformations via hyperbolic polaritons

作者:

The resilience of van der Waals (vdW) materials to large strain fields makes them an ideal platform for tuning electronic, optical and magnetic properties1–4. Although in-plane strain is readily mapped, non-invasive and quantitative characterization of out-of-plane strain remains a formidable challenge, particularly for picometre-scale deformations buried at interfaces. Here we demonstrate a polaritonic optical method that uses the mid-infrared out-of-plane hyperbolic polaritons (oHPs) mode to detect interlayer deformations in prototypical vdW polar insulator–hexagonal boron nitride (hBN). This method uses the softening mechanism of out-of-plane transverse optical (oTO) phonons induced by interlayer strain, enabling highly sensitive detection of picometre-scale deformations. Although these oTO phonon modes are typically spectroscopically ‘dark’, their strain response is activated through the oHPs, achieving an atomic displacement sensitivity of about 10 pm (about 8 × 10−7 times the probing wavelength), enabling ultradeep-subwavelength mechanical interlayer deformation detection. This is experimentally validated in both planar hBN and at the buried interface of quantum dot–hBN nanotube heterostructures. This polariton-based picometrology bridges nanomechanics and photonics, providing a non-destructive lens to visualize hidden stress landscapes with atomic precision. A new polaritonic optical method that uses the mid-infrared out-of-plane hyperbolic polaritons mode is described and experimentally validated to allow the examination of picometre-scale interlayer deformations, providing a bridge between nanomechanics and photonics.

19.
arXiv (CS.CV) 2026-06-18

Domain Generalizable Adaptation of 3D Vision-Language Models via Regularized Fine-Tuning

Domain adaptation remains a central challenge in 3D vision, especially for multimodal foundation models that align 3D point clouds with visual and textual data. While these models demonstrate strong general capabilities, adapting them to downstream domains with limited data often leads to overfitting and catastrophic forgetting. To address this, we introduce ReFine3D, a regularized fine-tuning framework designed for domain-generalizable tuning of 3D large multimodal models (LMMs). ReFine3D combines selective layer tuning with two targeted regularization strategies: multi-view consistency across augmented point clouds and text diversity through synonym-based prompts generated by large language models. Additionally, we incorporate point-rendered vision supervision and a test-time augmentation mechanism with confidence-based aggregation to further enhance robustness. Extensive experiments across different 3D domain generalization benchmarks show that ReFine3D improves base-to-novel class generalization by 1.36%, cross-dataset transfer by 2.43%, robustness to corruption by 1.80%, and few-shot accuracy by up to 3.11%, outperforming prior state-of-the-art methods with minimal added computational overhead.

20.
arXiv (quant-ph) 2026-06-17

Quantum conditional entropies from convex trace functionals

arXiv:2410.21976v4 Announce Type: replace Abstract: We study geometric properties of trace functionals that generalize those in [Zhang, Adv. Math. 365:107053 (2020)], arising from a novel family of conditional entropies with applications in quantum information. Building on new convexity results for these functionals, we establish data-processing inequalities and additivity properties for our entropies, demonstrating their operational significance. We further prove completeness under duality, chain rules, and various monotonicity properties for this family. Our proofs draw on tools from complex interpolation theory, multivariate Araki–Lieb and Lieb–Thirring inequalities, variational characterizations of trace functionals, and spectral pinching techniques.

21.
Nature (Science) 2026-06-12

Daily briefing: How Venus flytraps snap shut

作者:

Softening cells enable flytraps to shut with astonishing speed. Plus, the cutting-edge science happening at the World Cup and why scientists shouldn’t ignore the Pope’s AI message. Softening cells enable flytraps to shut with astonishing speed. Plus, the cutting-edge science happening at the World Cup and why scientists shouldn’t ignore the Pope’s AI message.

22.
arXiv (CS.LG) 2026-06-17

Reducing Learner Redundancy in Boosting via Residual Orthogonalization

arXiv:2606.17567v1 Announce Type: new Abstract: While sequential residual fitting is the bedrock of standard boosting frameworks, it inherently breeds learner redundancy by repeatedly revisiting correlated error components. To address this bottleneck, we propose a shift from residual fitting to residual orthogonalization and introduce SCBoost. Our framework tackles redundancy through two complementary mechanisms: Spectral Residual Projection (SRP) and Covariance-Regularized Weighting (CRW). During training, SRP projects each residual target onto the orthogonal complement of the historical prediction subspace, forcing successive learners to capture only novel empirical innovations. During aggregation, CRW optimizes ensemble weights on a validation set with an explicit covariance penalty to mitigate remaining correlations. Theoretically, we provide a finite-sample geometric characterization proving that SRP yields an exact additive residual-energy decomposition. Furthermore, under an isotropic-noise assumption, we rigorously establish the conditions under which this projection improves the effective Signal-to-Noise Ratio. Extensive experiments across ten benchmark datasets demonstrate that SCBoost delivers strong out-of-the-box performance, particularly in accuracy and F1 score. This work reinterprets boosting through a geometric lens, suggesting that explicit redundancy control is a principled and necessary step toward more efficient ensemble architectures.

23.
arXiv (CS.AI) 2026-06-16

ARB4WM: An Adversarial Robustness Benchmark for World Models in Continuous Control

arXiv:2606.16605v1 Announce Type: new Abstract: World models are widely used in robotic and agentic engineering control systems due to their ability to learn latent dynamics for planning and decision-making. As these systems are increasingly deployed in safety-critical settings, understanding their robustness under adversarial conditions has become essential. However, existing evaluations lack a unified benchmark for testing adversarial threats across the policy, value, and latent-dynamics levels of world-model agents. To fill this gap, we present ARB4WM, a unified evaluation framework for pre-deployment robustness and risk assessment of world-model agents under visual perturbations. ARB4WM defines five white-box loss objectives across these three levels and studies their effects when combined with single-step or multi-step perturbation strategies and temporal attack modes, including full-frame, half-sequence, and sparse-frame exposure. Specifically, we evaluate four Dreamer-style agents across 20 tasks from MetaWorld and the DeepMind Control Suite under different loss objectives, perturbation strategies, and temporal attack modes. Results show that attacks targeting value estimation, latent representations, and RSSM dynamics can be as damaging as direct policy disruption, and that early or frequent perturbations are especially harmful, while input-level defenses provide limited recovery under adaptive attacks. These findings suggest that safety, risk, and reliability assessment for world models should cover multiple component-oriented attack objectives and temporal exposure protocols rather than relying solely on action-space robustness. Source code is available at https://github.com/zaoanguai/ARB4WM.

24.
arXiv (CS.AI) 2026-06-19

BrainG3N: A Dual-Purpose Tokenizer for Controllable 3D Brain MRI Generation

arXiv:2606.19651v1 Announce Type: new Abstract: Three-dimensional (3D) brain MRI is central to clinical neurology and neuro-oncology, where generative models could augment under-represented cohorts, simulate disease trajectories, and support privacy-preserving data sharing. Latent diffusion has been the go-to solution for modeling imaging data, but it places two competing demands on the tokenizer: encoder embeddings must retain the clinical information that downstream tasks act on, and the decoder must reconstruct anatomically faithful volumes. Existing reconstruction-driven tokenizers achieve the second at the expense of the first. To address this, we introduce a fully volumetric masked-autoencoder (MAE) based tokenizer for 3D brain MRI latent diffusion, decoupling encoder and decoder: a frozen 3D MAE encoder produces clinically informative embeddings, while a dedicated CNN decoder reconstructs voxels from a linear projection of those embeddings. We pretrain the encoder on 35,309 volumes from 18 public cohorts spanning four modalities, ten disease categories, and 200+ acquisition sites, and demonstrate its dual utility in two settings. First, on a 23-task linear-probing benchmark, the encoder outperforms or matches SOTA models (i.e., BrainIAC, BrainSegFounder, and MedicalNet) on 21 of 23 tasks. Second, a conditional diffusion transformer (DiT) trained on these clinically informative embeddings supports both conditional generation across six variables and patient-specific longitudinal forecasting. Together these results establish a single 3D brain-MRI embedding space capable of both downstream clinical tasks and controllable generation.

25.
arXiv (CS.AI) 2026-06-12

Humor Style Drives Laughter, Topic Shapes Acceptability: Evaluating Bilingual Personal and Political Robot-Delivered AI Jokes

arXiv:2606.13256v1 Announce Type: cross Abstract: Humor plays a central role in human social relationships, and recent advances in computational humor create new opportunities for integrating humor into human-robot interaction (HRI). While large language models (LLMs) can generate diverse forms of humor, it remains unclear how humor style, joke content, and language preference shape perceptions of robot-delivered humor in group settings. In this exploratory study, we employed a mixed factorial design in which participants evaluated AI-generated jokes delivered by a robot in a university classroom. We examined the effects of humor type (Affiliative, Self-Enhancing, Aggressive, Self-Defeating) and joke content (person-related vs. political) on perceived funniness and appropriateness, as well as preferred language. Results show that humor type significantly influences funniness, with Aggressive and Affiliative humor rated higher, while joke content primarily affects appropriateness, with person-related jokes preferred over political ones. Language preference was shaped by both joke content and participants' self-reported fluency and humor practices.