Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.AI) 2026-06-16

An Integrated System for Real-Time Student Assessment and Career Guidance Using Neural Networks in Computing Disciplines

arXiv:2606.15831v1 Announce Type: new Abstract: Many undergraduate students in Computer Science (CS) and Software Engineering (SWE) struggle to identify suitable career paths, particularly when their academic performance, abilities, and interests do not fully align. To address this issue, this study proposes an AI-driven Student Assessment and Career Prediction System that integrates a Career Guidance Expert (CGE) system with a Web-Based Student Assessment (WBSA) platform. Within the integrated framework, CGE enhances personalized career recommendations using AI while also assisting students after graduation in identifying suitable jobs, research domains, and higher study opportunities aligned with their skills and interests. The WBSA platform further strengthens interaction between students and faculty through assessments, personalized tasks, mentorship activities, and a secure real-time chat application. The CGE system employs a Multilayer Perceptron (MLP) model trained on real-world academic and extracurricular data collected using the snowball sampling method from the students of universities, achieving a validation accuracy of 94.71% in predicting personalized career paths. A pre-survey was conducted across universities to evaluate the proposed model before deployment. The WBSA system was developed as a modern web application using technologies such as Node.js, Next.js, and PostgreSQL to ensure scalability, responsiveness, and secure data management. The overall system is supported by a secure cloud-based infrastructure, the platform provides reliable performance while assisting graduates to select suitable career path in IT sector. In addition, a post-survey involving both students and faculty was conducted to gather feedback and further improve the overall effectiveness and usability of the system.

02.
arXiv (quant-ph) 2026-06-16

Adaptively secure unitary designs with constant non-Clifford cost

arXiv:2510.08129v2 Announce Type: replace Abstract: Randomness is a fundamental resource in quantum information, with crucial applications in cryptography, algorithms, and error correction. A central challenge is to construct unitary $k$-designs that closely approximate Haar-random unitaries while minimizing the costly use of non-Clifford operations. In this work, we present a protocol able to generate unitary $k$-designs on $n$ qubits, secure against any adversarial quantum measurement, with a system-size-independent number of non-Clifford gates. Our construction applies a $k$-design only to a subsystem of size $\Theta(k)$, independent of $n$. This ``seed'' design is then ``diluted'' across the entire $n$-qubit system by sandwiching it between two random Clifford operators. The resulting ensemble forms an $\varepsilon$-approximate unitary $k$-design on $n$ qubits. We prove that this construction achieves full quantum security against adaptive adversaries using only $\tilde{O}(k^2 \log\varepsilon^{-1})$ non-Clifford gates. If one requires security only against polynomial-time adaptive adversaries, the non-Clifford cost decreases to $\tilde{O}(k + \log^{1+c} \varepsilon^{-1})$. This is optimal, since we show that at least $\Omega(k)$ non-Clifford gates are required in this setting. Compared to existing approaches, our method significantly reduces non-Clifford overhead while strengthening security guarantees to adaptive security as well as removing artificial assumptions between $n$ and $k$. These results make high-order unitary designs practically attainable in near-term fault-tolerant quantum architectures.

03.
medRxiv (Medicine) 2026-06-24

Towards a Robust cell-free DNA Isolation Protocol for NGS Applications in a Clinical Molecular Diagnostics Setting

Cell-free DNA (cfDNA), released from apoptotic and necrotic cells into body fluids, represents a non-invasive source of genetic information for disease prediction, diagnosis, and monitoring. However, its low physiological abundance makes cfDNA highly susceptible to pre-analytical influences. In particular, genomic DNA (gDNA) released from lysed white blood cells (WBCs) can contaminate plasma and compromise downstream cfDNA analyses. This study evaluated the impact of different blood collection tubes and isolation methods on cfDNA stability and yield. Blood samples from 13 healthy donors were collected using cfDNA-stabilizing tubes (Cell-Free DNA BCT, Streck; S-Monovette cfDNA Exact, Sarstedt) and stored at room temperature for 1, 5, or 10 days before plasma isolation. CfDNA was extracted using either a magnetic bead-based method or a silica column-based approach. DNA quantity and quality were assessed by fluorometric quantification, automated fragment analysis, and gene-specific quantitative PCR. Streck-based workflows maintained stable cfDNA yields and characteristic mononucleosomal fragmentation profiles across all storage times. In contrast, Sarstedt tubes showed reduced cfDNA concentrations after 5 days and a pronounced increase at 10 Days, accompanied by high-molecular weight DNA patterns consistent with WBC lysis. These trends were largely independent of the extraction method. Overall, the results demonstrate that blood collection tube chemistry critically influences cfDNA integrity during delayed processing. Streck tubes, particularly when combined with QIAamp, provided the most robust and reproducible workflow for routine molecular diagnostics, whereas Sarstedt tubes produced physiologically implausible results after extended storage.

04.
bioRxiv (Bioinfo) 2026-06-15

SMS: Symmetric Mediation Statistics for Powerful High-Dimensional Mediation Analysis

Background: Mediation analysis of high-dimensional features, particularly molecular-level omics features, provides important opportunities to uncover biological mechanisms underlying human health and disease. However, two central statistical challenges remain: testing the composite-null hypothesis and maintaining power when the exposure-mediator and mediator-outcome associations differ substantially in statistical significance. Existing methods typically rely on accurate estimation of the proportions of the three null types or on the maximum of the two association p-values, and may not always control the FDR well and may have limited power under imbalanced significance. Methods: We propose SMS, a new statistical framework based on symmetric mediation statistics. By exploiting symmetry, SMS calibrates the composite null distribution as a whole for FDR control. It also allows flexible combinations of the two association p-values, including the maximum, and then enables construction of an omnibus test. Moreover, it permits direct use of effect-size estimates, bypassing the need to compute p-values. Results: SMS controlled the FDR across a wide range of simulation scenarios while achieving a substantial sensitivity gain, often around 20 percentage points, over existing methods including HDMT, DACT, and DEI-B. Applications to a metabolomics dataset and a DNA methylation dataset further corroborated these findings. Notably, SMS discovered five plausible mediators in the metabolomics dataset that were missed by all existing methods considered.

05.
arXiv (quant-ph) 2026-06-12

Quantum Error Correction Codes for Truncated SU(2) Lattice Gauge Theories

Authors:

arXiv:2511.13721v2 Announce Type: replace Abstract: We construct two quantum error correction codes for pure SU(2) lattice gauge theory in the electric basis truncated at the electric flux $j_max=1/2$, which are applicable on quasi-1D plaquette chains, 2D honeycomb and 3D triamond and hyperhoneycomb lattices. The first code converts Gauss's law at each vertex into a stabilizer while the second only uses half of the vertices and is locally the carbon code. Both codes are able to correct single-qubit errors. The electric and magnetic terms in the SU(2) Hamiltonian are expressed in terms of logical gates in both codes. The logical-gate Hamiltonian in the first code exactly matches the spin Hamiltonian for gauge singlet states found in previous work.

06.
arXiv (CS.LG) 2026-06-18

Beyond Prediction: Tail-Aware Scheduling for LLM Inference

arXiv:2606.18431v1 Announce Type: new Abstract: LLM serving exhibits extreme length variability, making size-based scheduling difficult in practice. Recent LLM schedulers approximate SJF/SRPT using predicted decode lengths or ranks and primarily report mean-centric metrics such as TTFT and TBT. We show that these prediction-driven policies can be fragile under distribution shifts, bursty arrivals, and GPU memory pressure, while offering limited control over the tail latency (P90-P99) that dominates user experience, even with perfect decode-length knowledge. We introduce a distribution-aware, prediction-free scheduling framework that replaces explicit length prediction with soft priority boosting driven by lightweight statistical signals. Our design co-optimizes scheduling and cache-aware preemption to account for memory-coupled decode dynamics across workload mixes. Evaluated on production and open-source traces, our method reduces P99 TTLT by up to 35-50% relative to SRPT with perfect length knowledge and reduces TTFT by 34-47% across workloads, including reasoning-heavy and chat-heavy tasks. These results demonstrate a robust alternative for optimizing tail latency in online LLM serving.

07.
arXiv (CS.CV) 2026-06-16

Task-Instructed Causal Routing of Vision Foundation Models for Multi-Task Learning

Vision foundation models (VFMs) have demonstrated strong robustness and transferability across a wide range of visual tasks. However, each model typically encodes strong inductive biases shaped by its pre-training objective and data domain, resulting in fragmented yet complementary visual knowledge. As a result, a single model often struggles to capture the diverse visual representations required across multiple dense prediction tasks. To address this limitation, we propose TIGER (Task-Instruction-Guided Expert Routing), a framework that coordinates multiple heterogeneous VFMs for multi-task dense prediction. Instead of naively aggregating expert features, TIGER leverages natural-language task instructions to guide a routing network that assigns token-level expert weights conditioned on task semantics, enabling adaptive integration of complementary expert features. TIGER further introduces a counterfactual loss that aligns routing decisions with each expert's causal contribution by measuring prediction changes when experts are excluded, encouraging more reliable and interpretable routing. We evaluate TIGER on two multi-task dense prediction benchmarks, NYUD-v2 and Pascal Context, where it consistently outperforms recent multi-task learning baselines while keeping all VFMs frozen. These results demonstrate that combining instruction-guided expert routing with counterfactual causal alignment enables effective coordination of heterogeneous vision foundation models.

08.
arXiv (CS.CL) 2026-06-19

How LLMs Fail and Generalize in RTL Coding for Hardware Design?

Translating sequential programming priors into the parallel temporal logic of hardware design remains a crucial bottleneck for large language models(LLM). To investigate this, we introduce a new error taxonomy grounded in problem solvability, inspired by cognitive theory. Our taxonomy categorizes failures into syntactic, semantic, solvable functional, and unsolvable functional types. Evaluations reveal a strict empirical ceiling on the VerilogEval benchmark, as frontier models plateau at a 90.8% initial pass rate. These plateaus are defined by unsolvable functional errors, exposing persistent knowledge gaps immune to test time compute scaling. Furthermore, we expose a striking surface convergence gap: optimization readily eliminates syntax errors but concurrently exacerbates deeper functional failures. Our findings demonstrate that alignment techniques merely teach models to compile. While repeated sampling strategies can patch solvable errors, register-transfer level(RTL) coding capacity remains strictly bounded by pretraining knowledge. Addressing challenges in the current LLM based hardware generation pipeline requires more studies in model reasoning rather than alignment interventions.

09.
arXiv (CS.CV) 2026-06-16

A Multi-Center Benchmark for Abdominal Disease Diagnosis and Report Generation from Non-Contrast CT

Multiphasic contrast-enhanced CT (CECT) is widely used for abdominal lesion characterization, yet it carries inherent risks of contrast-induced nephropathy, escalates acquisition burden, and heavily contributes to radiologist workload. To address these challenges, we introduce a novel multi-center benchmark for multi-organ abdominal disease diagnosis and automated radiology report generation, which learns to synthesize contrast-enhanced findings from single-phase non-contrast CT (NCCT). To support this, we curated a large-scale dataset of paired NCCT-CECT studies and their corresponding contrast-enhanced radiology reports from two centers, partitioned into internal sets and an external validation cohort. Under a unified evaluation protocol, we benchmarked five contemporary deep learning architectures encompassing chest-specific, abdomen-specific, and general-purpose multimodal domains. Extensive experiments demonstrate that NCCT retains diagnostic signals, achieving an average multi-organ AUC of 69.1% on the internal cohort and 63.1% on the external cohort, respectively. By releasing this dataset and standardized benchmark publicly, this study aims to catalyze future research into safer, resource-efficient, and globally accessible contrast-free abdominal imaging workflows. Code is available at: https://github.com/xmed-lab/TriALS-Report.

10.
arXiv (quant-ph) 2026-06-19

Resolving problems with the continuum limit in coherent-state path integrals

arXiv:2602.02466v2 Announce Type: replace Abstract: The paper solves the problem of continuum limit in bosonic thermal coherent-state path integrals. For this purpose, exact discrete versions of the path integral are constructed for three different orderings of the Hamiltonian: normal, anti-normal and symmetric (Weyl order). Subsequently, their different continuum versions are checked on the harmonic oscillator, to choose the symmetric ordering as a possibly correct choice for all polynomial Hamiltonians. Spotted mathematical subtleties in the simple case serve as a clue to the general solution. Finally, a general justification for the symmetric order is provided by deriving the continuum path integral starting from the exact discrete case using a renormalization procedure in the imaginary time frequency domain. While the role of Weyl order has already been found, the paper provides the missing proof of its suitability for every polynomial Hamiltonian and simplifies the previously established construction by referring only to creation and annihilation operators (without position and momentum operators).

11.
arXiv (CS.LG) 2026-06-18

Quantifying and Auditing LLM Evaluation via Positive–Unlabeled Learning

arXiv:2606.19057v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly used as judges for scalable evaluation, yet such LLM–as–a–Judge systems exhibit systematic biases that are decoupled from semantic quality, most notably verbosity bias. Meanwhile, human supervision is costly and typically selective, yielding reliable positive judgments but leaving most outputs unlabelled and potentially mixed in quality. We formulate LLM evaluation under selective human supervision as a positive–unlabelled learning problem and propose a geometric auditing framework based on Partial Optimal Transport. By aligning a small set of human–verified positives with a reliable subset of unlabelled outputs in a fixed embedding space, our method identifies human–consistent preferences and corrects biased judges without retraining. Experiments demonstrate improved alignment with human preferences, increased robustness to presentation biases, and interpretable confidence estimates, offering a scalable and statistically grounded alternative to existing LLM–as–a–judge pipelines.

12.
arXiv (CS.LG) 2026-06-17

Overcoming the Incentive Collapse Paradox

arXiv:2603.27049v2 Announce Type: replace-cross Abstract: AI-assisted task delegation is increasingly common, yet human effort in such systems is costly and typically unobserved. Recent work by Bastani and Cachon (2025); Sambasivan et al. (2021) shows that accuracy-based payment schemes suffer from incentive collapse: as AI accuracy improves, sustaining positive human effort requires unbounded payments. We study this phenomenon in a budget-constrained principal-agent framework with strategic human agents whose output accuracy depends on unobserved effort. Our first contribution is a general impossibility result showing that incentive collapse is not merely a limitation of simple linear payments, but arises for any payment rule based only on observed task accuracy.To overcome this barrier, we propose a sentinel-auditing payment mechanism that enforces a strictly positive and controllable level of human effort at finite cost, independent of AI accuracy. Building on this incentive-robust foundation, we develop an incentive-aware active statistical inference framework that jointly optimizes (i) the auditing rate and (ii) active sampling and budget allocation across tasks of varying difficulty to minimize the final statistical loss under a single budget. Experiments demonstrate improved cost-error tradeoffs relative to standard active learning and auditing-only baselines.

13.
arXiv (CS.CV) 2026-06-16

GridVQA-X: A Framework for Evaluating Multimodal Explainability Methods

With the increasing development of Vision-Language Models, it becomes imperative that their predictions are readily explainable to relevant stakeholders. However, the field of explainability has not kept pace with the multimodal surge. While recent Multimodal Explainable AI (MxAI) methods generate explanations to attribute the interaction between different modalities, current evaluation protocols lack the ground truth required to distinguish between true cross-modal reasoning (e.g., spatial composition) and shallow cross-modal shortcuts (e.g., Bag-of-Words attribute matching). It remains unknown whether MxAI methods faithfully capture synergistic interactions or merely hallucinate reasoning on models acting as simple feature detectors. In this paper, we introduce GridVQA-X, the first diagnostic framework specifically designed to evaluate cross-modal explainability. Unlike natural datasets, GridVQA-X leverages a closed-world synthesis logic to generate unique, mathematically guaranteed explanations. We utilize this controlled environment to train paired ground-truth models on identical architectures: $M_{pure}$, which learns robust spatial-relational reasoning and $M_{spur}$, which is structurally forced to rely on cross-modal shortcuts. This behavioral divergence creates a rigorous testbed: a faithful explainer must report distinct reasoning pathways for each model. Our findings reveal that widely used methods fail to distinguish between models relying on genuine spatial-relational reasoning and those exploiting cross-modal shortcuts, highlighting a critical gap in capturing true cross-modal synergy and misrepresenting how multimodal models actually make decisions.

14.
arXiv (CS.AI) 2026-06-24

Page image classifier fine-tuned on century-spanning archives of scanned documents for further content-specific processing

arXiv:2606.07558v2 Announce Type: replace-cross Abstract: Purpose: Digitization projects in the humanities produce vast, heterogeneous archives of historical documents, making manual sorting impractical at scale. This work addresses the need for an automated system to classify scanned page images based on visual content type - text, tables, and graphics - enabling content-specific downstream processing such as Optical Character Recognition (OCR) or structured data extraction. Methods: An image classification system was developed and evaluated on a dataset of over 48,000 annotated historical page images from century-old Czech archaeological archives, refined through four successive annotation stages with domain-expert review. A Random Forest Classifier baseline was established using hand-crafted image features. Subsequently, deep learning architectures were fine-tuned and compared: Convolutional Neural Networks (EfficientNetV2, RegNetY), Vision and Document Image Transformers (ViT, DiT), and multimodal CLIP models. An 11-category label scheme was designed collaboratively with domain experts and evaluated via five-fold cross-validation. Results: The feature-based baseline achieved approximately 75% accuracy. Fine-tuned CNNs and Transformers substantially outperformed it, with RegNetY-16GF achieving 99.16% and ViT-large 99.12% Top-1 accuracy on the held-out test set. CLIP ViT-B/16 reached 99.14% with optimized text descriptions. Conclusion: Image-only models, particularly RegNetY-16GF, deliver near-perfect classification accuracy and produce consistent labels across 649,508 unlabeled archival pages with over 90% inter-model agreement. Fine-tuned CLIP, despite competitive test-set accuracy, showed under 65% agreement with image-only models on unlabeled data, making it less suitable for deployment. The final models, annotated dataset, and software are publicly available under open-source licenses.

15.
arXiv (CS.CV) 2026-06-11

Tac-DINO: Learning Vision-Tactile Features with Patch Alignment

Touch is the primary medium through which humans interact with the environment. Currently, tactile learning mainly focuses on image-level pretraining or alignment. However, tactile signals correspond to local object contact, while research into scale alignment and holographic matching remains limited and proper datasets and benchmarks also lack. To bridge this gap, we first construct a data collection system to acquire a large-scale tactile dataset, with over 20 K tactile contacts from 505 real-world objects. Building on this dataset, we design a Vis-Tac Holographic Matching Benchmark to evaluate vision-tactile local-to-global alignment ability. Then we propose Vision-Tactile Patch Alignment (VTPA) methods for vision-tactile representation learning. Experiments demonstrate that these exceed the performance of methods without alignment and align with whole-object images.

16.
arXiv (CS.LG) 2026-06-18

Learning Augmented Exact Exponential Algorithms

arXiv:2606.18807v1 Announce Type: cross Abstract: The field of learning-augmented algorithms has demonstrated that machine-learned predictions can bypass worst-case lower bounds across a wide range of problems. So far, however, the focus has been almost exclusively on polynomial-time algorithms, where predictions improve competitive ratios, approximation guarantees, or running times. In this paper, we raise the question of whether predictions can push the frontier of exact exponential-time algorithms for NP-hard problems. We answer this question affirmatively by proposing a general approach that augments an entire family of state-of-the-art exact algorithms for a variety of subset selection problems. We show that a noisy predictor that is only marginally better than random guessing suffices to provably reduce the search space, and that the resulting runtime speedup scales smoothly with the prediction quality. Importantly, our algorithms require only pairwise independence of predictions or, alternatively, do not require the knowledge of the predictor's accuracy - both strictly weaker and more realistic settings than typically assumed.

17.
PLOS Medicine 2026-05-27

Sequential chemo-immunotherapy followed by standard versus reduced thoracic radiotherapy for older and/or frail stage III non-small-cell lung cancer: A randomized open-label cohort trial

Authors:

by Wei-Xiang Qi, Shuyan Li, Mengdi Wang, Huan Li, Feifei Xu, Lei Yao, Biao Yu, Linlin Chen, Gang Cai, Cheng Xu, Xianwen Sun, Zhiyao Bao, Jiayi Chen, Yi Xiang, Shengguang Zhao Background The appropriateness of concurrent chemoradiotherapy (cCRT) for older or clinically vulnerable stage III unresectable non-small-cell lung cancer (NSCLC) patients remains contentious. Furthermore, the survival implications of de-escalating thoracic radiotherapy (RT) intensity in this population have not been conclusively elucidated. Methods and findings We conducted a phase II randomized, open-label, two-cohort (non-comparative) trial at a tertiary hospital in China (NCT05557552). Between September 30, 2022 and April 30, 2024, we enrolled 56 older and/or frail patients with stage III NSCLC who were ineligible for cCRT. The primary endpoint was the 1-year progression-free survival (PFS) rate estimated using the Kaplan–Meier method. Secondary endpoints included objective response rate (ORR), overall survival (OS), and safety. In the intention-to-treat (ITT) set, which included all 56 randomized patients who received at least one dose of study treatment, the 1-year PFS was 84.3% (95% confidence interval [CI] [70.3%, 98.3%]) in the standard RT group and 70.7% (95% CI [54.3%, 87.1%]) in the reduced RT group. In the per-protocol set (53 patients), the 1-year PFS was 82.9% (95% CI [68.9%, 98.8%]) in the standard RT group and 73.4% (95% CI [58.3%, 92.4%]), with a median follow-up of 24 months. Among 56 patients in the safety analysis set, 71.4% of patients experienced grade 3/4 adverse events (AEs) in the standard RT group and 53.6% in the reduced RT group. One patient (3.6%) in the reduced RT and three patients (10.7%) in the standardized RT experienced grade 5 AEs. The main limitations are the non-comparative design, small sample size, and lack of power to establish non-inferiority or superiority. Conclusion The current study suggested that reduced RT combined with sequential chemo-immunotherapy might be feasible for older/frail patients intolerant to cCRT, showing numerically similar survival outcomes. These exploratory findings warrant confirmation in larger, adequately powered randomized trials. Trial registration The trial had been registered on ClinicalTrials.gov on Sep 30, 2022.ClinicalTrials.gov NCT05557552

18.
medRxiv (Medicine) 2026-06-22

Referral pathways, ETAT triage acuity, and inpatient outcomes among children presenting to a national tertiary paediatric emergency unit in Ghana: a prospective cohort study

Emergency referral systems in sub-Saharan Africa are fragmented, and children reaching tertiary facilities through different referral pathways often arrive in advanced clinical states. Prospective data simultaneously characterising referral patterns, triage acuity at presentation, diagnostic case mix, and inpatient mortality at a national tertiary paediatric emergency unit are lacking from West Africa. This prospective cohort study enrolled 675 consecutively presenting children aged one month to 12 years at the Paediatric Emergency Unit of Korle Bu Teaching Hospital, Accra, Ghana, from February to December 2019. The primary outcome was all-cause inpatient mortality. Key variables collected included referral status and facility tier, Emergency Triage Assessment and Treatment (ETAT) triage category, ICD-10 diagnostic classification, Oyedeji socioeconomic classification, and time from symptom onset to PEU registration. Crude odds ratios were computed for all candidate predictors. Multivariable logistic regression was conducted using complete case analysis (n = 613). Of 675 children, 63.0% (n = 425) were referred from another health facility; referred children had higher ETAT emergency triage category rates than self-presenting children (32.7% vs 27.6%, p < 0.001). Overall inpatient mortality was 9.9% (67/675). Mortality varied by referral source: 16.7% among secondary/regional hospital referrals, 11.0% among lower-tier facility referrals (district, municipal, CHAG, polyclinic, private, health centre, and maternity home facilities combined, n = 356), 7.6% among self-presenting children, and 7.4% among tertiary referrals. Overall, 30.8% of children were classified as ETAT emergencies on arrival, with case fatility rate of 21.6%. The three most common diagnostic domains were respiratory conditions (17.2%), blood and haematological disorders (17.0%), and digestive presentations (16.4%). Inpatient mortality was highest in neoplastic disease (33.3%, n = 30) and circulatory presentations (31.0%, n = 29). In the primary multivariable analysis (n = 613, 51 events; events-per-variable ratio 4.2), no referral tier was independently associated with inpatient mortality after adjustment. Referral from secondary/regional hospitals showed a borderline non-significant association (adjusted odds ratio 3.09, 95% CI 0.96 to 9.90, p = 0.058). School going children (60-119 months) had higher odds of inpatient death than infants (adjusted odds ratio 5.56, 95% CI 1.16 to 26.53, p = 0.032), as did adolescents (adjusted odds ratio 10.01, 95% CI 2.15 to 46.69, p = 0.003). ETAT emergency category and lower socioeconomic status were not independently significant in this model. A pre-specified sensitivity analysis using the full analytic cohort (n = 674, events-per-variable ratio 6.7) with collapsed referral categories did not confirm any referral tier association; ETAT emergency category and lower SES were independently associated in the sensitivity model. All multivariable estimates should be regarded as exploratory. This prospective cohort provides simultaneous characterisation of referral patterns, ETAT triage acuity, diagnostic case mix, and inpatient mortality at a national tertiary paediatric emergency unit in West Africa. The referral-mortality gradient and high ETAT emergency category proportion document the severity of illness arriving through different referral pathways at this facility. The association between secondary/regional hospital referral and inpatient mortality is hypothesis-generating and requires replication in an adequately powered multicentre study before any service-level conclusions can be drawn.

19.
arXiv (CS.CL) 2026-06-12

ChiKhaPo: A Large-Scale Multilingual Benchmark for Evaluating Lexical Comprehension and Generation in Large Language Models

Existing benchmarks for large language models (LLMs) are largely restricted to high- or mid-resource languages, and often evaluate performance on higher-order tasks in reasoning and generation. However, plenty of evidence points to the fact that LLMs lack basic linguistic competence in the vast majority of the world's 3800+ written languages. We introduce ChiKhaPo, consisting of 8 subtasks of varying difficulty designed to evaluate the lexical comprehension and generation abilities of generative models. ChiKhaPo draws on existing lexicons, monolingual data, and bitext, and provides coverage for 2700+ languages for 2 subtasks, surpassing any existing benchmark in terms of language coverage. We further show that 6 SOTA models struggle on our benchmark, and discuss the factors contributing to performance scores, including language family, language resourcedness, task, and comprehension versus generation directions. With ChiKhaPo, we hope to enable and encourage the massively multilingual benchmarking of LLMs.

20.
arXiv (CS.AI) 2026-06-24

Render-FM: Feedforward Model for Real-time Photorealistic Volumetric Rendering

arXiv:2505.17338v3 Announce Type: replace-cross Abstract: Photorealistic volumetric rendering of CT scans greatly benefits clinical workflows, yet neural approaches such as Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) require prohibitive per-scan optimization (hours for NeRF, about 30 minutes for 3DGS), making them impractical in clinical settings. We propose Render-FM, a feedforward model that eliminates this bottleneck by directly regressing 6D Gaussian Splatting (6DGS) parameters from a CT volume in a single 2.8-second forward pass, a 500x speedup over per-scan optimization. To bridge the domain gap between natural scene reconstruction and medical volumetric rendering, we introduce Anatomy-Guided Priming (AGP), which incorporates segmentation masks and transfer functions as structural and appearance priors, information that existing Gaussian splatting methods overlook. Built on an nnU-Net-inspired 3D U-Net trained on diverse CT scans, Render-FM predicts per-voxel 6DGS parameters and supports immediate real-time rendering. Unlike per-scan methods, it generalizes to unseen anatomies, novel transfer functions, and enables compositional organ visualization with zero additional preparation time. Optional 89-second fine-tuning further improves quality, surpassing per-scan optimized baselines. Project page: https://gaozhongpai.github.io/renderfm/.

21.
arXiv (CS.AI) 2026-06-17

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

arXiv:2606.18111v1 Announce Type: cross Abstract: Fairness is an important aspect of decision-making in multi-objective reinforcement learning (MORL), where policies must ensure both optimality and equity across multiple, potentially conflicting objectives. While single-policy MORL methods can learn fair policies for fixed user preferences using welfare functions such as the generalized Gini welfare function (GGF), they fail to provide the diverse set of policies necessary for dynamic or unknown user preferences. To address this limitation, we formalize the fair optimization problem in multi-policy MORL, where the goal is to learn a set of Pareto-optimal policies that ensure fairness across all possible user preferences. Our key technical contributions are threefold: (1) We show that for concave, piecewise-linear welfare functions (e.g., GGF), fair policies remain in the convex coverage set (CCS), which is an approximated Pareto front for linear scalarization. (2) We demonstrate that non-stationary policies, augmented with accrued reward histories, and stochastic policies improve fairness by dynamically adapting to historical inequities. (3) We propose three novel algorithms, which include integrating GGF with multi-policy multi-objective Q-Learning (MOQL), state-augmented multi-policy MOQL for learning non-statoinary policies, and its novel extension for learning stochastic policies. We evaluate our algorithms across various domains and compare our methods against the state-of-the-art MORL baselines. The empirical results show that our methods learn a set of fair policies that accommodate different user preferences.

22.
arXiv (CS.LG) 2026-06-11

Hierarchical Probabilistic Conformal Prediction for Distributed Energy Resources Adoption

arXiv:2411.12193v4 Announce Type: replace-cross Abstract: The rapid growth of distributed energy resources (DERs) presents both opportunities and operational challenges for electric grid management. Accurately predicting DER adoption is critical for proactive infrastructure planning, but the inherent uncertainty and spatial disparity of DER growth complicate traditional forecasting approaches. Moreover, the hierarchical structure of distribution grids demands that predictions satisfy statistical guarantees at both the circuit and substation levels, a non-trivial requirement for reliable decision-making. In this paper, we propose a novel uncertainty quantification framework for DER adoption predictions that ensures validity across hierarchical grid structures. Leveraging a multivariate Hawkes process to model DER adoption dynamics and a tailored split conformal prediction algorithm, we introduce a new nonconformity score that preserves statistical guarantees under aggregation while maintaining prediction efficiency. We establish theoretical validity under mild conditions and demonstrate through empirical evaluation on customer-level solar panel installation data from Indianapolis, Indiana that our method consistently outperforms existing baselines in both predictive accuracy and uncertainty calibration.

23.
arXiv (CS.CV) 2026-06-15

Mirage Probes: How Vision Models Fake Visual Understanding

Vision-language models (VLMs) can answer image-based questions confidently, and often correctly, even when no image is provided. This mirage behavior inflates benchmark scores without reflecting visual grounding. Prior work treats this as a single failure mode. We argue it is two. Using Mirage Probes, a contrastive probing framework that pairs paraphrased question variants with matched mirage and non-mirage labels on the same image, we show that mirage behavior is linearly decodable from internal activations across residual stream, MLP, post-attention, and attention-head sites in two open-source VLMs. We demonstrate that a Naive Bayes text baseline cannot recover this signal, ruling out surface lexical confounds. Cross-benchmark separability patterns, together with a novel Prior Harnessing Index (PHI) measuring how much a model can answer from text alone, expose two distinct regimes: textual biases, where the model answers from language priors without engaging visual representations, and spurious images, where it constructs false visual content in latent space and answers as if grounded. The distinction has direct mitigation consequences: text-distribution cleaning can address the first regime but cannot reach the second, since spurious-image mirages live in the model's visual representations rather than its text. Faithful visual grounding will require interventions at the representational level.

24.
arXiv (CS.AI) 2026-06-16

AgenticRec: A Recommendation-Oriented Agentic Framework with Progressive Tool-Integrated Reasoning Optimization

arXiv:2603.21613v2 Announce Type: replace-cross Abstract: Recommender agents built on Large Language Models offer a promising paradigm for personalized recommendation. However, existing agents typically suffer from a misalignment between their tool-integrated reasoning trajectories and recommendation feedback, limiting their ability to distinguish fine-grained user preferences. To address these challenges, we propose AgenticRec, an agentic recommendation framework that formulates recommendation as a tool-integrated reasoning process over a recommendation-oriented tool suite. Built upon this framework, we further develop a dedicated two-stage training paradigm tailored for recommender agents. In the first stage, we introduce Recommendation-Oriented Trajectory Activation, optimize the agentic recommendation ability under implicit feedback. In the second stage, Progressive Preference Refinement further refines the agent through bidirectional preference reasoning over self-bootstrapped hard pairs, progressively sharpening preference boundaries. Theoretical analysis and extensive experiments demonstrate the effectiveness of AgenticRec. Our code is available at https://anonymous.4open.science/r/AgenticRec-FB16.

25.
arXiv (CS.CL) 2026-06-24

FALCON: Transforming Cyber Threat Intelligence into Deployable IDS Rules with Self-Reflection

Signature-based Intrusion Detection Systems (IDS) detect malicious activity by matching network or host events against predefined rules. Security analysts manually develop these rules from Cyber Threat Intelligence (CTI). As threats evolve, this manual pipeline faces two bottlenecks. Before authoring a new rule, an analyst must reconcile the incoming CTI with the existing rule base and determine whether to create, update, or retire one. This process is challenging due to the representational differences between the CTI and Rule formats. This gap limits the effectiveness of keyword- and embedding-based search, making rule reconciliation cognitively demanding and, in turn, contributing to "rule bloat". Second, automated verification of a new rule is inherently difficult as zero-day threats lack ground truth from simulated testing. Hence, standard metrics cannot prove that a rule semantically adheres to the CTI, and the use of LLMs leads to non-deterministic behavior. To address these challenges, we introduce FALCON, an agentic framework for CTI-grounded rule retrieval, generation, and validation. At its core, a novel CTI-Rule semantic scorer, quantifies the functional alignment between a CTI and a rule; the same signal drives a retriever that surfaces relevant deployed rules and a ground-truth-free validator that scores generated ones. Around it, a generation pipeline produces deployable rules from CTI in real time and refines them through self-reflective syntactic, semantic, and performance validators. Across network (Snort) and host-based (YARA) platforms on a purpose-built CTI-Rule dataset, FALCON attains a mean relevance of 0.72 (approx), with 84% inter-rater agreement among cybersecurity analysts, underscoring the promise of real-time security automation.