Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
medRxiv (Medicine) 2026-06-24

Development and Validation of Machine Learning Models for Predicting Initiation of Emergency Dialysis in Advanced Chronic Kidney Disease

Background: Initiation of emergency dialysis, often requiring temporary catheter owing to unprepared definitive vascular access, is associated with infectious and vascular complications and suggests advanced chronic kidney disease (CKD) care gaps. Previous studies focused on kidney failure or dialysis timing. This study aimed to predict initiation of emergency dialysis using machine learning and baseline data. Methods: This retrospective cohort study used the Japan Medical Data Center claims data (2014-2022). Adults with an estimated glomerular filtration rate (eGFR)

02.
medRxiv (Medicine) 2026-06-24

Multifactorial tuberculosis severity score for people living with HIV based on the Rand Appropriateness Method

Background In people living with Human Immunodeficiency Virus (PLWH), tuberculosis (TB) is the leading cause of death and is often associated with substantial morbidity. Better identifying PLWH with severe forms of TB could help target early interventions to reduce mortality and severe morbidity. Existing TB severity assessment tools may be sub-optimal for assessing disease severity in PLWH, since they incompletely integrate key determinants of disease severity. We aimed to develop a consensus-based TB severity score tailored to PLWH. Methods We developed a multifactorial TB severity score (TBSS) for PLWH using a modified Delphi process with a multidisciplinary group of international TB experts as the second part of a RAND/UCLA Appropriateness Method, following a previously published systematic review. Results Eight of 15 invited experts (53%) participated in both Delphi rounds. Of 62 candidate factors, 15 reflecting TB-related characteristics, host-related characteristics as well as characteristics related to both TB and host were rated as having high appropriateness for inclusion in the final TBSS. The total score ranges from 0 (no severity) to 61 (highest severity). Conclusion This study represents a first step towards the development of a multidimensional TB severity assessment tool for PLWH. However, its clinical usefulness, feasibility, and added value compared with existing severity scores remain to be demonstrated through validation studies before routine implementation can be considered. Key words: tuberculosis, HIV, severity.

03.
medRxiv (Medicine) 2026-06-12

Room-Specialized Mixture-of-Experts for In-Home ADL Recognition with Ambient Sensors

Monitoring activities of daily living (ADLs) in the home is a promising approach for tracking dementia progression in older adults. While ambient sensor-based ADL systems are well-studied, most existing ADL recognition systems rely on globally trained models that ignore the spatial organization of in-home activities. In real deployments, where training data are sparse and highly home-specific, global transformer models may fail to capture room-dependent behavioral structure. We propose a deterministic Mixture of Experts (MoE) architecture for in-home ADL recognition, in which each expert is a compact transformer specialized to one room of the home (bedroom, kitchen, bathroom, living area). Input segments are routed using a deterministic gating strategy based on room-level motion activity and time-of-day priors for sleep-related behaviors. Unlike learned routing networks, the proposed gate encodes domain knowledge about where ADLs are likely to occur, reducing model complexity under limited per-home training data. By decomposing ADL recognition into room-specific activity spaces, the proposed architecture reduces competition between dominant and low-frequency activities under highly imbalanced residential data. We evaluated the system on data collected via low-cost ambient sensors (motion, light, temperature, humidity) and Raspberry Pi edge devices across five homes, with ground-truth ADL labels provided by participants and caregivers. Across the five homes, the proposed MoE consistently outperformed global transformer, 1D CNN, and Random Forest baselines, achieving macro-F1 scores ranging from 0.60 to 0.88, highlighting the importance of home-specific modeling in real-world deployments. These findings suggest that room-aware expert specialization may provide a practical and interpretable strategy for low-data ADL recognition in real-world residential environments.

04.
arXiv (CS.AI) 2026-06-12

WISE: A Long-Horizon Agent in Minecraft with Why-Which Reasoning

arXiv:2606.12852v1 Announce Type: new Abstract: Rapid advances have been made in developing general-purpose embodied agent in environments like Minecraft through the adoption of LLM-augmented hierarchical approaches. Despite their promise, low-level controllers often become performance bottlenecks due to repeated execution failures. We argue that a key limitation is not only the lack of episodic memory, but also the decoupling of what-where-when memory from which-why reasoning. To address this, we propose WISE (Which-Why Informed Semantic Explorer), a long-horizon agent framework with an enhanced low-level controller equipped with a Causal Event Graph that augments episodic memory with explicit causal structure linking observations to task relevance. Unlike prior work such as MrSteve, which relies on feature similarity for retrieval, WISE enables robust recall under viewpoint changes and supports opportunistic task reordering through causal reasoning. Building on this memory, we propose an Opportunistic Task Scheduler that dynamically re-prioritizes subtasks when causally relevant opportunities are detected. We further equip WISE with a multi-scale progressive exploration strategy to provide spatially comprehensive observations for downstream reasoning. Experiments show that WISE largely improves task success and efficiency on long-horizon sparse tasks, particularly in settings requiring adaptive decision-making.

05.
arXiv (CS.AI) 2026-06-16

Who Drifted: the System or the Judge? Anytime-Valid Attribution in LLM Evaluation Pipelines

作者:

arXiv:2606.15474v1 Announce Type: new Abstract: Continuous evaluation of LLM products relies on a strong LLM judge treated as ground truth: a cheap monitor scores every interaction and a team is paged when the score drifts down. But the judge is itself a model behind an API, and a silent version bump or scoring-prompt update changes how it scores – so every drift alarm is ambiguous between a worse product and a changed judge. We resolve the ambiguity with a fixed, human-labeled anchor set that the current judge re-scores at a steady interleave, a second betting e-process on the judge-versus-human gap, and a guard-window rule returning a verdict in {none, system, judge}. We prove anytime-validity, one-way identification (only the judge can move the anchors), an attribution race whose design law is that the anchors must out-run the main process they guard, and process orthogonality. On two real judge changes, a silent version bump is detected as judge drift in 60/60 runs with zero judge-to-system misattribution, and a contaminating strict-prompt change is correctly attributed on 110 of 120 runs at guard width 300 – while the industry-default rolling z-test false-alarms on 75% of drift-free streams. Every experiment replicates on a second domain (TL;DR summarization) with nothing re-tuned, and where the domains differ the differences are the ones the race predicts: the strict-prompt change shifts scores harder there, so the anchors fire faster and attribution becomes perfect (240/240). The monitor runs at approximately 0.64 of the cost of strong-judging every item, or 0.21 in a cheaper-but-deafer regime.

06.
arXiv (CS.AI) 2026-06-18

Maturing Markov Decision Processes: Decision Making under Increasing Information and Shrinking Action Sets

arXiv:2606.18820v1 Announce Type: cross Abstract: Sequential decision problems often exhibit an asymmetric evolution of information and decision flexibility: as a decision cycle unfolds, the agent receives richer information while feasible actions expire due to operational cutoffs, commitments, or resource constraints. Standard MDP formulations typically flatten this structure into stage-dependent state descriptions and action masks, thereby obscuring the nested information–action asymmetry that determines which decisions are urgent and which can be deferred. We introduce Maturing Markov Decision Processes (MMDPs), a formulation built around this information–action asymmetry. We characterize one of its key consequences through an expiring-action priority principle, which identifies the actions that must be resolved before the next stage. Motivated by this structure, we develop a structure-aware reinforcement learning framework with stage-aware policy design, expiring-action abstraction, and search-augmented learning with distillation. Experiments on a controlled multi-supplier replenishment problem, simplified cash-management environments of increasing complexity, and a production-scale simulator show that explicitly modeling this asymmetry improves learning efficiency and becomes increasingly valuable as decision problems scale.

07.
arXiv (quant-ph) 2026-06-24

Syndrome aware mitigation of logical errors

arXiv:2512.23810v2 Announce Type: replace Abstract: Broad applications of quantum computers will require error correction (EC). However, hardware roadmaps indicate that physical qubit numbers will remain limited in the foreseeable future, leading to residual logical errors that constrain the size and accuracy of achievable computations. Recent work suggested logical error mitigation (LEM), which applies known error mitigation (EM) methods to logical errors, eliminating their effect at the cost of a runtime overhead. We introduce syndrome-aware logical error mitigation (SALEM), which mitigates logical errors conditioned on the error syndromes measured during error correction. The runtime overhead of SALEM is exponentially lower than that of LEM schemes which do not make use of syndrome data, enabling substantially larger circuit volumes that can be executed accurately. Compared to the routinely used combination of error correction and syndrome rejection (post-selection), SALEM increases the size of reliably executable computations by orders of magnitude. In the practical setting where space and time overheads are fixed and error reduction methods are compared by their resulting estimation errors, we observe a surprising phenomenon: SALEM, which tightly combines EC with EM, can outperform physical EM even above the standard fault-tolerance (pseudo) threshold. Thus, SALEM can make use of EC in regimes of physical error rates where EC is commonly deemed useless.

08.
arXiv (CS.AI) 2026-06-25

Distribution Preference Optimization: A Fine-grained Perspective for LLM Unlearning

arXiv:2510.04773v2 Announce Type: replace-cross Abstract: As Large Language Models (LLMs) demonstrate remarkable capabilities learned from vast corpora, concerns regarding data privacy and safety are receiving increasing attention. LLM unlearning, which aims to remove the influence of specific data while preserving overall model utility, is becoming an important research area. One of the mainstream unlearning classes is optimization-based methods, which achieve forgetting directly through fine-tuning, exemplified by Negative Preference Optimization (NPO). However, NPO's effectiveness is limited by its inherent lack of explicit positive preference signals. Attempts to introduce such signals by constructing preferred responses often necessitate domain-specific knowledge or well-designed prompts, fundamentally restricting their generalizability. In this paper, we shift the focus to the distribution-level, directly targeting the next-token probability distribution instead of entire responses, and derive a novel unlearning algorithm termed Distribution Preference Optimization (DiPO). We show that the requisite preference distribution pairs for DiPO, which are distributions over the model's output tokens, can be constructed by selectively amplifying or suppressing the model's high-confidence output logits, thereby effectively overcoming NPO's limitations. We theoretically prove the consistency of DiPO's loss function with the desired unlearning direction. Extensive experiments demonstrate that DiPO achieves a strong trade-off between model utility and forget quality. Notably, DiPO attains the highest forget quality on the TOFU benchmark, and maintains leading scalability and sustainability in utility preservation on the MUSE benchmark.

09.
medRxiv (Medicine) 2026-06-23

Linking mpox wastewater surveillance with reported clinical cases in three countries in Sub-Saharan Africa

The emergence of the novel monkeypox virus (MPXV) clade Ib in the Democratic Republic of the Congo (DRC) and neighboring countries in late 2023 highlighted the need for rapid, scalable surveillance approaches to support outbreak detection and response. As part of the ODIN-Mpox project, wastewater surveillance (WWS) systems were established as an emergency public health measure in three Sub-Saharan African countries (DRC, Tanzania, and Burkina Faso) to evaluate the feasibility of wastewater-based monitoring for mpox and strengthen local surveillance capacity. Between January 2025 and April 2026, 117 wastewater samples were collected from selected sites and analyzed for MPXV DNA using targeted qPCR assays. Clinical mpox data were obtained from national surveillance systems and WHO reports to assess epidemiological linkages between wastewater detections and reported infections. Six wastewater samples tested positive for MPXV DNA. During the study period, DRC experienced the highest disease burden, with weekly reported cases peaking at about 3,000 in January 2025, while Tanzania reported a peak of 20 weekly cases in March 2025. No confirmed clinical cases were reported in Burkina Faso. No clear relationship was observed between reported case numbers and qPCR Ct values in positive wastewater samples. Despite the low detection frequency, the project demonstrated the operational feasibility of implementing MPXV wastewater surveillance in resource-limited settings and established laboratory capacity for environmental monitoring of emerging infectious diseases. Given the early stage of WWS implementation in the region, the study identified opportunities for further system strengthening, including optimization of sample processing and reporting workflows, improved access to laboratory supplies, and enhanced integration of environmental and clinical surveillance data streams. These findings highlight the value of WWS as a complementary component of integrated public health surveillance systems and emphasize the need for continued investment in laboratory capacity, harmonized methodologies, governance frameworks, and knowledge exchange to enhance outbreak preparedness and response in low-resource settings.

10.
arXiv (quant-ph) 2026-06-25

Nonlocal Topological Maxwell Demon Teleporting Ergotropy via Surface-Code Quantum Error Correction

arXiv:2605.14924v4 Announce Type: replace Abstract: Surface-code quantum error correction has recently achieved logical error rates below the physical threshold on superconducting processors, establishing topologically ordered states as experimentally accessible resources. Whether these resources can support thermodynamic operations beyond fault-tolerant computation remains open. We introduce a nonlocal Maxwell demon protocol that transfers ergotropy between spatially separated quantum batteries using only local operations and classical communication over a shared surface code. Alice expends ergotropy to encode a logical qubit and transmits a classical syndrome record to Bob, who decodes via minimum-weight perfect matching and conditionally charges his battery, with no direct energy exchange across the channel. Active syndrome monitoring exponentially suppresses logical errors below the topological threshold $p_th \approx 0.013$, converting physical qubits directly into recoverable ergotropy. For finite-size codes at distance $L = 7$, net extracted work changes sign at a thermodynamic critical error rate $p_c \approx 0.014 > p_th$, a physically significant finite-size effect relevant to near-term devices. Causality enforces an irreducible quadratic infrastructure cost $W_bulk \propto N^2$, strictly satisfying the second law at all separations and defining a fundamental thermodynamic horizon $N_max \approx 78$ beyond which positive net work extraction is impossible regardless of code distance or decoder quality.

11.
arXiv (CS.AI) 2026-06-15

From Sorting Algorithms to Scalable Kernels: Bayesian Optimization in High-Dimensional Permutation Spaces

arXiv:2507.13263v4 Announce Type: replace-cross Abstract: Bayesian Optimization (BO) is a powerful tool for black-box optimization, but its application to high-dimensional permutation spaces is severely limited by the challenge of defining scalable representations. The current state-of-the-art BO approach for permutation spaces relies on an exhaustive $\Omega(n^2)$ pairwise comparison, inducing a dense representation that is impractical for large-scale permutations. To break this barrier, we introduce a novel framework for generating efficient permutation representations via kernel functions derived from sorting algorithms. Within this framework, the Mallows kernel can be viewed as a special instance derived from enumeration sort. Further, we introduce the Merge Kernel , which leverages the divide-and-conquer structure of merge sort to produce a compact, $\Theta(n\log n)$ to achieve the lowest possible complexity with no information loss and effectively capture permutation structure. Our central thesis is that the Merge Kernel performs competitively with the Mallows kernel in low-dimensional settings, but significantly outperforms it in both optimization performance and computational efficiency as the dimension $n$ grows. Extensive evaluations on various permutation optimization benchmarks confirm our hypothesis, demonstrating that the Merge Kernel provides a scalable and more effective solution for Bayesian optimization in high-dimensional permutation spaces, thereby unlocking the potential for tackling previously intractable problems such as large-scale feature ordering and combinatorial neural architecture search.

12.
arXiv (CS.LG) 2026-06-12

Epistemic Uncertainty Is Not the Reducible Kind

作者:

arXiv:2606.12646v1 Announce Type: cross Abstract: The standard taxonomy of predictive uncertainty defines epistemic uncertainty as the part removable by collecting more data, while the standard measure identifies it with a mutual-information term. We prove the definition and the measure are extensionally inconsistent. On an explicit construction, the measure assigns all uncertainty to the epistemic class, yet no quantity of training data reduces it. Reducibility is instead a property of the pair (uncertainty, acquisition class), and the dichotomy resolves into three parts: aleatoric, sample-reducible epistemic, and mechanism-reducible epistemic uncertainty. An exact identity for the value of an observation shows that in-distribution data never reduces mechanism-irreducible uncertainty and generically increases it. Ensemble disagreement, the deployed epistemic estimate, tracks the training procedure rather than the epistemic term. It collapses to zero beneath a positive truth under consistent training, and equals hyperparameter-scaled initialization noise under interpolation. A finite-sample falsification test and seed-swept experiments confirm the theory.

13.
arXiv (CS.LG) 2026-06-15

Mitigating Heterogeneity-Induced Drift in Hierarchical Sign-Based Federated Learning

arXiv:2602.02355v2 Announce Type: replace-cross Abstract: Hierarchical federated learning (HFL) is well suited for large-scale wireless and Internet of Things systems, where devices communicate with nearby edge servers before reaching the cloud. In these environments, uplink bandwidth and latency impose strict communication constraints, making aggressive gradient compression essential. One-bit sign-based stochastic gradient descent methods provide an attractive solution in flat federated settings, but their behavior in hierarchical edge–cloud architectures remains insufficiently understood, especially under inter-cluster data heterogeneity. To address this gap, we develop a sign-based HFL framework in which devices transmit binary stochastic-gradient signs to edge servers, edge servers apply majority voting, and the cloud periodically aggregates edge models. Our analysis reveals that inter-cluster heterogeneity induces a persistent bias term in the convergence bound, reflecting the drift of edge models toward local objectives. This term cannot be removed by increasing the number of training rounds or by tuning standard hyperparameters alone. We therefore propose \(\mathtt{DC-HierSignSGD}\), a drift-corrected sign-based HFL algorithm in which devices apply a cloud-assisted gradient correction before taking the sign. We show that this pre-sign correction mitigates the non-vanishing heterogeneity-induced bias while preserving binary device–edge communication during the repeated local sign-update steps. Experiments under severe inter-cluster heterogeneity demonstrate that \(\mathtt{DC-HierSignSGD}\) improves the stability and accuracy of sign-based HFL and achieves performance comparable to full-precision hierarchical SGD with substantially lower device–edge communication.

14.
arXiv (CS.LG) 2026-06-15

ORCA: A Platform for Open-Source Dexterity Research

arXiv:2606.14561v1 Announce Type: cross Abstract: Robotics manipulation research increasingly focuses on two-finger parallel grippers for their effectiveness, affordability, and ease of teleoperation. Grippers are nonetheless limited by their form factor, often requiring bimanual setups even for simple reorientation tasks. Anthropomorphic hands are a more natural platform for dexterous robot learning – closer to the human hand, and capable of learning from human video – yet they remain hard to use in learning research: even where open and accessible hand hardware exists, the software for control, simulation, teleoperation, and retargeting is scattered in one-off code bases, and largely disconnected from the robot-learning ecosystem. In this work, we introduce the \orca~learning stack, an open-source research stack for dexterity as a first-class robot learning domain. Our \orca~stack unifies low-level control, simulation, teleoperation from a range of consumer platforms, and hand retargeting, behind a single interface, and integrates natively with popular robot-learning frameworks such as \lerobot, so dexterous hand researchers can leverage the same data, training, and evaluation pipelines used for non-dexterous robot learning. We demonstrate a complete end-to-end workflow, collecting expert demonstrations of an in-hand reorientation task by teleoperation with a consumer-grade VR headset, training an autonomous policy with \lerobot, and evaluating the learned policy in a fully reproducible and observable setup. We open-source the entire stack as a shared, reproducible foundation for dexterous-manipulation research.

15.
arXiv (CS.CV) 2026-06-16

FlexPooling with Simple Auxiliary Classifiers in Deep Networks

In computer vision, the basic pipeline of most convolutional neural networks consists of multiple feature extraction layers, where the input signal is downsampled to a lower resolution in each subsequent layer. This downsampling process is commonly referred to as pooling, which is an essential operation in CNNs. Pooling improves robustness against transformations, reduces the number of trainable parameters, increases the receptive field, and lowers computation time. Since pooling is a lossy process but remains important for extracting high-level information from low-level representations, it is important to preserve the most prominent information from previous activations to improve network discriminability. Standard pooling is usually performed using dense pooling methods, such as max pooling or average pooling, or through strided convolutional kernels. In this paper, we propose a simple yet effective adaptive pooling method, called FlexPooling, which generalizes average pooling by learning a weighted average over activations jointly with the rest of the network. We further show that attaching Simple Auxiliary Classifiers (SAC) to the CNN improves performance and demonstrates the effectiveness of the proposed method compared with standard pooling methods. Experiments on multiple popular image classification datasets show that FlexPooling consistently outperforms baseline networks, achieving approximately 1 to 3 percent improvement in accuracy.

16.
arXiv (CS.CV) 2026-06-16

Towards Global AI-Driven Cervical Cancer Screening

The global elimination of cervical cancer is a key public health goal set by the World Health Organization (WHO), with screening programs reducing mortality by up to 80%. However, access to experts and biopsy services is limited in low- to middle-income countries (LMICs). Deep learning (DL)-based algorithms offer promising support for screening, but most existing approaches have been developed and validated on private datasets from single countries. We present the first DL-based approach to cervical cancer screening validated on data from multiple countries. Technically, we phrase the problem of detecting and classifying lesions in colposcopy images as a multi-task learning problem, in which we simultaneously perform image-level classification and lesion segmentation. Our model was trained on a private data set of acid stain colposcopy images with manually generated lesion segmentation masks and corresponding histopathological results, employing extensive data augmentation to address image variability. In an in-distribution validation with pathology results serving as ground truth, our algorithm outperformed medical experts (Balanced Accuracy: 0.68 vs 0.64) in CIN1- (Cervical intraepithelial neoplasia grade 1 or lower) versus CIN2+ (grade 2 or higher) classification. External validation on four colposcopy data sets from four countries featuring radical differences in prevalence and patient characteristics yielded superior performance of our method compared to baseline methods. Performance variability across countries was high with AUC values ranging from 0.54 - 0.80. Overall, algorithm performance varied with age, transformation zone (cervical area most prone to lesion development), presence of comorbidities and pathognomonic signs, with comorbidities having by far the largest negative effect. Future work should focus on improving model robustness and generalizability.

17.
arXiv (quant-ph) 2026-06-16

Sub-Poissonian Statistics and Quantum Non-Gaussianity from High-Harmonic Generation

arXiv:2602.10882v4 Announce Type: replace Abstract: Quantum technologies are powered by platforms to generate complex non-classical states of matter or light to realize applications. We investigate the non-classical properties of high-harmonic generation in semiconductors, an emerging photonic platform. Measuring the click statistics of three double-digit orders, we evaluate witness operators to certify the non-classicality of the generated states. We show that higher-order harmonics driven by a coherent laser are squeezed and entangled. The properties of the emission are well retrieved with an entangled Gaussian state model, obtained by numerical state optimization to multiple observables. Additionally, we perform inter-order heralded measurements to engineer the quantum state of the emission. The heralded states have distinct properties, showing sub-Poissonian photon statistics. Further, we witness the generation of a quantum non-Gaussian state, a resource highly relevant for quantum information. With this, we establish high-harmonic generation as a platform for generating quantum optical resources.

18.
arXiv (quant-ph) 2026-06-19

GPU-accelerated semidefinite programming for causal games

arXiv:2606.20519v1 Announce Type: new Abstract: The process matrix formalism describes quantum correlations in scenarios without a fixed causal order between local laboratories. Operational signatures of such correlations can be investigated through causal games. A paradigmatic example is the Guess-Your-Neighbour's-Input game, in which two parties attempt to guess each other's inputs. Correlations compatible with any definite, or probabilistically mixed, causal order cannot achieve a winning probability exceeding $1/2$. The best process-matrix strategy currently known attains a value of approximately $0.6218$ using local dimension $d=5$, while the strongest known dimension-independent upper bound is $0.7592$. In this work, we investigate whether increasing the local dimension beyond $d = 5$ can narrow this gap. To this end, we employ a see-saw optimization scheme in which each step is formulated as a semidefinite program. For scalability, we develop a custom implementation of the SCS solver in which the dominant computational cost, the projection onto the positive-semidefinite cone, is offloaded to a GPU, yielding a six-fold speedup. Using this implementation, we explore local dimensions up to $d = 8$, and we do not find significant improvements over the value at $d=5$. Our results suggest that either qualitatively different strategies are required to approach the known upper bound, or that the bound itself is not tight.

19.
arXiv (CS.LG) 2026-06-19

PU-UNet: Stable Multiplicative Interactions for Medical Image Segmentation

arXiv:2606.20035v1 Announce Type: cross Abstract: Many dense prediction networks rely on additive feature transformations and model higher-order feature interactions only implicitly. Product units provide an explicit mechanism for multiplicative feature modeling, but their logarithmic–exponential formulation can cause numerical instability, which has limited their use in deep dense prediction networks. In this work, we propose Product-Unit U-Net (PU-UNet), a residual U-Net that integrates stable product-unit residual blocks into rich low-resolution stages for medical image segmentation. The proposed formulation combines smooth positivity mapping with log-domain clipping, enabling stable multiplicative feature learning with negligible computational overhead. On ISIC 2018, Kvasir-SEG, and BUSI, PU-UNet achieves Dice scores of 0.942, 0.959, and up to 0.925, respectively. Compared with a matched Residual U-Net baseline, PU-UNet consistently improves Dice and IoU while keeping parameters, FLOPs, and inference latency nearly unchanged, and reduces the image-level false-positive rate on normal BUSI cases from 0.077 to zero. Ablation studies suggest that the gains are associated with product-unit interactions, are strongest under low-resolution placement, and benefit from the proposed stabilization design. These results suggest that stable product-unit residual learning can be an effective way to enhance U-Net-style segmentation networks with explicit multiplicative interactions.

20.
arXiv (CS.AI) 2026-06-17

Online LLM Selection via Constrained Bandits with Time-Varying Demand

arXiv:2606.17489v1 Announce Type: cross Abstract: Large Language Models (LLMs) are increasingly deployed in edge-cloud inference systems to handle diverse user tasks with heterogeneous accuracy, latency, and cost profiles. Selecting the appropriate LLM for each incoming task is critical for ensuring service quality and efficient resource utilization. However, model heterogeneity, stochastic and unknown performance characteristics, and time-varying task demands make static selection strategies inadequate. Real-world deployments often impose hard resource budgets such as monetary expenditure limits, along with soft service-level requirements such as latency guarantees. These constraints introduce additional challenges for online decision-making. We formulate this problem as a constrained stochastic bandit learning task, where the learner sequentially selects models under both packing-type (hard) and covering-type (soft) constraints, while adapting to time-varying task demand. The learner operates without access to the underlying reward, cost, or latency distributions and must rely on partial feedback. We develop a novel online learning algorithm that leverages confidence-bound estimates and demand predictions to balance reward maximization with long-term constraint satisfaction. We provide theoretical guarantees showing sublinear regret and sublinear covering constraint violations compared to an offline benchmark with full information. Experimental results on synthetic workloads demonstrate the effectiveness and robustness of our approach in dynamic, resource-constrained environments.

21.
arXiv (CS.AI) 2026-06-25

Measurable Majorities Are Not Finitely Axiomatizable

arXiv:2606.25954v1 Announce Type: cross Abstract: This theoretical note studies the finite axiomatizability of strict majority reasoning in finite social decision frames. Moss and Pedersen (2026) introduce a coherence criterion that characterizes exactly when qualitative majority judgments are representable by a finitely additive measure. The question addressed here is whether that coherence criterion can be replaced, in the finite setting, by any bounded finite fragment. We prove that it cannot. For every $k\ge 1$, we construct a maximal standard frame whose shortest coherence violation has length exactly $2k+2$. Hence there is no uniform finite bound on the incoherence index of social decision frames, resolving Conjecture 5.7 stated by Moss and Pedersen (2026). The construction is geometric, in the sense that it proceeds via orthogonality and dimension in rational vector spaces, and self-contained: it isolates a symmetric family of half-sized voting blocs and extends it to a maximal frame in which every shorter balanced obstruction is excluded. Along the explicit infinite sequence of universe sizes obtained in the construction, this also establishes the middle-layer family predicted by Conjecture B.25 by Moss and Pedersen (2026). Together with the soundness and completeness theorem for the Moss-Pedersen minimal logic for strict majorities, this establishes that measurable social decision frames are not finitely axiomatizable in that language.

22.
arXiv (CS.AI) 2026-06-24

Decentralized Coordination of Autonomous Traffic Through Advanced Air Mobility Corridors

arXiv:2606.23832v1 Announce Type: cross Abstract: The use of dedicated corridors for Advanced Air Mobility (AAM) traffic is one of the most commonly proposed pathways to integrating them into existing airspace operations. Most prior research has focused on the design of networks of AAM corridors and conflict resolution for aircraft within corridors. It is also generally believed that while attractive from an implementation perspective, corridor-based operations may be inefficient, especially in the absence of centralized traffic management. In this paper, we show that contrary to this belief, it is possible for autonomous aircraft to learn to self-organize into corridor flows in decentralized settings. We illustrate our approach using scenarios in which fixed-wing aircraft need to safely and efficiently traverse (1) a single corridor with metering after the exit, (2) a sequence of two consecutive corridors, and (3) a corridor that splits into two. We find that in decentralized settings with only local information, the aircraft are able to conform to the corridor boundaries more than 94% of the time and reach their goal in a relatively efficient manner. Furthermore, tactical interventions to handle violations of the separation minimum are needed only infrequently in low- and medium-density settings. However, such tactical interventions become more frequently necessary only when traffic density is high.

23.
arXiv (CS.LG) 2026-06-17

HeteRo-Select: Informativeness as the Participation Driver in Heterogeneous Federated Learning

arXiv:2508.06692v2 Announce Type: replace Abstract: Federated learning systems typically allocate gradient compression by link speed. This is sensible when bandwidth and data informativeness align. However, under non-IID data, these signals often decorrelate or invert. A bandwidth-driven allocator then risks compressing the most informative gradients hardest. We propose HeteRo-Select, a framework that replaces bandwidth with a per-client informativeness score as the primary driver of compression. The score jointly governs three decisions per round: client selection, compression ratio, and server aggregation weight, with bandwidth retained only as a hard ceiling. Score-proportional selection provably reduces the effective heterogeneity of the chosen subset; score-proportional compression provably lowers aggregate top-$k$ error at fixed traffic. Under the exact FedCG simulation protocol, HeteRo-Select delivers a $1.78\times$ speedup and an $18.2\%$ reduction in traffic on CIFAR-10. The same configuration, unchanged, scales from a $7{,}850$-parameter logistic regression to an $11.27$M-parameter ResNet-18, hitting the accuracy target on three of four benchmarks. When bandwidth and informativeness are deliberately anti-correlated, the method still achieves the target accuracy with less traffic than the normal-bandwidth run.

24.
arXiv (CS.CV) 2026-06-19

HEad and neCK TumOR (HECKTOR) 2025: Benchmark of Segmentation, Diagnosis, and Prognosis in Multimodal PET/CT

Head and neck cancers (HNC) represent a significant global health burden, with accurate tumor delineation being essential for effective radiotherapy planning. The complexity of the oropharyngeal anatomy, combined with the heterogeneous appearance of tumors on imaging, makes manual segmentation time-intensive and subject to inter-observer variability. Beyond segmentation, predicting long-term clinical outcomes, such as recurrence-free survival (RFS), and determining human papillomavirus (HPV) status from noninvasive imaging, remain challenging yet clinically valuable goals. The HECKTOR 2025 challenge addresses these needs by establishing a comprehensive benchmark for automated HNC analysis using multimodal PET/CT imaging and electronic health records. Building on previous editions (2020-2022), this challenge features an expanded multi-institutional dataset comprising over 1,100 patients from 10 centers worldwide. Participants were tasked with three complementary objectives: (1) segmenting primary gross tumor volumes (GTVp) and metastatic lymph nodes (GTVn), (2) predicting recurrence-free survival, and (3) classifying HPV status. The challenge attracted 35 registered teams, with 15 final submissions evaluated on a held-out test set. Top-performing algorithms achieved a mean Dice similarity coefficient of 0.75 for segmentation, a concordance index of 0.66 for survival prediction, and a balanced accuracy of 0.56 for HPV classification. This paper presents a comprehensive analysis of the submitted methodologies, evaluates their performance across different lesion characteristics, and discusses their implications for clinical translation in automated oncology workflows and decision support systems.

25.
arXiv (CS.AI) 2026-06-11

A Survey of Reasoning and Agentic Systems in Time Series with Large Language Models

arXiv:2509.11575v3 Announce Type: replace Abstract: Time series reasoning treats time as a first-class axis and incorporates intermediate evidence directly into the answer. This survey defines the problem and organizes the literature by reasoning topology with three families: direct reasoning in one step, linear chain reasoning with explicit intermediates, and branch-structured reasoning that explores, revises, and aggregates. The topology is crossed with the main objectives of the field, including traditional time series analysis, explanation and understanding, causal inference and decision making, and time series generation, while a compact tag set spans these axes and captures decomposition and verification, ensembling, tool use, knowledge access, multimodality, agent loops, and LLM alignment regimes. Methods and systems are reviewed across domains, showing what each topology enables and where it breaks down in faithfulness or robustness, along with curated datasets, benchmarks, and resources that support study and deployment (https://github.com/blacksnail789521/Time-Series-Reasoning-Survey). Evaluation practices that keep evidence visible and temporally aligned are highlighted, and guidance is distilled on matching topology to uncertainty, grounding with observable artifacts, planning for shift and streaming, and treating cost and latency as design budgets. We emphasize that reasoning structures must balance capacity for grounding and self-correction against computational cost and reproducibility, while future progress will likely depend on benchmarks that tie reasoning quality to utility and on closed-loop testbeds that trade off cost and risk under shift-aware, streaming, and long-horizon settings. Taken together, these directions mark a shift from narrow accuracy toward reliability at scale, enabling systems that not only analyze but also understand, explain, and act on dynamic worlds with traceable evidence and credible outcomes.