Academic Intelligence · Curated Daily

探索全球前沿学术脉络

AcademicHub 汇聚顶级期刊与预印本平台的实时文献。定制您的专属科研雷达,利用大语言模型自动生成交叉领域文献分析简报。

01.
arXiv (math.PR) 2026-06-24

Critical Erd{\H o}s-Rényi digraph: all eigenvectors away from zero are delocalized

arXiv:2606.24887v1 Announce Type: new Abstract: We consider the adjacency matrix of the directed Erd{\H o}s-Rényi graph. As long as the expected degree is larger than the logarithm of the number of vertices, the graph is connected, we show that all eigenvectors are completely delocalized. Below this critical scale, we prove eigenvector delocalization if the corresponding eigenvalue is away from zero. This contrasts the undirected or Hermitian setting, where large eigenvalues have localized eigenvectors [arXiv:2005.14180]. Our results also hold for sparse random matrices with independent entries, which can be viewed as weighted Erd{\H o}s-Rényi digraphs.

02.
arXiv (CS.CL) 2026-06-11

Redesign Mixture-of-Experts Routers with Manifold Power Iteration

Router is the cornerstone component to the Mixture-of-Experts models. Serving as expert proxies, the rows of the router matrix compute their similarity to the MoE inputs to determine which subset of experts is activated. Ideally, each router row is designed to encode the expert matrix into this representative vector, such that its dot-product with token can better reflect token-expert affinity. However, there exists no design principles to enforce this condensation. In this paper, we propose to align each router row with the principal singular direction of the associated expert, as this direction provides the most expressive mathematical description of a matrix. Based on this principle, we propose a router redesign with Manifold Power Iteration (MPI). Specifically, it introduces a "Power-then-Retract" paradigm, where a power iteration step is performed on the router weights, followed by a retraction to impose a norm constraint to ensure both efficiency and stability. Theoretically, we show that MPI drives router rows to converge toward the principal singular directions of associated experts. Empirically, we pretrain MoE model across scales from 1B to 11B parameters to confirm that this alignment facilitates more effective MoE models.

03.
medRxiv (Medicine) 2026-06-18

Looked but didn't see: inattentional blindness and yes-bias confabulation in vision-language models

Previous work showed that many participants fail to notice a gorilla in a video of people playing basketball. Another study found that 83% of trained radiologists failed to report a gorilla figure inserted into a chest CT nodule-search task, even though eye-tracking revealed that most observers had foveated the figure. We ask whether a similar phenomenon exists in contemporary vision-language models (VLMs). We find that (i) VLMs are capable of spotting the gorilla in both still-frame images and videos of lung CT scans; (ii) models display inattentional blindness, which varies according to model generation and type of stimulus presented; (iii) Gemini-3.1-Pro outperforms most other flagship and open-weight VLMs at identifying the presence or absence of the gorilla. We additionally ran a segmentation experiment utilizing two different model classes: a generalist (SAM 3), which found the gorilla but produced little to no results for anatomy-based prompts; a medical specialist (BiomedParse), which produced more promising anatomy-based results but flagged "gorilla" on gorilla-free control videos on 82% of frames. The behavioral signature of inattentional blindness reproduces in VLMs, but a unique confabulation failure mode means that any "did the model see X" claim requires signal-detection analysis with a matched-control false-alarm baseline.

04.
arXiv (CS.LG) 2026-06-16

Information Gap and Feasibility-Aware Inference in Binomial Logistic Mixtures

arXiv:2606.15665v1 Announce Type: cross Abstract: This paper studies the information gap between mixture detection and label recovery in binomial logistic mixtures. Standard likelihood-based criteria such as the Bayesian information criterion (BIC) can detect the presence of two components, but this does not guarantee that the corresponding labels are recoverable. We show that this gap is intrinsic to binomial logistic mixtures with a fixed number of trials: observed-data evidence for mixture structure and per-observation information for label recovery have different local orders in the component separation, and only the former accumulates with the sample size. As a result, there exists a detectable-but-unrecoverable regime in which BIC selects two components while the posterior labels remain essentially uninformative. To address this issue, we propose two feasibility-aware inference procedures: a recoverability-aware BIC with a posterior-entropy penalty and an entropy-regularized estimator that mitigates the tendency of the maximum likelihood estimator to produce overly separated components and overly concentrated posterior responsibilities. Numerical experiments confirm the predicted gap and demonstrate that the proposed methods avoid misleading component selections and improve the calibration of posterior label probabilities.

05.
arXiv (CS.AI) 2026-06-25

Safe Learning Control with Optimality and Stability Guarantees

arXiv:2501.15373v2 Announce Type: replace-cross Abstract: Merely pursuing performance may adversely affect safety, while a conservative policy for safe exploration will degrade the performance. How to guarantee both safety and performance in learning-based control problems is an interesting yet challenging issue. This paper aims to enhance system performance with a safety guarantee by solving reinforcement learning (RL)-based optimal control problems for nonlinear systems subject to high-relative-degree state constraints and unknown time-varying disturbance/actuator faults. A new type of control barrier functions (CBFs), termed high-order reciprocal-based control barrier function, is proposed to handle high-relative-degree constraints, which extends the design of CBFs to enforce robust safety without knowing the disturbance bound. The concept of gradient similarity is proposed to quantify the relationship between safety and performance. Finally, gradient manipulation and adaptive mechanisms are introduced in the model-based safe RL framework to enhance the performance with a safety guarantee. Two simulation examples illustrate the efficacy of the proposed algorithms.

06.
arXiv (CS.LG) 2026-06-12

The Urysohn Machine: A Metric-Topological Model of Computation

作者:

arXiv:2508.14143v2 Announce Type: replace Abstract: We introduce the Urysohn Machine, an effective model of classification-oriented computation in which metric separation, frontier structure, and contraction are explicit parts of the computational state. Its basic object is a Urysohn Triple: a support region, a target partition, and a separating classifier stored in a reusable Metric Library. The topological foundation is a constructive Urysohn Realization theorem for finite simplicial settings. It builds separators from dyadic ladders of nested polyhedral regions and equips their frontiers with a chain-level calculus: frontiers are cycles, and shells between levels have boundaries given by differences of frontiers. This construction yields two related complexity measures: decision-boundary width, the geometric measure of a single classifier's boundary, and Urysohn width, the total frontier mass represented by a library or realization. We prove an Amortized Separation Theorem showing that approximating a boundary of width to accuracy requires a number of simple basis triples proportional to boundary width and inversely proportional to resolution, under explicit boundary-footprint assumptions. We also introduce a contrastive separation operator whose graph-cut functional consistently estimates decision-boundary width from sampled metric data, while its Laplacian spectrum certifies class-component structure and conductance. Finally, we analyze the dynamic Urysohn ladder and prove four guarantees: separability under quotient collapse, stability of committed frontiers, bounded capacity under contraction, and scalability with quotient distance. Together, these results give a metric-topological account of classification complexity, amortized inference, and compositional reuse that preserves classical computability while exposing geometric structure hidden by purely symbolic descriptions.

07.
medRxiv (Medicine) 2026-06-11

Large-scale proteomics and timing of hypertensive disorders of pregnancy

Background: Hypertensive disorders of pregnancy (HDP) may first be diagnosed antepartum, during labor, or postpartum. We utilized untargeted large-scale proteomics to identify pathways associated with HDP based on timing of onset. Methods: We performed a nested case-control study comparing differential protein expression, from the SomaScan 7K platform, based on timing of onset of HDP versus controls (referent) using first-trimester samples from the NuMoM2b-Heart Health Study, a multi-site cohort that followed nulliparous individuals from the first trimester. Associations of proteins with timing of onset of HDP, adjusted for co-variates, were assessed using logistic regression q value-based false discovery rates and pathway enrichment and differential expression analysis were conducted. Results: Of 1628 individuals included, 678 had HDP, of which 67% manifested antepartum (AP), 29% intrapartum (IP), and 3% postpartum (PP). After adjusting for co-variates, compared to controls, 698 proteins, 39 proteins, and 144 proteins were differentially expressed in those with HDP according to AP, IP, PP onset, respectively. There was little overlap in individual protein expression based on timing of HDP. Pathway enrichment and graphical summary analyses suggested distinct processes. Specifically, there was downregulation of angiogenic proteins in AP HDP, downregulation of immune-related proteins in IP HDP, and upregulation of complement activation promoting fibrotic changes leading to cardiac dysfunction in PP HDP. Conclusion: There are differences in first-trimester protein expression based on whether HDP first manifests AP, IP or PP. This raises the possibility that there may be distinct mechanistic phenotypes that could uniquely inform diagnostic and therapeutic targets for HDP.

08.
arXiv (CS.LG) 2026-06-12

Self-Guidance: Enhancing Neural Codecs via Decoder Manifold Alignment

arXiv:2606.12940v1 Announce Type: cross Abstract: Neural speech codecs based on Vector-Quantized VAEs (VQ-VAEs) are core audio tokenizers for speech LLMs, yet their reconstruction fidelity is bottlenecked by quantization error. Modifying the quantizer or increasing model capacity are common fixes, but they complicate downstream language modeling. Our core idea is to align the decoder's internal feature manifolds when processing both the quantized tokens and their original continuous embeddings, using a lightweight feature-mapping loss. This requires minimal training overhead and no inference-time changes. Applied to XCodec2, self-guidance improves all reconstruction metrics, achieving state-of-the-art low-bitrate performance. Notably, it enables a 4x codebook reduction without fidelity loss, which downstream TTS experiments show significantly improves LLM-based synthesis by simplifying the token modeling space. Multiple statistical observations and visualizations corroborate the enhanced internal manifold alignment in the decoder. Extensive experiments confirm its generality across various inductive biases. Self-guidance thus establishes an efficient, broadly applicable method for high-fidelity neural audio coding.

09.
arXiv (CS.CV) 2026-06-16

Leptomeningeal Collateral Detection on DSA via Vessel-Graph Neural Networks

Leptomeningeal collaterals (LMCs) are an important prognostic factor in acute ischemic stroke. Existing automated methods rely on CT angiography (CTA), but individual LMCs are often too small to be resolved on CTA, limiting these methods to coarse collateral scoring. Digital subtraction angiography (DSA) visualizes individual collaterals at superior resolution, yet current assessment remains subjective, relying on manual grading scales that suffer from poor inter-rater agreement. We present a framework that formulates collateral detection as the classification of individual vessel segments on a graph derived from DSA. A hybrid graph-pixel architecture combines a topology-aware graph branch with a dense pixel branch, fused in a shared node-probability space. In a five-fold cross-validation setting, the fused model achieves a PR-AUC of 0.434, outperforming the graph-only (0.403) and pixel-only (0.362) baselines. To our knowledge, this is the first method to enable the individualization of LMCs in DSA, allowing for precise per-vessel quantitative assessment. This integration shifts DSA assessment toward objective evaluation, supporting future biomarker and pattern discovery for individual LMCs.

11.
arXiv (CS.LG) 2026-06-16

Generative Molecular Design with Steerable and Granular Synthesizability Control

arXiv:2505.08774v2 Announce Type: replace-cross Abstract: Designing molecules that are both property-optimal and readily synthesizable is a central challenge in drug discovery. Existing works that do consider synthesizability can jointly output predicted synthesis routes for generated molecules. However, there has been minimal attention in addressing the ease of synthesis and with flexibility to incorporate desired reaction constraints. On the other hand, virtual screening searches for commercially available compounds, but imposes challenges when scaling to ultra-large (billion-size and beyond) chemical spaces. Here, we propose a generative design framework that unifies synthesis-constrained molecular design and ultra-large-scale virtual screening through steerable and granular synthesizability control. Generated molecules satisfy arbitrary multi-parameter optimization objectives with predicted synthesis routes satisfying mix-and-match constraints: including or avoiding certain reactions, incorporating specific building blocks, and minimizing synthesis route length. In an end-to-end in-house campaign targeting BRD4, we designed molecules synthesizable with specific selected reactions and building blocks, synthesized all six selected compounds, and identified two micromolar binders. We further demonstrate that reaction control enables efficient navigation of ultra-large make-on-demand chemical spaces to identify property-optimal candidates. By applying our framework to Chemspace's Freedom 4.0 make-on-demand space (142 billion molecules), we generated ~320k molecules (0.00023% of the library) on a single consumer-grade GPU (with only 8 GB GPU memory) and identified a micromolar Wee1 binder amongst 60 synthesized candidates. The single unified framework thus enables generating novel synthesizable molecules and retrieving catalogue-ready candidates, offering a flexible solution to mitigating the synthesizability bottleneck.

12.
arXiv (CS.LG) 2026-06-17

Credibility-Weighted Pricing of Autonomous Vehicle Liability Under Operational Design Domain Shift

作者:

arXiv:2606.17451v1 Announce Type: new Abstract: Automated Driving System deployments create a foundational ratemaking challenge: sparse experience, shifting operational design domains, and non-stationary risk across software releases. We propose a hierarchical Bayesian credibility framework pooling across cities, software versions, and territories via a learned ODD-similarity kernel, nesting Buhlmann-Straub as a limiting case. Demonstrated on 648 verified-engaged Waymo crashes across four U.S. metros from the NHTSA Standing General Order database against 116 million matched miles, city-aggregate credibility weights are moderate (0.12-0.46), partial pooling decisively outperforms no pooling, and a power analysis shows the learned kernel's advantage becomes detectable at approximately twelve deployed cities.

13.
arXiv (CS.LG) 2026-06-19

Influence-Guided Concolic Testing of Transformer Robustness

arXiv:2509.23806v2 Announce Type: replace-cross Abstract: Concolic testing for neural networks alternates concrete execution with constraint solving to search for inputs that flip model decisions. We present a concolic tester for Transformer classifiers that uses SHAP estimates to rank pending path predicates by their impact on the current prediction. To support self-attention with multiple heads in execution backed by SMT solving, we implement attention semantics in pure Python that are compatible with the solver and make the softmax boundary explicit by concretizing exponentiation arguments. We evaluate our method on CIFAR-10 across three compact Transformer classifiers, ResNet18, and VGG16 under a one-pixel budget and a 900s horizon. Across the 500 model–input pairs in this matched comparison, our method achieves 60% success, compared with 15% for a differential evolution baseline that treats the model as a black box. In the primary two-layer Transformer branch-ordering study, SHAP-based predicate prioritization raises success from 56% to 60% and reduces median attack time by 51%. These results show that influence-guided path exploration can make concolic testing a practical way to find adversarial examples in Transformer models.

14.
arXiv (quant-ph) 2026-06-25

Imposing Constraints on Driver Hamiltonians and Mixing Operators: From Theory to Practical Implementation

arXiv:2407.01975v3 Announce Type: replace Abstract: Driver Hamiltonians and Mixing Operators that satisfy constraints is an important part of ansatz construction for many quantum algorithms. In this manuscript, we give general algebraic expressions for finding Hamiltonian terms and analogously unitary primitives, that satisfy constraint embeddings and use these to give complexity characterizations of the related problems. We prove that knowing if operators exist that enforce classical constraints is NP-Complete in the general case, but give algorithmic procedures with worse-case polynomial runtime to find any operators with a constant locality bound; a useful result since many constraints imposed admit local operators to enforce them in practice. We then give algorithmic procedures to turn these algebraic primitives into Hamiltonian drivers and unitary mixers that can be used for Constrained Quantum Annealing (CQA) and Quantum Alternating Operator Ansatz (QAOA) constructions by tackling practical problems related to finding an appropriate set of reduced generators and defining corresponding drivers and mixers accordingly. We consider a new QAOA approach based on the maximally disjoint subset as well as higher order constraint satisfaction terms for 1-in-3 SAT, which dramatically outperform the X-mixer.

15.
arXiv (CS.CV) 2026-06-11

OSCS-SupCon: Orthogonal Sigmoid-based Common and Style Supervised Contrastive Learning for Robust Feature Disentanglement

Supervised Contrastive Learning (SupCon) has achieved strong performance by explicitly modeling pairwise relationships among samples. However, existing SupCon-based methods suffer from two key limitations: negative-sample dilution induced by the standard InfoNCE loss, and feature-space entanglement caused by the lack of explicit constraints separating category-relevant (common) and category-irrelevant (style) features. These limitations reduce feature discriminability and generalization ability. To address these issues, we propose OSCS-SupCon (Orthogonal Sigmoid-based Common and Style Supervised Contrastive Learning), a unified framework that combines a sigmoid-based pairwise contrastive objective with explicit orthogonality constraints. Specifically, we introduce a sigmoid-based contrastive loss with two learnable parameters, temperature and bias, which adaptively modulate pairwise decision boundaries and alleviate negative-sample dilution. Furthermore, we enforce orthogonality between common and style feature subspaces via a linear projection with ReLU nonlinearity, thereby reducing feature overlap and improving disentanglement of style-irrelevant representations. Extensive experiments on six benchmark datasets demonstrate that OSCS-SupCon consistently outperforms state-of-the-art supervised contrastive learning methods across multiple backbone architectures. In particular, on the fine-grained CUB200-2011 dataset with a ResNet-18 backbone, the proposed method achieves a 3.4% improvement in classification accuracy over CS-SupCon, highlighting its robustness and generalization capability. Ablation studies further confirm the effectiveness of each component.

16.
arXiv (CS.LG) 2026-06-16

Multi-User Dueling Bandits: A Fair Approach using Nash Social Welfare

arXiv:2605.01961v2 Announce Type: replace Abstract: Learning from human preference data is becoming a useful tool, from fine-tuning large language models to training reinforcement learning agents. However, in most scenarios, the model is trained on the average preference of all human evaluators, which, under large variations of preferences, can be unfair to minority groups. In this work, we consider fairness in dueling bandits, a standard framework for online learning from preference data. We assume that each user has a (potentially distinct) Condorcet winner, which is an arm preferred to every other arm. Using these user-specific Condorcet winners as reference points, we evaluate and score arms according to their performance relative to the corresponding winner. To promote fairness across heterogeneous users, we adopt the well-established Nash Social Welfare objective, which maximizes the product of user utilities, thereby inherently penalizing inequality and preventing the marginalization of any single user. Within this framework, we construct a hard instance to establish a regret lower bound of $\Omega(T^{2/3}\min(K,D)^\frac{1}{3})$ for a time horizon $T$, $K$ arms, and $D$ users, which, to the best of our knowledge, is the first result quantifying the cost of fairness in dueling bandits with heterogeneous preferences. We then present the Fair-Explore-Then-Commit and Fair-$\epsilon$-Greedy algorithms with a Condorcet winner identification phase. We further derive their regret upper bounds that match the lower-bound dependence on $T$ up to logarithmic factors.

17.
arXiv (CS.AI) 2026-06-25

ATMA: Length-Invariant Language Modeling via Polar Attention and Gated-Delta Compression Memory

arXiv:2606.25156v1 Announce Type: cross Abstract: Modern large language models based on softmax scaled-dot-product attention are constrained by their training sequence length: as the key-value sequence grows, softmax probability mass can dilute across a wider distribution, inducing activation shift and long-context performance collapse. Moreover, long-context language modeling faces a structural tension: a sliding-window attention core maintains a bounded local representation and low perplexity but is blind to long-range dependencies, while full-context attention preserves global recall but suffers from out-of-distribution perplexity explosion. To resolve these limitations, we introduce ATMA, a hybrid convolutional-attention architecture that integrates a novel three-channel attention mechanism. ATMA factorizes the attention mixing step into: (1) a count-blind, unit-vector direction channel, (2) a bounded magnitude channel driven by the participation ratio of effective matches over an extreme-value-corrected null sink, and (3) a long-term recurrent compression memory optimized via a gated-delta fast-weights rule. Neither the Polar Attention core nor the recurrent memory is sufficient alone; their combination enables monotonic perplexity reduction and high-fidelity long-range retrieval simultaneously. We evaluate ATMA using a 100-run factorial ablation sweep, demonstrating that the combined Polar + memory model maintains induction needle-in-a-haystack retrieval accuracy above 90% out to 64K tokens (32 times the training length of 2K) while its document perplexity improves monotonically, outperforming softmax-based memory baselines which collapse at extreme context lengths. Code: https://github.com/kreasof-ai/atma

18.
arXiv (CS.LG) 2026-06-17

MiniFool – Physics-Constraint-Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks

arXiv:2511.01352v2 Announce Type: replace Abstract: In this paper, we present a new algorithm, MiniFool, that implements physics-inspired adversarial attacks for testing neural network-based classification tasks in particle and astroparticle physics. While we initially developed the algorithm for the search for astrophysical tau neutrinos with the IceCube Neutrino Observatory, we apply it to further data from other science domains, thus demonstrating its general applicability. Here, we apply the algorithm to the well-known MNIST data set and furthermore, to Open Data data from the CMS experiment at the Large Hadron Collider. The algorithm is based on minimizing a cost function that combines a $\chi^2$ based test-statistic with the deviation from the desired target score. The test statistic quantifies the probability of the perturbations applied to the data based on the experimental uncertainties. For our studied use cases, we find that the likelihood of a flipped classification differs for both the initially correctly and incorrectly classified events. When testing changes of the classifications as a function of an attack parameter that scales the experimental uncertainties, the robustness of the network decision can be quantified. Furthermore, this allows testing the robustness of the classification of unlabeled experimental data.

19.
arXiv (quant-ph) 2026-06-19

Local controllability of heralded quantum linear optics

arXiv:2606.19470v1 Announce Type: new Abstract: Photonic linear optical networks provide a versatile platform for quantum information processing and quantum state engineering. However, the set of states that can be generated using passive linear optics alone is fundamentally constrained by bosonic symmetries. Heralding, based on conditional measurements on auxiliary modes, is a widely used technique to overcome these limitations and effectively enlarge the set of accessible states. Despite the widespread use of heralding, it is often unclear how specific ancillary resources impact the overall reachability of the target space. In this work, we investigate the local controllability of photonic states in linear optical networks by analyzing the rank of the Jacobian of the output state with respect to the underlying unitary circuit, which provides a quantitative measure of the dimension of the accessible tangent space at a given configuration. Our analysis ranges from passive linear optics to heralded linear optics, where auxiliary resources and conditional measurements are included. Within this framework, we quantify how different resources enlarge the locally accessible state space beyond that of passive linear optics and determine the resources required for the Jacobian rank to reach its maximal value, thereby achieving full local controllability. As maximal local rank is a necessary condition for global reachability, our framework offers a systematic tool to assess and compare the accessible state space of measurement-based photonic architectures, and to establish practical criteria for the resources needed in high-dimensional quantum state engineering.

20.
arXiv (CS.AI) 2026-06-24

Fix Initial Programs and Iteratively Refine Repair Instructions Toward Non-Elimination Multi-Turn Program Correction

arXiv:2604.23989v2 Announce Type: replace-cross Abstract: Recent work on large language models (LLMs) has emphasized the importance of scaling inference compute. From this perspective, the state-of-the-art method Scattered Forest Search (SFS) has been proposed, employing Monte Carlo Tree Search with carefully crafted initial seeds and textual optimization for multi-turn program correction. However, its complexity makes it unclear what factors contribute to improvements in inference performance. To address this problem, we analyze SFS and propose a simpler method, \textsc{Iterative Refinement of Repair Instructions} (IRRI), which fixes initial programs and iteratively refines repair instructions. Because of the simplicity of IRRI, we theoretically establish the non-elimination of IRRI using Oracle-Guided Inductive Synthesis (OGIS). Experiments on several program generation benchmarks suggest that IRRI achieves inference performance comparable to state-of-the-art methods. These results indicate that, even without complex search structures, refining initial programs with high-quality repair instructions alone can effectively improve inference performance.

21.
Nature Medicine 2026-06-11

Clinical Profile and Genomic Characterization of the 2026 Bundibugyo Virus Index Case in Uganda

Bundibugyo virus disease (BVD) remains a high-consequence threat in Eastern and Central Africa, where cross-border mobility, nonspecific early symptoms, and delayed recognition can obscure transmission. In this case report, we describe Uganda’s 2026 BVD index case: a male patient who traveled from the Democratic Republic of the Congo to Uganda and was admitted to a private hospital in Kampala on 11 May 2026 after more than two weeks of vomiting and diarrhea, with epigastric pain, weakness, and hiccups. He deteriorated rapidly, developing acute kidney injury, pulmonary edema, hepatic dysfunction, hypoxemia, delirium, atrial flutter, possible disseminated intravascular coagulation, and multiorgan failure, and died on 14 May. A posthumous EDTA whole-blood specimen tested at the Central Emergency Response and Surveillance Laboratory was positive for orthoebolavirus RNA and confirmed as Bundibugyo virus (BDBV) by RT-qPCR. Sequencing achieved 99% genome coverage at ≥100× depth. The 2026 BDBV genome formed a distinct lineage approximately equidistant from the 2007–2008 Butalya and 2012 Isiro variants, differing by 216–227 nucleotides (~1.2% sequence divergence). Here, we demonstrate the value of fatality surveillance, private-sector surveillance, diagnostic optimization through national specimen referral, and rapid molecular-genomic diagnostics for early detection, transmission chain interruption, and public health response coordination.

22.
arXiv (CS.AI) 2026-06-16

PAL-Bench: Evidence-Grounded Profile Reconstruction from Longitudinal Personal Albums

arXiv:2606.16175v1 Announce Type: new Abstract: Longitudinal personal albums are weak-schema multimodal databases: noisy perceptual records whose key facts require joins across faces, text, timestamps, locations, and repeated events. Existing visual, video, document, and lifelog benchmarks test sub-problems, but not album-scale profile reconstruction with social identity binding and evidence citation. Benchmarking this task is difficult because the ground truth needed for evaluation–owner profiles, social graphs, face-name maps, and evidence provenance–is private state that real albums cannot safely release. We introduce PAL-Bench, a controlled benchmark for evidence-grounded reconstruction under a public-record contract. Its Evidence Compiler builds latent private worlds, programs target-level evidence paths, renders album pixels, re-measures them through perception pipelines, and exports audited public/private views. Agents receive only perception-derived public records; targets, identifier maps, and evidence paths remain hidden. PAL-Bench contains 50 synthetic users, 36,659 public photo records, and 2,799 targets over owner facts, identities, and relations. A privacy-preserving audit with 10 participants confirms that PAL-Bench evidence structures match real private albums, though equivalent releases remain privacy-prohibitive. Across seven systems and two compute-matched diagnostics, a seven-metric protocol reveals a gap between plausible profile summarization and faithful social reconstruction: systems recover some owner facts but struggle with recurring identities and evidence citation. PAL-TRACE, a reference framework that freezes identity bindings before owner-fact mining, performs best but leaves hard identity resolution far from solved. PAL-Bench provides a testbed for perceptual entity resolution, multimodal data integration, temporal evidence aggregation, and provenance-aware structured prediction.

23.
arXiv (CS.CL) 2026-06-12

SkillChain: Closing the Loop on Skill Evolution for Image-Based E-Commerce AI Assistants

Image-based AI assistants are now deployed at production scale on e-commerce platforms, where a single uploaded image can trigger fundamentally different user intents: product search, style recommendation, visual encyclopedia, or utility tool calls, each demanding its own response format, tool invocation, and domain knowledge. Without per-intent behavioral constraints, LLM-based systems conflate these heterogeneous modes and fall short of domain quality standards, while the breadth and dynamism of the intent space render manual engineering infeasible. To address this, we present SkillChain, which closes the production feedback loop on Skill evolution, automating the lifecycle of Skills through three stages: Skill Creator for bootstrapping from task specs and trajectories, Route Optimizer for routing alignment, and Body Refiner for iterative Skill Body refinement via dual-path LLM-Judge evaluation. Deployed on a production-scale e-commerce image assistant, SkillChain substantially improves aggregate response quality, with the strongest gains on structural compliance and content quality; a one-week online A/B experiment further confirms significant gains in user engagement, content consumption, and long-term retention.

24.
bioRxiv (Bioinfo) 2026-06-17

An Integrated Framework for Transcriptomic Characterization and Lorentzian Hyperbolic Visualization of a High-Risk Topological Branch in Alzheimer's Disease

Alzheimer's disease (AD) is a highly heterogeneous brain disorder in which molecular alterations vary across brain regions, disease stages, and patient subgroups. This study introduces an integrated analytical framework for characterizing transcriptomic variation associated with a high-risk topological branch, which was identified based on Lorentz distance in postmortem Brodmann area 36 samples from the Mount Sinai Brain Bank cohort, where over 70% of samples were in Braak stages V-VI. The framework integrates weighted gene co-expression network analysis, repeated stability-based differential expression analysis, network-level gene filtering, Gene Ontology enrichment, and nested stratified cross-validation to evaluate whether topological branch-associated genes capture biologically meaningful signals and carry predictive information for high-Braak group status. The identified gene sets were functionally enriched for neuronal development, neuron projection organization, synaptic signaling, vesicle fusion, and regulated synaptic release, suggesting that the high-risk topological branch reflects biologically relevant transcriptomic programs linked to neurodegenerative progression. Nested cross-validation further showed that the selected genes achieved measurable internal predictive performance for distinguishing high-Braak samples. As a second methodological contribution, we introduced a Lorentzian hyperbolic variant of t-distributed stochastic neighbor embedding (Lorentz t-SNE) to explore latent non-Euclidean structure in transcriptomic data. This method embeds samples in hyperbolic space, providing an alternative to Euclidean embeddings for representing hierarchical or nonlinear structures. Compared with conventional Euclidean embeddings, the proposed Lorentz t-SNE revealed a more localized organization of high-Braak samples. Together, these results demonstrate the utility of the proposed analytical framework and Lorentz t-SNE for investigating heterogeneous, potentially non-Euclidean organization in AD transcriptomes.

25.
arXiv (quant-ph) 2026-06-19

Measuring Rényi entropy with an Echo Protocol

arXiv:2504.05237v3 Announce Type: replace Abstract: We present efficient and practical protocols to measure the second Rényi entropy, whose exponential is known as the purity. Our approach is based on expressing the purity in terms of transition probabilities generated by an echo-type forward-backward evolution sequence, making it applicable to quantum many-body systems. Notably, our approach does not rely on random-noise averaging, a feature that can be extended to protocols to measure out-of-time-order correlation functions, as we demonstrate. By way of example, we show that our protocols can be practically implemented in superconducting qubit-based platforms, as well as in cavity-QED trapped ultra-cold gases.