Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.LG) 2026-06-16

deFOREST: Fusing Optical and Radar satellite data for Enhanced Sensing of Tree-loss

arXiv:2510.14092v2 Announce Type: replace-cross Abstract: In this paper we develop a deforestation detection pipeline that incorporates optical and Synthetic Aperture Radar (SAR) data. A crucial component of the pipeline is the construction of anomaly maps of the optical data, which is done using the residual space of a discrete Karhunen-Lo\'{e}ve (KL) expansion. Anomalies are quantified using a concentration bound on the distribution of the residual components for the nominal state of the forest. This bound does not require prior knowledge on the distribution of the data. This is in contrast to statistical parametric methods that assume knowledge of the data distribution, an impractical assumption that is especially infeasible for high dimensional data such as ours. Once the optical anomaly maps are computed they are combined with SAR data, and the state of the forest is classified by using a Hidden Markov Model (HMM). We test our approach with Sentinel-1 (SAR) and Sentinel-2 (Optical) data on a $92\,km \times 92\,km$ region in the Amazon forest. The results show that both the hybrid optical-radar and optical only methods achieve high accuracy that is superior to the recent state-of-the-art hybrid method. Moreover, the hybrid method is significantly more robust in the case of sparse optical data that are common in highly cloudy regions.

02.
arXiv (CS.CV) 2026-06-16

FrameOracle: Learning What to See and How Much to See in Videos

Vision-language models (VLMs) advance video understanding but operate under tight computational budgets, making performance dependent on selecting a small, high-quality subset of frames. Existing frame sampling strategies, such as uniform or fixed-budget selection, fail to adapt to variations in content density or task complexity. To address this, we present FrameOracle, a lightweight, plug-and-play module that predicts both (1) which frames are most relevant to a given query and (2) how many frames are needed. FrameOracle is trained via a curriculum that progresses from weak proxy signals, such as cross-modal similarity, to stronger supervision with FrameOracle-41K, the first large-scale VideoQA dataset with validated keyframe annotations specifying minimal sufficient frames per question. Extensive experiments across five VLMs and six benchmarks show that FrameOracle reduces 16-frame inputs to an average of 10.4 frames without accuracy loss. When starting from 64-frame candidates, it reduces inputs to 13.9 frames on average while improving accuracy by 1.5%, achieving state-of-the-art efficiency-accuracy trade-offs for scalable video understanding.

03.
arXiv (math.PR) 2026-06-12

Data-driven subsampling rates for diffusion parameter estimation of SDEs

arXiv:2606.13615v1 Announce Type: new Abstract: We study the problem of diffusion parameter estimation for stochastic differential equation (SDE) models in scenarios where data and model are compatible only on specific scales that have yet to be determined. We introduce a simple and efficient method for selecting suitable rates at which given time series data should be subsampled in order to ensure that the statistical structure of the subsampled data is consistent with the behavior of the SDE model on an infinitesimal scale. Our approach is based on analyzing the statistics of the lengths of monotonically increasing or decreasing segments in the subsampled data sequence, which we refer to as monotone runs. As an analytical foundation, we prove for a large class of SDEs with additive noise that the lengths of monotone runs at an infinitesimal scale are approximately geometrically distributed with success probability $1/2$. This universal characterization is employed to derive an automated method for selecting appropriate subsampling rates for given time series data that is directly applicable in real-world scenarios and does not rely on an asymptotic framework of multiscale diffusions. The approach is demonstrated using an application from industrial mathematics concerning surrogate models for fiber lay-down curves in production processes of nonwoven textiles.

04.
arXiv (quant-ph) 2026-06-12

Achieving Heisenberg limit under noisy conditions with quantum Zeno dynamics and dynamical decoupling

arXiv:2606.13205v1 Announce Type: new Abstract: Quantum Zeno dynamics (QZD) and dynamical decoupling (DD) are useful tools that enable the effective suppression of noise in quantum systems. We consider the problem of when (i) noise can be suppressed and (ii) Heisenberg limit (HL) can be achieved in quantum metrology, and prove necessary and sufficient conditions for when QZD and DD are useful for achieving these two goals. We also show that in the Markovian regime, there are scenarios where preventing errors using QZD/DD may enable HL to be achieved where current QEC methods may not. Finally, we demonstrate that the combination of both techniques can allow individually imperfect QZD and DD strategies to saturate HL.

05.
arXiv (quant-ph) 2026-06-15

Local correlations in long-range dual-unitary kicked Hamiltonian chains

arXiv:2606.13857v1 Announce Type: new Abstract: Many-body Floquet models with exact space–time symmetry, such as the kicked Ising spin chain (KIC), provide natural examples of systems with dual-unitary dynamics. The requirement of exact space–time symmetry is, however, highly restrictive, as it permits only nearest-neighbor interactions. Based on a pair of Hadamard matrices, we construct a wide family of dual-unitary kicked spin chains with long-range interactions. We show that local two-point correlations in such models propagate along the light-cone edges \( |n| = r|t| \), where \(r\) is the interaction range, and can be derived analytically for operators with local support. This approach is illustrated using the example of a kicked Ising spin chain with next-to-next-neighbor interactions.

06.
arXiv (CS.CV) 2026-06-16

Position: The Systemic Lack of Agency in Visual Reasoning

This paper argues that a systemic lack of Agency constrains the implicit reasoning capabilities of current Vision-Language Models (VLMs). Implicit reasoning refers to the ability to autonomously discover and utilize hidden visual evidence to bridge information gaps, rather than merely relying on explicitly specified targets. This capacity underlies human visual understanding and everyday reasoning. We argue that this limitation arises from a tendency to approach visual reasoning primarily as passive semantic retrieval, rather than as active, situated reasoning that depends on autonomous visual exploration. As a result, most existing benchmarks primarily assess Passive Capacity, leaving this aspect of reasoning largely unmeasured. To address this gap, we introduce the Visual Implicit Reasoning Diagnosing Benchmark (V-IRD), which targets this missing quadrant by requiring models to derive answers strictly through autonomous visual analysis. Our results show that, despite strong retrieval abilities, prominent VLMs struggle to utilize reference objects and to attend to visual evidence that requires self-directed inquiry. Simply put, strong semantic recognition does not equate to active visual exploration, revealing a critical gap in current VLMs. More information can be found at https://haoychen.github.io/Implicit-Reasoning/

07.
arXiv (math.PR) 2026-06-15

Universality for Products of Random Matrices with i.i.d. Entries and the Fuss–Catalan Number

arXiv:2606.14450v1 Announce Type: cross Abstract: Let \((w_{ij})_{i,j\ge1}\) be a single infinite array of independent identically distributed real- or complex-valued entries of mean zero, variance \(\sigma^2\), and finite fourth moment. Set \(W_n=(w_{ij})_{1\le i,j\le n}\) and \(X_n=n^{-1/2}W_n\). For every fixed \(k\ge1\), we identify the almost sure limiting operator norm of several fixed products built from this family. Define the \(k\)-th freeness coefficient by \[ \gamma_k:=\sqrt{\frac{(k+1)^{k+1}}{k^k}}. \] Then we prove \[ \|X_n^k\|\to\sigma^k\gamma_k \qquad almost surely. \] The same limit holds for products sampled with replacement from any fixed finite pool of independent copies of \(X_n\); in particular, it holds for the product of \(k\) independent copies. Thus, the freeness coefficient captures the non-commuting characteristic between large random matrices %powers and independent or fixed-pool sampled products under the finite fourth moment assumption. The improvement of the classical Bai–Yin-type power estimate from the scale \(\sigma^k(k{+}1)\) to \(\sigma^k \sqrt{k{+}1}\) is a direct corollary of our result. The main technical challenge is to prove the upper bound using a high-moment expansion of %the upper bound is proved by a high-moment expansion of \(\E\Tr((X_n^kX_n^{*k})^m)\). The leading zero-defect trace words are tree-like and are counted by the Fuss–Catalan number \[ F_{k,m}= \frac1{km+1}\binom{(k+1)m}{m}. \] The combinatorial tool helps to devise a defect-sensitive global enumeration: if \(L=km\) and \[ r=(L+1-v)+(L-q), \] then the number of admissible word classes with defect \(r\) is at most \(F_{k,m}(Cm)^{Dr}\). This polynomial-in-\(m\) loss, with degree proportional to the defect, is summable in the logarithmic moment range.

08.
arXiv (CS.LG) 2026-06-12

Optimal Spatio-Temporal Decoupling for Bayesian Conformal Prediction

arXiv:2605.00432v2 Announce Type: replace Abstract: Online conformal prediction must balance fast adaptation to distribution shift against stable coverage: feedback-driven methods react quickly but become volatile, while strongly discounted Bayesian methods lag and inflate intervals at tight coverage. We introduce State-Adaptive Bayesian Conformal Prediction (SA-BCP), which forms the predictive quantile as a gated convex combination of long-term temporal inertia and local spatial evidence from a kernel density estimate, controlled by a single interpretable evidence threshold $K$. We establish three results: (i) asymptotic marginal validity of the resulting intervals; (ii) a closed-form expression for the MSE-optimal threshold, $K^*_{\mathrm{MSE}}=\alpha(1-\alpha)/M^{\mathcal{T}}$, trading the coverage-indicator (Bernoulli) variance against the temporal structural bias $M^{\mathcal{T}}$; and (iii) a rolling-origin procedure for selecting $K$ online – consistent under stationarity, with $O(\sqrt{T\log N})$ regret against the best fixed $K$ and, for a segmented variant, a sublinear dynamic-regret bound under bounded drift. Across four financial-volatility and weather datasets, three target coverage levels, and eight baselines (including the strongest recent conditional-quantile methods, SPCI and KOWCPI), SA-BCP attains at-or-above-nominal coverage in most settings while producing substantially sharper intervals – up to roughly $3\times$ lower Winkler score than discounted Bayesian CP at the tightest coverage – and a coverage-matched audit confirms these efficiency gains are not an artifact of under-coverage. We disclose one principal limitation: a volatility-specialized conformal-GARCH competitor remains more efficient on its home volatility-base series, though it does not transfer across domains.

09.
arXiv (quant-ph) 2026-06-11

Diffusive Relaxation of Participation Entropy in U(1)-symmetric Dynamics

arXiv:2606.11561v1 Announce Type: new Abstract: Participation entropy (PE) quantifies the spread of a many-body wavefunction across configuration space. While PE relaxes rapidly in generic chaotic systems, we show that $\mathrm{U}(1)$ conservation laws slow it down by imprinting with the slow hydrodynamic modes. Using a cluster expansion around equilibrium, we show that, after local density inhomogeneities decay, the leading PE deficit is dominated by squared connected density correlations. The long time relaxation is therefore controlled by diffusive correlation spreading, giving $\Delta S(t)\sim t^{-1/2}$ in the hydrodynamic regime and crossing over to $\sim \exp[-O(t/L^2)]$ when $t\geq L^2$. We confirm this entropy correlation relation using exact computation and infinite system tensor network simulations in various quantum $\mathrm{U}(1)$ conserving circuits. Our results establish PE as a sensitive probe of hydrodynamic memory and suggest that slow relaxation is a generic consequence of conservation laws.

10.
medRxiv (Medicine) 2026-06-22

Generative Artificial Intelligence in Psychotherapy Practice: A Global Online Survey of Mental Health Professionals' Adoption

Background: Generative artificial intelligence (GenAI) tools, including large language model (LLM)-based platforms such as ChatGPT, Google Gemini, and Microsoft Copilot, are being adopted across healthcare settings with increasing speed. Despite the increasing popularity of GenAI, empirical data on the extent and nature of adoption by mental health clinicians in routine psychotherapy practice globally remain scarce. Objective: This study aimed to characterize current use patterns of GenAI tools among a global sample of practicing mental health professionals, including prevalence of use, specific tools employed, clinical and administrative purposes served, perceived effect on workload, and the institutional context shaping adoption (e.g., encouragement, prohibition, and training). Methods: We administered a cross-sectional online survey to a global convenience sample of licensed mental health professionals who provide psychotherapy as part of the scope of their practice (i.e., psychotherapists, psychologists, counsellors, nurses, and psychiatrists). Participants were recruited via professional networks, purposely avoiding the use of social media platforms. Within the survey, we captured GenAI use behaviors in psychotherapy contexts, and demographic and professional background data. Descriptive statistics were analyzed for all variables. Multivariate logistic regression was used to examine demographic and professional predictors of GenAI use. Results: A total of 766 mental health professionals who provide psychotherapy from 30 countries completed the survey. Of these, 54.6% (n=418) reported having purposely used at least one GenAI tool in psychotherapy clinical practice. ChatGPT was the most frequently used tool (354/418, 84.7%). The most commonly reported clinical purpose was assisting with treatment planning (175/418, 41.9%), followed by managing administrative tasks (173/418, 41.4%) and generating psychoeducational materials for clients (166/418, 39.7%). 82.8% of AI users reported that these tools reduced their overall work burden. Only 18.1% (139/766) of respondents reported institutional encouragement to use AI tools, while 81.1% (621/766) reported not having received any professional training on AI use. Predictors of AI adoption included younger age and rural practice setting. Conclusions: In this global convenience sample survey, GenAI use among mental health professionals in psychotherapy settings is widespread, concentrated in a wide variety of clinical and administrative tasks. Formal training and institutional guidance substantially lag behind current adoption patterns. These findings highlight an urgent need for evidence-based competency frameworks, regulatory clarity, and professional education to support safe and ethically informed integration of AI into clinical mental health practice.

11.
medRxiv (Medicine) 2026-06-11

Plasma protein prioritisation in rheumatoid arthritis reveals druggable targets and shared biology with cardiovascular diseases

Abstract Background Rheumatoid arthritis (RA) is an autoimmune inflammatory disease with complex and incompletely understood molecular mechanisms. Understanding circulating proteins associated with RA may improve understanding of disease biology and clarify its pathological links with cardiometabolic comorbidities. Methods A proteome-wide two-sample Mendelian randomisation (MR) drug target analysis was conducted using plasma proteins measured in 54,219 participants from the UK Biobank Pharma Proteomics Project as exposures and RA and cardiometabolic diseases as the outcomes. Summary statistics for RA included 53,663 cases and 1,070,200 controls. Colocalisation analysis was performed to confirm shared single causal variants and prioritise RA proteins supported by both MR and colocalisation. The prioritised proteins were then evaluated in the Accelerating Medicines Partnership RA Phase II synovial single-cell dataset for cell-type expression patterns. Druggability was then assessed followed by analysis of genetic overlap between RA-associated proteins and cardiometabolic diseases. Results 37 plasma proteins had a causal effect on RA risk, supported by combined evidence from MR and conditional colocalisation. In synovial tissue, TPPP3, RARRES2, AKAP12, and GGT5 were predominantly expressed in stromal and endothelial cell clusters. Druggability assessment identified IFNGR2, IL6R, CD40, and FCGR2B as Tier 1 targets. However, several biologically relevant proteins, including RARRES2, AKAP12, TPPP3, and SNX2, had limited available druggability data. Genetic overlap analysis demonstrated shared protein signals between RA and cardiovascular diseases, including overlap of RARRES2 and TPPP3 with coronary artery disease (CAD) and FCGR2B with atrial fibrillation (AF). To approximate the therapeutic effect of target inhibition, the direction of effect estimates for proteins showing overlap between RA-CAD and RA-AF was reversed. Conclusion This study identified circulating proteins involved in RA pathogenesis and reveals shared mechanisms between RA and cardiovascular diseases. While some proteins showed clear translational potential targets, several prioritised proteins had limited available druggability information and could not be confidently classified. Addressing these gaps may help identify new targets relevant to RA management. Future work should also use phenome-wide MR studies to evaluate potential on-target adverse effects of protein inhibition across RA-CAD and RA-AF.

12.
arXiv (CS.AI) 2026-06-24

A Simplex Witness Certificate and Escape Force for Constant Collapse in Variational Autoencoders

arXiv:2605.18224v4 Announce Type: replace-cross Abstract: We study exact constant collapse in variational autoencoders: the deterministic encoder mean becomes independent of the input. The prior remains the standard Gaussian. Before VAE training, we select a fixed teacher posterior from a GMM-based view of the data and attach a fixed latent-only simplex witness to the encoder mean. This construction yields two linked objects. The first is a certificate: if the witness prediction improves on the best constant predictor of the teacher, the encoder mean cannot be input-independent constant. The second is a local escape direction: on the collapsed manifold, the teacher residual gives a sample-dependent descent direction for the alignment loss. For any full-support teacher posterior, the same geometry also gives a closed-form latent code with zero teacher-witness alignment error. Its scaled versions trace a margin-energy path from the constant predictor to the exact teacher code, which quantifies non-collapse inside the protected witness subspace. We instantiate the method on MNIST, CIFAR-10, and CIFAR-100. With searched unsupervised PCA-GMM teachers, vanilla VAEs fail the teacher-witness certificate in all five seeds on CIFAR-10 and CIFAR-100, while RST variants pass in all five seeds. Under collapse-stress settings with \(\beta_{\mathrm{KL}}\in\{2,4,8\}\), vanilla VAE again fails in all seeds, whereas RST-alpha-prefit remains certificate-positive. Escape trajectories on both natural-image datasets increase the witness margin from a low-margin initialization and exhibit nonzero teacher-induced gradient norms. The analysis is confined to exact constant collapse of the encoder mean; generation quality, decoder use, and other collapse modes remain separate questions.

13.
arXiv (CS.AI) 2026-06-16

Intelligence Is Not the Bottleneck: Validating an LLM First-Pass Manuscript Score Against Peer-Review Outcomes

arXiv:2606.15887v1 Announce Type: cross Abstract: Large language model (LLM) systems are increasingly proposed to assist peer review, yet most evaluations judge the prose of machine-generated review text, not the validity of the numeric score a system assigns. We validate AIPR, which reads a submitted manuscript and emits five 0-100 quality dimensions and a weighted overall score, against the public decision outcomes of a major machine learning venue. AIPR grades by prompting alone, with no fine-tuning on reviews or decisions. Across 300 ICLR submissions with public decision tiers and reviewer ratings, graded under a frozen pipeline with hypotheses pre-registered before any score met any outcome, the overall score separates rejected from accepted submissions (AUROC 0.82, 95% CI 0.78-0.87), rises monotonically across tiers, and tracks the mean reviewer rating. The signal is strongest where we claim it: the lowest-scoring fifth is rejected far above the base rate, with oral papers absent. The validity comes mostly from the model: a one-paragraph prompt on the same model discriminates almost as well as the full pipeline (the small gap favours the pipeline but does not meet the pre-declared criterion, p = 0.09). What the engineering adds is reliability and a grounded review: AIPR's score barely moves across repeated runs (0.7 vs. 2.8 points within-paper SD) where the bare prompt swings, and the same pass returns a rubric-structured, evidence-grounded review rather than a bare number, with the human keeping the decision.

14.
arXiv (CS.LG) 2026-06-15

Conformal calibration and look-elsewhere effect in anomaly detection for new-physics searches

arXiv:2606.13780v1 Announce Type: cross Abstract: Machine-learned anomaly detection is reshaping searches for new physics, but it has outrun the statistics used to interpret it. A raw anomaly score has no calibrated meaning, a model that scans many regions inflates the look-elsewhere effect, and the asymptotic significances the field relies on are blind to the background mismodelling that anomaly detectors are especially prone to. We propose a calibration layer, built on conformal prediction, that turns any anomaly score into a defensible significance with distribution-free, finite-sample guarantees. Conformal prediction converts scores into valid local p-values, weighted and Mondrian variants repair the sideband-to-signal-region exchangeability failures that resonant searches suffer, and a Gross-Vitells step carries the result through to a look-elsewhere-aware global significance. The layer does two things at once. It exposes miscalibration that the standard pipeline cannot see, and it corrects it without retraining the detector. On public LHC Olympics data, a classifier develops a substructure-mass correlation that makes sideband-calibrated background p-values anti-conservative. Taken at face value, this manufactures a $\sim 46\sigma$ excess from background sculpting alone, which the label-free weighted correction removes, restoring an honest null. When run as a blind wide-mass bump hunt, the standard asymptotic and unweighted procedures fabricate $\gtrsim10\sigma$ excesses and $\approx5\sigma$ excesses even in signal-free windows, while the conformal layer raises no false alarms and its global false-positive rate is verified on background-only pseudoexperiments. The result is an auditable, detector-agnostic path from an uncalibrated score to a trials-factor-aware significance, ready to be folded into experimental anomaly searches.

15.
arXiv (CS.AI) 2026-06-12

From AGI to ASI

arXiv:2606.12683v1 Announce Type: new Abstract: Over the last decade, building human-level artificial general intelligence has moved from far-fetched speculation to being a concrete next-decade target for many of the largest AI organisations. Achieving this goal would have profound and far-reaching impacts on human society, which raises many complex questions for the decade ahead. This report investigates how AI itself might continue to develop in a post-AGI world along the continuum of machine intelligence. The endpoint of this continuum, Universal AI, is theoretically well understood, which provides some formal grounding for the main focus of this report: the transition from human-level AGI to artificial general superintelligence, which, intuitively, can be understood as a system that is more intelligent and cognitively capable than large organisations of humans. After characterizing ASI, the report discusses four potential pathways from AGI to ASI: scaling AGI, AI paradigm shifts, recursive improvement, and ASI emerging from large-scale multi-agent collectives. The report then discusses possible frictions and bottlenecks along these pathways. Determining whether the impact of these frictions will be negligible or substantial raises a number of concrete open research questions. Due to large uncertainties for predicting ASI progress, it cannot be ruled out that AI progress might continue to accelerate over the next years. This could imply that the image of a single transformative step change, caused by the introduction of human-level AGI into our society, could be inaccurate. More apt might be the prospect of a series of transformative societal changes caused by AI-enabled progress and breakthroughs across many areas of science and technology. Preparing for this prospect requires a massively interdisciplinary endeavour of global scope and interest.

16.
medRxiv (Medicine) 2026-06-23

Association between the hemoglobin albumin lymphocyte and platelet score and chronic kidney disease: insights from patient data and animal models

Introduction The hemoglobin, albumin, lymphocytes and platelets (HALP) score, a novel nutritional and inflammatory biomarker, has been used in various chronic disease studies. However, the relationship between the HALP score and chronic kidney disease (CKD) remains poorly elucidated. This study aimed to explore the possible association between the HALP score and CKD. Methods Our analysis encompassed 25,160 adult participants drawn from NHANES cycles spanning 2009 through 2018. Weighted multivariable logistic regression and generalized additive models (GAMs) were employed to evaluate the independent associations between the HALP score and CKD, albuminuria, and low-estimated glomerular filtration rate (eGFR). Threshold effects were examined using two-piecewise linear regression. Subgroup and sensitivity analyses were performed to assess robustness. Receiver operating characteristic (ROC) curve analyses were applied to compare the discriminative capacity of the HALP score with the prognostic nutritional index (PNI), systemic immune-inflammation index (SII), lymphocyte-to-monocyte ratio (LMR), and platelet-to-lymphocyte ratio (PLR). The clinical findings were further validated in a 5/6 nephrectomy rat model. Results After adjustment for multiple confounders, higher HALP scores were inversely associated with the risk of CKD (OR = 0.97, 95% CI: 0.94-0.99) and albuminuria (OR = 0.97, 95% CI: 0.93-0.99). However, after full adjustment for demographic characteristics, physical examination indices and laboratory parameters (Model 3), the correlation between the HALP score and low-eGFR was no longer statistically significant. Non-linear analyses revealed a threshold effect, with CKD risk declining as the HALP score increased up to an inflection point of 52.43 (OR = 0.97, 95% CI: 0.95-0.99), beyond which no further protective effect was observed. A similar threshold effect was identified for albuminuria. Subgroup and interaction analyses indicated no meaningful effect modification by age, sex, BMI, hypertension, or diabetes. Sensitivity analyses confirmed the robustness of the results. ROC analysis demonstrated that the HALP score showed superior discriminative ability for CKD and albuminuria compared with PNI, SII, LMR, and PLR. In the animal experiment, CKD model rats exhibited significantly lower HALP scores than controls. Inverse correlations were observed between the HALP score and serum creatinine (Scr), blood urea nitrogen (BUN), and urinary albumin-to-creatinine ratio (UACR), with UACR showing the strongest correlation, which was consistent with the clinical findings. Conclusion Lower HALP scores are independently associated with increased prevalence of CKD and albuminuria. As an affordable and readily measurable biomarker, the HALP score may facilitate CKD risk assessment.

17.
arXiv (CS.CL) 2026-06-24

Automatic Part-of-Speech Tagging of Arabic-English Dictionary Senses through WordNet

This paper proposed an algorithm for part-of-speech (POS) tagging senses of a bilingual dictionary. The algorithm is applied on the Al-Mawrid Arabic-English dictionary. The tagging task is accomplished by transferring the POS tags of the English translation equivalences (TEs) to the dictionary senses after dis-ambiguities process. The English POS tags of senses are acquired from the Princeton WordNet. POS tagging of bilingual dictionary senses is prerequisite to link a bilingual dictionary to WordNet and/or standardizing that dictionary into WordNet-LMF format where the synset (set of synonyms), not word, is the basic brick. The registered accuracy is high though the cost is little. Building NLP/HLT tools needs linguistic experts, large investments, and long time. For statistical approach, we need large annotated corpora and for rule-based approach, we need large lexicon that contains rich linguistic and world knowledge. That motivates the appearance of what are called resource-light approaches to develop natural language processing (NLP) tools for poor-resource languages.

18.
arXiv (quant-ph) 2026-06-15

Note on the local calculation of decoherence of quantum superposition in the static black holes

arXiv:2606.14178v1 Announce Type: cross Abstract: We investigate the decoherence of a quantum spatial superposition of a static particle in Schwarzschild and Reissner-Nordstr\"{o}m black holes. By treating the particle as a localized classical source coupled to a quantum scalar field, we reformulate the decoherence process in the Danielson-Satishchandran-Wald (DSW) gedankenexperiment through coherent state generation and derive the local expression for the decoherence functional in terms of the Wightman function. In the long-time limit, the decoherence rate is shown to be characterized by the low-frequency behavior of the Wightman function. We then employ the asymptotic matching method to calculate the analytical expressions of the Wightman functions in the Boulware, Unruh, and Hartle-Hawking vacua. We show that the decoherence behavior depends on the quantum state of the environmental field. While the Boulware vacuum gives vanishing decoherence for a static superposition, the thermal effects associated with Hawking radiation in the Unruh and Hartle-Hawking vacua can induce nonvanishing decoherence.

19.
arXiv (CS.AI) 2026-06-18

MIDS: Detecting Stealthy Masquerade and Tampering Attacks on CAN Bus via Bidirectional Mamba

arXiv:2606.18599v1 Announce Type: cross Abstract: The Controller Area Network (CAN) protocol is the primary communication standard for Electronic Control Units (ECUs) in modern vehicles, but its lack of encryption and authentication exposes it to a range of security threats. Existing intrusion detection systems are largely tuned to fabrication-style attacks (DoS, fuzzing, ID spoofing realised by frame injection), in which detection signals such as per-ID inter-arrival statistics are readily available. We instead address the harder masquerade setting[b37], in which an internal adversary substitutes a legitimate frame in-situ at its original transmission slot, preserving traffic periodicity and rendering traffic-statistic defences ineffective. We propose the Mamba Intrusion Detection System (MIDS), an innovative dual-stream framework that processes CAN identifiers and payloads in parallel and reconstructs their joint temporal semantics through bidirectional selective state-space modelling. To evaluate MIDS, we collected over 100 million CAN frames from a physical Tesla Model 3 across three driving regimes and synthesised 54 masquerade attack variants spanning ID-only, data-only, and combined modifications. MIDS attains an F1 of 96.94\% on this dataset, exceeding the strongest reproducible baseline by more than 8 percentage points, while sustaining a 1.147~ms single-window inference latency – ample headroom for real-time onboard deployment. To verify generalisation, we further evaluate MIDS on four public benchmarks (ROAD, CrySyS, OTIDS, CT\&T) covering both masquerade and injection scenarios; MIDS attains F1 from 93.70\% to 99.61\%, outperforming the strongest of eight reproduced baselines by up to 13.94 percentage points under a unified 5-fold protocol.

20.
arXiv (CS.AI) 2026-06-17

Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI

arXiv:2603.18104v5 Announce Type: replace Abstract: Prevailing AI training assumes reverse-mode automatic differentiation over IEEE-754 arithmetic. The memory overhead of training relative to inference, optimizer complexity, and structural degradation of geometric properties through training are consequences of this arithmetic substrate. This paper develops an alternative training architecture grounded in three prior results: the Dimensional Type System and Deterministic Memory Management framework (Haynes 2026), which establishes stack-eligible gradient allocation and exact quire accumulation as design-time verifiable properties; the Program Hypergraph (Haynes 2026), which establishes grade preservation through geometric algebra computations as a type-level invariant; and the b-posit bounded-regime design (Jonnalagadda et al. 2025), which makes posit arithmetic tractable across hardware targets conventionally considered inference-only. Their composition enables depth-independent training memory bounded to approximately twice the inference footprint, grade-preserving weight updates, and exact gradient accumulation, applicable uniformly to loss-function-optimized and spike-timing-dependent neuromorphic models. We introduce *Bayesian distillation*, a mechanism by which the latent prior structure of a general-purpose model is extracted through the ADM training regime, resolving the data-scarcity bootstrapping problem for domain-specific training. For deployment, we introduce *warm rotation*, an operational pattern in which an updated model transitions into an active inference pathway without service interruption, with correctness formalized through PHG certificates and signed version records. The result is a class of domain-specific AI systems that are smaller and more precise than general-purpose models, continuously adaptive, verifiably correct with respect to the physical structure of their domains, and initializable from existing models.

21.
arXiv (CS.AI) 2026-06-24

The Measurable Majority

arXiv:2606.23853v1 Announce Type: cross Abstract: This paper studies strict majority reasoning in finite electorates using so-called $social decision frames$: finite sets of voters equipped with distinguished families of coalitions interpreted as those voting blocs evaluated to form a strict majority. A coherence criterion for qualitative majority judgments is identified and shown to give an exact characterization for representability of strict majorities by finitely additive measures. In addition, a minimal natural logic for reasoning about strict majorities is shown to be sound and complete. These developments motivate examination of associated combinatorial questions concerning incoherence in finite families of sets; partial results and a conjecture are given. Finally, the results of this paper are applied to correct a classical representation theorem for weak qualitative probability structures due to Patrick Suppes and to establish a May-type characterization for ordinary strict majority rule for social decision frames.

22.
arXiv (CS.CV) 2026-06-15

Generation of Maximal Snake Polyominoes Using a Deep Neural Network

Maximal snake polyominoes are difficult to study numerically in large rectangles, as computing them requires the complete enumeration of all snakes for a specific rectangle size, which corresponds to a brute force algorithm. This hinders the study of maximal snakes in larger rectangles. Moreover, most enumerable snakes lie in small rectangles, obscuring large-scale patterns. In this paper, we investigate the contribution of a deep neural network to the generation of maximal snake polyominoes from a data-driven training, where the maximality and adjacency constraints are not encoded explicitly, but learned. To this extent, we experiment with a denoising diffusion model, which we referred as Structured Pixel Space Diffusion (SPS Diffusion). We find that SPS Diffusion generalizes from small rectangles to larger ones, generating valid snakes up to 28x28 squares and producing maximal snake candidates on squares close to the current computational limit. The model is, however, prone to errors such as branching, cycles, or multiple snake components. Overall, the diffusion model is promising and suggests that complex combinatorial objects can be understood by deep neural networks, which is useful in their investigation.

23.
arXiv (CS.CL) 2026-06-24

Few shot chain-of-thought driven reasoning to prompt LLMs for open ended medical question answering

In this paper, we propose a modified version of the MedQA-USMLE dataset, named MEDQA-OPEN, which contains open-ended medical questions without options to mimic clinical scenarios, along with clinician-approved reasoned answers. Additionally, we implement a prompt driven by Chain of Thought (CoT) reasoning, CLINICR, to mirror the prospective process of incremental reasoning, reaching a correct response to medical questions. We empirically demonstrate how CLINICR outperforms the state-of-the-art 5-shot CoT-based prompt (Liévin et al., 2022). We also present an approach that mirrors real-life clinical practice by first exploring multiple differential diagnoses through MCQ-CLINICR and subsequently narrowing down to a final diagnosis using MCQ-ELIMINATIVE. Finally, emphasizing the importance of response verification in medical settings, we utilize a reward model mechanism, replacing the elimination process performed by MCQ-ELIMINATIVE.

24.
arXiv (quant-ph) 2026-06-24

Ultra-Low-Rate Information Reconciliation: Repetition Coding or Dedicated Codes?

arXiv:2606.23726v1 Announce Type: new Abstract: We compare repetition-based ultra-low-rate information reconciliation with dedicated ultra-low-rate codes for CV-QKD. Repetition coding offers a favorable performance-complexity trade-off, incurring only a moderate error-rate penalty while reducing decoding complexity by $2\times$, making it attractive for implementation-constrained systems.

25.
arXiv (CS.CL) 2026-06-12

G-Long: Graph-Enhanced Memory Management for Efficient Long-Term Dialogue Agents

While Large Language Models (LLMs) have advanced open-domain dialogue systems, maintaining long-term consistency remains a challenge due to inherent limitations in long-context reasoning and the inefficiency of processing extensive raw text. Existing approaches typically rely on either unstructured memory storage, which is prone to information loss, or computationally expensive LLMs that incur high latency. To address these limitations, we propose G-Long, a graph-enhanced framework that utilizes a fine-tuned small Language Model (sLM) for structured triplet extraction and associative retrieval, significantly reducing operational costs. Furthermore, we introduce the novel attention-aware importance scoring mechanism that leverages the intrinsic cross-attention signals of a T5 summarizer to identify salient memories. Extensive experiments across diverse benchmarks demonstrate that G-Long achieves state-of-the-art performance in both response generation and memory retrieval, yielding performance gains of up to 9.8% in response quality on MSC and 40.8% in retrieval recall on LME, while significantly minimizing computational overhead.