Academic Intelligence · Curated Daily

Explore the Frontier of Global Academia

AcademicHub aggregates real-time literature from top journals and preprint platforms. Build your personal research radar and let large language models compile cross-disciplinary analysis briefings automatically.

01.
arXiv (CS.CL) 2026-06-15

Efficiency-Performance Trade-offs in Neural Speaker Diarization via Structured Pruning and Low-Bit Quantization

Streaming speaker diarization is crucial for time-critical medical dispatch, but deploying it on resource-constrained hardware requires smaller, faster models. Using SIMSAMU, a dataset of simulated medical-dispatch conversations, we evaluate streaming behavior before compressing the segmentation model with pruning and low-bit quantization. We characterize performance across a range of streaming latency budgets and find that additional buffering is not consistently beneficial, while very low-latency operating points can substantially degrade performance. Our study shows that model compression trades performance for memory footprint, and we highlight an operating point where FP16 reduces model size by half with essentially unchanged real-time factor, at a cost of a 40\% relative DER increase against the baseline. This work characterizes the trade-offs for real-time deployment and contributes to speech technology that can enable reliable human communication in time-critical contexts.

02.
arXiv (CS.AI) 2026-06-15

AI Receptivity or AI Adoption Breadth? A Tool-Specific Reanalysis of the Lower-Literacy/Higher-Usage Link

arXiv:2606.13734v1 Announce Type: new Abstract: Recent evidence reported by Tully, Longoni, and Appel (2025) suggests that lower artificial intelligence (AI) literacy predicts greater receptivity toward AI. We revisit this claim using the public data from Study 3 of that article, which measures past usage of five AI tool categories on a five-point frequency scale. We first reproduce the negative association between AI literacy and aggregate AI usage using OLS on participant-level averages, binary logit, ordered logit, and multinomial logit specifications. We then show that the aggregate relationship masks substantial heterogeneity by tool type. In our demographic-adjusted primary specification, AI literacy does not significantly predict text AI usage (ordered-logit $\beta$ = -0.090, p = .387), whereas it remains a strong predictor of non-text AI adoption ($\beta$ = -0.377, p < .001). The non-text effect is also robust under Tully et al.'s original Study 3 control specification ($\beta$ = -0.502, p < .001). Binary, ordered-logit, and multinomial specifications suggest that the non-text relationship is primarily an adoption/non-adoption pattern rather than evidence of intensive use: the demographic-adjusted odds ratio of ever having used a non-text AI tool is 0.68. Thus, in the study that measures self-reported past usage rather than stated preferences, the evidence does not support a simple claim that lower AI literacy predicts greater receptivity to AI in general. It points instead to a narrower pattern of broader adoption across lower-penetration, non-text AI tools.

03.
arXiv (CS.LG) 2026-06-12

Adaptive Weighted Averaging

arXiv:2606.12763v1 Announce Type: new Abstract: We study the problem of selecting the largest among $n$ unknown values $x_1,\dots,x_n$ given only a single unbiased estimate $y_i$ for each $x_i$. We design strategies that are simultaneously admissible (not uniformly dominated by any other strategy) and also never worse than a given baseline such as uniform random selection. We provide an application to stochastic optimization, where we obtain online-to-batch conversion bounds with a desirable "no-compromise" guarantee: they are never worse than standard random iterate selection, and yet can be significantly better in benign settings.

04.
arXiv (CS.CV) 2026-06-15

ZipSplat: Fewer Gaussians, Better Splats

Feed-forward 3D Gaussian Splatting methods reconstruct a scene from posed or pose-free images in a single forward pass, yet current approaches predict one Gaussian per input pixel, tying the representation budget to camera resolution rather than scene complexity. A flat wall and a richly textured object thus produce equally many Gaussians despite very different geometric needs. We propose ZipSplat, a token-based feed-forward model that decouples Gaussian placement from the pixel grid. A multi-view backbone extracts dense visual tokens, and k-means clustering compresses them into a compact set of scene tokens. Cross- and self-attention refine these tokens, and a lightweight MLP decodes each into a group of Gaussians with unconstrained 3D positions. Because clustering is applied at inference, a single trained model spans the quality-efficiency curve without retraining. ZipSplat operates without ground-truth poses or intrinsics, yet sets a new state of the art on DL3DV and RealEstate10K with ${\sim}6{\times}$ fewer Gaussians than pixel-aligned methods, surpassing the best pose-free baseline by 2.1dB and 1.2dB PSNR, respectively. It further generalizes zero-shot to Mip-NeRF360 and ScanNet++, outperforming all comparable baselines. Our project page is at https://veichta.com/zipsplat.

05.
arXiv (CS.CV) 2026-06-16

Physics-Driven Zero-Shot MRI Reconstruction with Non-local Image Priors

Zero-Shot Self-Supervised Learning (ZS-SSL) has emerged as a promising paradigm for accelerated Magnetic Resonance Imaging (MRI) reconstruction, eliminating the reliance on fully-sampled external datasets. However, learning solely from a single under-sampled scan suffers from supervision scarcity and optimization instability, often leading to overfitting or artifacts. To address these challenges, we propose a robust physics-driven ZS-SSL framework that synergizes physical consistency with image-domain non-local priors. Our method introduces three core innovations: (1) a Coil Sensitivity Map (CSM)-Guided Dynamic Repository, which stabilizes the training trajectory by filtering physically inconsistent artifacts based on coil sensitivity constraints; (2) a SPIRiT-based regularization, which enforces k-space self-consistency via a learned correlation kernel and stochastic masking; (3) a Non-Local Self-Similarity (NSS) Pixel Bank, which leverages the high-fidelity reference established by the former modules to explicitly mine non-local anatomical similarities, thereby augmenting supervision in the image domain. Extensive experiments on the FastMRI dataset demonstrate that our approach achieves state-of-the-art performance, particularly under high acceleration factors, effectively bridging the gap between zero-shot learning and supervised methods. The code is available at https://github.com/Zolento/NS-SSL.

06.
arXiv (CS.LG) 2026-06-16

SDVDiag: Multimodal Causal Discovery for Online Diagnosis in Software-defined Vehicles

arXiv:2606.15559v1 Announce Type: cross Abstract: The transition toward software-defined vehicles concentrates an increasing share of vehicle functionality into distributed software services, where failures propagate through service dependencies and the surface symptom is often several causal hops away from the underlying defect. Existing approaches to causal root-cause analysis in such systems address this only partially: they typically reason over a single observability modality and operate in an offline, operator-driven mode that does not match the demands of continuous vehicle operation. This paper presents SDVDiag, a multimodal causal-discovery pipeline that fuses log-based and metric-based service representations into a shared embedding space before graph construction, coupled with an anomaly-driven trigger that converts the diagnostic platform from a manually operated batch tool into a continuously running online system. Evaluation on an Autonomous Valet Parking testbed shows that the multimodal pipeline produces sparser causal graphs than a metrics-only baseline (134 vs. 182 edges on average) and consistently outperforms it in edge-weighted reward against an expert knowledge graph at every stage of human-feedback refinement, showing a 2.4-fold improvement over the baseline after 60 feedback queries. An end-to-end fault-injection scenario further demonstrates that the integrated trigger correctly recovers a true root cause located two causal hops upstream of the observable symptom.

07.
arXiv (math.PR) 2026-06-17

LP-Based Algorithms for Scheduling in a Quantum Switch

Authors:

arXiv:2603.27812v2 Announce Type: replace-cross Abstract: We consider scheduling in a quantum switch with stochastic entanglement generation, finite quantum memories, and decoherence. The objective is to design a scheduling algorithm with polynomial-time computational complexity that stabilizes a nontrivial fraction of the capacity region. Scheduling in such a switch corresponds to finding a matching in a graph subject to additional constraints. We propose an LP-based policy, which finds a point in the matching polytope, which is further implemented using a randomized decomposition into matchings. The main challenge is that service over an edge is feasible only when entanglement is simultaneously available at both endpoint memories, so the effective service rates depend on the steady-state availability induced by the scheduling rule. To address this, we introduce a single-node reference Markov chain and derive lower bounds on achievable service rates in terms of the steady-state nonemptiness probabilities. We then use a Lyapunov drift argument to show that, whenever the request arrival rates lie within the resulting throughput region, the proposed algorithm stabilizes the request queues. We further analyze how the achievable throughput depends on entanglement generation rates, decoherence probabilities, and buffer sizes, and show that the throughput lower bound converges exponentially fast to its infinite-buffer limit as the memory size increases. Numerical results illustrate that the guaranteed throughput fraction is substantial for parameter regimes relevant to near-term quantum networking systems.

08.
arXiv (CS.CV) 2026-06-18

Spiking Pyramid Wavelet Transformation for High-efficient and Low-energy Image Restoration

Spiking neural networks (SNNs) have garnered significant interest in computer vision due to their potential for efficiency and biological inspiration. While spiking CNN-based methods have shown promise for image restoration (IR) tasks, their performance is constrained by the inherent receptive field limitations of CNN operations. In the paper, we explore the benefits of discrete wavelet transformation and propose a spiking pyramid wavelet-based model (SPWM) for high-efficient and low-energy target. Specifically, we develop a spiking dual pyramid wavelet (SDPW) block to model long-range dependency and exploit the properties of the degradation in the wavelet domain. Experimental results on several benchmarks demonstrate that SPWM significantly lowers computational costs and energy consumption while maintaining image quality. Our method showcases the potential of SNNs in the field of IR, offering new insights for future applications of resource-limited devices.

10.
medRxiv (Medicine) 2026-06-16

Higher Population Coverage with Typhoid Conjugate Vaccine is Needed to Induce Herd Protection: Evidence from a Cluster-Randomized Trial in Urban Bangladesh

Introduction: A cluster randomized trial (CRT) in Bangladesh found that Vi-tetanus toxoid (Vi-TT) vaccine conferred 85% protection to vaccinees at 18 months of follow-up; however, it failed to confer significant herd protection to non-vaccinees. Methods: In the CRT, children aged 9 months to

11.
arXiv (CS.AI) 2026-06-17

SP-GCRL: Influence Maximization on Incomplete Social Graphs

arXiv:2605.12513v2 Announce Type: replace-cross Abstract: Influence maximization (IM) in real platforms is challenged by incomplete, noisy social graphs and non-stationary diffusion dynamics. We propose SP-GCRL, a social-propagation-aware graph contrastive reinforcement learning framework that learns end-to-end seed selection under partial observability.We first introduce a social-propagation-aware nonlinear diffusion function to model reinforcement/diminishing effects and probability drift under repeated exposure; we then construct dual structural views and perform contrastive learning to obtain node representations robust to missing edges and weak ties, while replacing expensive strategy metrics with a GAT-based regression surrogate to improve efficiency and scalability; finally, we use DDQN to learn an end-to-end seed selection policy on top of these representations. Experiments on multiple real-world networks show that SP-GCRL achieves significant gains over heuristic and learning-based baselines across budgets and topologies, while maintaining strong large-scale scalability.

12.
arXiv (CS.AI) 2026-06-12

Different Layers, Different Manifolds: Module-Wise Weight-Space Geometry in Transformer Optimization

arXiv:2606.13276v1 Announce Type: cross Abstract: Weight-space geometry plays a central role in neural network optimization, yet manifold constraints are often applied uniformly across all weight matrices. In this work, we ask whether different transformer modules prefer different manifold geometries. We study Manifold Muon for GPT-2 pretraining and compare layer-wise assignments of Stiefel and DGram constraints across attention and MLP blocks. Our results show a clear asymmetry: constraining attention layers with Stiefel geometry while assigning DGram geometry to MLP layers gives the best performance among the tested configurations, whereas the inverted assignment and all-DGram configuration become unstable under the shared hyperparameter setting. We trace this failure to singular value growth in DGram-constrained attention weights, which can amplify attention logits and induce softmax saturation. These findings suggest that symmetry-aware and geometry-aware optimization for transformers should be module-specific rather than uniform.

13.
arXiv (CS.CL) 2026-06-15

Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation

The remarkable achievements of Large Language Models (LLMs) have led to the emergence of a novel recommendation paradigm – Recommendation via LLM (RecLLM). Nevertheless, it is important to note that LLMs may contain social prejudices, and therefore, the fairness of recommendations made by RecLLM requires further investigation. To avoid the potential risks of RecLLM, it is imperative to evaluate the fairness of RecLLM with respect to various sensitive attributes on the user side. Due to the differences between the RecLLM paradigm and the traditional recommendation paradigm, it is problematic to directly use the fairness benchmark of traditional recommendation. To address the dilemma, we propose a novel benchmark called Fairness of Recommendation via LLM (FaiRLLM). This benchmark comprises carefully crafted metrics and a dataset that accounts for eight sensitive attributes1 in two recommendation scenarios: music and movies. By utilizing our FaiRLLM benchmark, we conducted an evaluation of ChatGPT and discovered that it still exhibits unfairness to some sensitive attributes when generating recommendations. Our code and dataset can be found at https://github.com/jizhi-zhang/FaiRLLM.

14.
arXiv (CS.AI) 2026-06-15

Korzhinskii-Net: Physics-Informed Neural Network for Sub-Surface Mineral Prospectivity Modelling

Authors:

arXiv:2606.13695v1 Announce Type: cross Abstract: Mineral prospectivity modelling (MPM) underpins exploration economics, yet most operational pipelines reduce to data-driven classifiers trained on shallow surface proxies. Such models are blind to the subsurface physics that actually localises ore: heat advection, fluid flow, and lithology-dependent precipitation. We present Korzhinskii-Net, a 2-D radial physics-informed neural network (PINN) that couples Darcy flow, advective-diffusive heat transport, and a softplus-saturated reaction rate into a single differentiable forward model, weakly supervised by surface and remote-sensing proxies. The network is named after Dmitri S. Korzhinskii (1899-1985), whose theory of infiltration metasomatism provides the physical scaffold. We evaluate Korzhinskii-Net on five ore provinces spanning four commodity classes – Norilsk (Ni-Cu-PGE), Pechenga (Ni-Cu sulphide), Udokan (sandstone-hosted Cu), Sukhoi Log (orogenic Au), and Mirny (kimberlitic diamond) – under a fair, leakage-controlled 5-fold cross-validation protocol with hard ring-shaped negatives. Korzhinskii-Net attains a mean PR-AUC of 0.885 versus 0.281 for the strongest classical baseline (gradient boosting), and a mean fractional rank of 0.019 versus 0.413. The improvement is consistent across all five provinces and four commodity systems, suggesting that physics-informed differentiable simulators, even when constrained only by global open-data proxies, can recover localisation patterns that pure feature-based learners systematically miss. We release the full pipeline and evaluation harness as open source.

16.
arXiv (CS.AI) 2026-06-19

Triangular Consistency as a Universal Constraint for Learning Optical Flow

arXiv:2606.19938v1 Announce Type: cross Abstract: We propose triangular consistency as a first-principled constraint for optical flow, which is agnostic to network architecture, supervision type, and dataset, and applies to both image-pair and multi-frame settings. This simple but powerful constraint is to compose two flows to induce a third flow and enforce consistency among the three. The composed flows may arise from (i) image pairs, yielding cycle consistency; (ii) multiple video frames, producing longer-range motion through temporal chaining; or (iii) image pairs combined with controlled synthetic transformations, which becomes data augmentation. This triangular consistency introduces negligible computational overhead and requires no additional annotations. Since it is derived directly from the geometry of optical flow, it does not rely on model-specific assumptions and serves as a ``universal'' plug-and-play component for optical flow training. Experiments show consistent improvement across supervised, unsupervised, and transfer learning settings.

17.
arXiv (CS.CV) 2026-06-18

CAOA – Completion-Assisted Object-CAD Alignment

Accurately aligning CAD models to their corresponding objects in indoor RGB-D scans is a central challenge in 3D semantic reconstruction. The task requires estimating a 9-Degree-of-Freedom (DoF) pose-position, rotation, and scale along three axes-but is hindered by noisy and incomplete scans, as well as segmentation errors that cause geometric distortions. We present Completion-Assisted Object-CAD Alignment (CAOA), a method that integrates a semantically and contextually aware point cloud completion module with a symmetry-aware relative pose estimation algorithm, enabling precise alignment of CAD models to scanned objects. Existing completion methods are typically trained and evaluated on synthetic datasets, which often fail to generalize to real-world scans. To bridge this gap, we introduce a synthetic data generation strategy tailored to indoor scenes, significantly reducing the synthetic-to-real domain gap-validated through quantitative comparisons with widely used completion datasets. In addition, we release S2C-Completion, an expert-annotated dataset of over 8,500 object-CAD pairs from Scan2CAD, created for real-world indoor single-object completion and intended as a new benchmark for this task. For object-CAD alignment, we incorporate symmetry information via a symmetry-aware loss, improving robustness to symmetric ambiguities. On the Scan2CAD benchmark, CAOA achieves a 17% accuracy improvement over state-of-the-art methods.

18.
bioRxiv (Bioinfo) 2026-06-16

FlowBench: separating planning, fault recovery and interpretation in agentic bioinformatics

Agentic large language model (LLM) systems are being deployed in bioinformatics faster than they are understood, and single-metric evaluations conflate capabilities that fail independently. We introduce FlowBench, a benchmark that decomposes agentic bioinformatics performance into planning, fault recovery, biological interpretation, and end-to-end output-fidelity. Existing systems achieve high plan completeness, but their closed, single-provider designs prevent attribution of performance to scaffolding versus the underlying model. We therefore built FlowAgent, a modular, provider-agnostic framework whose components can be selectively disabled and whose backbone model can be swapped across providers on a shared harness, and used it to evaluate 23 models from three main providers. Three findings emerge. First, generating a valid workflow plan from a named toolchain is largely solved, whereas inferring an appropriate toolchain from biological intent alone is uniformly difficult regardless of model tier, compressing all models into a narrow 44-57% pass-rate band. Second, ablation shows that the dependency-structured plan and a completeness-reflection step drive performance, while adding a same-context validator-driven retry makes structural quality worse. Third, fault recovery and data-grounded interpretation remain unsolved. Models frequently propose fixes that force a clean exit while leaving the underlying data invalid, and data-grounded interpretation lags internal-knowledge recall by a consistent margin. Safety does not emerge from capability, and reasoning-tier models were among the least reliable at recognising unrecoverable faults. Once planning saturates, agent architecture and refusal calibration, not model scale, are the productive frontier.

19.
arXiv (CS.AI) 2026-06-16

AQ4SViT: An Automated Quantization Framework with Search Gating Policy for Compressing Spiking Vision Transformers

arXiv:2606.15523v1 Announce Type: cross Abstract: Spiking Vision Transformers (SViTs) have emerged as alternative low-power ViT models, but their large sizes hinder their deployments on resource-constrained embedded AI systems. To address this, state-of-the-art works proposed quantization techniques to compress SViT models, but their manual, human-guided approach needs a huge design time and power/energy consumption to find the appropriate quantization setting for each given network, making this approach not scalable for quantizing multiple networks. Toward this, we propose AQ4SViT, a novel automated quantization framework for SViTs that can provide quick quantization settings with good trade-offs between accuracy and memory. To achieve this, AQ4SViT employs the following key ideas: quantization search strategy that evaluates the quantization setting candidates while considering the accuracy constraint; and search gating policy that quickly evaluates and selects promising quantization candidates by leveraging membrane potential drift as a performance proxy. In the search gating policy, AQSViT employs two search algorithm variants to provide trade-off options: Greedy search, which performs fast but may lead to local optima; and Beam search, which performs slower but has better performance in finding global optima selection due to a wider search space. Experimental results show that AQ4SViT-Greedy quickly finds the appropriate quantization settings, achieving up to 6.6x faster search time and up to 82.5% memory saving compared to the state-of-the-art; while AQ4SViT-Beam further reduces the memory footprint by up to 90% compared to the state-of-the-art, but with 4.5x longer search time; all these results are obtained while maintaining high accuracy within 1.5% from the original/non-quantized models on the ImageNet dataset. These results highlight that AQ4SViT framework offers advancements toward SViT deployments on embedded AI systems.

20.
arXiv (CS.CV) 2026-06-12

BSViT: A Burst Spiking Vision Transformer for Expressive and Efficient Visual Representation Learning

Spiking Vision Transformers (S-ViTs) offer a promising framework for energy-efficient visual learning. However, existing designs remain limited by two fundamental issues: the restricted information capacity of binary spike coding and the dense token interactions introduced by global self-attention. To address these challenges, this work proposes BSViT, a burst spiking-driven Vision Transformer featuring a Dual-Channel Burst Spiking Self-Attention (DBSSA) mechanism. DBSSA encodes queries with binary spikes and keys with burst spikes to enhance representational capacity. The value pathway adopts dual excitatory and inhibitory binary channels, enabling signed modulation and richer spike interactions. Importantly, the entire attention operation preserves addition-only computation, ensuring compatibility with energy-efficient neuromorphic hardware. To further reduce spike activity and incorporate spatial priors, a patch adjacency masking strategy is introduced to restrict attention to local neighborhoods, resulting in structure-aware sparsity and reduced computational overhead. In addition, burst spike coding is systematically integrated across the network to increase spike-level representational capacity beyond conventional binary spiking. Extensive experiments on both static and event-based vision benchmarks demonstrate that BSViT consistently outperforms existing spiking Transformers in accuracy while maintaining competitive energy efficiency.

21.
arXiv (CS.AI) 2026-06-18

DecNefSimulator: A Modular, Interpretable Framework for Decoded Neurofeedback Simulation Using Generative Models

arXiv:2511.14555v4 Announce Type: replace-cross Abstract: Decoded Neurofeedback (DecNef) is a promising non-invasive approach to brain modulation with wide-ranging applications in neuromedicine and cognitive neuroscience. However, progress in DecNef research remains constrained by subject-dependent learning variability, reliance on indirect measures to quantify progress, and the high cost and time demands of experimentation. We present DecNefSimulator, a modular and interpretable simulation framework that formalizes DecNef as a machine learning problem. Beyond providing a virtual laboratory, DecNefSimulator enables researchers to model, analyze and understand neurofeedback dynamics. Using latent variable generative models as simulated participants, DecNefSimulator allows direct observation of internal cognitive states and systematic evaluation of how different protocol designs and subject characteristics influence learning. We demonstrate how this approach can (i) reproduce empirical phenomena of DecNef learning, (ii) identify conditions under which DecNef feedback fails to induce learning, and (iii) guide the design of more robust and reliable DecNef protocols in silico before human implementation. In summary, DecNefSimulator bridges computational modeling and cognitive neuroscience, offering a principled foundation for methodological innovation, robust protocol design, and ultimately, a deeper understanding of DecNef-based brain modulation.

22.
arXiv (CS.CL) 2026-06-12

FENCE: A Financial and Multimodal Jailbreak Detection Dataset

Jailbreaking poses a significant risk to the deployment of Large Language Models (LLMs) and Vision Language Models (VLMs). VLMs are particularly vulnerable because they process both text and images, creating broader attack surfaces. However, available resources for jailbreak detection are scarce, particularly in finance. To address this gap, we present FENCE, a bilingual (Korean-English) multimodal dataset for training and evaluating jailbreak detectors in financial applications. FENCE emphasizes domain realism through finance-relevant queries paired with image-grounded threats. Experiments with commercial and open-source VLMs reveal consistent vulnerabilities, with GPT-4o showing measurable attack success rates and open-source models displaying greater exposure. A baseline detector trained on FENCE achieves 99 percent in-distribution accuracy and maintains strong performance on external benchmarks, underscoring the dataset's robustness for training reliable detection models. FENCE provides a focused resource for advancing multimodal jailbreak detection in finance and for supporting safer, more reliable AI systems in sensitive domains. Warning: This paper includes example data that may be offensive.

23.
arXiv (CS.CL) 2026-06-16

Data Augmentations for Data-Constrained Language Model Pretraining

As AI labs approach a data ceiling where compute capacity outpaces the rate of new high-quality text generation, language model pretraining is shifting toward a data-constrained, compute-abundant regime that demands productive multi-epoch training on fixed corpora. Standard autoregressive (AR) pretraining overfits severely in this setting, reaching its optimum early and then continuously deteriorating. We investigate data augmentation as a regularizer to mitigate this overfitting and enable productive training for hundreds of epochs on the same data. We introduce three orthogonal categories of augmentation for AR pretraining: token-level noise (masking, random replacement), sequence permutations (right-to-left prediction, Fill-in-the-Middle), and target offset prediction ($x_{t+i}$ for $i > 1$). Through systematic ablations, we find that individual augmentations delay overfitting and lower validation loss relative to the baseline, with random token replacement achieving the best minimum loss among individual methods. Combining augmentation categories further lowers the minimum validation loss. Our experiments demonstrate that data augmentations mitigate AR pretraining's data inefficiency and offer a promising solution to the data-constrained regime. All code and data are available at https://github.com/michaelchen-lab/data-augmentations-for-pretraining

24.
arXiv (CS.CL) 2026-06-19

A BART-based approach with hierarchical strategy for Vietnamese abstractive multi-document summarization

In this technical report, we focus on solving the challenge of Vietnamese multi-document abstractive summarization, introduced in the International Workshop on Vietnamese Language and Speech Processing (VLSP) 2022. We choose to follow the popular hierarchical approach, i.e. condensing each document followed by aggregation and summarization. We propose a novel yet simple strategy to shorten documents that is driven by the golden summary, thus ensuring high correlation between stages of the hierarchical approach. Our method achieves a ROUGE2-F1 score of 0.2468 on the VLSP's public test set, and can produce fluent and concise summaries. Additionally, we utilize external sources for extra data, which greatly enhances the quantity of data for Vietnamese multi-document summarization. The additional data is made available for the community.

25.
arXiv (CS.LG) 2026-06-16

A spectral audit framework reveals task-dependent aperiodic reliance across EEG and ECG deep learning

arXiv:2606.08583v2 Announce Type: replace Abstract: Deep learning on physiological time series is interpreted through domain-specific features – oscillatory rhythms in EEG, morphological complexes in ECG – yet these signals sit atop a broadband aperiodic 1/f-like envelope that covaries with arousal, age, and pathology. We introduce a spectral audit framework combining aperiodic/periodic decomposition, phase-preserving Fourier interventions, sham controls, and simulation validation. Aperiodic reliance was task-dependent and architecture-general: across six neural architectures, flattening drops exceeded 0.42 balanced-accuracy points for sleep-wake classification, reached 0.07-0.13 for clinical abnormality detection, and remained minimal for motor imagery. Six of seven EEG foundation models showed FDR-significant aperiodic reliance on clinical EEG; age/sex and recording-era controls reduced but did not eliminate the effect. Applying the audit to PTB-XL ECG revealed neural drops of 0.32–0.36 persisting after demographic matching, confirming this confound class extends beyond EEG. Aperiodic controls should become standard for interpretable physiological time-series deep learning.