Abstract

A K-means algorithm based on characteristics of density applied to network intrusion detection

Jing Xu, Dezhi Han, Kuan-Ching Li, Hai Jiang

K-means algorithms are a group of popular unsupervised algorithms widely used for cluster analysis. However, the results of traditional K-means clustering algorithms are greatly affected by the initial clustering center, with unstable accuracy and low speed, which makes the algorithm hard to meet the requirements for Big Data. In this paper, a modernized version of the K-means algorithm based on density to select the initial seed of clustering is proposed. Firstly, Kd-tree is used to divide the hyper-rectangle space, so those points close to each other are grouped into the same sub-tree during data pre-processing, and the generalized information is stored in the tree structure. Besides, an improved Kd-tree nearest neighbor search is used in the K-means algorithm to prune the search space and optimize the operation for speedup. The clustering results show that the clusters are stable and accurate when the numbers of clusters and iterations are constant. Experimental results in the network intrusion detection case show that the improved version of the K-means algorithms performs better in terms of detection rate and false rate.