Security Attack using Machine Learning K-means Clustering

In this article, I am going to show you step by step process on how to prevent Security attacks using Machine Learning K-Means Clustering.

What is Clustering?

Clustering is one of the most common data analysis techniques. It is used to get the structure of the data. It’s identifying and grouping similar data points in larger datasets.

What is K-Means Cluster?

K-means clustering is a simple unsupervised learning algorithm that is used to solve clustering problems. Its main purpose is to partition a set of observations into a number of clusters.

The way the K-means algorithm works is as follows:

  1. Specify the number of clusters K.
  2. Initialize centroids and then randomly selecting K data points for the centroids.
  3. Assign all data points to the closest k.
  4. After that, the positions of the k centroids are recalculated
  5. Steps 3 and 4 are repeated until the positions of the centroids no longer move.

What is a Security(DDoS) Attack?

A Distributed denial-of-service attack is a cyber-attack in which the attacker seeks to make a machine or network resource unavailable to its intended users by temporarily or indefinitely interrupting the services of an Internet-connected host.

This type of attack takes advantage of the specific capacity limits that apply to any network resources — such as the infrastructure that enables a company’s website. The DDoS attack will send multiple requests to the attacked web resource.

Step to Follow:

Create a system that will be useful for a server in terms of the following features:-

  1. This system will keep a log of the information about the client hit or request to the server for example we can get a log file of a webserver at location /var/log/httpd/

2. This log data of clients will be used for finding the unusual pattern of a client request.

Now, we have to create a program for clustering the data.

The below code will get the log and convert it into CSV format using the pandas library.

The below code will get the log file and convert it into csv format using python library pandas

The below code will read the log and get the IP and Status_code columns and drop the unnecessary columns and convert it into pandas dataframe.

In the below code we will scale the data in a fixed range.

Here is the main part we are going to create clusters of different IPs with different status codes using the K-Means algorithm that will help us to block the attacking IPs.

Creating a Scatter graph to visualize the IP and Status code in the form of clusters.

Creating a Bar graph to visualize Which IP has MAX requests coming

Github Repo for Full Code —

we have discussed a use case of the k-means clustering till now.

Another Use case of K-means Clustering is —

~ Call Record detail analysis

~ Document classification

~ Customer segmentation

~ Delivery Store Optimization

Thank you for reading this Article……😊😍



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store