Implementation of Data Mining using the Clustering Method (Case: Region of the Actors of Theft Crime by Province)

Theft is a behavior that causes harm to victims who are targeted and cause casualties. This study aims to classify areas of theft crimes based on provision by using data mining techniques. Data was obtained from the Indonesian statistical center (Badan Pusat Statistik) consisting of 34 provinces. The grouping technique used is K-Means. Clusters are divided into 3 namely: C1: areas with high crime rates of theft, C2: areas with crime rates of ordinary theft and C3: areas with low theft crime rates. Data processing is done using the help of RapidMiner software. The results of the k-means analysis obtained 17 provinces in Indonesia have the highest theft crime rate (C1), namely: Aceh, North Sumatra, West Sumatra, Riau, Jambi, South Sumatra, Lampung, DKI Jakarta, West Java, Central Java, East Java, Banten, West Nusa Tenggara, East Nusa Tenggara, South Kalimantan, South Sulawesi and Papua. The results of the study concluded that more than 50% of regions in Indonesia still had high rates of crime of theft.


Introduction
Theft is an unlawful behavior that can cause casualties.Theft often occurs in the city center or shopping centers such as markets because the crowd center can trigger the perpetrators to commit acts of theft.Actors operate individually or in groups.This is triggered by the desire to fulfill economic needs that are increasingly high.In every area theft often occurs even every year the crime rate of theft is increasing.Another factor that causes theft is increased unemployment, economic, environmental and social crisis that is not good.
Based on these problems, researchers want to analyze areas with the highest theft crime rates by province using data mining techniques.There are several settlement techniques that can be done using data mining.Data mining is a method used for processing data, in order to find hidden images of processed data.Data that is processed with data mining methods then produces a new knowledge that comes from old data, the results of processing the data, can be used as information to determine future decisions [1]- [3].Some of these data mining techniques (1) Classification, (2) Clustering, (3) Estimates and (4) Associations [4]- [7].From these cases the researchers used the k-means clustering technique to classify data on theft crime cases based on provinces in Indonesia.Some of the advantages of kmeans are that the method uses a simple principle, can be explained in nonstatistics, the time needed to run it is relatively fast and very flexible and easily adaptable.This has also been proven by several previous researchers who solved the problem using the K-Means method.One of which is [6] with the title Implementation of Data Mining on Rice Imports by the Major Country of Origin Using Algorithm Using K-Means Clustering Method.The results of the study state that k-means can be analyzed and applied to the grouping of rice imports.The result is an assessment based on rice import index with 2 high-imported clusters of countries namely Vietnam and Thailand, 4 medium-level clusters of moderate import countries namely China, India, Pakistan and other 4 low-imported clusters countries namely USA, Taiwan, Singapore and Myanmar.The results of the research can be imported from the main country of origin.Based on this, the results of the research using the k-means method in the case of grouping the regions of theft crimes by province can answer the formulation of the problem that is analyzing and testing the k-means method in cases of theft crimes based on provinces in Indonesia

K-Means Method
K-Means is a data analysis method or Data Mining method that performs the modeling process without supervision (unsupervised) and is one method of grouping data with system partitions.The purpose of the k-means method is to minimize objective functions that are set in the clustering process by minimizing variations between the data in a cluster and maximizing variations with the data in other clusters [1], [8].

Steps of the K-Means Method
Generally done with the basic algorithm as follows: a) Determine the number of clusters b) Allocate data into clusters randomly c) Calculate the centroid / average of the data in each cluster d) Allocate each data to the nearest centroid / average e) Return to Step 3, if there is still data that moves clusters or if changes in the centroid value, there is something above the specified threshold value or if the change in the objective function used is above the specified threshold value [9], [10].

Data source
The source of research data was obtained from the Indonesian statistical center (https://www.bps.go.id/) regarding data on theft crimes based on provinces in Indonesia using data from 2008, 2011 and 2018.Then from the criminal journal st site which later managed from data in each province.The data used are data in 2008, 2011, and 2018 consisting of 34 provinces.The data will be taken an average value.The following research data:

Centroid Data
Determination of the starting point of this cluster is carried out by taking the highest value in the area of the high crime criminals (C1), the average value in the area of normal theft crimes (C2) and the smallest value in the area of low theft crime (C3).Next is the centroid of the data in the first iteration:

Clustering Data
The first cluster iteration process is done by taking the closest distance from each data that is processed.From the average value of the area of the crime of theft in 2008, 2011, 2018 according to the province, grouping was found in the first iteration for the 3 clusters.The regional cluster of perpetrators of high theft crimes (C1), namely 4 provinces: South Sumatra, Lampung, West Java, East Java.Regional clusters of normal theft crimes (C2), namely 14 provinces: Aceh, North Sumatra, West Sumatra, Riau, Jambi, DKI Jakarta, Central Java, Banten, West Nusa Tenggara, East Nusa Tenggara, Central Kalimantan, South Kalimantan, South Sulawesi, Papua and the cluster of low theft (C3) crime areas, namely 16 Provinces: Bengkulu, Kep.Bangka Belitung, Kep.Riau, DI Yogyakarta, Bali, West Kalimantan, East Kalimantan, North Kalimantan, North Sulawesi, Central Sulawesi, Southeast Sulawesi, Gorontalo, West Sulawesi, Maluku, North Maluku, West Papua.Following is the Calculation of the First Center Cluster Iteration and Data Grouping The first iteration can be illustrated in the following table: Based on table 4, the process continues until the last iteration process is the same as the previous iteration.Determination of centroid values will continue to change according to the iteration.The second iteration process until the next will use the help of RapdMiner software.By using RapidMiner software, the iteration process ends in the eighth iteration where the final result of the seventh iteration is the same as the eighth iteration.The following are the last Iteration Calculations and Grouping The latest data on theft crime cases by province as shown in the following table: