Implementation of the K-Medoids Algorithm for Data Clustering of Covid 19 Cases in West Java

The Covid 19 pandemic has hit Indonesia for almost 15 months since March 2020. The virus has spread to all provinces in Indonesia. Various efforts were made to be able to reduce or prevent the spread of the coronavirus, including the implementation of the PSBB in various areas including in West Java province. In this study, the objective of this research is to cluster the data on cases of Covid 19 in West Java which are recapitulated daily based on districts/cities that occurred on May 20, 2021. For the clustering process, the K-medoids algorithm is used which determines 3 clusters based on the variables used, namely discarded close contact, suspects discarded, probable completed, probable died, totally positive, positive recovered, and positive died. For data processing, a calculation analysis was carried out using the stages in the K-medoids algorithm and the Rapidminer application with high cluster mapping of 6 districts/cities, medium clusters there were 19 districts/cities, while low clusters had 2 districts/cities. The results of the analysis are expected to provide information about the distribution and mapping of clusters in West Java province.


Introduction
2019 in December, the coronavirus outbreak began to attack residents of Wuhan province [1] through the respiratory tract. The spread of the virus is fast enough to infect all countries in the world, including Indonesia, which occurred in March 2020 [2] [3]. The spread of covid 19 has a significant impact on various sectors of the economy, education, tourism, and other sectors [4] [5] [6] which requires people to carry out social distancing and restrictions on activities carried out by a large number of people. One province that has fairly high data on the spread of Covid 19 cases is West Java province with a total of 302,335 confirmed people, 28,938 people in isolation / in care, 269,351 people who have been isolated/recovered and 4,046 people died [7]. In the study, applying the K-Medoids algorithm to cluster the dataset used in the case of Covid 19, obtained based on districts/cities in West Java, which are grouped from the concept of data mining to find the partial clustering pattern used to find k-clusters of data collected. best characterizes the objects in that data set. Several previous studies that discussed Covid 19 used the Kmedoids algorithm. Sindi et al conducted a study in 2020 by grouping the spread of covid 19 using the K-medoids algorithm in the province of Indonesia which is divided into 3 clusters [8]. Subsequent researchers Samudi et al who discussed the use of instructional media in the Covid 19 pandemic [9] and Windarto et al, who combined clustering and classification methods for Covid 19 cases in Indonesia [10].

Data on the Distribution of Covid Cases 19
When conducting research, case data related to the distribution of cases that occur in West Java based on districts / cities are needed using the dataset from Discarded close contact, suspect discarded, probable completion, Probable Death, positive total, positive recovery and positive death which are recapitulated on a daily basis which is taken until 20 May 2021 which was obtained from the website of the Covid 19 Covid coordination and information center in West Java Province [7]. Several stages were carried out in the data processing using the K-medoids method a) Data Selection The data set from this study was obtained from the Pikobar website [7] by following the data requirements required for data processing. b) Dataset Selection Researchers determine the data be processed based on the date grouped by districts/cities in West Java, which is cumulative according to the date selected by the researcher.

Data Mining
Data mining is a method used to process information obtained in a database that can be used for certain purposes based on the algorithm that will be used [11] [12] [13].

Clustering
Is the first step that will be used for grouping data that is seen based on the similarity of objects [8]. When using clustering, the equations are used to determine the distance to the algorithm, which means that the initial data selection can be determined randomly [14].

K-Medoids Algorthm
It is an algorithm that is used to determine k which is determined as an object representative to minimize the number of object inequalities. In the K-medoids algorithm, we must first determine the center point of the cluster that will be used in data grouping. The steps used to complete the K-Medoids calculation a) Determine the initial centroid that is randomly selected from the existing data set b) Count non-medoids objects to the closest cluster based on the Euclidean distance. c) Randomly assign objects in each cluster as candidate data objects for the new medoids d) Calculate the distance of each object from each cluster e) Perform the calculation of the Total Deviation (S) with the formula S = new total distance value -old total distance. If the value of S <0 then replace the object with cluster data to get k new objects as medoids

Results and Discussion
a) Selecting Covid 19 data in West Java based on 27 districts/cities categorized based on the parameters to be calculated, namely the number of confirmed positives, the number of cured, the number of people who died, the number of isolations in care, and the number in close contact. The results of normalization can be seen. in table 1.    Besides using manual calculations, the K-Medoids algorithm can also use calculations using the Rapidminer application. Based on the research dataset processed using Rapidminer, it is found that the number of clusters consists of 3 clusters, namely cluster 0 = 6 districts / cities, cluster 1 = 19 districts / cities and cluster 2 = 2 districts / cities.

Conclusion
Based on the research conducted as well as the implementation and testing that was done, it can be done grouping the Covid 19 data in West Java into 3 clusters, namely Cluster 0 with a high level of Covid 19 spread in districts/cities, namely Bandung City, Bogor, Ciamis Regency, Karawang, Bekasi, Bandung. Cluster 1 with moderate levels of covid spread in districts / cities, including Garut, Tasikmalaya, Kuningan, Cirebon, Majalengka, Sumedang, Indramayu, Subang, Purwakarta, West Bandung, Pangandaran, Sukabumi City, Cirebon, Cimahi, Tasikmalaya, Banjar districts. Whereas for cluster 2 with a low level of distribution in the districts/cities of Bekasi and Depok.