Classification of Generation By Population by Region in Indonesia Using K-Means Algorithm

Population growth caused by the year of birth led to the classification of population groups into several generations. Classification is important because in each generation there is based on population growth has different characteristics and traits in each generation. This research was conducted to try to group generations based on provinces in Indonesia based on the number of residents owned. When researchers analyzed the data obtained from population census data conducted by the central statistics agency (BPS). The method used in generation classification grouping uses the K-Means algorithm method based on 3 clusters. Based on the results of calculations carried out for 3 clusters obtained cluster 1 has 25 provinces, cluster 2 has 3 provinces and cluster 3 has 6 provinces. Based on the 2020 census that has been conducted, the current population is generation Z, generation and Pre Boomer generation is last in line so that from the available data can provide information about mapping in 34 provinces to be able to improve communication patterns between generations and fulfill public facilities that can be used every generation.


Introduction
The 2020 population census conducted by the central statistics agency conducted in February -September 2020 [1], based on the census of the population of Indonesia as many as 270,203,917 people who have a distribution of population that can be classified based on the generation seen based on the year of birth of the population. Based on the results of the 2020 census, Indonesia's population is dominated by generation Z who were born between 1977 and 2012, and then the millennial generation whose population was born from 1981 to 1996. In the process of classifying done for the population group using the literature of William H Frey. In every generation in Indonesia so that from the process can create a good communication process. From this background, generation grouping is needed to make it easier to know the number of generation clustering deployments in provinces in Indonesia. Based on the above assessment several methods can be done to find out the clustering process based on previous research [2][3] [4]. Clustering is a method used to analyze data used to solve problems based on data grouping [5][6] [7]. For the calculation process, researchers use the K-Means method as an algorithm in the data mining method in the process of grouping data [8]. In this research activity, the data used is divided into post generation Z, Generation Z, Millennial, Generation X, Boomer, Pre Boomer based on the population of 34 provinces, namely the spread of generation with a number that dominates, dominate and less dominates.

Research and Methodology
To conduct research is needed by using the overall literature of the recording of total demographic data in Indonesia from the BPS website related to the 2020 census data and also looking for references related to problems from books and related journals to be able to get problem-solving and using K-Means algorithm in the calculation process based on 3 specified clusters that are very dominating, Dominate and dominate less.

Data Collection Stages
In the process of collecting data researchers take data from secondary parties based on population surveys conducted from census records conducted from February 2021 to September 2021 conducted online or by BPS officers then the data can be accessed on the BPS website.

Stages of Data Processing and Analysis
The generation clustering in 34 provinces that have been obtained will be processed first to be able to determine a cluster. The clustering process divides into 3 classes based on the data provided. Then the data is analyzed by calculating the weight of each index by selecting a randomly selected centroid number for the cluster.

Stages of Application of K-Means Algorithm Method
To be able to complete the K-Means algorithm several stages can be done including a) Determining the number of clusters formed from available data is 3 clustering: Very domineering, Dominating, and Less domineering. b) Determining cluster values randomly, for initial data the specified value comes from West Sumatra Province, Riau Islands Province, and South Kalimantan Province. The results of the cluster value determination can be seen in table 2. c) From each line that has been calculated, determine the cluster closest to the center of the cluster. This stage can be seen in table 3. d) Determining the value for the center of the latest cluster to perform recalculation from the initial stage until the overall data from each cluster that we have no change back then the final result can be obtained and we can find out the number of clusters. This can be seen from the processing results with Rapidminer in figures 1,2 and 3.

Results and Discussion
To conduct the process of grouping generation classification in the territory of Indonesia is done first with the selection of centroid data conducted randomly from 33 provinces from data obtained from BPS.  After determining the centroid center then calculated based on the available data so that 3 clusters were obtained and determined the closest distance from the centroid center and the value of the cluster for each provincial data. The results of the calculation can be seen in table 3. To perform the calculation process with the Rapidminer application, the data we have is carried out the import process into the application by adjusting the data type and determination of the id, as seen in figure 1.

Figure 1. Transformation Data Process
After doing the process of reading the data, the next step is to determine the results of clustering, with K = 3 in the RapidMiner application, thus producing the cluster data output in figure 2.

Conclusion
Based on the results of research that has been done can be drawn conclusions: a. K-Means algorithm used is able to map generation clustering into 3 clusters, namely the dominant cluster has 25 provinces, the dominant cluster has 6 provinces and the non-dominant cluster has 3 provinces obtained from 34 provinces in Indonesia. b. From the results of the research that has been done, researchers suggest that further research be conducted to provide public facilities owned by a province that can be accessed by every generation.