Case Based Reasoning using K-Nearest Neighbor with Euclidean Distance for Early Diagnosis of Personality Disorder

Abstrak A personality disorder is a condition of a person with an extreme personality that causes the sufferer to have unhealthy and different thoughts patterns and behavior from other people. The personality disorders discussed in this study consisted of 110 diseases with 300 case data and 68 symptoms. Based on Basic Health Research (Riskesdas) 2018 data, it shows that more than 19 million people aged 15 years and over were affected by mental-emotional disorders. Data from the Statistics Indonesia in 2019 that the population of Indonesia is around 265 million people, while according to the Indonesian Clinical Psychologist Association, the number of verified professional psychologists is 1,599 clinical psychologists out of a total membership of 2,078 as of January 2019. However, this figure does not meet the standards of the World Health Organization (WHO), which is that psychologists serve 30 thousand people. This shows that Indonesia still lacks around 28,970 psychologists. The unequal distribution of professional psychologists has made psychologists need a long time to provide a diagnosis because of the number of patients being inversely proportional to the availability of psychologists in Indonesia. Moreover, there is not enough patient knowledge about the symptoms they feel. This study aims to produce a system for diagnosing personality disorders. This study is a case based reasoning to solve problems that have occurred in previous cases using KNearest Neighbor to classify data based on the closest distance using the calculation of the Euclidean Distance. Algorithm testing for the system used the Confusion Matrix test. Based on the results of testing data in the 60 case data using K-nearest Neighbor and the calculation of the Euclidean Distance with a score of K=3, it is known that 60 data have 100% similarity to cases with a personality disorder. Meanwhile, testing new cases with 10 case data that were not in the knowledge base was also conducted showing that 9 cases had 100% similarity to the previous case, while another case had 90% similarity to the previous case.


Introduction
A personality disorder is a term used for mental disorders. This condition is described by the existence of an extreme personality that causes sufferers to have different thought patterns and behaviors from normal humans [1]. This behavior usually occurs during adolescence or early adulthood. During this period, a person experiences significant changes biologically, psychologically, and socially. A personality disorder is divided into 10 types, namely paranoid personality disorder, schizoid personality disorder, passiveaggressive personality disorder, antisocial personality disorder, narcissistic personality disorder, histrionic personality disorder, borderline personality disorder, avoidant personality disorder, obsessive-compulsive personality disorder, and dependent personality disorder.
Based on the data from the 2018 Basic Health Research (Riskesdas) issued by the Ministry of Health, the prevalence of mental-emotional disorders in people aged > 15 years old in 2013 to 2018 increased from 6& to 9.8% per province [2]. Data from the Statistics Indonesia (BPS) in 2019 showed that the Indonesian population is around 265 million people. Meanwhile, according to the Indonesian Clinical Psychologist Association (Ikatan Psikologi Klinis, abbreviated as IPK), the number of verified professional psychologists is 1,599 clinical psychologists and a total of 2,078 members as of January 2019. However, this number does not meet the standards of the World Health Organization (WHO) where 1 psychologist serves 30 thousand people. This shows that Indonesia still lacks around 28.970 psychologists.
The unequal distribution of professional psychologists indicating that the number of psychologists in Indonesia does not meet the standards of the World Health Organization (WHO). This condition makes psychologists need a long time to provide a diagnosis since the number of patients is more than the number of psychologists. Furthermore, the patient's knowledge of the symptoms they feel is not sufficient.
This study utilized data from a previous study entitled "Case Base Reasoning for the Diagnosis of Personality Disorders by Utilizing Bayesian Probabilistic." The data used were 300 case data and 68 symptom data [3]. Case data were divided into training data and testing data with a ratio of 80% training data and 20% testing data. The method used in this study was Case Base Reasoning (CBR) using K-Nearest Neighbor (KNN). The Case Base Reasoning method was used to solve problems that had occurred in previous cases and then adopted the information and solutions used in previous cases to solve problems in new cases [4]. K-nearest Neighbor was used to classifying data that had the closest distance [5]. This system aims to streamline the performance of psychologists in providing initial diagnosis to patients and to help sufferers to find out solutions for early treatment of personality disorder.

Case Base Reasoning
Case Base Reasoning (CBR) is a knowledge-based approach to solving problems by utilizing previous experience stored in a case based [6]. The knowledge stored on a case basis reduces the probability of errors occurring and makes it possible to analyze errors in previous cases [7]. A Case based reasoning has diagnostic capabilities that can provide information automatically based on knowledge of previous cases which have been revised according to new case problems. Thus, case base reasoning knowledge can continue to grow to solve problems in the future [8]. In general, there are 4 stages of problem-solving based on case-based reasoning that is used in problem-solving, namely retrieve, reuse, revise, and retain [9]. The problem-solving stage has an important role, which is if you cannot retrieve a case that is similar to the previous case, the CBR stage cannot be continued [10]. The solution is in the form of a cycle as shown in Figure 1, CBR Stages.

K-Nearest Neighbour
K-Nearest Neighbor is a simple classification method to classify new data whose class is not yet known [12]. The new data is classified by finding the closest distance to the object according to the specified number of k [13]. The use of K values is generally an odd number to anticipate the existence of the same distance in the classification process [14]. K-Nearest Neighbor is also known as the lazy learning method because the training sample must be present in memory when the classification process is running [15]. Measurement of the distance to find out the nearest neighbor is calculated using the Euclidean distance, which is a calculation to find the closest distance between two points by determining the similarity of cases to the symptom weight value [16]. The use of the Euclidean distance is useful for determining valid distance [17].

Confusion Matrix
Classification model testing is done to find out how well the system performs data classification by predicting true and false objects. A confusion matrix is used to measure the performance of the classification method [18].  Table 4 presents that the number of positive data classified as correct by the system is called TP while negative data classified correctly by the system is called TN. FP is the number of positive data classified incorrectly by the system, while FN is the number of negative data classified incorrectly by the system [19]. a) Accuracy is a test to find out how accurate the system is to classify data correctly. [20]. The accuracy value is the total of all data assessed and identified [21]. Accuracy = TP+TN/TP+TN+FP+FN)*100% (2) b) Precision, the number of positive data classified correctly is then divided by the total data classified as positive [22].
The recall is used for the proportion of documents that the system can recover [23].

Result and Discussion
The system built required case data, disease data, and symptoms for early diagnosis of personality disorder then processed into data that were ready to be implemented into the system. Symptom data consisted of 68 symptoms and disease data consisted of 10 disease data.

Decision Flow
In decision making, a flow was made to describe the flow of decision making using the case base reasoning method. The case-based reasoning stage consisted of 4 steps, namely retrieve, reuse, revise, and retain. Data classification used the k-nearest neighbor method and distance measurement used Euclidean distance. The calculation of k-nearest neighbors used 300 case data including 240 training data and 60 testing data. After the data were divided into training data and testing data, calculations were performed using the k-nearest neighbor method using the Euclidean distance calculation.

K-Nearest Neighbor Calculation
K-nearest neighbor calculation was done to find the nearest neighbor between cases by using the Euclidean distance calculation. After calculating the distance using Euclidean distance, the next step was to sort the closest distance based on the minimum distance K=3. Distances with the same value were sorted into the same class. The distance that satisfied K=3 was disease 1, namely paranoid personality disorder. The next step was to determine the categories included in K=3. Category values were taken based on the closest distance, namely distance 1, 2, and 3. Distances that were more than 3 were not included in the K=3 category. Based on the Yes category table for K-NN in column 6, there were 35 cases of paranoid personality disorder. After the categories had been determined, the next step was to determine the classification. The classification was taken based on the majority of the disease. Thus, the results were obtained, namely paranoid personality disorder with 35 cases.

Testing of New Cases
Testing of new cases was conducted using data not found in training data or testing data. The data used in new cases was 10 cases. a) Calculation of new Euclidean distance Based on the Yes category table for K-NN in column 6, there were 12 cases of paranoid personality disorder. After the categories have been determined, the next step was to determine the classification based on the majority of diseases. The results of the classification of the test data were paranoid personality disorder with a total of 12 cases.

Algorithm Testing of the System
Accuracy testing is an accuracy test performed to measure the performance of the classification method [24]. This test used the Confusion Matrix by calculating accuracy, precision, and recall. a) Testing of Testing Data The results of accuracy testing with confusion matrix for 20% testing data, 6 case data can be seen in table 7 of the confusion matrix data testing.

Conclusion
This study has implemented case based reasoning with the K-Nearest Neighbor and Euclidean Distance methods which aim to produce a system that can diagnose personality disorders. K-Nearest Neighbor is used to classify data based on the closest distance using Euclidean Distance calculations. Based on the results of data testing on 60 case data using K-Nearest Neighbor and the calculation of Euclidean Distance with a score of K = 3, it is known that 60 data has 100% similarity to cases of personality disorder. Meanwhile, testing of new cases with 10 case data which were not in the knowledge base was also carried out which showed that 9 cases had 100% similarity to the previous case, while other cases had 90% similarity with the previous cases. System testing with the Confusion Matrix using the Euclidean Distance method shows 9 cases have similarities with personality disorders and are positive for actual and predicted values. Meanwhile, 1 case is positive for the actual value and negative for the prediction. Calculation results with 90% accuracy, 100% precision, and 90% recall.