The Application of Data Mining in Determining Timely Graduation Using the C45 Algorithm

Graduating on time is one element of higher education accreditation assessment. In the Strata 1 level, students are declared to graduate on time if they can complete their studies <= eight semesters or four years. BAN-PT sets a timely graduation standard of >= 50%. If the standard is not met, it will reduce the value of accreditation. These problems encourage the Universitas Simalungun Pematangsiantar to conduct evaluations and strategic steps in an effort to increase student graduation rates so that the targets of BANPT can be achieved. For this reason it is necessary to know in advance the pattern of students who tend not to graduate on time. In this study, C4.5 Algorithm is proposed to predict student graduation. This algorithm will process student profile datasets totaling 150 data. This dataset has a graduation status label. The value of the label is categorical, that is, right and late. The features or attributes used, namely the name of the student, gender, student status, GPA. The results of the C4.5 algorithm are in the form of a decision tree model that is very easy to analyze. In fact, even by ordinary people. This model will map the patterns of students who have the potential to graduate on time and late.


Introduction
Students are intellectual figures who have high mobility as one of the biggest assets owned by a country. When a student can dedicate himself to the maximum, both his knowledge and experience to the wider community means that it can help the process of changing society that is more advanced. Basically students continue their education in Higher Education with the hope that they can attend education well. But this is not always the case, there are various problems they face relating to the study process of students in tertiary institutions. In completing it sometimes students face various obstacles that can hamper the completion of their study time which has implications for the graduation of the students themselves. Student graduation can be seen from the level of admission, ethics, activeness in the teaching and learning process, and academic achievement. In tertiary education institutions, graduation level information from students is very important to improve services that can make students comfortable so they can graduate on time.
Based on this we need a system that can provide decisions that can help related parties determine the graduation pattern of a student. many techniques in computer science can solve complex problems for those problems [1]- [7]. The settlement technique can use artificial intelligence [8]- [13]. There are several branches of artificial intelligence that can solve pattern cases. One of them is datamining [14]. With the datamining process, patterns or rules can be found that can be used to produce information by applying a decision tree technique. One well -known datamining technique is the C4.5 algorithm. The reason for using the C4.5 algorithm is because this algorithm can make rules in the form of patterns that can be done to determine whether a student's graduation is on time or not. This is reinforced by previous research which solved the problem by utilizing the classification data method C4.5. As was done [15] with the title application of the C4.5 algorithm to predict the recruitment of prospective new employees. In this study the C4.5 algorithm can be applied with the results of the method of measuring the success rate of prospective new employees by 71%. It is expected that the results of this study can provide solutions specifically to the Universitas Simalungun Pematangsiantar in determining the pattern of graduation of its students which has an impact on improving the quality of the tertiary institution.

Datamining
Data mining is a scientific discipline that studies methods for extracting knowledge or finding patterns from data where the results of data processing can be used to make decisions in the future [16].

Classification
In classification, there are target variable categories. For example, income classifications can be separated into three categories, namely high income, medium income, low income [17].

C4.5 Method (Decision Tree)
Decision tree is one of the most popular classification methods because this algorithm converts data into decision trees and decision rules expressed in tabular form with attributes and records [17]. In addition, the decision tree combines data exploration and modeling so that it is very good as the initial step of modeling even when used as the final model of several other techniques.

Data source
Data collection methods carried out in this study consisted of interviews, observations, and litetarature studies. Activities undertaken in collecting data relate to determining the graduation of Universitas Simalungun students in a timely manner. The data obtained will then be processed using the C4.5 algorithm classification method by taking the value of each attribute in the data to determine timely graduation. The following is a proposed method for research using the C4.5 algorithm to determine timely graduation.

Figure 1. Proposed Research Methods
Collect student data that will be used to determine graduation on time.
Calculation of preference values (entropy and gain) for each student data attribute

Produce a Decision Tree
Generate data on students who graduated and did not graduate on time

Results and Discussion
The dataset of the study consisted of criteria determined including: name of student, student status, sex, semester 1 to semester 8, grade point average and graduation status. Existing data is then transformed into the Microsoft Excel 2007 data format. The collected data is used as input data in creating rule models using the C4.5 algorithm using rapidminer software to display an overview of rule models in determining student graduation on time.

Research Dataset
In displaying data modeling using the C4.5 algorithm the decision tree method is used. The data used are Simalungun University student data of 150 records.

Classification Analysis of method C4.5
Following are the steps in forming a decision tree using the C4.5 algorithm. In making a decision tree first count the number of cases for a decision. Yes, the number of decision cases is not, and the entropy of all cases is based on attributes. From the calculation of entropy value and gain value for each attribute can be seen in the following Node 1 gain calculation table: From the results of entropy and gain counts in Table 1. it appears that the attribute status of students has the highest gain value that is 0.21599063. Therefore the student status attribute becomes the first root or node of the decision tree formed with 2 attribute values of the student status namely work and students. Count the number of cases, the number of cases for a decision Yes, the number of cases for a decision No, and the entropy of all cases by using the status attribute of working students. Then do the calculation again to get entropy and gain. From the calculation of the entropy value and the gain value for each attribute can be seen in the calculation table of gain Node 1.1. following. In table 3 it is known that the results of the gender attribute have a gain value of 0.015910807. Therefore, no further calculation is needed for the value of this attribute. After the results of entropy and gain calculations are made, a decision tree is formed by modeling the rules using the Rapidminer software.
Based on the shape of the decision tree formed the rule model is in the form of text as follows:

Conclusion
Based on the results of research in determining timely graduation at the Universitas Simalungun Pematangsiantar, it can be concluded that: a) Obtained a rule model that can show the rules of connectedness between gender attributes, student status and achievement index scores from semester 1 to 8 and from the research results obtained a model of timely graduation rules is based on a cumulative achievement index. b) Problems in determining timely graduation Universitas Simalungun Pematangsiantar can be solved by applying data mining techniques, namely the C4.5 Algorithm. c) Classification of student data at the Simalungun Pematangsiantar University with the C4.5 Algorithm can be a support in the application in the process of determining the timely graduation used by the administration of the Universitas Simalungun Pematangsiantar.