With the emergence of the massive educational data, education data mining techniques have extensively drawn considerable interest from scholars to explore the relationship between students’ achievements and other factors. In this study, the data set about the students’ achievements of Portuguese in two secondary education schools in Portugal is selected for education data mining, which involves the personal information, social and school related factors. To analyze the relationship between the students' achievements and other factors, this study proposed an ensemble model based on weighted voting for predicting the students’ achievements of Portuguese in the final period. First, the raw data is preprocessed using some basic methods, including dummy coding, correlation analysis, standardization, and normalization. Second, the isolation forest algorithm-based outlier adaption is applied to deal with the data set to enhance the robustness of the ensemble model. Finally, two base classifiers, i.e. gradient boosting decision tree and extreme gradient boosting, are integrated to form the ensemble model. The experiments are presented for verifying the superiority of the proposed model by comparing with five base classifiers, including gradient boosting decision tree, adaptive boosting, extreme gradient boosting, random forest, and decision tree. The experimental results demonstrate that the ensemble model performs better than other base classifiers in classification, and prove the validity of the outlier adaption based on isolation forest algorithm.
Published in | Science Journal of Education (Volume 9, Issue 2) |
DOI | 10.11648/j.sjedu.20210902.16 |
Page(s) | 58-62 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2021. Published by Science Publishing Group |
Ensemble Model, Education Data Mining, Prediction, Students’ Achievements
[1] | Romero, C., and Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33 (1), 135–146. |
[2] | Baker, R. S. J. D. (2010). Data mining for education. International Encyclopedia of Education, 7 (3), 112–118. |
[3] | Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29 (5), 1189–1232. |
[4] | Chen, T. Q., and Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, USA, pp. 785–794, August 13–17. |
[5] | Freund, Y., and Schapire, R. E. (1996). Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning, Bari, Italy, pp. 148–156, July 3–6. |
[6] | Breiman, L. (2001). Random forests. Machine Learning, 45 (1), 5–32. |
[7] | Li, X., Ying, W., Tuo, J., Li, B., and Liu, W. (2004). Applications of classification trees to consumer credit scoring methods in commercial banks. In Proceedings of IEEE International Conference on Systems, Man and Cybernetics, Hague, Netherlands, pp. 4112–4117, October 10–13. |
[8] | Zhang, L. B., Tan, X. W., Zhang, S., and Zhang, W. Y. (2019). Association rule mining for career choices among fresh graduates. Applied and Computational Mathematics, 8 (2), 37–43. |
[9] | Francis, B. K., and Babu, S. S. (2019). Predicting academic performance of students using a hybrid data mining approach. Journal of Medical Systems, 43 (6), 1–15. |
[10] | Karthikeyan, V. G., Thangaraj, P., and Karthik, S. (2020). Towards developing hybrid educational data mining model (HEDM) for efficient and accurate student performance evaluation. Soft Computing, 24 (24), 18477–18487. |
[11] | Mengash, H. A. (2020). Using data mining techniques to predict student performance to support decision making in university admission systems. IEEE Access, 8, 55462–55470. |
[12] | Kausar, S., Oyelere, S. S., Salal, Y. K., Hussain, S., Cifci, M. A., Hilcenko, S., Iqbal, M. S., Zhu, W. H., and Xu, H. H. (2020). Mining smart learning analytics data using ensemble classifiers. International Journal of Emerging Technologies in Learning, 15 (12), 81–102. |
[13] | Sun, Y., Li, Z. L., Li, X. W., and Zhang, J. (2021). Classifier selection and ensemble model for multi-class imbalance learning in education grants prediction. Applied Artificial Intelligence, 35 (4), 290–303. |
[14] | Schapire, R. E. (1999). A brief introduction to boosting. In Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, pp. 1401–1406, July 31–August 6. |
[15] | Assi, K. J., Shafiullah, M., Nahiduzzaman, K. M., and Mansoor, U. (2019). Travel-to-school mode choice modelling employing artificial intelligence techniques: A comparative study. Sustainability, 11 (16), 4484. |
[16] | Troussas, C., Krouska, A., Sgouropoulou, C., and Voyiatzis, I. (2020). Ensemble learning using fuzzy weights to improve learning style identification for adapted instructional routines. Entropy, 22 (7), 735. |
[17] | Rao, C. J., Liu, M., Goh, M., and Wen, J. H. (2020). 2-stage modified random forest model for credit risk assessment of P2P network lending to “Three Rurals” borrowers. Applied Soft Computing, 95, 106570. |
[18] | Xiao, N., Qiang, Y., Bilal Zia, M., Wang, S. H., and Lian, J. H. (2020). Ensemble classification for predicting the malignancy level of pulmonary nodules on chest computed tomography images. Oncology Letters, 20 (1), 401–408. |
[19] | Cortez, P., and Silva, A. M. G. (2008). Using data mining to predict secondary school student performance. In Proceedings of the 5th Future Business Technique Conference, Porto, Portugal, pp. 5–12, April 9–11. |
[20] | Liu, F. T., Ting, K. M., and Zhou, Z. H. (2008). Isolation forest. In Proceedings of the 8th IEEE International Conference on Data Mining, Pisa, Italy, pp. 413–422, December 15–19. |
[21] | Wei, S., Yang, D. Q., Zhang, W. Y., and Zhang, S. (2019). A novel noise-adapted two-layer ensemble model for credit scoring based on backflow learning. IEEE Access, 7, 99217–99230. |
[22] | Lin, W. Y., Hu, Y. H., and Tsai, C. F. (2011). Machine learning in financial crisis prediction: A survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42 (4), 421–436. |
[23] | Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27 (8), 861–874. |
[24] | Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78 (1), 1–3. |
APA Style
Shuai Zhang, Jie Chen, Wenyu Zhang, Qiwei Xu, Jiaxuan Shi. (2021). Education Data Mining Application for Predicting Students’ Achievements of Portuguese Using Ensemble Model. Science Journal of Education, 9(2), 58-62. https://doi.org/10.11648/j.sjedu.20210902.16
ACS Style
Shuai Zhang; Jie Chen; Wenyu Zhang; Qiwei Xu; Jiaxuan Shi. Education Data Mining Application for Predicting Students’ Achievements of Portuguese Using Ensemble Model. Sci. J. Educ. 2021, 9(2), 58-62. doi: 10.11648/j.sjedu.20210902.16
AMA Style
Shuai Zhang, Jie Chen, Wenyu Zhang, Qiwei Xu, Jiaxuan Shi. Education Data Mining Application for Predicting Students’ Achievements of Portuguese Using Ensemble Model. Sci J Educ. 2021;9(2):58-62. doi: 10.11648/j.sjedu.20210902.16
@article{10.11648/j.sjedu.20210902.16, author = {Shuai Zhang and Jie Chen and Wenyu Zhang and Qiwei Xu and Jiaxuan Shi}, title = {Education Data Mining Application for Predicting Students’ Achievements of Portuguese Using Ensemble Model}, journal = {Science Journal of Education}, volume = {9}, number = {2}, pages = {58-62}, doi = {10.11648/j.sjedu.20210902.16}, url = {https://doi.org/10.11648/j.sjedu.20210902.16}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.sjedu.20210902.16}, abstract = {With the emergence of the massive educational data, education data mining techniques have extensively drawn considerable interest from scholars to explore the relationship between students’ achievements and other factors. In this study, the data set about the students’ achievements of Portuguese in two secondary education schools in Portugal is selected for education data mining, which involves the personal information, social and school related factors. To analyze the relationship between the students' achievements and other factors, this study proposed an ensemble model based on weighted voting for predicting the students’ achievements of Portuguese in the final period. First, the raw data is preprocessed using some basic methods, including dummy coding, correlation analysis, standardization, and normalization. Second, the isolation forest algorithm-based outlier adaption is applied to deal with the data set to enhance the robustness of the ensemble model. Finally, two base classifiers, i.e. gradient boosting decision tree and extreme gradient boosting, are integrated to form the ensemble model. The experiments are presented for verifying the superiority of the proposed model by comparing with five base classifiers, including gradient boosting decision tree, adaptive boosting, extreme gradient boosting, random forest, and decision tree. The experimental results demonstrate that the ensemble model performs better than other base classifiers in classification, and prove the validity of the outlier adaption based on isolation forest algorithm.}, year = {2021} }
TY - JOUR T1 - Education Data Mining Application for Predicting Students’ Achievements of Portuguese Using Ensemble Model AU - Shuai Zhang AU - Jie Chen AU - Wenyu Zhang AU - Qiwei Xu AU - Jiaxuan Shi Y1 - 2021/04/26 PY - 2021 N1 - https://doi.org/10.11648/j.sjedu.20210902.16 DO - 10.11648/j.sjedu.20210902.16 T2 - Science Journal of Education JF - Science Journal of Education JO - Science Journal of Education SP - 58 EP - 62 PB - Science Publishing Group SN - 2329-0897 UR - https://doi.org/10.11648/j.sjedu.20210902.16 AB - With the emergence of the massive educational data, education data mining techniques have extensively drawn considerable interest from scholars to explore the relationship between students’ achievements and other factors. In this study, the data set about the students’ achievements of Portuguese in two secondary education schools in Portugal is selected for education data mining, which involves the personal information, social and school related factors. To analyze the relationship between the students' achievements and other factors, this study proposed an ensemble model based on weighted voting for predicting the students’ achievements of Portuguese in the final period. First, the raw data is preprocessed using some basic methods, including dummy coding, correlation analysis, standardization, and normalization. Second, the isolation forest algorithm-based outlier adaption is applied to deal with the data set to enhance the robustness of the ensemble model. Finally, two base classifiers, i.e. gradient boosting decision tree and extreme gradient boosting, are integrated to form the ensemble model. The experiments are presented for verifying the superiority of the proposed model by comparing with five base classifiers, including gradient boosting decision tree, adaptive boosting, extreme gradient boosting, random forest, and decision tree. The experimental results demonstrate that the ensemble model performs better than other base classifiers in classification, and prove the validity of the outlier adaption based on isolation forest algorithm. VL - 9 IS - 2 ER -