Big Data Mining for Chronic Disease Prediction using Principal Component Analysis and eXtreme Gradient Boosting

Shruti Bhargava Choubey; T. Chitra; J. Jasmine Hephzipah

doi:10.34293/gkijaret.v1i1.2024.1

Authors

Shruti Bhargava Choubey Associate Professor, Department of Electronics and Communication Engineering, Sreenidhi Institute of Science and Technology, Hyderabad, 501301, Telangana, India
T. Chitra Assistant Professor, Department of Electronics and Communication Engineering, Christian College of Engineering and Technology, Oddanchatram, 624619, Tamil Nadu, India
J. Jasmine Hephzipah Associate Professor, Department of Electronics and Communication Engineering, R.M.K. Engineering College, Kavaraipettai, 601206, Tamil Nadu, India

DOI:

https://doi.org/10.34293/gkijaret.v1i1.2024.1

Keywords:

Big Data, Data Mining, Disease Detection, Chronic Disease, Machine Learning

Abstract

Big data mining has revised health care with advanced analytics running on huge data sets, thereby improving immensely in disease detection and management. In this paper, a new approach of chronic disease detection is proposed, which combines PCA-based feature selection with the XGB classifier. It focuses only on the most important features, making analysis more efficient and accurate. This technique projects complicated medical data, be it biomarkers or clinical parameters, into a lower dimension called principal components, capturing essential variability but discarding less important information. In this way, PCA will reduce the dimensionality of the dataset, making the analysis more efficient and accurate. This technique projects complicated medical data into a lower dimension called principal components, capturing essential variability but getting rid of less important information. After extracting the features using PCA, XGB is used for the classification that would lead to the identification and prediction of chronic diseases. Boosting algorithms in XGB are very powerful and could handle complex relationships between data points easily and accurately, greatly enhancing model performance. The approach returned remarkable results with precision of 98.8%, recall of 98%, and an F1-score of 98.7%, hence considerably good enough to decide between healthy vs. chronic conditions. Moreover, it achieves high efficiency with the proposed model, reducing processing time to just 8 seconds and thus making the model very practical to be applied in real-world scenarios.

Big Data Mining for Chronic Disease Prediction using Principal Component Analysis and eXtreme Gradient Boosting

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

Cite This