Big Data Mining for Chronic Disease Prediction using Principal Component Analysis and eXtreme Gradient Boosting
DOI:
https://doi.org/10.34293/gkijaret.v1i1.2024.1Keywords:
Big Data, Data Mining, Disease Detection, Chronic Disease, Machine LearningAbstract
Big data mining has revised health care with advanced analytics running on huge data sets, thereby improving immensely in disease detection and management. In this paper, a new approach of chronic disease detection is proposed, which combines PCA-based feature selection with the XGB classifier. It focuses only on the most important features, making analysis more efficient and accurate. This technique projects complicated medical data, be it biomarkers or clinical parameters, into a lower dimension called principal components, capturing essential variability but discarding less important information. In this way, PCA will reduce the dimensionality of the dataset, making the analysis more efficient and accurate. This technique projects complicated medical data into a lower dimension called principal components, capturing essential variability but getting rid of less important information. After extracting the features using PCA, XGB is used for the classification that would lead to the identification and prediction of chronic diseases. Boosting algorithms in XGB are very powerful and could handle complex relationships between data points easily and accurately, greatly enhancing model performance. The approach returned remarkable results with precision of 98.8%, recall of 98%, and an F1-score of 98.7%, hence considerably good enough to decide between healthy vs. chronic conditions. Moreover, it achieves high efficiency with the proposed model, reducing processing time to just 8 seconds and thus making the model very practical to be applied in real-world scenarios.
Downloads
Published
Issue
Section
License
Copyright (c) 2024 GK International Journal of Advanced Research in Engineering and Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.