Statistical analysis of non-normal distributions in high-dimensional data

Main Article Content

Noor Saeejil
Shaymaa Qasim Mohsin

Abstract

              This study addresses the challenges of analyzing high-dimensional insurance data characterized by non-normal distributions. By employing advanced statistical techniques, including transformation methods and dimensionality reduction strategies like Principal Component Analysis (PCA), we normalize skewed variables and simplify complex datasets effectively. Generalized Linear Models (GLMs) are used to identify significant predictors of policy charges, revealing how demographic factors such as age, sex, BMI, and lifestyle choices influence premium costs. The findings support personalized pricing strategies that reflect individual risk profiles more accurately. This research contributes to the development of sophisticated statistical models capable of managing complex datasets in the insurance sector, enhancing both operational efficiency and customer satisfaction.

Article Details

How to Cite
Saeejil , N., & Mohsin, S. (2025). Statistical analysis of non-normal distributions in high-dimensional data. The Gulf Economist, 42(64), 239–272. Retrieved from https://tge.uobasrah.edu.iq/index.php/tge/article/view/178
Conference Proceedings Volume
Section
Articles
Author Biographies

Noor Saeejil , University of Basra / College of Administration and Economics

                             Noor Ali Saeejil  

University of Basra / College of Administration and Economics

Shaymaa Qasim Mohsin, University of Basra / College of Administration and Economics

Shaymaa Qasim Mohsin

University of Basra / College of Administration and Economics

References

Akram, M., Cerin, E., Lamb, K. E., & White, S. R. (2023). Modelling count, bounded and skewed continuous outcomes in physical activity research: Beyond linear regression models. International Journal of Behavioral Nutrition and Physical Activity, 20(1), 57. https://doi.org/10.1186/s12966-023-01460-y

-Ashraf, A., Nawi, N. M., Shahzad, T., Aamir, M., Khan, M. A., & Ouahada, K. (2024). Dimension Reduction Using Dual-Featured Auto-Encoder for the Histological Classification of Human Lungs Tissues. IEEE Access, 12, 104165–104176. https://doi.org/10.1109/ACCESS.2024.3434592

-Elattar, M., Younes, A., Gad, I., & Elkabani, I. (2024). Explainable AI model for PDFMal detection based on gradient boosting model. Neural Computing and Applications, 36(34), 21607–21622. https://doi.org/10.1007/s00521-024-10314-y

-Lei, L., Bickel, P. J., & Karoui, N. E. (2016). Asymptotics For High Dimensional Regression M-Estimates: Fixed Design Results (arXiv:1612.06358). arXiv. https://doi.org/10.48550/arXiv.1612.06358

-Wang, Z., Akande, O., Poulos, J., & Li, F. (2022). Are deep learning models superior for missing data imputation in large surveys? Evidence from an empirical comparison (arXiv:2103.09316). arXiv. https://doi.org/10.48550/arXiv.2103.09316

-Wentzell, P. D., Giglio, C., & Kompany-Zareh, M. (2021). Beyond principal components: A critical comparison of factor analysis methods for subspace modelling in chemistry. Analytical Methods, 13(37), 4188–4219. https://doi.org/10.1039/D1AY01124C

-Woodman, R. J., & Mangoni, A. A. (2023). A comprehensive review of machine learning algorithms and their application in geriatric medicine: Present and future. Aging Clinical and Experimental Research, 35(11), 2363–2397. https://doi.org/10.1007/s40520-023-02552-2

-Woods, A. D., Davis-Kean, P., Halvorson, M. A., King, K. M., Logan, J. A. R., Xu, M., Bainter, S., Brown, D. M. Y., Clay, J. M., Cruz, R. A., Elsherif, M. M., Gerasimova, D., Joyal-Desmarais, K., Moreau, D., Nissen, J., Schmidt, K., Uzdavines, A., Van Dusen, B., & Vasilev, M. R. (2021). Missing Data and Multiple Imputation Decision Tree. https://doi.org/10.31234/osf.io/mdw5r