Абай атындағы ҚазҰПУ-нің ХАБАРШЫСЫ, «Физика-математика ғылымдары» сериясы, №
3
(7
9
), 2022
154
The presence of outliers has the potential to fail the statistical assumptions on which we intend to build the
model. Subsequently identifying is fundamentally to understand the background of the outliers before using
any kind of healing. For example, outliers have every chance of being a valuable
source of information when
fraud is detected; as a result, it would be a bad idea to change them with the mean or median meaning.
Data mining and data cleansing are considered mutually cyclical steps
Data mining includes both
univariates, eg, and bivariate testing and ranges from univariate statistics and frequency spreads to correlations,
crosstabs, and data analysis. A univariate exploratory data test is shown in Figure 3.
Figure 3. EDA (one-dimensional view)
Subsequently, exploratory data analysis (EDA) data is processed to increase properties [10]. Data cleansing
requires good business conduct and data awareness so that the data can be correctly interpreted. It is an iterative
process designed to eliminate violations and replace, reconfigure, or remove these violations as needed. The 2
main difficulties with dirty data are missing meanings and outliers; both have every chance of strongly
influencing the accuracy of the model, because of which prudent intervention is needed.
Достарыңызбен бөлісу: