Абай атындағы ҚазҰпу-нің хабаршысы, «Физика-математика ғылымдары» сериясы, №3 (7 9 ), 2022 150 мрнти

жүктеу/скачать 1,21 Mb.

Pdf көрінісі

бет	6/7
Дата	30.12.2023
өлшемі	1,21 Mb.
	#199874

1 2 3 4 5 6 7

Байланысты:
вестник КазНПУ 2

Training, Scaling and Evaluation of the Model

Variable Transformations
Scorecard development outlines how to turn the data into a scorecard model if data preparation and the
initial variable selection process (filtering) are completed and the filtered training dataset is available for the
model building process. The development process is made up of 4 main parts: rearranging the variables,
learning the model with the implementation of logistic regression, testing the model and scaling. Figure 4
illustrates the process of developing a feature system.
The usual scorecard model based on logistic regression is seen as an additive model. Accordingly, special
rearrangements of variables will be required.

Figure 4. The process of developing a system of indicators

ВЕСТНИК КазНПУ им. Абая, серия «Физико-математические науки», №
3
(7
9
), 2022 г.

155
Training, Scaling and Evaluation of the Model
Model evaluation is considered the last step in the model construction process. It is produced from 3
separate milestones: evaluation, testing, and acceptance.
The main metrics assessed are statistical characteristics, covering model accuracy, complexity, miss rate,
model correlation statistics, variable statistics, sense of significance, and odds ratios [11].
The choice of test metric depends on the similarity of the model classifier. The most common indicators
for binary systematization problems are the lifting diagram, the lifting force diagram, the ROC curve, and the
Kolmogorov-Smirnov diagram. The ROC curve is the most common inventory for visualizing model data.
This is a universal tool that is used for:
- champion-challenger methodology for choosing a more efficient model;
- Testing the performance of the model on invisible data and comparing it with the training data;
- Choosing a rational threshold that maximizes the number of true positives and minimizes the number of
false positives.
The ROC-curve is based on the method of raising the dependence of sensitivity on the probability of false
positives (false positives ratio) at all possible thresholds. A desirable feature of the ROC curve is the
assessment of performance characteristics at various thresholds. Different types of business problems will have
different threshold meanings depending on the business strategy.
The area under the ROC curve (AUC) is a useful indicator to indicate the predictive ability of a classifier.
As far as credit risk is concerned, an AUC of 0.75 or higher is considered an industry stereotype and a sine
qua non for model acceptance. Figure 6 shows the performance characteristics of the model.
Figure 5. Performance indicators of the model

Receiving utility is seen as a critical boundary when a data professional is obliged to rebuild models for
business and “protect” this model. The main aspect of the evaluation is the financial benefit of the model,
because of which the benefit test occupies a central place in the presentation of the results. Data scientists are
required to make every effort to suggest summaries in short form so that the cross section and output are simply
skipped and understood. Failure to receive this can lead to withdrawal from the model and, therefore, to a
breakthrough of the plan.

жүктеу/скачать 1,21 Mb.

Достарыңызбен бөлісу:

1 2 3 4 5 6 7