Инновации. Наука. Образование For continuous placement tasks, algorithms have been invented to use only the most well-
known and understandable metrics and distance measures. The arsenal of models used and the
construction of universal methods for solving problems with various distance measures needs to
be expanded. Among other things, there is an urgent need to improve methods for solving
grouping problems with a large amount of input data.
Deterministic algorithms for automatic grouping and placement have existed since the
middle of the XX century. The computational complexity of the data of the algorithms studied
exponentially depends on the size of the input information. The effectiveness of the algorithm
depends on the amount of data received at the input. The application is considered reasoned if
they are used only for a relatively narrow range of tasks. But the accuracy and stability of the
results obtained makes it possible to trust the theoretical construction of effective algorithms that
have the property of stability of the results obtained and are designed to solve problems of
automatic grouping.
The clustering problem is a special case of the unsupervised learning problem, which
boils down to splitting the existing set of data objects into subsets in such a way that the elements
of one subset differed significantly in some set of properties from the elements of all other
subsets. A data object is usually considered as a point in a multidimensional metric space, each
dimension of which corresponds to some property (attribute) of the object, and the metric is a
function of the values of these properties. The choice of the data clustering algorithm and the
metric used depends on the types of measurements of this space, which can be both numerical
and categorical. This choice is dictated by the differences in the nature of different types of
attributes.
Most algorithms for these tasks require specifying the number of groups (clusters).
Determining this number is a separate difficult task, solved using various criteria. Another
approach is to solve a series of problems. A series refers to a set of tasks that differ only in the
number of groups (clusters) into which objects are divided. The results of solving these problems
can be further analyzed using any criteria.
The clustering method is a way of calculating distances between clusters. The following
main clustering methods are shared:
−
Between-groups linkage
−
Within-groups linkage
−
Nearest neighbor
817
Научный журнал «Инновации. Наука. Образование» Индексация в РИНЦ н