This article is devoted to a comparative analysis of methods for extracting knowledge from texts used to build ontologies. Various extraction approaches are reviewed, such as lexical, statistical, machine learning and deep learning methods, as well as ontology-oriented methods. As a result of the study, recommendations are formulated for choosing the most effective methods depending on the specifics of the task and the type of data being processed.
Keywords: ontology, knowledge extraction, text classification, named entities, machine learning, semantic analysis, model
The article presents the existing methods of reducing the dimensionality of data for teaching machine models of natural language. The concepts of text vectorization and word-form embedding are introduced. The task of text classification is being formed. The stages of classifier training are being formed. A classifying neural network is being designed. A series of experiments is being conducted to determine the effect of reducing the dimension of word-form embeddings on the quality of text classification. The results of evaluating the work of trained classifiers are compared.
Keywords: natural language processing, vectorization, word-form embedding, text classification, data dimensionality reduction, classifier
The article provides a brief description of the existing methods of vectorization of texts in natural language. The evaluation is described by the method of determining the similarity of words. A comparative analysis of the operation of several vectorizer models is carried out. The process of selecting data for evaluation is described. The results of evaluating the performance of the models are compared.
Keywords: natural language processing, vectorization, word-form embedding, semantic similarity, correlation
The article presents ways to improve the accuracy of the classification of normative and reference information using hierarchical clustering algorithms.
Keywords: machine learning, artificial neural network, convolutional neural network, normative reference information, hierarchical clustering, DIANA