Evaluation of the coherence of polish texts using neural network models
Authors:
- Sergii Telenyk,
- Sergiy Pogorilyy,
- Artem Kramov
Abstract
Coherence evaluation of texts falls into a category of natural language processing tasks. The evaluation of texts’ coherence implies the estimation of their semantic and logical integrity; such a feature of a text can be utilized during the solving of multidisciplinary tasks (SEO analysis, medicine area, detection of fake texts, etc.). In this paper, different state-of-the-art coherence evaluation methods based on machine learning models have been analyzed. The investigation of the effectiveness of different methods for the coherence estimation of Polish texts has been performed. The impact of text’s features on the output coherence value has been analyzed using different approaches of a semantic similarity graph. Two neural networks based on LSTM layers and a pre-trained BERT model correspondingly have been designed and trained for the coherence estimation of input texts. The results obtained may indicate that both lexical and semantic components should be taken into account during the coherence evaluation of Polish documents; moreover, it is advisable to analyze corresponding documents in a sentence-by-sentence manner taking into account word order. According to the retrieved accuracy of the proposed neural networks, it can be concluded that suggested models may be used in order to solve typical coherence estimation tasks for a Polish corpus.
- Record ID
- CUT210a40c7c0da4ace9efe403c700e9558
- Publication categories
- ;
- Author
- Journal series
- Applied Sciences-Basel, ISSN 2076-3417, Monthly
- Issue year
- 2021
- Vol
- 11
- No
- 7
- Pages
- [1-15]
- Article number
- 3210
- Other elements of collation
- schem.; tab.; Bibliografia (na s.) - 14-15; Bibliografia (liczba pozycji) - 28; Oznaczenie streszczenia - Abstr.; Numeracja w czasopiśmie - Vol. 11, Iss. 7, Spec. Iss.
- Substantive notes
- Special Issue: Rich Linguistic Processing for Multilingual Text Mining
- Keywords in English
- natural language processing, coherence evaluation, BERT model, LSTM-based neural network, Polish language
- DOI
- DOI:10.3390/app11073210 Opening in a new tab
- URL
- https://www.mdpi.com/2076-3417/11/7/3210 Opening in a new tab
- Language
- eng (en) English
- License
- Score (nominal)
- 100
- Additional fields
- Indeksowana w: Web of Science, Scopus
- Uniform Resource Identifier
- https://cris.pk.edu.pl/info/article/CUT210a40c7c0da4ace9efe403c700e9558/
- URN
urn:pkr-prod:CUT210a40c7c0da4ace9efe403c700e9558
* presented citation count is obtained through Internet information analysis, and it is close to the number calculated by the Publish or PerishOpening in a new tab system.