Article information

2013 , Volume 18, ¹ 6, p.62-74

Zagoruiko N.G., Barakhnin V.B., Borisova I.A., Tkachev D.A.

Clusterization of text documents from the database of publications using FRiS-Tax algorithm

In this paper, a successful experience of using the FRiS-Tax algorithm for clustering of text documents, based on function of rival similarity is described. For this type of tasks, advantages of the given algorithm compared to the classical clustering algorithms are shown. A posteriori selected rules for weighting coefficient in the measure of document's similarity determination are found. The way how to use the parallel calculations in some steps of FRiS-algorithm aimed at the speeding up the computations in the text document clustering is offered. Quantitative estimations of the process time are given, which prove the advantage of the parallel realization at different stages of the program. It applies both at preliminary analysis of texts, including similarity measures calculation and at some steps of FRiS-Tax algorithm.

[full text]
Keywords: text documents clustering , parallel algorithm for clustering , FRiS-Tax algorithm

Author(s):
Zagoruiko Nikolay Grigoryevich
Dr. , Professor
Position: Head of Laboratory
Office: Sobolev Institute of Mathematics SB RAS
Address: 630090, Russia, Novosibirsk, Acad. Koptyug avenue 4
Phone Office: (383)363 46 83
E-mail: zag@math.nsc.ru

Barakhnin Vladimir Borisovich
Dr. , Associate Professor
Position: Leading research officer
Office: Federal Research Center for Information and Computational Technologies
Address: 630090, Russia, Novosibirsk, Ac. Lavrentiev ave, 6
Phone Office: (383) 330 78 26
E-mail: bar@ict.nsc.ru
SPIN-code: 1541-0448

Borisova Irina Artemovna
PhD.
Position: Senior Research Scientist
Office: Sobolev Institute of Mathematics SB RAS
Address: 630090, Russia, Novosibirsk, Acad. Koptyug avenue 4
Phone Office: (383)36-34-671
E-mail: biamia@mail.ru

Tkachev Dmitry Alexandrovich
Office: Institute of Computational Technologies SB RAS
Address: 630090, Russia, Novosibirsk, prospect Akademika Lavrentjeva, 6
Phone Office: (383) 33-07-826
E-mail: relk-tda@yandex.ru


Bibliography link:
Zagoruiko N.G., Barakhnin V.B., Borisova I.A., Tkachev D.A. Clusterization of text documents from the database of publications using FRiS-Tax algorithm // Computational technologies. 2013. V. 18. ¹ 6. P. 62-74
Home| Scope| Editorial Board| Content| Search| Subscription| Rules| Contacts
ISSN 1560-7534
© 2024 FRC ICT