Информация о публикации

Просмотр записей
Инд. авторы: Selivanova I.V., Ryabko B.Ya., Guskov A.E.
Заглавие: Classification by Compression: Application of Information-Theory Methods for the Identification of Themes of Scientific Texts
Библ. ссылка: Selivanova I.V., Ryabko B.Ya., Guskov A.E. Classification by Compression: Application of Information-Theory Methods for the Identification of Themes of Scientific Texts // Automatic Documentation and Mathematical Linguistics. - 2017. - Vol.51. - Iss. 3. - P.120-126. - ISSN 0005-1055. - EISSN 1934-8371.
Внешние системы: DOI: 10.3103/S0005105517030116; РИНЦ: 32812733; WoS: 000409073600005;
Реферат: eng: A method for automatic classification of scientific texts based on data compression is proposed. The method is implemented and investigated based on the data from an archive of scientific texts (arXiv.org) and in the CyberLeninka scientific electronic library (CyberLeninka.ru). Experiments showed that the method correctly identified the themes of scientific texts with a probability of 75-95%; its accuracy depends on the quality of the original data
Ключевые слова: CyberLeninka; arXiv.org; text compression; information theory; thematic classification of texts; classification;
Издано: 2017
Физ. характеристика: с.120-126
Цитирование:
1. A Frequent Concepts Based Document Clustering Algorithm
By: Baghel, R.; Dhir, D. R.
Int. J. Comput. Appl.  Volume: 4   Issue: 5   Pages: 6-12   Published: 2010
2. 
Frequent Term-based Text Clustering
By: Beil, F.; Ester, M.; Xu, X.
P 8 ACM SIGKDD INT C  Pages: 436-442   Published: 2002
3. 
Algorithmic clustering of music based on string compression
By: Cilibrasi, R; Vitanyi, P; de Wolf, R
COMPUTER MUSIC JOURNAL  Volume: 28   Issue: 4   Pages: 49-67   Published: DEC 2004
4. 
Clustering by compression
By: Cilibrasi, R; Vitanyi, PMB
IEEE TRANSACTIONS ON INFORMATION THEORY  Volume: 51   Issue: 4   Pages: 1523-1545   Published: APR 2005
5. A complex approach to the problem of determining the authorship of the text
By: Khmelev, D. V.
MEZHD K RUSSK YAZ IS  Pages: 426-427   Published: 2001
6. 
Some effective techniques for naive Bayes text classification
By: Kim, Sang-Bum; Han, Kyoung-Soo; Rim, Hae-Chang; et al.
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING  Volume: 18   Issue: 11   Pages: 1457-1466   Published: NOV 2006
7. 
Title: [not available]
By: KUKUSHKINA OV
PROBL PEREDACHI INF  Volume: 37   Pages: 96   Published: 2001
8. 
The similarity metric
By: Li, M; Chen, X; Li, X; et al.
IEEE TRANSACTIONS ON INFORMATION THEORY  Volume: 50   Issue: 12   Pages: 3250-3264   Published: DEC 2004
9. Title: [not available]
By: Li, M.; Vitanyi, P. M. B.
An Introduction to Kolmog-orov Complexity and Its Applications  Pages: 637   Published: 1997
Publisher: Springer-Verlag, New York
10. Conditional Complexity of Compression for Authorship Attribution
By: Malyutov, M. B.; Wickramasinghe, C. I.; Li, S.
SFB 649 Discussion Paper No. 57  Pages: 38   Published: 2007
Publisher: Humboldt University, Berlin
11. Title: [not available]
By: Malyutov, M. B.
SPRINGER LECT NOTES  Volume: 4123   Pages: 362-380   Published: 2007
12. Classification of documents in vector space
By: Matyasko, A. A.; Khaustov, V. A.
INF TEKHN SIST 2012  Pages: 140-141   Published: 2012
13. Document clustering using character n-grams: A comparative evaluation with term-based and word-based clustering
By: Miao, Y.; Keselj, V.; Milios, E.
CIKM 05  Pages: 357-358   Published: 2005
14. Title: [not available]
By: Ryabko, B.; Astola, J.; Malyutov, M.
COMPRESSION BASED ME  Published: 2016
Publisher: Springer, New York
15. Graph clustering
By: Schaeffer, Satu Elisa
COMPUTER SCIENCE REVIEW  Volume: 1   Issue: 1   Pages: 27-64   Published: AUG 2007
16. Classification of texts with decision trees and neural networks of direct propagation, Vestn
By: Shevelev, O. G.; Petrakov, A. V.
Tomsk. Gos. Univ.  Volume: 290   Pages: 300-307   Published: 2006
17. 
A comparison among three neural networks for text classification
By: Wang, Zhan; He, Yifan; Jiang, Minghu
2006 8TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, VOLS 1-4  Book Series: International Conference on Signal Processing   Pages: 1883-+   Published: 2006