Article information

2016 , Volume 21, ¹ 4, p.3-15

Shokina N.Y., Mocken S.

A text mining system for creating electronic glossaries with application to research of Church Slavonic language

In this paper we present a text mining system for creating electronic glossaries with application to research of Church Slavonic language. Preprocessing, core text mining operation (pattern discovery), and postprocessing (automatic and manual lemmatization) are described in detail. An original pattern recognition algorithm for the initial discovery of text elements and the subsequent identification of the positions of tagged elements (or sets of elements) within a TEI XML entry is presented. The application of software design principles to the creation of our software is briefly described.

[full text]
Keywords: Digital humanities, text mining, pattern recognition, lemmatization, software engineering, software design, TEI XML, Church Slavonic

Author(s):
Shokina Nina Yurievna
PhD.
Position: Research Scientist
Office: Medical Center University of Freiburg
Address: 79106, Germany, Freiburg, Killianstrasse, 5a
Phone Office: (49761) 270 73930
E-mail: nina.shokina@uniklinik-freiburg.de
SPIN-code: 8680-7439

Mocken Susanne
PhD.
Position: Research Scientist
Office: Computing Center, Albert-Ludwigs-University Freiburg
Address: Germany, Freiburg, Freiburg, Killianstrasse, 5a
E-mail: susanne.mocken@rz.uni-freiburg.de

References:
[1] Schreibman, S., Siemens, R, Unsworth, J. A companion to digital humanities. Oxford: Blackwell; 2004. Available at: http://www.digitalhumanities.org/companion/

[2] What Is Digital Humanities? Available at: http://whatisdigitalhumanities.com

[3] Fitzpatrick, K. On scholarly communication and the digital humanities: An Interview with Kathleen Fitzpatrick / By Fred Rowland and Andrew Lopez, January 14, 2015. Available at: http://www.inthelibrarywiththeleadpipe.org/2015/on-scholarly-communication-andthe-digital-humanities-an-interview-with-kathleen-fitzpatrick

[4] SlaVaComp (COMPutergest¨utzte Untersuchung von VAriabilitat im KirchenSLAvischen). Available at: http://www.slavacomp.uni-freiburg.de/

[5] The Text Encoding Initiative (TEI). Available at: http://www.tei-c.org/index.xml

[6] The TEI Guidelines for Electronic Text Encoding and Interchange.Available at: http://www.tei-c.org/ Guidelines/P5/

[7] Feldman, R., Sanger, J. The text mining handbook. Cambridge: Cambridge University Press; 2006: 424.

[8] Han, J., Kamber, M., Pei, J. Data mining: concepts and techniques. Elsevier; 2011: 744.

[9] KurSlovnık jazyka staroslovenskeho = Lexicon linguae palaeoslovenicae. Vol. I-IV. Praha, 1959– 1997.

[10] Gouws, R. H., Heid, U., Schweickard, W., Wiegand, H. E. Supplementary volume dictionaries. An International encyclopedia of lexicography. Supplementary volume: Recent developments with focus on electronic and computational lexicography. Berlin, Boston: De Gruyter Mouton; 2013: 1579.

[11] Liddle, H.G., Scott, R., Jones, H.S. A Greek-English lexicon. Oxford: Clarendon Press; 1996: 2448.

[12] Lampe, G.W.H. A Patristic Greek lexicon. Oxford: Oxford University Press; 1969: 1616.

[13] Bauer, W. Griechisch-Deutsches Worterbuch zu den Schriften des neuen Testaments und der fruhchristlichen Literatur. Berlin, New York: Walter De Gruyter; 1998.

[14] Muraoka, T. A Greek-English lexicon of the Septuagint. Louvain–Paris-Walpole (MA): Peeters; 2009: 757.

[15] Lust, J., Eynikel, E., Hauspie, K. A Greek-English lexicon of the Septuagint. Stuttgart: Deutsche Bibelgesellschaft; 1992: 217.

[16] Trapp, E. Das Lexikon zur byzantinischen Grazitat: besonders des 9.-12. Jahrhunderts. Wien: Verlag der osterreichischen Akademie der Wissenschaften; 2001: 316.

[17] ISO/IEC/IEEE Systems and Software Engineering Vocabulary (SEVOCAB). Available at: http://www.computer.org/sevocab/

[18] Guide to the Software Engineering Body of Knowledge (SWEBOK Guide). Available at: http://www.computer.org/web/swebok

[19] Krys'ko V.B. Il'ina kniga. Rukopis' RGADA, Tip. 131. Lingvisticheskoe izdanie, podgotovka grecheskogo teksta, kommentarii, slovoukazateli [Elias’ Book, Manuscript RGADA, Tip. 131. Linguistic edition, processing of the Greek text, comments, glossaries]. Moscow: Indrik; 2005: 904. (In Russ.)

[20] Skilevic, S. SlaVaComp: Konvertierungstool. Slovene = Ñëîâѣíå. International Journal of Slavic Studies. 2013; 2(2):172–183.

[21] Khakimzyanov, G.S., Shokina, N.Yu. Numerical modelling of three-dimensional steady fluid flows on adaptive grids. Russian Journal of Numerical Analysis and Mathematical Modelling. 2001; 16(1):33–57.

Bibliography link:
Shokina N.Y., Mocken S. A text mining system for creating electronic glossaries with application to research of Church Slavonic language // Computational technologies. 2016. V. 21. ¹ 4. P. 3-15
Home| Scope| Editorial Board| Content| Search| Subscription| Rules| Contacts
ISSN 1560-7534
© 2024 FRC ICT