Article information

2017 , Volume 22, Special issue, p.99-112

Ustalov D.A.

Concept discovery from synonymy graphs

This paper addresses the problem of automatic concept discovery from synonymy graphs. The purpose of the present study is to reuse the widely available semi-structured synonymy dictionaries for discovering the concepts. For that, Watset, a novel concept discovery method, based on graph clustering, has been proposed.

The method is designed under the assumption that the concept structures form cliques in the input synonymy graph. Watset has three primary steps. Firstly, it uses word sense induction to deal with ambiguous words. Secondly, it produces a disambiguated version of the input synonymy graph representing the synonymy relations between the particular word senses. Finally, it clusters the latter graph to produce a set of clusters corresponding to the concepts. The overall time complexity of this method has been assessed and found to be proportional to the number of the input words multiplied by the biquadratic maximum degree of the input graph.

A series of experiments has also been conducted to evaluate the performance of the proposed method. Watset outperformed four analogous state-of-the-art methods in terms of pairwise recall while being comparable in terms of pairwise precision and pairwise F-score on two datasets derived from the different Russian golden standards.

The software implementing the proposed approach has been made publicly available for further use.

[full text]
Keywords: ontology, computational semantics, word sense induction, graph clustering, concept, synset

Author(s):
Ustalov Dmitry Alekseevich
Position: Junior Research Scientist
Office: Krasovskii Institute of Mathematics and Mechanics, Ural Federal University
Address: 620990, Russia, Ekaterinburg, 16 Sofia Kovalevskaya Str.
Phone Office: (343) 362-81-63
E-mail: dau@imm.uran.ru
SPIN-code: 1435-1640

References:
[1] Lukashevich, N.V. Tezaurusy v zadachakh informatsionnogo poiska [Thesauri in information retrieval tasks]. Moscow: Izdatel'stvo Moskovskogo universiteta; 2011: 512. (In Russ.)
[2] Zagorulko, Y.A. Semantic technology for development of intelligent systems oriented on experts in subject domain. Ontology of Designing. 2015; 5(1):30–46. (In Russ.)
[3] Nikolaev, I.S., Mitrenina, O.V., Lando, T.M. Prikladnaya i komp'yuternaya lingvistika [Applied and computational linguistics]. Moscow: URSS; 2016: 320. (In Russ.)
[4] Fellbaum, C. WordNet: An Electronic Database. Cambridge, MA, USA: MIT Press; 1998: 449.
[5] Navigli, R., Ponzetto, S.P. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence. 2012; (193):217–250.
[6] Kiselev, Y., Porshnev, S., Muhkin, M.Yu. Current Status of Russian Electronic Thesauri: Quality, Completeness and Availability. Software Engineering. 2016; (6):34–40. (In Russ.)
[7] Shokina N.Y., Mocken S. A text mining system for creating electronic glossaries with application to research of Church Slavonic language. Computational Technologies. 2016; 21(4):3–15.
[8] Konstantinova, N.S., Mitrofanova, O.A. Ontologii kak sistemy khraneniya znaniy [Ontologies as knowledge storage systems]. Available at: http://www.ict.edu.ru/lib/index.php?id_res=5706 (accessed 04.11.2016). (In Russ.)
[9] Schutze, H. Automatic Word Sense Discrimination. Journal of Computational Linguistics. 1998; (24):97–123.
[10] Lin, D., Pantel, P. Concept Discovery from Text. Proceedings of the 19th International Conference on Computational Linguistics (COLING ’02). Taipei, Taiwan: Association for Computational Linguistics; 2002; (1): 1–7.
[11] Biemann, C. Chinese Whispers - an Efficient Graph Clustering Algorithm and its Application to Natural Language Processing Problems. Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing (TextGraphs-1). New York, NY, USA: Association for Computational Linguistics; 2006: 73–80.
[12] van Dongen, S. Graph Clustering by Flow Simulation: Ph.D. thesis. Utrecht, Netherlands, University of Utrecht; 2000: 169.
[13] Dorow, B., Widdows, D. Discovering Corpus-Specific Word Senses. Proceedings of the Tenth Conference on European Chapter of the Association for Computational Linguistics (EACL ’03). Budapest, Hungary: Association for Computational Linguistics; 2003; (2):79–82.
[14] Hope, D., Keller, B. MaxMax: A graph-based soft clustering algorithm applied to word sense induction. In: Gelbukh, A. (ed.) CICLing 2013, Part I. LNCS. 2013; (7816):368–381.
[15] Oliveira, H. G., Gomes, P. ECO and Onto.PT: a flexible approach for creating a Portuguese wordnet automatically. Language Resources and Evaluation. 2014; 48(2):373–393.
[16] Palla, G., Derenyi, I., Farkas, I., Vicsek, T. Uncovering the overlapping community structure of complex networks in nature and society. Nature. 2005; (435):814–818.
[17] Kamps, J., Marx, M., Mokken, R.J., de Rijke, M. Using WordNet to Measure Semantic Orientations of Adjectives. Proceedings of LREC’2004. Paris, France: European Language Resources Association; 2004:1115–1118.
[18] Bomze, I.M., Budinich, M., Pardalos, P.M., Pelillo, M. The Maximum Clique Problem. Handbook of Combinatorial Optimization. Springer; 1999: 1–74.
[19] Freeman, L.C. Centered graphs and the structure of ego networks. Mathematical Social Sciences. 1982; 3(3):291–304.
[20] Faralli, S., Panchenko, A., Biemann, C., Ponzetto, S.P. Linked Disambiguated Distributional Semantic Networks. The Semantic Web – ISWC 2016: 15th International Semantic Web Conference, Part II. Springer International Publishing; 2016: 56–64.
[21] Powers, D.M.W. Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation. Journal of Machine Learning Technologies. 2011; 2(1):37–63.
[22] Braslavski, P., Ustalov, D., Mukhin, M., Kiselev, Y. YARN: Spinning-in-Progress. Proceedings of the 8th Global WordNet Conference (GWC 2016). Global WordNet Association; 2016: 58–65.
[23] Abramov, N. Slovar' russkikh sinonimov i skhodnykh po smyslu vyrazheniy, 8-e izdanie [The dictionary of Russian synonyms and semantically related expressions, 8th edition]. Moscow: AST; 2007: 672. (In Russ.)
[24] Dikonov, V.G. Development of lexical basis for the Universal Dictionary of UNL Concepts. Computational Linguistics and Intellectual Technologies: Papers from the Annual International Conference “Dialogue”. Moscow: RGGU; 2013: 212–221.
[25] Krizhanovsky, A.A., Smirnov, A.V. An approach to automated construction of a generalpurpose lexical ontology based on Wiktionary. Journal of Computer and Systems Sciences International. 2013; 52(2):215-225.
[26] Panchenko, A., Ustalov, D., Arefyev, N., Paperno, D., Konstantinova, N., Loukachevitch, N., Biemann, C. Human and Machine Judgements for Russian Semantic Relatedness. Analysis of Images, Social Networks and Texts: International Conference, AIST 2016, Revised Selected Papers. Springer International Publishing; 2017: 303–317.

Bibliography link:
Ustalov D.A. Concept discovery from synonymy graphs // Computational technologies. 2017. V. 22. XVII All-Russian Conference of Young Scientists on Mathematical Modeling and Information Technology​. P. 99-112
Home| Scope| Editorial Board| Content| Search| Subscription| Rules| Contacts
ISSN 1560-7534
© 2024 FRC ICT