Информация о публикации

Просмотр записей
Инд. авторы: Zhizhimov O.L.
Заглавие: Technology for Extracting Geographical Names from Text Documents Based on the PostgreSQL
Библ. ссылка: Zhizhimov O.L. Technology for Extracting Geographical Names from Text Documents Based on the PostgreSQL // Data Analytics and Management in Data Intensive Domains: XX International Conference DAMDID/RCDL'2018 (October 9-12, 2018, Moscow, Russia): Conference Proceedings / Edited by Leonid Kalinichenko, Yannis Manolopoulos, Sergey Stupnikov, Nikolay Skvortsov, Vladimir Sukhomlin. - 2018. - Moscow: FRC CSC RAS. - P.203-206. - ISBN: 978-5-519-65438-8.
Реферат: eng: Extracting geographical names from arbitrary text documents is important in the tasks of processing large arrays of documents and linking their content to a specific geographic region. In the simplest form, the model for extracting geographical names from the text looks like a sequence of actions with the text, while at each stage its task is solved. Among these tasks, there are undoubtedly: text parsing, analyzing text elements, processing synonyms and abbreviations, bringing the text elements to normal form from possible word forms and grammar rules, comparing text elements with the elements of dictionaries of geographical names, adding special tags to the text for unambiguous identification geographical names. The proposed work describes a technology that implements the above tasks on the basis of a freely distributed PostgreSQL DBMS. In this case, the standard configuration is used, all the server part settings are performed within the framework of the documented procedures. GeoNames Gazetteer database, Open Street Map (OSM) databases, OKATO and КЛАДР classifications are used as an authoritative database of geographical names.
Ключевые слова: geographical search; PostgreSQL; text processing; model of extraction of names; full-text search; geographical names;
Издано: 2018
Физ. характеристика: с.203-206
Конференция: Название: XX Международная конференция «Аналитика и управление данными в областях с интенсивным использованием данных»
Аббревиатура: DAMDID/RCDL'2018
Город: Москва
Страна: Россия
Даты проведения: 2018-10-09 - 2018-10-12
Ссылка: http://damdid2018.frccsc.ru/
[1]	Zhizhimov O.L., Mazov N.A. Problems of geographical reference of digital objects in digital libraries.  Proc. XII All-Russian Sci. Conf. «Electronic libraries: Perspective Methods and Technologies, Electronic collections» (RCDL’2010). Kasan, p. 207–214. (2010).
[2]	Barakhnin V.B., Zhizhimov O.L., Kupershtokh A.A., Skachkov D.M., Fedotov A.M. The Algoritm of Exstracting Place Names Representing Content from Text Documents. Vestnik NSU. Ser.: The Information technology, Vol.10, Iss.1, p.109-120. (2012).
[3]	All-Russian classifier of administrative-territorial division objects (OKATO), http://protect.gost.ru/document.aspx?control=20&id=134377.
[4]	Classifier of addresses of the Russian Federation (CLADR), http://kladr-rf.ru.
[5]	The GeoNames geographical database. -  http://www.geonames.org/.
[6]	Open Street Map, http://wiki.openstreetmap.org.
[7]	Getty Thesaurus of Geographic Names (TGN), - http://www.getty.edu/research/tools/vocabularies/tgn/index.html.
[8]	State catalogue of geographical names, Rosreestr. - https://rosreestr.ru/site/activity/geodeziya-i-kartografiya/naimenovaniya-geograficheskikh-obektov/gosudarstvennyy-katalog-geograficheskikh-nazvaniy/.
[9]	Bartunov J., Sigaev F.  Introduction to full-text search in PostgreSQL, - http://citforum.ru/database/postgres/fts/bib.shtml.
[10]	Zhizhimov, O.L., Fedotov, A.M., Shokhin, Y.I. The ZooSPACE platform- access organization to various distributed resources. Digital libraries: The Russian scien-tic e-magazine. - Vol.17. – Iss. 2. - ISSN 1562-5419 (2014).