Student | Yasith Ariyasena |
Supervisor | Aleksei Tepljakov |
Keywords | bibliographical data, data mining, Estonian research information system, PDF data extraction, research database |
Degree | MSc |
Thesis language | English |
Defense date | May 31, 2021 |
Document link | Download Thesis Document |
Bibliographic Data Mining from Estonian Research Information System
Abstract
Bibliographic metadata presents a valuable resource since it is essential in the process of analyzing research output and relations. However, extracting metadata becomes a challenge since the data is not available in specific formats. This research is about developing a system for bibliographic data mining of research papers from the research publications which are published in the Estonian research information system (ETIS). Since most of the publications in ETIS do not have Digital object identifiers (DOI), one of the main objectives of this project is to generate bibliographic metadata in BibTeX format for every publication that can be used in text editors such as LATEX to generate bibliographies quickly.
GROBID and Crossref tools are used further to parse the PDF files of the research papers and analyze the references to generate the BibTeX. The system is designed to integrate different functions with a web application. Output data is validated and tested with external tools to ensure that the system works without any issues.
Project results
As a result of this work, a solution for working with ETIS database and .bib bibliography was developed.