Design and development of the database architecture for a bibliographic database

Student Awaiting assignment — project topic available
Supervisors Azer Ramazanli, Aleksei Tepljakov
Keywords Database architecture design, bibliographic database, RESTful API, SQL, NoSQL

This project is devoted to the design and the development of a coherent database architecture for a bibliographic database (Wikipedia definition – Database of bibliographic records, an organized digital collection of references to published literature, including journal and newspaper articles, conference proceedings, reports, government and legal publications, patents, books, etc.) which is going to be part of a research assistant toolset planned to be developed. In this section, we will describe tasks, expectations and requirements associated with this specific student project, but students who are interested in taking this topic and want to know more about how the end product is planned to be used are recommended to check “The Big Picture – Development of a research assistant toolset” section also for details.

Project Description

The project is essentially composed of two parts: the design of the database  architecture and the implementation of the actual bibliographic database and API for interacting with it.  In other words, the end result of the project must contain an efficiently designed 3-tier database architecture for a bibliographic database and the implementation of the first two layers of the architecture, namely, the Data Layer and the Application Layer.  As the initial step, student must conduct a research addressing the questions including but not limited to:

  • How the database architecture is designed in already existing academic databases and why?
  • Are there any particular nuances which must be considered during the design phase of a bibliographic database specific to each data abstraction level (internal schema, conceptual schema, external schema)?
  • What is the most suitable database model (relational, object-oriented,  graph model, document-based, XML, or maybe hybrid?) to choose considering the specific purpose the database will serve?
  • What are the known, common challenges in the development and handling of the bibliographic databases and what are the state of the art solutions? (For example, one known challenge is an entity resolution problem referred to as “Author Name Disambiguation” in the literature).

After completing the research process, the student will be able to design a justified, coherent, purpose-specific architecture and to make informed decisions necessary for the implementation of the database and the API.

Requirements

  • Interest on conducting literature review and research on the specified topics.
  • Ability and desire to work on development of complex database architectures.
  • Experience with relational (SQL) and/or document-based (NoSQL) databases.
  • Knowledge and experience in Python3.
  • Familiarity with RESTful architectures.
  • Experience in Django or Flask or FastAPI.

Please note that satisfying all the requirements listed above is sufficient, but not necessary for taking the topic. An important remark to consider is that, the student taking the topic will be expected to have a submitted conference paper written based on the conducted research and the development before the thesis defense.

 

The Big Picture – Development of a research assistant toolset

Project Description

This project is devoted to the development of an assistant program helping researchers to automate and accelerate the academic research process, as well as, to discover new research insights. The program which will be developed as a suite of software tools is planned to be composed of the following components:

  • Bibliographic database built upon the data retrieved from academic search engines, DOI registration agencies and open access research paper aggregators and housed on a server installation.
  • Reference management tool (Wikipedia definition – software for scholars and authors to use for recording and utilizing bibliographic citations as well as managing project references either as a company or an individual).
    • Facility for importing the details of publications from our bibliographic database and external sources.
    • User-side database architecture in which full bibliographic references can be entered and recorded.
    • Built-in citation creator system generating selective list of articles – bibliographies formatted according to known guidelines and standards.
  • Embedded web browser adding following functionalities:
    • Academic search engine integrated with our own bibliographic database and external scholarly databases and search engines.
    • Quick-access configuration for full-text article (if available as open-access) and scholar profile inspections in external academic sources.
  • Integrated document processor preferably working based on the LaTex typesetting system.
  • Integrated markdown editor preferably having syntax highlighting, cross-reference, live preview and auto-completion features.
  • Tool for constructing and visualizing bibliometric networks (such as publication citation networks and academic social connection networks) based on citations, co-citations, affiliations, and co-authorship relations. Possible node entities in the generated graphs may be researchers, publications, publishing venues, conferences,  research institutes and universities, and even distinct geographical regions and states.

 

As the end result, the developed toolset is intended to provide researchers with the following features:

  • Access to a voluminous academic database composed of:
    • Metadata of many millions of publications.
    • Hundreds of millions of citation links.
    • Many millions of researcher data.
    • Data about publishing venues, grants, patents, research concepts, etc.
    • Full-texts of open-access scientific documents.
  • Academic metrics for evaluating publications, venues, conferences, institutes and researchers.
  • Retrieve the semantic-based profile generated for the requested researcher including list of scientific works, contact info, citation statistics, academic efficiency evaluation,  affiliations, research interests,  personal academic social graph, etc.
  • Discovering top of the field experts and rising stars for specific research areas and concepts.
  • Discovering the most-cited papers and the most active scientists associated with specific publication venues and conferences.
  • Constructing, visualizing and analyzing bibliometric graphs based on search queries.
  • Text-mining based concurrence graphs for terms and concepts extracted from the body of the scientific literature.
  • Importing references from files in different reference formats or directly from the embedded web browser.
  • Autocompleting missing metadata or retrieving complete metadata of bibliographic entities based on DOI, ISBN or other associated IDs in academic sources.
  • Configurable reference types and metadata fields.
  • Highly-customizable searching, filtering, tagging, ranking and note-taking features.
  • Quick-access and linking for open-access full-text articles.
  • Summarizing tool accelerating literature review process.
  • Storing, loading and managing the research in hierarchically organized local document-based databases.
  • Generating and exporting citations and bibliographies automatically formatted in over 10000 citation styles.
  • Quickly and effectively noting the research ideas with integrated markdown editor.
  • Cite while you write functionality with the integrated document processor.

Share this