What is an ontology?

Source Persagen.com
Author Dr. Victoria A. Stuart, Ph.D.
Created 2021-04-16
Modified
Summary Glossary of key terms for Persagen.com
Contents

Background

An entity is something that exists as itself, as a subject or as an object, actually or potentially, concretely or abstractly, physically or not. It need not be of material existence.

A named entity is a real-world object - such as a person, location, organization, product, etc. - that can be denoted with a proper name. It can be abstract or have a physical existence. Examples of named entities include Barack Obama, New York City, Volkswagen Golf, or anything else that can be named. Named entities can simply be viewed as entity instances (e.g., New York City is an instance of a city).

Ontology is the philosophical study of being, as well as related concepts such as existence, becoming, and reality. Ontology addresses questions like how entities are grouped into categories, and which of these entities exist on the most fundamental level. Ontologists often try to determine what the categories or highest kinds are and how they form a system of categories that encompasses the classification of all entities. Ontology is sometimes referred to as the science of being and belongs to the major branch of philosophy known as metaphysics.

In computer science and information science an ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains of discourse. More simply, an ontology is a way of showing the properties of a subject area and how they are related, by defining a set of concepts and categories that represent the subject.

Simply put, an ontology allows administrators and users to manage knowledge according to a structured categorization of those data.

Formal definitions of ontology

The definition of ontology most frequently quoted in the semantic web literature is that provided by Tom Gruber, who defined an ontology as "an explicit specification of a conceptualization of a domain of interest."

  • Gruber, T. (2009) "Ontology." In: Encyclopedia of Database Systems, Ling Liu and M. Tamer Özsu (Eds.), Springer-Verlag.

  • Swartout et al. defined an ontology as "a hierarchically structured set of terms for describing a domain that can be used as a skeletal foundation for a knowledge base".

  • Swartout, B. R. Patil, K. Knight and T. Russ.(1997) "Towards Distributed Use of Large-Scale Ontologies." Spring Symposium Series on Ontological Engineering. Stanford University, California. pp. 138-148.

  • The problem of too much data

    When you conduct a search you may be overwhelmed with too many results. Having an ontology allows you to facet those data, better organizing those results while reducing the amount of material - according to your preference.

    For example, you may be interested in Persagen.com entries for Michael Levin, an American philosopher and writer at City University of New York,

    ... not Michael Levin, an American developmental and synthetic biologist at Tufts University.

    Similarly, you may wish to focus on LGBT right groups in Canada,

    ... not the United States.

    Eureka! - the Persagen ontology

    Eureka, from the Ancient Greek εὕρηκα - "I have found (it)" - is an interjection used to celebrate a discovery or invention.

    The Persagen ontology, Eureka! is used to classify entities as a grounded hierarchical data structure. "Grounded" means that all entries stem from a common root (ROOT), extended through the nested classification to the LEAF nodes - an entity (idea or concept; thing; named entity). This data structure provides several key attributes.

  • Anything in the universe may be easily and definitively categorized.

  • Being grounded, the relationship among entities is readily apparent.

  • Named entities may be uniquely identified, thus disambiguated both to humans and machines (machine learning, especially natural language processing.

  • A greater understanding of a domain may be attained through the examination of similar (locally categorized) entities in the ontology. Ontologies facilitate the discovery and visualization of relationships, not previously recognized or understood.

  • While some entities may appear as two or more ontological classifications, cross-referencing among ontology entries again facilitates broader understanding of subject matter.

  • Here is an example of a grounded ontological structure (illustration only - see also the D3.js visualization (demo), following).

    Eureka! - Data structure

    Eureka! is a living document - updated multiple times daily (form, content).

    • The plain-text version of Eureka! is available here. Note:

    For example, here is a representative listing (entries here edited for brevity).

    Eureka! currently (2022-07) consists of approximately 17,330 lines (entries). While it is easy to manage Eureka! in Vim (text editor) as a flat file, it is not the ideal data structure for these data. The ideal web- and JavaScript-friendly data structure is JSON (possibly JSONB, in PostgreSQL) - certainly JSON, in some form. JSON also allows the facile embedding of metadata - also extensively used at Persagen.com for data annotation and information retrieval and processing. JSON also facilitates the incorporation of relationships (e.g. parent-child nodes, and the representation of hyperdimensional data (analogous to mathematical tensors, the basis of Google's TensorFlow).

    The downside is it's considerably more difficult to manually edit and interact with JSON. Resolving this technical challenge is a key focus of Persagen's data engineering. In the meantime, the indexing of Eureka! in Apache Solr provides a facile user interface to the querying of those data.

    Eureka! - Data visualization

    An earlier draft version of Persagen explored a D3.js visualization of Eureka!. Noting the challenges above and the need to press forward on other areas of development (Persagen is a solo effort), JavaScript (JSON)-based visualizations of Eureka! await further study and exploration.

    [demo] Ontology D3.js visualization with search, pan, zoom

       

    Click to drag; mouse wheel (or double click / Shift-double click) to zoom at cursor position. Page Up/Down to scroll webpage. Reload page to reset all selections, views.

    Alternatives to the D3.js visualization above include the graphical display of those data - i.e., a relation graph (nodes plus edges). Platforms under consideration for that approach include NetworkX, Cytoscape (possibly Cytoscape.js), tensors / TensorFlow TensorBoard, and custom solutions.