A survey of ontology learning techniques and applications Open Access

Performance Summary of Ontology Learning Techniques

Techniques		Domain	Performance	References
				Paper	Tools
Linguistic Techniques
Preprocessing	Berkley Parser	Tourism, Sport	Precision=95.7%	(28)	Text2Onto(75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), https://nlp.stanford.edu/software/lex-parser.shtml, http://nlp.cs.berkeley.edu/
	Stanford Parser		Precision=90.3%
	Syntactic Analysis for headword modifier	Chinese Text	Accuracy=83.3%	(29)	https://github.com/kimduho/nlp/wiki/Head-modifier-principle-(or-relation)
Relation Extraction	Lexico-syntactic Parsing	News	Accuracy=75.5%	(40)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), TextStorm/Clouds (27, 123)
Relation Extraction	Dependency Analysis	Bioinformatics	Accuracy=83.3%	(38)
Statistical Techniques
Term Extraction	C/NC Value	Medical	Precision=89.7%	(26)	OntoGain (72), https://github.com/Neuw84/CValue-TermExtraction
			Computer Science	Precision=86.67%
	Contrastive Analysis	Chinese Text	Precision=70%	(56)	OntoLearn (49, 124, 55, 125), CRCTOL (28), OntoGain (72)
	Co-occurrence Analysis	Biomedical (Cancer)	Precision=67.3%	(62)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/gsi-upm/sematch
	Clustering	Tourism	Accuracy=68.52%	(66)	ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://pythonprogramminglanguage.com/kmeans-text-clustering/
		Tourism	Accuracy=53.2%
Relation Extraction	Formal Concept Analysis	Medical	Precision=47%	(72)	OntoGain (72), https://github.com/xflr6/concepts
		Computer Science	Precision=44%	(72)	OntoGain (72), https://github.com/xflr6/concepts
	Hierarchical Clustering	Medical	Precision=71%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/mstrosaker/hclust
		Cooking	Precision=92.1%	(71)
		Finance	F1 Score=18.51%	(75)
		Tourism	F1 Score=21.4%	(75)
	Association Rule Mining	Medical	Accuracy=72.5%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html)
Logical
	Inductive Logical Programming	English	Accuracy=96%	(83)	TextStorm/Clouds (27, 123) , Syndikate (126, 11), http://pyke.sourceforge.net/

Techniques		Domain	Performance	References
				Paper	Tools
Linguistic Techniques
Preprocessing	Berkley Parser	Tourism, Sport	Precision=95.7%	(28)	Text2Onto(75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), https://nlp.stanford.edu/software/lex-parser.shtml, http://nlp.cs.berkeley.edu/
	Stanford Parser		Precision=90.3%
	Syntactic Analysis for headword modifier	Chinese Text	Accuracy=83.3%	(29)	https://github.com/kimduho/nlp/wiki/Head-modifier-principle-(or-relation)
Relation Extraction	Lexico-syntactic Parsing	News	Accuracy=75.5%	(40)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), TextStorm/Clouds (27, 123)
Relation Extraction	Dependency Analysis	Bioinformatics	Accuracy=83.3%	(38)
Statistical Techniques
Term Extraction	C/NC Value	Medical	Precision=89.7%	(26)	OntoGain (72), https://github.com/Neuw84/CValue-TermExtraction
			Computer Science	Precision=86.67%
	Contrastive Analysis	Chinese Text	Precision=70%	(56)	OntoLearn (49, 124, 55, 125), CRCTOL (28), OntoGain (72)
	Co-occurrence Analysis	Biomedical (Cancer)	Precision=67.3%	(62)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/gsi-upm/sematch
	Clustering	Tourism	Accuracy=68.52%	(66)	ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://pythonprogramminglanguage.com/kmeans-text-clustering/
		Tourism	Accuracy=53.2%
Relation Extraction	Formal Concept Analysis	Medical	Precision=47%	(72)	OntoGain (72), https://github.com/xflr6/concepts
		Computer Science	Precision=44%	(72)	OntoGain (72), https://github.com/xflr6/concepts
	Hierarchical Clustering	Medical	Precision=71%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/mstrosaker/hclust
		Cooking	Precision=92.1%	(71)
		Finance	F1 Score=18.51%	(75)
		Tourism	F1 Score=21.4%	(75)
	Association Rule Mining	Medical	Accuracy=72.5%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html)
Logical
	Inductive Logical Programming	English	Accuracy=96%	(83)	TextStorm/Clouds (27, 123) , Syndikate (126, 11), http://pyke.sourceforge.net/

Table 1

Performance Summary of Ontology Learning Techniques

Techniques		Domain	Performance	References
				Paper	Tools
Linguistic Techniques
Preprocessing	Berkley Parser	Tourism, Sport	Precision=95.7%	(28)	Text2Onto(75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), https://nlp.stanford.edu/software/lex-parser.shtml, http://nlp.cs.berkeley.edu/
	Stanford Parser		Precision=90.3%
	Syntactic Analysis for headword modifier	Chinese Text	Accuracy=83.3%	(29)	https://github.com/kimduho/nlp/wiki/Head-modifier-principle-(or-relation)
Relation Extraction	Lexico-syntactic Parsing	News	Accuracy=75.5%	(40)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), TextStorm/Clouds (27, 123)
Relation Extraction	Dependency Analysis	Bioinformatics	Accuracy=83.3%	(38)
Statistical Techniques
Term Extraction	C/NC Value	Medical	Precision=89.7%	(26)	OntoGain (72), https://github.com/Neuw84/CValue-TermExtraction
			Computer Science	Precision=86.67%
	Contrastive Analysis	Chinese Text	Precision=70%	(56)	OntoLearn (49, 124, 55, 125), CRCTOL (28), OntoGain (72)
	Co-occurrence Analysis	Biomedical (Cancer)	Precision=67.3%	(62)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/gsi-upm/sematch
	Clustering	Tourism	Accuracy=68.52%	(66)	ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://pythonprogramminglanguage.com/kmeans-text-clustering/
		Tourism	Accuracy=53.2%
Relation Extraction	Formal Concept Analysis	Medical	Precision=47%	(72)	OntoGain (72), https://github.com/xflr6/concepts
		Computer Science	Precision=44%	(72)	OntoGain (72), https://github.com/xflr6/concepts
	Hierarchical Clustering	Medical	Precision=71%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/mstrosaker/hclust
		Cooking	Precision=92.1%	(71)
		Finance	F1 Score=18.51%	(75)
		Tourism	F1 Score=21.4%	(75)
	Association Rule Mining	Medical	Accuracy=72.5%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html)
Logical
	Inductive Logical Programming	English	Accuracy=96%	(83)	TextStorm/Clouds (27, 123) , Syndikate (126, 11), http://pyke.sourceforge.net/

Techniques		Domain	Performance	References
				Paper	Tools
Linguistic Techniques
Preprocessing	Berkley Parser	Tourism, Sport	Precision=95.7%	(28)	Text2Onto(75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), https://nlp.stanford.edu/software/lex-parser.shtml, http://nlp.cs.berkeley.edu/
	Stanford Parser		Precision=90.3%
	Syntactic Analysis for headword modifier	Chinese Text	Accuracy=83.3%	(29)	https://github.com/kimduho/nlp/wiki/Head-modifier-principle-(or-relation)
Relation Extraction	Lexico-syntactic Parsing	News	Accuracy=75.5%	(40)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), CRCTOL (28), ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), TextStorm/Clouds (27, 123)
Relation Extraction	Dependency Analysis	Bioinformatics	Accuracy=83.3%	(38)
Statistical Techniques
Term Extraction	C/NC Value	Medical	Precision=89.7%	(26)	OntoGain (72), https://github.com/Neuw84/CValue-TermExtraction
			Computer Science	Precision=86.67%
	Contrastive Analysis	Chinese Text	Precision=70%	(56)	OntoLearn (49, 124, 55, 125), CRCTOL (28), OntoGain (72)
	Co-occurrence Analysis	Biomedical (Cancer)	Precision=67.3%	(62)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/gsi-upm/sematch
	Clustering	Tourism	Accuracy=68.52%	(66)	ASIUM (117, 118, 119) (http://www-ai.ijs.si/∼ilpnet2/systems/asium.html), Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://pythonprogramminglanguage.com/kmeans-text-clustering/
		Tourism	Accuracy=53.2%
Relation Extraction	Formal Concept Analysis	Medical	Precision=47%	(72)	OntoGain (72), https://github.com/xflr6/concepts
		Computer Science	Precision=44%	(72)	OntoGain (72), https://github.com/xflr6/concepts
	Hierarchical Clustering	Medical	Precision=71%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html), https://github.com/mstrosaker/hclust
		Cooking	Precision=92.1%	(71)
		Finance	F1 Score=18.51%	(75)
		Tourism	F1 Score=21.4%	(75)
	Association Rule Mining	Medical	Accuracy=72.5%	(72)	Text2Onto (75, 120, 121, 122) (http://neon-toolkit.org/wiki/1.x/Text2Onto.html)
Logical
	Inductive Logical Programming	English	Accuracy=96%	(83)	TextStorm/Clouds (27, 123) , Syndikate (126, 11), http://pyke.sourceforge.net/

In addition, we also cite the tools (column: Tools) and reference papers (column: Paper) against each performance benchmark produced by specific underlying ontology learning technique in different domains. Table 1 can prove a milestone for researchers and practitioners as it marks seven most prominent and widely used ontology learning tools with their respective methodology. Among all, Text2Onto, ASIUM and CRCTOL are considered hybrid ontology learning tools as they exploit both linguistic and statistical techniques in order to extract terms and relations from underlying corpus. Whereas OntoGain and OntoLearn solely utilize statistical-based methods in order to perform any ontology learning task. Similarly, TextStorm/Clouds and Syndikate use only logical techniques to acquire concepts and relations.

Evaluation of ontology learning techniques

Assessing the quality of ontology acquisition is a very important aspect of smart web technology as it allows the researchers and practitioners to assess the correctness at lexical level, coverage at concept level, wellness at taxonomic level and adequacy at non-taxonomic level of yielded ontologies. Evaluation of ontology acquisition makes it possible to refine and remodel the entire ontology learning process in case of unexpected resultant ontologies, which do not fit with the specific requirements of a user. As discussed earlier, ontology learning is a multi-level process so this makes the evaluation process of ontology extraction pretty hard. Considering the complexity of evaluating domain ontologies, countless evaluation techniques have been proposed in the past couple of years and this area is still under continuous development. All proposed techniques fall under one of these categories, which are generally classified on the basis of kind of target ontologies and purpose of evaluation.

Golden standard-based evaluation
Application-based evaluation
Data-driven evaluation
Human evaluation

Table 2 gives an overview of ontology evaluation approaches against various supported evaluation levels of ontology learning.

Table 2

Overview of ontology evaluation approaches

Level	Golden standard	Application-based	Data-driven	Assessment by humans
Lexical, vocabulary, concept and data	x	x	x	x
Hierarchy and taxonomy	x	x	x	x
Other semantic relations	x	x	x	x
Context and application		x		x
Syntactic	x			x
Structure, architecture and design				x

Level	Golden standard	Application-based	Data-driven	Assessment by humans
Lexical, vocabulary, concept and data	x	x	x	x
Hierarchy and taxonomy	x	x	x	x
Other semantic relations	x	x	x	x
Context and application		x		x
Syntactic	x			x
Structure, architecture and design				x

Table 2

Overview of ontology evaluation approaches

Level	Golden standard	Application-based	Data-driven	Assessment by humans
Lexical, vocabulary, concept and data	x	x	x	x
Hierarchy and taxonomy	x	x	x	x
Other semantic relations	x	x	x	x
Context and application		x		x
Syntactic	x			x
Structure, architecture and design				x

Level	Golden standard	Application-based	Data-driven	Assessment by humans
Lexical, vocabulary, concept and data	x	x	x	x
Hierarchy and taxonomy	x	x	x	x
Other semantic relations	x	x	x	x
Context and application		x		x
Syntactic	x			x
Structure, architecture and design				x

This section highlights the research work done by many researchers and practitioners utilizing one of the mentioned evaluation techniques along with advantages, challenges and drawbacks.

Golden standard-based evaluation

Golden standard-based evaluation is all about evaluating resultant ontology with a predefined benchmark or standard ontology. As gold standard ontology depicts an ideal ontology of a particular domain, assessing and comparing the learned ontology through this reference ontology can efficiently validate domain coverage and consistency. Golden standard can be a stand-alone ontology, statistical figures fetched from corpus or formalized by domain experts. Golden standard-based techniques are also known as ontology mapping or ontology alignment. All measures that come under the category of golden standard-based evaluation enable frequent and large-scale evaluations at multi-level. However, having an appropriate gold ontology may prove a huge challenge, since it needs to be the one that has been created with similar conditions and goals as suggested by the learned ontology. This leads to select either human-created taxonomies or reliable taxonomies of a similar domain as gold standard by most of the approaches. It is important to mention that all gold standard techniques mostly cover completeness, conciseness and accuracy factors for evaluation of learned ontologies.

Maedche and Staab (60) propose a set of similarity measures for ontology and empirical evaluation for different phases of ontology learning. They take ontologies as two-layer architecture comprising of lexical and conceptual layer. Considering this ontology model, they compute similarity between learned ontology and reference ontology, which is prepared by experts in tourism domain. They measure the similarity of ontologies on the basis of lexicon, semantic cotopy and reference functions. Moreover, Ponzetto and Strube (88) extracted a taxonomy from Wikipedia and compared it with a couple of gold standard taxonomies. At first, this technique utilizes a denotational mapper known as ‘lexeme-to-concept’ to map the extracted ontology. Finally, semantic similarity is computed through WordNet using various measures: Leacock and Chodorow (89,), Zavitsanos et al. (90,), Trokanas et al. (91,) and Sfar et al. (92,) assess the learned ontology by comparing it with a gold standard ontology. The proposed approach computes the similarity of two ontologies at lexical and relational level by transforming the ontological concepts and their attributes into vector representation. Likewise, Kashyap et al. (93) also exploited the similar approach by considering MEDLINE as corpus and MeSH thesaurus as benchmark to assess their extracted taxonomy. The assessment process actually compares the constructed taxonomy with the benchmark taxonomy using the following couple of metrics:

Content quality: It computes the extent of overlap among the labels of both taxonomies for sake of measuring precision and recall.
Structural quality: It computes the structural validity of all labels. For instance, if two labels are appearing in an ancestor–descendant relationship in first taxonomy then they must possess the same parent child relationship in other taxonomy.

Treeratpituk et al. (94) constructed a taxonomy from a corpus of larger text. They compared the constructed taxonomy with the six benchmark taxonomies. These taxonomies are topic specific and extracted from Wikipedia by exploiting their suggested GraBTax algorithm.

Application-based evaluation

Application-based evaluation also referred as ‘Task Based Evaluation’ is an application and task-oriented evaluation as it evaluates given ontology by exploiting it in a specific application to perform some task. The outcome of particular task determines the goodness of specified ontology regardless of its structural properties. Task-based methodologies enable the detection of inconsistent concepts and allow to evaluate the adaptability of particular ontology by analyzing the performance of the specified ontology in the context of various tasks (95). In addition, task-based approaches are mostly getting exploited in the process of evaluating compatibility among employed tool and the ontology and measuring the required pace to complete the particular task. Application-based evaluation evaluates the correctness, coverage, adequacy and wellness of ontology in reference to other applications. For instance, an ontology is crafted in quest of improving the results of document retrieval. One may accumulate some sample queries to check if application retrieved more relevant documents after utilizing crafted ontology. In addition, it is important to mention that task-based evaluation measures mainly depend on the kind of task. In the case of document retrieval, traditional measures of information retrieval such as F-score can be used (96, 97). Lozano-Tello et al. (98) proposed a technique that enables the users to determine the suitability and appropriateness of existing ontologies with the requirements of their respective systems. Porzel and Malaka et al. (99) evaluated the exploitation of ontological relations in speech recognition. Human-generated gold standard is used to compare the outcome of the speech recognition system. It is important to mention that application-based evaluation has several shortcomings, which are highlighted as below:

Ontology gets evaluated after getting exploited in a particular way by a specific application for a particular task; therefore, it is pretty hard to generalize its performance.
Ontology can be a minor component of an application so its impact over the results may be indirect and small.
Various ontologies can be compared if they all can be embedded into the same application for the same task.

Moreover, Haase and Sure (100) assess the quality of specific ontology by finding the extent to which it enables the users to acquire relevant individuals in particular search. They introduce a cost intensive model to figure out the required user’s effort against desired relevant information. This cost is computed through the complexity of constructed hierarchy in form of breadth and depth.

Data-driven evaluation

Data-driven or so-called Corpus-based evaluation (96) utilizes existing domain-specific knowledge sources (usually textual corpora) to assess the extent of coverage by specific ontology in particular domain. The major advantage of this approach is enabling the comparison of one or more target ontologies with a specific corpus. Like golden standard-based approach, it also covers the similar evaluation criteria comprising of completeness, conciseness and accuracy of learned ontologies. The major challenge of data-driven approaches is to find a domain-specific corpus that is much easier than finding a fine domain-specific benchmark ontology. For instance, Jones and Alani (101) utilized Google as the search engine in order to find a corpus against a specific user query. After expanding the user query by exploiting WordNet, the top 100 pages of Google results are taken as the corpus for the sake of evaluation. Many researchers performed the corpus based evaluation. For example, Brewster et al. (102) explained the number of techniques and methodologies for assessing the structural fit among ontology and particular domain knowledge, which exists like text corpora. They acquire domain-specific terms from textual corpora by utilizing latent semantic analysis. The extent of overlap among domain-specific terms and terms revealing in a particular ontology (i.e. concepts names) are used to compute the fit among the ontology and corpus. Moreover, they proposed a probabilistic methodology to determine the best ontology among all candidate ontologies. Sordo et al. (39) used it to evaluate the music relations extracted from unstructured text. Likewise, Patel et al. (103) assessed the coverage of specific ontology by retrieving textual data such as concepts names and relations from it. The acquired textual data is exploited as a source of input to a fine text classification model, which is trained by utilizing various standard machine learning methodologies.

Human evaluation

Human evaluation of ontologies is generally based on defining and formulating various decision criteria for the selection of best ontology from a specified set of candidate ontologies. A numerical score is assigned after evaluating ontology against each criterion. Finally, a weighted sum is calculated through criterion scores. This kind of evaluation is also called ‘Criteria Based Evaluation’ (96). Criteria-based evaluation is extensively getting used in many contexts for the selection of best ontology (i.e. grant applications, tenders etc.). The major shortcoming of criteria-based evaluation is the requirement of high manual cost in terms of time and effort. However, this approach is deprecated and not used very often nowadays. Researchers did quite some work over this approach. For example, Burton-Jones et al. (104) proposed a list of 10 criteria comprising of richness (number of syntactic features present in formal language are utilized by specific ontology), lawfulness (syntactical errors frequency), interpretability (determining the existence of ontology terms in WordNet), clarity (number of terms senses present in WordNet), consistency (number of inconsistent concepts), accuracy (number of false statements in the target ontology), comprehensiveness (total concepts in the target ontology, compare to the average for the entire repository of ontologies), authority (number of ontologies utilizing the concepts from target ontology), history (number of accesses have been made to target ontology in comparison of other candidate ontologies) and relevance (total statements which involve significant syntactic features). Similarly, Fox et al. (105) present a set of criteria that is more inclined toward manual evaluation and assessment of ontologies. Lozano-Tello et al. (106) formulate a set comprising of 117 criteria, grouped in a framework of three levels. They assess taxonomies on the basis of multi-level properties comprising of cost, design qualities, language properties and tools through the assignment of some scores. Moreover, criteria-based evaluation can also be classified in two categories which are discussed below.

Structure-based evaluation
Structure-based methodologies explore and measure different structural properties in quest of evaluating specified taxonomy. Most proposed structure-based techniques fully automate the entire evaluation process. For example, one may compute the relational density of all existing nodes and an average of taxonomic depth. Like, Fernández et al. (107) examine the effect of various structural ontology methodologies in context of ontology quality. After extensive experimentation, they conclude that lavishly populated ontologies in terms of high depth and breadth values have more chances of being correct. Besides, Gangemi et al. (108) assess ontologies on the basis of presence of cycles in a directed graph.
Complex- and Expert-based evaluation
Complex- and expert-based evaluation measures are in high numbers, which try to embed various aspects and properties of ontology quality. For instance, Alani and Brewster et al. (109) add many ontology evaluation measures such as density, betweenness and class matching measures in ‘AKTiveRank’ system. Moreover, Guarino and Welty (110) assess ontologies through a system known as ‘OntoClean’. OntoClean is based on a set of notions comprising identity, essence and unity. They exploit the OntoClean notions to characterize and explore the suggested meaning of classes, relations and properties that actually prove significant to build up a specific ontology.

Ontology learning data sets

This section summarizes the characteristics of commonly used data sets and systems in ontology learning. For the development of ontologies using ontology learning techniques, data sets containing unstructured domain-specific documents are used. For the biological domain, most of the researchers use OHSUMED (http://davis.wpi.edu/xmdv/datasets/ohsumed) (111, 112, 113) and Genia Corpus (http://www.geniaproject.org/genia-corpus) (114, 115) for experimentation. Similarly, in traveling and tourism domain, data sets for ontology learning are Mecklenburg Vorpommern (116, 75) and Lonely Planet (http://www.lonelyplanet.com/destinations) (116, 75). Two large data sets of news domain namely British National Corpus (http://www.natcorp.ox.ac.uk/) (97) and Reuters-21578 (https://archive.ics.uci.edu/ml/datasets/reuters-21578+text+categorization+collection) (113, 97) are also extensively used for experimentation and evaluation of different ontology learning systems. Table 3 illustrates the characteristics of six data sets.

Table 3

Summary of Popular Datasets

Corpus	No. of documents	Domain	Tokens
Mecklenburg Vorpommern	1047	Tourism	332000
Lonely Planet	1801	Traveling	1 Million
British National Corpus	4124	News	100 Million
Reuters-21578	21578	News	218 Million
OHSUMED	348566	Biological	NA
Genia Corpus	2000	Biological	400000
Planet Stories	307	Stories	NA

Corpus	No. of documents	Domain	Tokens
Mecklenburg Vorpommern	1047	Tourism	332000
Lonely Planet	1801	Traveling	1 Million
British National Corpus	4124	News	100 Million
Reuters-21578	21578	News	218 Million
OHSUMED	348566	Biological	NA
Genia Corpus	2000	Biological	400000
Planet Stories	307	Stories	NA

Table 3

Summary of Popular Datasets

Corpus	No. of documents	Domain	Tokens
Mecklenburg Vorpommern	1047	Tourism	332000
Lonely Planet	1801	Traveling	1 Million
British National Corpus	4124	News	100 Million
Reuters-21578	21578	News	218 Million
OHSUMED	348566	Biological	NA
Genia Corpus	2000	Biological	400000
Planet Stories	307	Stories	NA

Corpus	No. of documents	Domain	Tokens
Mecklenburg Vorpommern	1047	Tourism	332000
Lonely Planet	1801	Traveling	1 Million
British National Corpus	4124	News	100 Million
Reuters-21578	21578	News	218 Million
OHSUMED	348566	Biological	NA
Genia Corpus	2000	Biological	400000
Planet Stories	307	Stories	NA

Industrial applications of ontology learning

A large amount of unstructured and semistructured data is being generated every second in the world. If we talk about statistics of data generation, almost 2.5 quintillion bytes of data were generated every day in 2017, which is a humongous amount ( https://www.ibm.com/blogs/insights-on-business/consumer-products/2-5-quintillion-bytes-of-data-created-every-day-how-does-cpg-retail-manage-it/). These data are distributed over the internet at various websites in such a way that it is totally disconnected. Storing such gigantic amount of data requires a lot of resources. Moreover, it is extremely difficult to process such data is order to find useful information. This marks the desperate need of a knowledge representation model, which shall store such data in a more structured way to enable fast processing and quick retrieval at large scale. The model that enables structured representation of data is known as ontology.

Ontologies are being extensively used in information retrieval, question answering and decision support systems. This section illustrates applications of ontology in diverse industries such as oil and gas industry, military, e-government, e-health and e-culture etc.

Oil and gas industry

Oil and gas industry is one of the most data intensive industry that is generating a huge amount of important data every day. Data are being generated from various sources in the form of oil wells data, seismic data, drilling and transportation data, customer data and marketing data. Since it is one of the industry that controls the balance of power in the world, these data along with its semantic are of significant importance as it can be used to derive very useful information. Soma et al. (127) presented a reservoir management system that uses the semantic web to access and enhance the view of information present in its core knowledge base. Fluor Corporation’s Accelerating Deployment of ISO 15926 (ADI) (150, 151) project converts ISO 159263 Part 4 (a resource of oil and gas industry that has descriptions of plant objects) into RDF/OWL form to make it process-able by computer systems. Norwegian Daily Production Report project implemented ontology based on ISO 15926 standard to make data comparison and retrieval easy. Moreover, workflow and quality of oil and gas industry can be further improved by utilizing the semantic web concepts by integrating the semantic web with Internet of Things.

Military technology

Diverse military technologies such as drones and weaponized mobile robots are producing exponentially large battlefield information. Technologists are using the semantic web to manage massive data load and assist decision analysis during the battle by utilizing the significant information produced by all auto-military units. In addition, ontologies are being constructed to conjure up battlefield information for quick retrieval. Halvorsen and Hansen (152) provided an integrated approach to access military information, which uses RDF representation and serialization mechanism between various systems and uses SPARQL as communication protocol. This approach can be used for threat detection by reasoning over the information provided in RDF triplets~(128).

In quest of standardizing available information, decision making and exchanging information effectively, technologists introduced diverse ontologies like MilInfo (129) and Air Tasking Order (ATO) (130). The ATO helps to assign the aircraft missions. Besides this, Tactic Technique and Procedure Ontology (131) as well as Battle Management Ontology (132) are some more ontologies to assist military decision making and shared information access. Another possible ontology could be the soldier ontology (http://rdf.muninn-project.org/ontologies/military.html), which can be generated by making use of the data of both on duty and retired soldiers. This type of ontology can help in selection of soldiers for specific missions and keeping tracks of retired senior soldiers.

E-government

Incorporation of ontology and the semantic web in e-government portals can be very fruitful. Instead of relying only on text, the underlying ontology can be used to extract the information that is semantically more meaningful to the query. Such portals are more efficient than simple traditional search portals, which do not consider semantics. Various governmental departments will be able to keep their knowledge bases in sync by using the underlying ontologies.

Rui et al. (133) presented the concept of semantic information portal that utilized semantic search algorithm. They not only proposed but also implemented the algorithm to retrieve semantically correct results against queries. On the other hand, Haav (134) described a process with which ontologies can be created for e-governmental data. By making use of these ontologies and semantics, government can manage their resources effectively and improve the planning and development policies.

E-business and E-commerce

E-business and e-commerce have also started utilizing the powers of the semantic web to make important business decisions and to develop smart systems for end users by handling massive available data efficiently using ontologies. GoodRelations is one such ontology introduced by Hepp (135). The ontology is essential for any semantic based web platform as it models various e-commerce concepts like products, prices, discount offers, sales offers etc. LIB2CO created by Akanbi (136) is another integrated semantic web platform that offers two major agents. One is search agent that retrieves semantically correct results to consumer queries by analyzing the metadata attached to products. The other is ontology agent whose task is to organize all the products into an ontology so that the search agent can find it effectively.

Ontologies are also helpful in commerce matchmaking where the best compatible services and goods are selected for the user. Paoloucci et al. (137) developed such a system which comprises of various ontologies and a matchmaker. Besides this, a security ontology developed by Ekelhart et al. (138) played its part in the security infrastructure of ontology based ecommerce and e-business.

E-health and life sciences

E-health and life sciences industry are also in quest of feeding patient data electronically for better processing and quick retrieval. In order to make this data useful for artificial intelligence applications, semantics behind the data need to be involved to enable automatic decision making.

European Patient Summary (153) is one such project whose backbone lies in the semantic web technologies. Besides this, ontologies and semantics have also been used by Podgorelec and Pavlic (139) to store and integrate the data about Mitral Valve Proplapse syndrome. Kim and Choi (140) presented an electrocardiography ontology for heart diseases and used it to create a knowledge base. Ganguly et al. (141) also worked on eHealth-based ontologies by addressing the issue of mismatch between conceptual hierarchies in ontologies. Some other applications of ontology learning for eHealth are present in the form of ontologies like Human Phenotype Ontology (142), Translational Medicine Ontology (143) and SNOMED CT (Systemized Nomenclature of Medicine Clinical Terms) (144).

Multimedia and E-culture

Annually, a huge amount of multimedia content is released on the internet, which includes >2500 movies and 1 million songs. The metadata attached to these multimedia contents along with its semantics can prove to be very helpful for multimedia companies as they can use it to build precise and accurate recommendation systems for their customers.

Retrieving relevant images, video contents and songs is one of the tasks that can be done using ontologies and semantics. Fan and Li (145) used an ontology-based reasoning system to retrieve the images relevant to the queries. Besides this, an animal ontology has been used in animal domain by Wang et al. (146) to retrieve and annotate animal images. Liu et al. (147) used reverse engineering process and generated an image ontology from images data. Ontologies have found their application in video annotation and retrieval process by utilizing the semantics of events happening in the video. Ballan et al. (148) presented one such framework for annotation and retrieval of video content.

Investigative and digital journalism

The semantic web and usage of numerous ontologies have taken journalism to next level by enabling the exploration of hidden and non-achievable information for all journalist through deeper search. For instance, Panama Papers is a gigantic list of documents that contains information about organizations and individuals who dodge sanction and taxes. Unfortunately, its information was non-accessible to journalists. Ontotext (https://ontotext.com/) company constructed an ontology from the list of these documents to give them more structure and meaning. It also enabled querying mechanism using SPARQL. Similarly, Trump World Data is another result of investigative journalism which has been transformed into structured text for easy information access.

Future directions

Ontology learning is a multidisciplinary task that extracts important terms, concepts, attributes and relations from unstructured text by borrowing techniques from different domains like text classification, natural language processing machine learning etc. These domains are research extensive and still developing. Natural language processing has various bottlenecks such as part of speech tagging, relation extraction from unstructured text, co-reference resolution and named entity recognition. From results discussed in the section entitled Linguistics for pre-processing, it can be concluded that techniques like PoS tagging and parsing can lead toward the development of better ontologies. With the advancement in NLP techniques, improved PoS taggers and parsers are being introduced that needs to be merged into ontology learning systems for better performance. In text classification, researchers are developing new algorithms to select highly discriminative features among the classes. There are many term selection algorithms available in these domains that [Bi-Normal Separation, Normalized Difference Measure, Odds Ratio, Poisson Ratio Balanced Accuracy Measure (ACC2) and Distinguishing Feature Selection (154)] needs to be introduced in ontology learning for the extraction of terms and concepts.

As far as machine learning is concerned, ontology learning borrows various techniques from this domain such as clustering and ARM. However, improvements can be made by incorporating the domain of deep learning into these algorithms. Besides this, the exponential growth of textual data on the web is heavily influencing various methods used at different levels of ontology learning. It can be said that the future of ontology learning will be led by the immense amount of unstructured web data. We propose following future directions to further improve ontology learning process:

Use of social media for data validation
Language independent ontology learning
Scalability of existing ontology learning techniques to cater larger data sets
Use of crowdsourcing and human-based computation games to perform ontology post processing
Development of more formal or heavyweight ontologies

This section summarizes five prominent challenges of ontology learning and discusses above mentioned future directions in context of these challenges.

Challenge 1: The immense amount of web data exists in different formats and languages. This leads to the production of conflicting and inconsistent ontologies.

Proposed solution:

To resolve this issue, we propose look for approaches to integrate and homogenize such data. This field has not yet gained enough attention by ontology learning community. We also propose use of cross language ontologies in quest of resolving such issues. There exists a need to develop advanced algorithms for ontology learning which are independent of language barriers. Since ontologies are actually shared conceptualization, they should be free of lexical information. For example, orange should not be portrayed lexically as ‘orange’ but rather as a form to which oranges of all languages can be mapped to.

Challenge 2: Ontology learning is still a developing field where each task of ontology learning layer cake is vast research that needs improvement. Each stage is dependent on results of the previous stage. If one stage produces wrong information, it will affect the later stages and it would eventually produce low quality ontologies. For example, if a faulty relation <VladmirPutin> <is−a> <president of Italy> occurs frequently in data, ontology learning methods will extract it and add it to final ontology. This will contaminate underlying knowledge base.

Proposed solution:

To ensure data validity we propose use of social web and folksonomy (collaborative tagging). We can assess the validity of learned ontology by asking users of social media to tag extracted concepts and relations either as correct or incorrect. By comparing the total number of users tagging them correct and incorrect, we can develop some level of trust for our learned ontology.

Challenge 3: Scalability of ontology learning techniques to accommodate larger data sets is another major challenge. Most of the techniques and tools used in state-of-the-art ontology learning methodologies are designed for smaller data sets. Such techniques and tools, when applied on bigger data sets, tend to produce inefficient results.

Proposed solution:

We suggest an increase in research to scale the present techniques up to certain level to accommodate larger data sets without compromising on the efficiency and quality. This can be done by introducing some community challenges like BioASQ, BioCreative, TREC etc. Various incentives in these challenges will be attractive for researchers and improvements will be made to tackle this challenge.

Challenge 4: The quality of learned ontologies is affected by the human intervention. We can say that the quality of learned ontology is directly proportional to human intervention. This is why semi-automatic ontology acquisition process tends to produce good ontologies. For automatic ontology learning process, a reasonable amount of post processing is required to boost the quality of ontology, which is another massive drawback of fully automated ontology acquisition. It puts a lot of burden on knowledge engineers and domain experts.

Proposed solution:

This post processing stage somehow must be integrated with the original ontology learning framework. To reduce this overhead, we propose to utilize the extensive amount of research in the field of crowdsourcing and human-based computation game (games with purpose). These can help lower the cost of ontology revision by involving non-expert humans and interacting with them to achieve post processing goals.

Challenge 5: Lastly, we predict a need to shift from lightweight ontologies to more formal, heavyweight ontologies in the future.

Proposed solution:

To tackle this problem, there is a strong need to strengthen axiom learning techniques so that in future formal ontologies take the center stage.

Above aforementioned challenges and future direction are summarized in Table 4.

Table 4

Summary of Ontology Learning: Challenges and Future Directions

	Challenge	Proposed Solution
1	Diversity of formatted data, multi-lingual data	Novel approaches to integrate and harmonize data Cross-language ontologies advanced algorithms for ontology learning
2	Lack of automatic ontology validation, faulty ontologies	Use of social web, collaborative tagging and folksonomy Use of search engines for answer validation
3	Scalability of ontology learning techniques	Increase in research to accommodate larger datasets Arrangement of community challenges by governing bodies to increase the research scale of ontology learning techniques
4	Requirement of human intervention for better quality of learned ontologies	Need of automatic post processing techniques Integrate post processing framework with ontology learning framework to boost the quality of ontology Use of research in the fields of crowdsourcing and human-based computation games
5	Lack of heavy weight ontologies	Strengthen axiom learning algorithms

	Challenge	Proposed Solution
1	Diversity of formatted data, multi-lingual data	Novel approaches to integrate and harmonize data Cross-language ontologies advanced algorithms for ontology learning
2	Lack of automatic ontology validation, faulty ontologies	Use of social web, collaborative tagging and folksonomy Use of search engines for answer validation
3	Scalability of ontology learning techniques	Increase in research to accommodate larger datasets Arrangement of community challenges by governing bodies to increase the research scale of ontology learning techniques
4	Requirement of human intervention for better quality of learned ontologies	Need of automatic post processing techniques Integrate post processing framework with ontology learning framework to boost the quality of ontology Use of research in the fields of crowdsourcing and human-based computation games
5	Lack of heavy weight ontologies	Strengthen axiom learning algorithms

Table 4

Summary of Ontology Learning: Challenges and Future Directions

	Challenge	Proposed Solution
1	Diversity of formatted data, multi-lingual data	Novel approaches to integrate and harmonize data Cross-language ontologies advanced algorithms for ontology learning
2	Lack of automatic ontology validation, faulty ontologies	Use of social web, collaborative tagging and folksonomy Use of search engines for answer validation
3	Scalability of ontology learning techniques	Increase in research to accommodate larger datasets Arrangement of community challenges by governing bodies to increase the research scale of ontology learning techniques
4	Requirement of human intervention for better quality of learned ontologies	Need of automatic post processing techniques Integrate post processing framework with ontology learning framework to boost the quality of ontology Use of research in the fields of crowdsourcing and human-based computation games
5	Lack of heavy weight ontologies	Strengthen axiom learning algorithms

	Challenge	Proposed Solution
1	Diversity of formatted data, multi-lingual data	Novel approaches to integrate and harmonize data Cross-language ontologies advanced algorithms for ontology learning
2	Lack of automatic ontology validation, faulty ontologies	Use of social web, collaborative tagging and folksonomy Use of search engines for answer validation
3	Scalability of ontology learning techniques	Increase in research to accommodate larger datasets Arrangement of community challenges by governing bodies to increase the research scale of ontology learning techniques
4	Requirement of human intervention for better quality of learned ontologies	Need of automatic post processing techniques Integrate post processing framework with ontology learning framework to boost the quality of ontology Use of research in the fields of crowdsourcing and human-based computation games
5	Lack of heavy weight ontologies	Strengthen axiom learning algorithms

Conclusion

This paper summarizes ontology learning techniques along with evaluation measures and highlights applications of ontology learning in various domains. We observed that a hybrid approach comprising of both linguistic and statistical techniques produces better ontologies. However, it is difficult to find the best technique among all as the performance of ontology learning techniques is highly dependent on efficient preprocessing of data in target domain. After critically analyzing the literature of ontology learning, following trends are observed: for term and concept extraction, many researchers prefer to use statistical techniques; however, for relation extraction, there is an inclination of use toward agglomerative clustering and ARM. We also overviewed various evaluation techniques for ontology learning and have found that the best form of evaluation is human-based evaluation. In addition, we also mark most widely used ontology learning tools along with their respective methodology and target domain. Applications of ontology learning in industries such as oil and gas, military and e-health etc. are also discussed. Lastly, we provide comprehensive information about ontology learning challenges. We also propose their solutions to further improve the process of ontology learning by showing directions for answer validation, language-independent ontology generation and crowdsourcing usage for automatic ontology post processing.

Conflict of interest. None declared.

References

[1]

Maedche

and

Staab

(

2001

)

Ontology learning for the semantic web

IEEE Intell. Syst.

–

[2]

Gruber

T.R.

(

1995

)

Toward principles for the design of ontologies used for knowledge sharing?

Int. J. Hum. Comput. Stud.

907

–

928

[3]

Cullen

and

Bryman

(

1988

)

The knowledge acquisition bottleneck: time for reassessment?

Expert Systems

216

–

225

[4]

Chen

Dosyn

Lytvyn

et al. (

2016

)

Smart data integration by goal driven ontology learning

. In:

INNS Conference on Big Data

Springer

Thessaloniki, Greece

283

–

292

[5]

Ding

and

Foo

(

2002

)

Ontology research and development. part 2-a review of ontology mapping and evolving

J. Inf. Sci.

375

–

388

[6]

Gómez-Pérez

and

Manzano-Macho

(

2003

)

A survey of ontology learning methods and techniques

Onto Web Deliverable

D 1 (5)

[7]

Faure

and

Nédellec

(

1998

)

Asium: learning subcategorization frames and restrictions of selection, Chemnitz, Allemagne

[8]

Yamaguchi

(

2001

)

Acquiring conceptual relationships from domain-specific texts

. In:

Workshop on Ontology Learning

Levanger, Norway,

–

113

[9]

Shamsfard

(

2003

) Designing the ontology learning model, prototyping in a persian text understanding system.

Ph.D. Thesis

Amir Kabir University

Iran, Tehran

[10]

de Chalenda

and

Brigitte

(

2000

)

SVETLAN A System to Classify Nouns in Context.

Workshop on Ontology Learning

[11]

Hahn

and

Romacker

(

2001

) The syndikate text knowledge base generator. In:

Proceedings of the First International Conference on Human Language Technology Research

Association for Computational Linguistics

San Diego

–

[12]

Maedche

and

Staab

(

2000

) Discovering conceptual relations from text. In:

ECAI

Berlin

321

[13]

Craven

McCallum

PiPasquo

et al. (

1998

)

Learning to extract symbolic knowledge from the world wide web

Technical Report. School of Computer Science, Carnegie-Mellon University, Pittsburgh, PA

[14]

Shamsfard

and

Barforoush

A.A.

(

2003

)

The state of the art in ontology learning: a framework for comparison

Knowl. Eng. Rev.

293

–

316

[15]

Buitelaar

Cimiano

and

Magnini

(

2005

) Ontology learning from text: an overview. In:

Ontology Learning from Text: Methods, Evaluation and Applications

Amsterdam

IOS Press

123

–

[16]

Zhou

(

2007

)

Ontology learning: state of the art and open issues

Inf. Technol. Manag.

241

–

252

). https://pdfs.semanticscholar.org/bd0b/ab6fc8cd43c0ce170ad2f4cb34181b31277d.pdf.

[17]

Hazman

El-Beltagy

S.R.

and

Rafea

A survey of ontology learning approaches.

Database

–

[18]

Brill

(

1992

) A simple rule-based part of speech tagger. In:

Proceedings of the Third Conference on Applied Natural Language Processing

Association for Computational Linguistics

Trento, Italy

152

–

155

[19]

Schmid

(

1994

) Probabilistic part-of-speech tagging using decision trees. In:

Proceedings of International Conference on New Methods in Language Processing

–

(

access date: 11 September 2012

[20]

Lin

(

1994

) Principar: an efficient, broad-coverage, principle-based parser. In:

Proceedings of the Fifteenth Conference on Computational Linguistics

Association for Computational Linguistics

Kyoto, Japan,

482

–

488

[21]

Lin

(

1998

) Dependency-based evaluation of minipar at lrec, In:

Proceedings of the Workshop on the Evaluation of Parsing Systems

Granada, Spain

. http://www.cs.ualberta.ca/lindek/minipar.htm.

[22]

Temperley

Sleator

and

Lafferty

(

1993

)

Parsing english with a link grammar. In:

Third International Workshop on Parsing Technologies

Tilburg, Netherlands

[23]

Klein

and

Manning

C.D.

(

2003

)

Accurate unlexicalized parsing. In:

Proceedings of the Forty-first annual meeting of the Association for Computational Linguistics

Sapporo, Japan

[24]

Petit

Boisson

J.-C.

and

Rousseaux

(

2017

)

Discovering cultural conceptual structures from texts for ontology generation. In:

IEEE 2017 Fourth International Conference on Control, Decision and Information Technologies, St. Paul's Bay, Malta, (CoDIT)

0225

–

0229

[25]

Cunningham

Maynard

Bontcheva

et al. (

2002

) Gate: an architecture for development of robust hlt applications. In:

Proceedings of the Fortieth Annual Meeting on Association for Computational Linguistics

Association for Computational Linguistics

Philadelphia, Pennsylvania,

168

–

175

[26]

Drymonas

E.G.

Ontology learning from text based on multi-word term concepts: the ontogain method

M.Sc. Thesis

Technical University of Crete

Greece

[27]

Oliveira

Pereira

F.C.

and

Cardoso

(

2001

) Automatic reading and learning from text. In:

Proceedings of the International Symposium on Artificial Intelligence (ISAI), India

[28]

Jiang

and

Tan

A.-H.

(

2010

)

Crctol: a semantic-based domain ontology learning system

J. Assoc. Inform. Sci. Technol.

150

–

168

[29]

Hippisley

Cheng

and

Ahmad

(

2005

)

The head-modifier principle and multilingual term extraction

Nat. Lang. Eng.

129

–

157

[30]

Agustini

Gamallo

and

Lopes

G.P.

(

2001

) Selection restrictions acquisition for parsing improvement. in:

International Conference on Applications of Prolog

Springer

129

–

143

[31]

Gamallo

Agustini

and

Lopes

G.P.

Learning subcategorisation information to model a grammar with “co-restrictions”. Modélisation probabiliste du langage naturel

TAL. Traitement automatique des langues

–

117

[32]

Faure

and

Nedellec

(

2016

)

Knowledge acquisition of predicate argument structures from technical texts using machine learning: the system asium. In:

International Conference on Knowledge Engineering and Knowledge Management

Springer

Siguenza, Spain

329

–

334

[33]

Belal

M.A.E.-F.

Abdel-Galil

and

Saber

Y.M.

(

2016

)

Ontology extraction from text: Related works between arabic and english languages

Int. J.

[34]

Hwang

C.H.

(

1999

)

Incompletely and imprecisely speaking: using dynamic ontologies for representing and retrieving information. In: KRDB

CEUR-WS

Linköping, Sweden

–

[35]

Sanchez

and

Moreno

(

2004

)

Creating ontologies from web documents

. In:

Recent advances in artificial intelligence research and development

IOS Press

Amsterdan

113

–

[36]

Fraga

A.L.

and

Vegetti

(

2017

) Semi-automated ontology generation process from industrial product data standards. In:

III Simposio Argentino de Ontolog’ıas y sus Aplicaciones (SAOA)-JAIIO, Córdoba, Argentina, 46 (Co’rdoba, 2017)

[37]

Kang

Patil

Rangarajan

et al. (

2015

) Extraction of manufacturing rules from unstructured text using a semantic framework. In:

ASME 2015 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference

American Society of Mechanical Engineers

Boston,

V01BT02A033

–

V01BT02A033

[38]

Ciaramita

Gangemi

Ratsch

et al. (

2005

)

Unsupervised learning of semantic relations between concepts of a molecular biology ontology. In:

IJCAI

Morgan Kaufmann Publishers

Edinburgh, Scotland, UK,

659

–

664

[39]

Sordo

Oramas

and

Espinosa-Anke

(

2015

) Extracting relations from unstructured text sources for music recommendation. In:

International Conference on Applications of Natural Language to Information Systems

Springer

Passau, Germany

369

–

382

[40]

Hearst

M.A.

(

1998

)

Automated discovery of wordnet relations, WordNet: an electronic lexical

Database

131

–

153

[41]

Kaushik

and

Chatterjee

Automatic relationship extraction from agricultural text for ontology construction

Inform. Process. Agri

–

[42]

Ismail

Abu Bakar

and

Abd Rahman

(

2015

)

Extracting knowledge from English translated Quran using NLP pattern

Jurnal Teknologi

–

[43]

Ismail

Bakar

Z.A.

and

Rahman

N. A.

Ontology learning framework for Quran

Advanced Science Letters,

4175

–

4178

[44]

Panchenko

Faralli

Ruppert

et al. (

2016

) Taxi at semeval-2016 task 13: a taxonomy induction method based on lexico-syntactic patterns, substrings and focused crawling. In:

Proceedings of the Tenth International Workshop on Semantic Evaluation (SemEval-2016)

Association for Computational Linguistics (ACL),

San Diego, California,

1320

–

1327

[45]

Atapattu

Falkner

and

Falkner

(

2017

)

A comprehensive text analysis of lecture slides to generate concept maps

Comput. Educ.

115

–

113

[46]

Snow

Jurafsky

and

A.Y.

(

2005

) Learning syntactic patterns for automatic hypernym discovery. In:

Advances in Neural Information Processing Systems

1297

–

1304

[47]

Sen

Tao

and

Deokar

A.V.

(

2015

) On the role of ontologies in information extraction. In:

Reshaping Society through Analytics, Collaboration, and Decision Support

Springer

Switzerland

115

–

133

[48]

Turcato

Popowich

Toole

et al. (

2000

) Adapting a synonym database to specific domains. In:

Proceedings of the ACL-2000 Workshop on Recent Advances in Natural Language Processing and Information Retrieval, held in conjunction with the Thirtieth Annual Meeting

Association for Computational Linguistics

Hong Kong

–

[49]

Navigli

Velardi

and

Gangemi

(

2003

)

Ontology learning and its application to automated terminology translation

IEEE Intell. Syst.

–

[50]

Frantzi

Ananiadou

and

Mima

(

2000

)

Automatic recognition of multiword terms: the c-value/nc-value method

Int. J. Dig. Libr.

115

–

130

[51]

Hersh

Buckley

Leone

et al. (

1994

) Ohsumed: an interactive retrieval evaluation and new large test collection for research. In:

SI-GIR94

Springer

Dublin, Ireland

192

–

201

[52]

Milios

Zhang

et al. (

2003

) Automatic term extraction and document similarity in special text corpora. In:

Proceedings of the Sixth Conference of the Pacific. Association for Computational Linguistics

Yangon, Myanmar

275

–

284

[53]

Yang

Zhou

and

Nyberg

(

2016

) Learning to answer biomedical questions: Oaqa at bioasq 4b, In:

Proceedings of the Fourth BioASQ Workshop

Yangon, Myanmar

–

[54]

Chandu

Naik

Chandrasekar

et al. (

2017

)

Tackling biomedical text summarization: Oaqa at bioasq 5b

BioNLP

2017

–

[55]

Navigli

and

Velardi

(

2002

) Semantic interpretation of terminological strings. In:

Proceedings of the Sixth International Conference on Terminology and Knowledge Engineering

Nancy, France

–

100

[56]

Guo

Qiu

and

Zhang

(

2015

) Web-based chinese term extraction in the field of study. In:

IEEE Eleventh International Conference on Semantics, Knowledge and Grids (SKG)

Beijing, China

133

–

139

[57]

Xiao

Ruan

Yang

et al. (

2016

) Domain ontology learning enhanced by optimized relation instance in dbpedia. In:

LREC

[58]

Resnik

(

1999

)

Semantic similarity in a taxonomy: An information- based measure and its application to problems of ambiguity in natural language

J. Artif. Intell. Res.

–

130

[59]

Senellart

P.P.

and

Blondel

V.D.

(

2003

) Automatic discovery of similar words. In:

Berry

(ed).

Survey of Text Mining: Clustering, Classification, and Retrieval

Springer

[60]

Maedche

and

Staab

(

2002

) Measuring similarity between ontologies. In:

International Conference on Knowledge Engineering and Knowledge Management

Springer

Siguenza, Spain

251

–

263

[61]

Suresu

and

Elamparithi

(

2016

)

Probabilistic relational concept extraction in ontology learning

Int. J. Inform. Technol.

[62]

Frikh

Djaanfar

A.S.

and

Ouhbi

(

2011

) A hybrid method for domain ontology construction from the web. In:

KEOD

Springer

Paris, France

285

–

292

[63]

Landauer

T.K.

Foltz

P.W.

and

Laham

(

1998

)

An introduction to latent semantic analysis

Discourse Process.

259

–

284

[64]

Rani

Dhar

A.K.

and

Vyas

(

2017

)

Semi-automatic terminology ontology learning based on topic modeling

Eng. Appl. Artificial Intell.

108

–

125

[65]

Berkhin

(

2006

) A survey of clustering data mining techniques. In:

Grouping Multidimensional Data

Springer

United States

–

[66]

Karoui

Aufaure

M.-A.

and

Bennacer

(

2007

) Contextual concept discovery algorithm. In:

FLAIRS Conference

AAAI Press

Key West, Florida, USA

460

–

465

[67]

Njike-Fotzo

and

Gallinari

Learning generalization/specialization relations between concepts–application for automatically building thematic document hierarchies

In:

Coupling approaches, coupling media and coupling languages for information retrieval, LE CENTRE DE HAUTES ETUDES INTERNATIONALES D'INFORMATIQUE DOCUMENTAIRE, ACM,

143

–

155

[68]

Zepeda-Mendoza

M. L.

and

Resendis-Antonio

(

2013

) Hierarchical agglomerative clustering. In:

Encyclopedia of Systems Biology

Springer

United States

886

–

887

[69]

Dhillon

I.S.

Mallela

and

Kumar

(

2003

)

A divisive information-theoretic feature clustering algorithm for text classification

J. Mach. Learn. Res.

1265

–

1287

[70]

Ragunath

and

Sivaranjani

(

2015

)

Ontology based text document summarization system using concept terms

ARPN J. Eng. Appl. Sci.

2638

–

2642

[71]

Faure

and

Nédellec

(

1998

) A corpus-based conceptual clustering method for verb frames and ontology acquisition. In:

LREC Workshop on Adapting Lexical and Corpus Resources to Sublanguages and Applications

LREC

Granada, Spain

, Vol.

707

[72]

Drymonas

Zervanou

and

Petrakis

E.G.

(

2010

)

Unsupervised ontology acquisition from plain texts: the ontogain system

. In:

NLDB

Springer

Cardiff, United Kingdom

277

–

287

[73]

Caraballo

S.A.

(

1999

) Automatic construction of a hypernym labeled noun hierarchy from text. In:

Proceedings of the Thirty-seventh annual meeting of the Association for Computational Linguistics on Computational Linguistics. Association for Computational Linguistics

ACM

Maryland, USA

120

–

126

[74]

Savaresi

S.M.

Boley

D.L.

Bittanti

et al. (

2002

) Cluster selection in divisive clustering algorithms. In:

Proceedings of the 2002 SIAM International Conference on Data Mining

SIAM

Arlington, VA, USA

299

–

314

[75]

Cimiano

and

Staab

(

2005

) Learning concept hierarchies from text with a guided agglomerative clustering algorithm, In:

Proceedings of the ICML 2005 Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods

Bonn, Germany

[76]

Liu

Hsu

Mun

L.-F.

et al. (

1999

)

Finding interesting patterns usinguser expectations

IEEE Trans. Knowl. Data Eng.

817

–

832

[77]

Idoudi

Ettabaa

K.S.

Solaiman

et al. (

2016

)

Association rules based ontology enrichment

Int. J Web Appl.

–

[78]

Paiva

Costa

Figueiras

et al. (

2014

) Discovering semantic relations from unstructured data for ontology enrichment: association rules based approach. In:

Information Systems and Technologies (CISTI), 2014 9th Iberian Conference on.

IEEE

Barcelona, Spain

–

[79]

Ghezaiel

L.B.

Latiri

C.C.

and

Ahmed

M.B.

(

2012

) Ontology enrichment based on generic basis of association rules for conceptual document indexing. In:

KEOD

Springer

Barcelona, Spain

–

[80]

Paiva

L.M.S.S.

(

2015

) Semantic relations extraction from unstructured information for domain ontologies enrichment.

Ph.D. Thesis in RUN - Universidade NOVA de Lisboa

[81]

Fatemi

Poulin

Raileany

L.E.

et al.

Using association rule mining to enrich semantic concepts for video retrieval

In:

KDIR 2009-International Conference on Knowledge Discovery and Information Retieval,

INSTICC Press,

Dublin City University

–

[82]

d’Amato

and

Learning

N.-S.

On extracting rules for: enriching ontological knowledge bases, complementing heterogeneous sources of information, empowering the reasoning process. In:

Neural-Symbolic Learning and Reasoning

[83]

Lima

Espinasse

Oliveira

et al. (

2013

)

An inductive logic programming-based approach for ontology population from the web. In:

International Conference on Database and Expert Systems Applications

Springer

Prague, Czech Republic,

319

–

326

[84]

Fortuna

Lavrač

and

Velardi

(

2008

) Advancing topic ontology learning through term extraction. In:

Pacific Rim International Conference on Artificial Intelligence

Springer

Hanoi, Vietnam

626

–

635

[85]

Seneviratne

and

Ranasinghe

(

2011

)

Inductive logic programming in an agent system for ontological relation extraction

Int. J. Mach. Learn. Comput.

344

[86]

Lisi

F.A.

and

Esposito

(

2008

) Foundations of onto-relational learning. In:

International Conference on Inductive Logic Programming

Springer

Prague, Czech Republic

158

–

175

[87]

Lisi

F.A.

and

Straccia

(

2013

)

A logic-based computational method for the automated induction of fuzzy ontology axioms

Fundamenta Informaticae

124

503

–

519

[88]

Ponzetto

S.P.

and

Strube

(

2007

)

Deriving a large scale taxonomy from wikipedia

AAAI

1440

–

1445

[89]

Leacock

and

Chodorow

(

1998

)

Combining local context and wordnet similarity for word sense identification, WordNet: an electronic lexical

Database

265

–

283

[90]

Zavitsanos

Paliouras

and

Vouros

G.A.

(

2011

)

Gold standard evaluation of ontology learning methods through ontology transformation and alignment

IEEE Trans. Knowl. Data Eng.

1635

–

1648

[91]

Trokanas

and

Cecelja

(

2016

)

Ontology evaluation for reuse in the domain of process systems engineering

Comput. Chem. Eng.

177

–

187

[92]

Sfar

Chaibi

A.H.

Bouzeghoub

et al. (

2016

) Gold standard based evaluation of ontology learning techniques. In:

Proceedings of the Annual ACM Symposium on Applied Computing

ACM

Salamanca, Spain

339

–

346

[93]

Kashyap

Ramakrishnan

Thomas

et al. (

2005

)

Taxaminer: an experimentation framework for automated taxonomy bootstrapping

Int. J. Web Grid Serv.

240

–

266

[94]

Treeratpituk

Khabsa

and

Giles

C.L.

(

2014

)

Graph-based approach to automatic taxonomy generation (GrabTax).

arXiv preprint arXiv:1307.1718

[95]

Sánchez

Batet

Martínez

et al. (

2015

)

Semantic variance: an intuitive measure for ontology accuracy evaluation

Eng. Appl. Artificial Intell.

–

[96]

Dellschaft

and

Staab

(

2008

)

Strategies for the evaluation of ontology learning

Ontol. Learn. Popul.

167

253

–

272

[97]

IJntema

Sangers

Hogenboom

et al. (

2012

)

A lexico-semantic pattern language for learning ontology instances from text

Web Semant.

–

[98]

Lozano-Tello

Gómez-Pérez

and

Sosa

(

2003

) Selection of ontologies for the semantic web. In:

International Conference on Web Engineering

Springer

Munich, German

413

–

416

[99]

Porzel

and

Malaka

(

2004

) A task-based approach for ontology evaluation. In:

ECAI Workshop on Ontology Learning and Population,

IOS Press

Valencia, Spain, Citeseer

–

[100]

Haase

and

Sure

D3. 2.1 usage tracking for ontology evolution. In:

EU-IST Integrated Project (IP)

IST-2005-506826 SEKT

[101]

Jones

and

Alani

(

2006

)

Content-based ontology ranking

In:

Ninth International Prot Conference,

Stanford, CA

[102]

Brewster

Alani

Dasmahapatra

et al.

Data driven ontology evaluation. In:

LREC 2004, LISBON - PORTUGAL, ELRA - European Language Resources Association,

641

–

644

[103]

Patel

Supekar

Lee

et al. (

2003

) Ontokhoj: a semantic web portal for ontology searching, ranking and classification. In:

Proceedings of the Fifth ACM international workshop on Web information and data management

ACM

Seattle, WA, USA

–

[104]

Burton-Jones

Storey

V.C.

Sugumaran

and

Ahluwalia

(

2005

)

A semiotic metrics suite for assessing the quality of ontologies

Data Knowl. Eng.

–

102

[105]

Fox

M.S.

Barbuceanu

and

Gruninger

(

1995

) An organisation ontology for enterprise modelling: preliminary concepts for linking structure and behaviour. In:

Enabling Technologies: Infrastructure for Collaborative Enterprises, 1995., Proceedings of the Fourth Workshop on,

IEEE

West Virginia, USA

–

[106]

Lozano-Tello

and

Gómez-Pérez

(

2004

)

Ontometric: A method to choose the appropriate ontology

J. Database Manag.

–

[107]

Fernández

Overbeeke

and

Sabou

Motta

(

2009

) What makes a good ontology? a case-study in fine-grained knowledge reuse. In:

Asian Semantic Web Conference

Springer

Bangkok, Thailand

–

[108]

Gangemi

Catenacci

Ciaramita

et al. (

2006

) Modelling ontology evaluation and validation, In:

European Semantic Web Conference

Springer

140

–

154

[109]

Alani

and

Brewster

(

2006

)

Metrics for ranking ontologies.

4273

–

[110]

Guarino

and

Welty

(

2004

) An overview of Ontoclean. In:

Staab

Studer

(eds).

Handbook on Ontologies.

Springer

Berlin, Heidelberg

[111]

Bloehdorn

Cimiano

and

Hotho

(

2006

)

Learning ontologies to improve text clustering and classification

. In:

From Data and Information Analysis to Knowledge Engineering

Springer

Magdeburg, Germany

334

–

341

[112]

Dollah

R.B.

and

Aono

(

2011

)

Ontology based approach for classifying biomedical text abstracts

Int. J. Data Eng.

–

[113]

Bloehdorn

and

Hotho

(

2009

)

Ontologies for machine learning

. In:

Handbook on Ontologies

Springer

Berlin, Heidelberg

637

–

661

[114]

Zavitsanos

Paliouras

and

Vouros

(

2008

) A distributional approach to evaluating ontology learning methods using a gold standard. In:

Third Ontology Learning and Population Workshop, ECAI, Patras, Greece

[115]

Zavitsanos

Petridis

Paliouras

et al. (

2008

) Determining automatically the size of learned ontologies. In:

ECAI

IOS Press

Patras, Greece

178

775

–

776

[116]

Cimiano

Hotho

Stumme

et al. (

2004

) Conceptual knowledge processing with formal concept analysis and ontologies. In:

International Conference on Formal Concept Analysis

Springer

Sydney, NSW, Australia

189

–

207

[117]

Faure

and

Poibeau

(

2000

) First experiments of using semantic knowledge learned by asium for information extraction task using intex. In:

Ontology Learning ECAI-2000 Workshop, Citeseer

IOS Press

Berlin, Germany

–

[118]

Zhang

Wang

et al. (

2016

)

A new cognitive model for autonomous ontology learning

. In:

Intelligent Systems (IS), 2016 IEEE Eighth International Conference on

IEEE

Sofia, Bulgaria

259

–

264

[119]

Deb

C.K.

Marwaha

Arora

and

Das

(

2018

) A framework for ontology learning from taxonomic data. In:

Big Data Analytics

Springer

–

[120]

Staab

(

2005

) Learning concept hierarchies from text with a guided agglomerative clustering algorithm. In:

Proceedings of the Workshop on Learning and Extending Lexical Ontologies with Machine Learning Methods, Sydney

[121]

Barbu

(

2015

)

Property type distribution in wordnet, corpora and wikipedia

Expert Syst. Appl.

3501

–

3507

[122]

Bian

H.-Z.

and

(

2017

)

Conceptual extraction of domain knowledge graph in different data sources. In: Conference of DEStech Transactions on Computer Science and Engineering iceit, Zhuhai, China.

[123]

Pereira

F.C.

Oliveira

and

Cardoso

(

2000

) Extracting concept maps with clouds. In:

Proceedings of the Argentine Symposium of Artificial Intelligence (ASAI)

Buenos Aires, Argentina

[124]

Missikoff

Navigli

and

Velardi

(

2002

)

Integrated approach to web ontology learning and engineering

Computer

–

[125]

Jain

and

Mishra

(

2015

) EHCPRS system as an ontology learning system. In:

Computing for Sustainable Global Development (INDIACom), 2015 Second International Conference on.

IEEE

New Delhi

978

–

984

[126]

Hahn

and

Romacker

(

2000

)

Content management in the syndikate system– how technical documents are automatically transformed to text knowledge bases

Data Knowl. Eng.

137

–

159

[127]

Soma

(

2008

)

Applying semantic web technologies for information management in domains with semi-structured data

University of Southern California

[128]

Halvorsen

and

Hansen

B.J.

(

2011

)

Integrating military systems using semantic web technologies and lightweight agents

FFI-notat

1851

2011

[129]

Valente

Holmes

and

Alvidrez

F.C.

(

2005

) Using a military information ontology to build semantic architecture models for airspace systems. In:

Aerospace Conference

IEEE

Big Sky, MT, USA

–

[130]

Frantz

and

Franco

(

2005

)

A semantic web application for the air tasking order

Technical report. Air Force Research Lab, Rome–NY Information Directorate

[131]

Lacy

Aviles

Fraser

et al. (

2005

) Experiences using owl in military applications. In:

OWLED

CEUR-WS

Galway, Ireland

188

[132]

Turnitsa

and

Tolk

(

2006

)

Battle management language: a triangle with five sides. In:

Proceedings of the Simulation Interoperability Standards Organization (SISO) Spring Simulation Interoperability Workshop (SIW)

IEEE

Huntsville, AL, USA

[133]

Rui

Nengcheng

and

Zhixue

(

2006

)

A new approach to a local e-government portal for information management and deep searching

Wuhan Univ. J. Nat. Sci.

1161

–

1166

[134]

Haav

H.-M.

(

2011

) A practical methodology for development of a network of e-government domain ontologies. In:

Building the e-World Ecosystem

Springer

Berlin, Heidelberg

–

[135]

Hepp

(

2008

)

Goodrelations: an ontology for describing products and services offers on the web

Knowl. Eng. Pract. Patterns

5268

329

–

346

[136]

Akanbi

A.K.

(

2014

)

Lb2co: a semantic ontology framework for b2c ecommerce transaction on the internet. In:

International Research Journal of Computer Science

, p.

arXiv preprint arXiv:1401.0943

[137]

Paolucci

Sycara

Nishimura

et al. (

2003

) Toward a semantic web e-commerce. In:

Proc. of Sixth Int. Conf. on Business Information Systems (BIS2003) Colorado Springs, USA

[138]

Ekelhart

Fenz

Tjoa

et al. (

2007

) Security issues for the use of semantic web in e-commerce. In:

Business Information Systems

Springer

Berlin, Heidelberg

–

[139]

Podgorelec

and

Pavlic

(

2007

) Managing diagnostic process data using semantic web. In:

Computer-Based Medical Systems, 2007. CBMS’07. Twentieth IEEE International Symposium on

IEEE

Maribor, Slovenia

127

–

134

[140]

Kim

K.-H.

and

Choi

H.-J.

(

2007

) Design of a clinical knowledge base for heart disease detection. In:

Computer and Information Technology, 2007. CIT 2007. Seventh IEEE International Conference on

IEEE

Fukushima, Japan

610

–

615

[141]

Ganguly

Chattopadhyay

Paramesh

et al. (

2008

) An ontology-based framework for managing semantic interoperability issues in e-health. In:

e-health Networking, Applications and Services, 2008. HealthCom 2008. Tenth International Conference on,

IEEE

Singapore

–

[142]

Köhler

Doelken

S.C.

Mungall

C.J.

et al. (

2013

)

The human phenotype ontology project: linking molecular biology and disease through phenotype data

Nucleic Acids Res.

D966

–

D974

[143]

Sandun

Sumathipala

and

Ganegoda

G.U.

(

2017

)

Self-evolving disease ontology for medical domain based on web

Int. J. Fuzzy Logic Intell. Syst.

307

–

314

[144]

De Silva

T.S.

MacDonald

Paterson

et al. (

2011

)

Systematized nomenclature of medicine clinical terms (snomed ct) to represent computed tomography procedures

Comput. Methods Programs Biomed.

101

324

–

329

[145]

Fan

and

(

2006

) A hybrid model of image retrieval based on ontology technology and probabilistic ranking. In:

Web Intelligence, 2006. WI 2006. IEEE/WIC/ACM International Conference on,

IEEE

Hong Kong, China

477

–

480

[146]

Wang

Chia

L.-T.

and

Liu

(

2007

) Semantic retrieval with enhanced match-making and multi-modality ontology. In:

Multimedia and Expo, 2007 IEEE International Conference on

IEEE

Beijing

516

–

519

[147]

Liu

Shao

and

Liu

Ontology-based image retrieval with sift features. In:

Pervasive Computing Signal Processing and Applications (PC-SPA), 2010 First International Conference on

IEEE

Harbin

464

–

467

[148]

Ballan

Bertini

Del Bimbo

and

Serra

(

2010

)

Video annotation and retrieval using ontologies and rule learning

IEEE MultiMedia

–

[149]

Sombatsrisomboon

Matsuo

and

Ishizuka

(

2003

) Acquisition of hypernyms and hyponyms from the WWW. In:

Proceedings of the Second International Workshop on Active Mining, France

[150]

Onno Paap

. (

2006

) Accelerating Deployment of ISO 15926 (ADI).

Technical report. FIATECH Member Meeting.

[151]

Onno Paap

and

Fluor Corporation

2008

. ISO 15926 for interoperability. In

W3C Workshop on Semantic Web in Oil & Gas Industry

Houston, TX, USA

[152]

Halvorsen

and

Hansen

B.J.

(

2011

)

Integrating military systems using semantic web technologies and lightweight agents

FFI-notat

1851

2011

[153]

Krummenacher

Simperl

Cerizza

et al. (

2009

)

Enabling the european patient summary through triplespaces

Comput. Methods Programs Biomedicine

S33

–

S43

[155]

Rehman

Javed

and

Babri

H.A.

(

2017

)

Feature selection based on a normalized difference measure for text classification

Inform. Process. Manag.

473

–

489