PDF Archive

Easily share your PDF documents with your contacts, on the Web and Social Networks.

Share a file Manage my documents Convert Recover PDF Search Help Contact



10.1016@j.knosys.2020.106050 .pdf


Original filename: 10.1016@j.knosys.2020.106050.pdf

This PDF 1.7 document has been generated by Elsevier / pdfTeX, and has been sent on pdf-archive.com on 14/08/2020 at 16:04, from IP address 31.7.x.x. The current document download page has been viewed 154 times.
File size: 1.1 MB (23 pages).
Privacy: public file




Download original PDF file









Document preview


Journal Pre-proof
A novel meta-matching approach for ontology alignment using
grasshopper optimization
Zhaoming Lv, Rong Peng

PII:
DOI:
Reference:

S0950-7051(20)30340-3
https://doi.org/10.1016/j.knosys.2020.106050
KNOSYS 106050

To appear in:

Knowledge-Based Systems

Received date : 13 January 2020
Revised date : 15 April 2020
Accepted date : 17 May 2020
Please cite this article as: Z. Lv and R. Peng, A novel meta-matching approach for ontology
alignment using grasshopper optimization, Knowledge-Based Systems (2020), doi:
https://doi.org/10.1016/j.knosys.2020.106050.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.

© 2020 Published by Elsevier B.V.

Journal Pre-proof

*Revised Manuscript (Clean Version)
Click here to view linked References

Zhaoming Lv, Rong Peng*

lP
repro
of

A novel meta-matching approach for ontology alignment using grasshopper
optimization

School of Computer Science, Wuhan University, Wuhan, 430072, China

Abstract

Ontology alignment is a fundamental task to support information sharing and reuse in heterogeneous
information systems. Optimizing the combination of matchers through evolutionary algorithms to align
ontology is an effective method. However, such methods have two significant shortcomings: weights need to
be set manually to combine matchers, and a reference alignment is required during the optimization process.
In this paper, a meta-matching approach GSOOM for automatically configuring weights and threshold using
grasshopper optimization algorithm (GOA) has been proposed. In this approach, the ontology alignment
problem is modeled as optimizing individual fitness of GOA. A fitness function is proposed, which includes
two goals: maximizing the number of matching and the average similarity score. Since it does not require an
expert to provide a reference alignment, it is more suitable for real-world scenarios. To demonstrate the
advantages of the approach, we conduct exhaustive experiments tasks on several standard datasets and
compare its performance to other state-of-the-other methods. The experimental results illustrate that our
approach is more efficiently and is significantly superior to other metaheuristic-based methods.
Keywords : Ontology matching; Grasshopper Optimization Algorithm; Ontology alignment evaluation
initiative

1. Introduction

Jou

rna

Ontologies provide specification which
formalizes the objects and their relationships of a
certain domain [1,2,3]. Ontologies are usually
applied to intelligent information processing and
data management [4-9]. The interoperability and
integration of ontology is key factor of
heterogeneous systems [10].
However, due to the subjectivity of ontology
developers, many ontologies appear to overlap in the
same field. This problem has become a huge
challenge for ontology applications [11]. In order to
address this problem, the ontology matching or
ontology alignment process needs to be
implemented, which can be considered as one of the
key technologies for knowledge exchange. The aim
of the ontology alignment is to find the semantically
identical concepts between two given ontologies. As
a result, the quality of ontology alignment directly
affects the scale of its application.
In recent years, even though many researchers
have proposed a variety of ontology alignment
methods and have solved some problems,
developing high quality matching systems is still a
challenge.
In general, finding all mapping among ongologies is

hard or even impossible to accomplish where
ontologies with tens of thousands of concepts are
common. This main difficulty lies in how to
combine different similarity measures naturally.
These techniques can be classified into three major
categories: lexical based measure, linguistic based
measure, structural based measure. Therefore,
thinking up an efficient searching approach will
significantly improve the performance of the
matching process. In fact, some methods have
demonstrated effective for computing a pair of
ontologies by combining multiple similarity
measures. Evolutionary algorithms have been
successfully applied to ontology alignment by
optimizing pairs of entities where the similarity of
pairs of entities is obtained by weighting combining
various similarity measures. However, such methods
need to set fixed weights and threshold manually by
expert, and even some methods need to provide
reference alignment during alignment process.
Furthermore, the reference alignment with few
concepts can be built by expert, but to build with
many concepts through expert is a very arduous task.
Therefore, how to configure appropriate weights and
threshold parameters and without reference
alignment during optimization process are a key
issue to be solved.

Journal Pre-proof

Section 2 we survey the state of the art and some
existing similar ontology alignment methods.
Section 3 introduces the definition of ontology and
ontology alignment and the theory of GOA.Section
4 discusses the different similarity measures
methods that will be used as the initial for
computing the aggregated similarity score between
concepts. Section 5 presents the optimization model
in formal way which is a new approach of ontology
alignment using GOA, and the time and space
complexity of the GSOOM algorithm has been
analyzed. Section 6 presents the datasets utilized in
our experiments and shows the result of the impact
of the GOA parameters on the convergence behavior
of the GSOOM algorithm. Section 7 summarizes
this work and proposes future work.

Jou

rna

lP
repro
of

Since the weights and threshold are difficult to
determine in a real scenario, researchers must
consider their uncertainty to make the research
problem closer to practical application. As we all
know, heuristic algorithm can usually find
acceptable solutions within a reasonable time of
unknown search space and widely used [12,13].
Therefore, to solve this problem, a novel
metaheuristic algorithm has been exploited. The
grasshopper
optimization
algorithm
is
a
nature-inspired metaheuristic algorithm. The
grasshopper optimization algorithm has been
proposed to provide benefit through high
exploration rate with efficient convergence ability
[14]. It well implements the important principle of
balanced exploration and exploitation in the
optimization technology. In detail, the high
exploration and local optimal avoidance of this
algorithm originates from the high repulsion rate
between
grasshoppers.
Exploitation
and
convergence are encouraged by the attraction forces
between the grasshoppers, and the adaptive comfort
zone [14]. Therefore, it has a high local optimal
avoidance as well. These characteristics make the
GOA stronger for multiobjective search-related
problem and outperform other algorithms such as
PSO, BA, ABC, and BFO. The GOA algorithm is
simple and effective, and can solve complex
problem with fewer iterations. In addition, the
feasibility of GOA in practical application has been
proved by many previous studies [15,16,17,18].
These main features motivated us to build a simple
and efficient ontology alignment model for
optimizing a fitness function with two objects using
grasshopper optimization algorithm which is
inspired by the better balance mechanism for
exploration and exploitation. The main contributions
of this article can be summarized as follows.
·The ontology alignment problem is formalized as
an optimization problem. Specifically, the
GSOOM algorithm is proposed to discover
mapping between two given ontologies.
·The proposed GSOOM has successfully tackled
the problem of configuring weights and threshold
automatically.
·The proposed fitness function of GOA has
successfully tackled the problem of dependency
on reference alignment.
·The proposed GSOOM algorithm has been
evaluated by using several standard datasets and
its hyper-parameters have been investigated for
impacting on the performance of GSOOM by
using real-world ontology.
The rest of the paper is organized as follows. In

2 Related works

Over the years, automatic and semi-automatic
ontology alignment methods have been proposed
and evaluated on Ontology Alignment Evaluation
Initiative (OAEI). These methods usually utilize a
variety of matching techniques, including combining
element-level and structural-level, using background
knowledge and user interaction and so on. Silva et al.
[19] proposed an interactive method by using expert
feedback, semantic and structural techniques. The
experimental shows good quality of matching.
However, if the expert makes a mistake, it is more
prone to decrease the F-measure. Destro et al. [20]
proposed a cross-lingual ontology alignment method
which combines semantic and syntactic similarity
measures by weighting average. The weights and the
threshold are manually set. Tang et al. [21] proposed
a method by using hybrid matching strategies. It has
achieved preferable performances in some tasks.
The weights and the threshold are predefined.
Djeddi et al. [22] proposed a method to tackle the
issue of matching large-scale ontologies by using
various similarity measures of different categories
whose weights and filtering threshold are set
manually in filtering and combining matchers
module. The scale of application of such methods is
restricted.
The evolutionary algorithms (EAs) have been
successfully applied to ontology alignment methods.
Where there are two different categories for
ontology alignment problem, the first category of
using EAs is that it is used optimizing ontology
alignment problem directly. In this field, Wang et al.
[23] proposed GAOM (Genetic Algorithm based
Ontology Matching). They define a global similarity
measure function which is used to evaluate each
solution. But, the properties and structural similarity

Journal Pre-proof

3 Preliminaries
In this section, the definitions of ontology and
ontology matching, adopted from [33] [34], are
given, and a principle is explained for the novel
meta-heuristic algorithm GOA.

Jou

rna

lP
repro
of

of concepts is not considered. Jurgen and Jan [24]
utilize discrete particle swarm optimization
algorithm to optimize ontology alignment, called
MapPSO. In detail, MapPSO utilizes a fitness
function combining lexical, linguistic and structural
matchers through weights, where the weights are
predefined. Ryma and Khollad [25] came up with
a hybrid approach which incorporates the Hill
Climbing algorithm within the mutation operator in
the genetic algorithm. It accelerates the
correspondences discovery for large ontology.
The second category is meta-matching method
in which the optimal hyper-parameters of an
alignment system are discovered. The most notable
ones are GAOL (Genetics Algorithm for Ontology
Alignments) is a meta-matching approach proposed
by Martinez-Gil et al. [26] to discover the optimal
weights configuration for combining various
similarity measures. This algorithm optimizes the
weights by maximizing the F-measure as a fitness
function of solution, while the main inconvenient of
GAOL is that a reference alignment is required.
Ginsca and Iftene [27] proposed similar method in
which they focus on optimizing the whole similarity
aggregation step as a single unit which includes a
threshold. The experimental results verified that
using evolutionary algorithm to optimize weights
and threshold is better than initial configuration.
Romero et al. [28] proposed based on genetic
algorithm for optimal ontology matching. However,
the algorithm still needs a reference alignment
during alignment process. In addition, Acampora et
al. [29] utilize improving memetic algorithm
optimizing ontology alignment without reference
alignment. The experimental results demonstrated
that the proposed memetic algorithm outperformed
genetic algorithm. In a similar way, Xue and Wang
[30] propose an ontology alignment method through
an approximate F-measure without using the
reference alignment. The match coverage and match
ratio are introduced to replace the recall and
precision. The experimental results verified the
effectiveness of the proposed method based on
evolutionary algorithm. There are also other
improved methods based on the EAs such as the
population-based incremental learning (Xue and
Chen [31]) and the multi-objective genetic
algorithm-II (Biniz and Ayachi [32]), which solve
the ontology alignment problem as well.
In this article, a single objective optimization
model is constructed by GOA to solve the problem
of configuring weights and threshold automatically.
And, a fitness function is also proposed which
combines two optimization objectives by convex
combination.

3.1 Ontology and Ontology Matching

Definition 1 (Ontology). Ontology O is a 5-tuple (C,
OP, DP, I, A)
Where:
1) C is the set of classes that are the fundamental
concepts in a domain;
2) OP is the set of object properties explaining the
relation of two concepts;
3) DP is the set of data properties defining the
features of the concepts;
4) I is the set of individuals that the real world
representing the instances of a modeled concepts;
5) A is the set of annotations that the information is
provided author friendly. Examples are labels and
comments.
Definition 2 (Ontology Matching). An ontology
matching process can be seen as a function f
which, from two given ontologies to match O1 and
O2 ,an input alignment A, a set of parameters p and a
set of resources s, returns an alignment A’ between
these ontologies:

A'  f (o1 , o2 , A, s, p)

(1)

'

The alignment result A is a set of correspondences
which can be defined as 4-tuple (e1 , e2 , v, r ) , where
e1 and e2 are the entity of the source ontology O1 and
target ontology O2, respectively; v is the similarity
score of a correspondence; r is a relation such as
equivalence (≡), more general (  ), less general
(  ), etc. The similarity score of pair of entities is
computed by multiple matchers.
Definition 3 (Similarity Measure). A similarity
measure for a pair of entities is a function f (e1 , e2 )
which calculates the similarity score of two input
entities e1 and e2, where the score in the range [0, 1].

3.2 The grasshopper optimization algorithm
The grasshopper optimization algorithm is a
new heuristic swarm intelligence optimization
algorithm proposed by Shahrzad Saremi et al. in
2017 [14].The grasshopper optimization algorithm
simulates the foraging process of that grasshoppers
search for food sources and move to local food

Journal Pre-proof

 N

ub  lbd
X id  cw   cu d
s  r  dij   Td
 j 1

2
 j i


(2)

where ubd and lbd are the upper bound and the

lower bound in the d dimension space, Td is the
optimal target value that is searched in the d
dimension space. The cw on the left is similar to the
inertia weight in the PSO algorithm [35] which
balances the algorithm's exploration and
exploitation. However, the internal cu is a
decreasing coefficient to shrink the comfort zone,
repulsion zone and attraction zone. cw and cu are
calculated as follows::
cw  cu  cmax  l

cmax  cmin
L

(3)

where l indicates the number of current iterations,
and L is the maximum number of iterations. cmax
is the maximum value, cmin is the minimum value.
The s function which defines the social forces.
It maintains the comfort zone, attraction region, and
repulsion region by appropriate parameters γ and τ.
It is calculated as follows:
r

s (r )   e   e  r

rna

(4)
where γ indicates the intensity of attraction and τ is
the attractive length scale. Parameter r represents the
distance the ith and jth grasshopper. The distance
between the ith and jth grasshopper is calculated by
formula as follows:
dij | x j  xi |
(5)
d ij 

x j  xi

(6)

d ij

The formula (6) is a unit vector from the ith
grasshopper to the jth grasshopper. In this work, we
will investigate the impact of parameters of GOA on
the performance of the GSOOM algorithm,
respectively. The pseudocode for GOA is
summarized in Algorithm 1.

Jou

similarity measure can also be called a matcher. It is
a good way to use a broad-spectrum matcher for
covering entities with different characteristic. In our
approach, in order to cope with various types of
information between two ontologies, five types of
matchers are developed and stored in a repertory.
The similarity measures are usually divided into
three major categories.

lP
repro
of

sources and consume food.The algorithm simulates
these behaviors of grasshoppers through a position
update formula. The next position of the
grasshopper is updated according to its current
position, global target value and the position of all
other grasshoppers. The specific mathematical
model is defined as follows:

4 Similarity measures

The similarity measures are fundamental task
of ontology alignment, which compute the similarity
score of each concept between ontologies. A

Algorithm1: Grasshopper Optimization Algorithm
(GOA)
Input: The population Xi (i=1, 2… n), cmax, cmin, and
maximum number of iterations (maxIter)
Output: The best individual X
CalculateIndividualFitness (Xi);
X
selectBestIndividual (Xi);
T
getBestFitness (X);
while (l < maxIter)
Update c using Eq. (3);
for search each individual
Normalize the distances between
grasshoppers in the range [1, 4]
Update the position of the current individual
by Eq. (2);
Boundary check about the current individual;
end for
Update T if there is a better solution;
l=l+1;
end while
Return X;

4.1 Lexical based measure

To map two entities, a similarity in name or
annotations is computed usually by using the
lexical-based matcher. In detail, the lexical matcher
can compute a similarity score of two entities using
concept name, a natural language label or comment
which in OWL representation are included in an
entity through the rdfs: label and rdfs: comment.
Generally, if the concept makes uses of normal
vocabulary, the lexical matcher only needs to
calculate the concept name to achieve the similarity.
If the ontology concept adopts random characters or
abbreviations, the matcher needs to calculate the
similarity score by using label and comment. In our
matchers’ repertory, the Entity Name Distance
Matcher, Entity Label Distance Matcher and Entity
Comment Distance Matcher are developed. The first
one matcher utilizes a String-Matching Ontology
Alignment (SMOA) distance measure [36] and the
latter two matchers utilize a vector space model
(VSM). In detail, The SMOA distance considers that
the similarity between two entities relates to their
commonalities and their differences. The function of
commonality is given by the following formula:

Journal Pre-proof

2*  i length( MaxComSubStringi ) (7)
length(s1 )  length(s2 )

where comm (s1, s2) stands for the commonality. The
s1 and s2 are the name of two entities.
The function of differences is as follows:
(8)
uLens1  uLens 2
Diff ( s , s ) 
1

2

p  (1  p ) *(uLens1  uLens 2  uLens1  uLens 2 )

where p ∈ [0, 1) is a parameter used to adjust a
different importance to the difference component of
the SMOA algorithm (typically is used p=0.6) and
uLens1 and uLens2 represent the length of the
unmatched substring from initial string s1 and
s2.Therefore, the similarity should be a function of
both these features. The metric method is defined by
the following equation include above two features.
Sim( s1 , s2 )  Comm( s1 , s2 )  Diff ( s1 , s2 )  winkler ( s1 , s2 ) (9)
where winkler (s1, s2) is an improvement
edit-distance by Winkler.
(10)
Winkler (s1 , s2 )  Sim  (lp(1  Sim))
0

Sim   1 m m m  t
3 ( s  s  m )
1
2


if m  0
otherwise

(11)

where Sim is Jaro similarity between s1 and s2, s1 is
length of string s1, s2 is length of string s2, m stands

rna

for the number of matched, t stands for the number
of character conversions, l is the prefix length of the
common string and p is a constant factor ( typically
is used p=0.1).
The Vector Space Model method considers the
transformation of text content processing into vector
space, and expresses semantic similarity by
calculating similarity in vector space [37]. In the
context of ontology, the text indicates the rdfs: label
or rdfs: comment annotations which are natural
language sentences or phrases. In detail, let consider
two entities e1 and e2 and the corresponding label or
comment m1 and m2. For each entity, a set of terms B
= {b1, b2, ⋯, bn} is extracted by the corresponding

Jou

label or comment. Then, let B  (b1 , b2 , , bk ) be
the vector composed of the set of terms of m1 and m2
with no duplicates. Finally, the vector of the ith
entity (with i=1, 2) is represented as follows:

Di  (ui1 , ui2 , , uih )

where each

ui j stands for the weight of term bi∈Bi.

The similarity between the two labels or comments
m1 and m2 is the cosine angle between D1 and D2

cos  

D1  D2

D1 D2

and hence the similarity of two entities e1 and e2
computes as follow
sim(e1 , e2 )  1  cos 
(13)

lP
repro
of

Comm( s1 , s2 ) 

(12)

4.2 Linguistic based measure

When the information about two entities has
different description morphology but similar
semantics, the lexical-based matcher will fail to
calculate their similarity. Therefore, linguistic-based
matcher is exploited to recognize the semantic
information of pairs of entities. Furthermore, this
matcher calculates the similarity score by using
external linguistic dictionaries or linguistic corpora.
Before using external resources, the information of
entity needs to be normalized such as tokenization,
lemmatization and removing stop word as done in
the preprocess phase. In our work, we use the
assistance of the WordNet lexical database [38] as
the external knowledge for calculating similarity.
WordNet is a large lexical database which is
organized as a graph and can provide a synonym
sets for a given word. In our matchers’ repertory, the
Entity Name WordNet Matcher is developed. The
semantic similarity between the two entity names e1
and e2 is calculated as follows:
1 e1  sysnset(e2 ) or vice versa
(14)

Simsem (e1 , e2 )  0.5 e1 and e2 have a hyponymy-hypernymy relation
0 e  antonyms(e )
1
2


where sysnset (e2) is the set of synonyms of entity e2
and antonyms (e2) is the set of antonyms of entity e2.

4.3 Structural based measure

The structural-based matchers compute a
similarity score between two given entities based on
their kinship (parents and children) in the context of
ontology. In our matchers’ repertory, the Hierarchy
Distance Matcher and Numbered Hierarchy
Distance Matcher are developed [39]. The first
matcher computes similarity between two entities
based on the inheritance relation between super
entities. In detail, suppose that ep1 and ep2 are two
super entities of ontology O1 and O2 and e1 and e2
are sub-entities of ep1 and ep2, respectively, and ep1
and ep2 are equivalent relations with similarity score
σ, then the distance of entity e1 and e2 is computed as
follows:
(15)
d similar  1  
The second matcher exploited in this work
computes a structural similarity based on the
number of super entities and sub entities for an
entity in the context of ontology. The calculation

Journal Pre-proof

formula is as follows:
abs(e1sub  e2su b ) max(e1sub , e2su b )

(16)

where e1sup and e2sup are the number of super entities
of entity e1 and e2, respectively. The e1sub and e2sub
are the number of sub entities of entity e1 and e2,
respectively.

5 Ontology alignment using grasshopper
optimization algorithm
We now investigate components in GSOOM.
GSOOM first computes the similarity of each
concept from the source ontology to the target
ontology. Having computed the similarities, Prior to
that, we need to model a solution, an evaluation
function and optimization model in order to be able
to use the GOA for ontology alignment.

5.1 Representation of solution

Each solution to the GOA represents an
alignment between given two ontologies, and can be
seen as a set containing pairs of entities from two
ontologies.More in detail, the set can be defined as a
vector whose each element indicates similarity
score.Therefore,the optimal solution is obtained by
optimizing the weights and threshold to the given
two ontologies. Such optimization process is called
meta-matching [40]. The solution vector X is
represented as follows:

rna

 X   v1 , v2 , ,vn 

m
m
(17)

vn   wi  simi (ce ), s.t. wi  1
i 1
i 1


where ce stands for a correspondence. wi stands for
ith weight to be aggregated. simi , i∈[1,m] is the ith
similarity measure and m is the number of matchers.

vi ,i∈[1,n]

denotes the similarity score of each
correspondence aggregated by weighting.

5.2 The evaluation function of solution

Jou

pairs of entities, this maximum matching probability
is proposed to be expressed by CurrMatchRate; To
maximize each correspondence similarity score, the
maximization average similarity score is exploited
in evaluation function [24]. As the result, the
evaluation function F (A) is proposed as follows:
F ( X )   CurrMatchRate  (1   ) f ( X ) (18)

lP
repro
of

d similar (e1 , e2 ) 

abs(e1sup  e2sup ) max(e1sup , e2sup )

The evaluation function is also the fitness
function of GOA algorithm. In this section, we
propose an evaluation function which does not need
reference alignment. Furthermore, our approach can
be applied to real-world scenarios. We note that
ontology matching problem needs to optimize the
two objectives, that is, to maximize the number of
mapping and the similarity score of mapping. In
detail, in order to maximize the number of mapping

CurrMatchRate 

|X|
Sum(O1 , O2 )


f (X ) 

|X |

(19)

c

i 1 i

(20)

|X|

where |X| is the number of correspondences of
current solution, Sum (O1, O2) is the number of
correspondences of the source ontology O1 and the
target ontology O2, f(X) is the average similarity
score of the current solution.ci is ith
correspondence’s similarity score. This α is a
coefficient to balance the two objectives.

5.3The optimization
algorithm

model

of

GSOOM

As discussed before, the representation of
solution and the evaluation function is the heart of
the GSOOM algorithm. Next, this most critical step
is how to update these historical solutions through
the evaluation function. In this section, we define an
optimization model that involves a solution vector
and a constraint. More in detail, if the newly
generated solution is superior to the historical
solution, this historical solution will be replaced.
Such a process is recurrently repeated for some
number of iterations. Finally, this best solution will
be found under this constraint. The optimization
model is defined as follows:

 X  (v1 , v2 , , vn )
Given : 
(21)
vn  [0,1]
Find : X best | F ( X best )  F ( X )
where X is a dynamic vector of n-dimensional and
represents a candidate solution to the problem;

vn , n∈[1,N] denotes the similarity score of each
correspondence aggregated by weighting. N is the
number of maximum correspondences. Xbest is the
best solution. F is an evaluation function which is
presented in subsection 5.2.
5.4 The GSOOM algorithm
Having defined the optimization model based

Journal Pre-proof

For the semantic similarity, the similarity
measures need to find the least common ancestor in
the WordNet between any pair of nodes. The depth
of a balance tree over the nodes is at most O (log n)
[41]. As the result, the final time complexity of the
similarity measure is O (n2log2n). The space
complexity is O (n2).
For the second factor, the time complexity of
optimization process. Since the GOA needs to
initialize the populations, this time complexity is O
(n), Furthermore, in each iteration, an optimal
individual is selected from the n populations.
Therefore, this time complexity is O (miter n), where
miter is the maximum number of iterations. The space
complexity is O (n)
The total time and space complexity of the
GSOOM algorithm are O (n (n+nlog2n+1+miter) and
O (n2), respectively.

lP
repro
of

on the solution vector and evaluation function, we
can apply the GOA algorithm ideas described in
Section 3.2 for finding best alignment. The overall
ontology alignment algorithm is summarized in
Algorithm 2.

5.5 Time and space complexity analysis

Jou

rna

In this section, we investigate the time and
space complexity of GSOOM algorithm. Firstly, the
five different similarity measures are computed for
the input ontologies and aggregate them. The
process of computing and the number of entities is
two important factors for time and space complexity.
In addition, the computational complexity of
optimization process is also important factor. In the
following, we analyze time and space complexity
regarding these two factors.
The computational cost of the lexical similarity
measure is decided by the Jaro-Winkler distance.
The Jaro-Winkler algorithm needs to compute the
number of matched and conversions of character
between two given string. Its time complexity is O
(n1n2), where n1 and n2 are the length of the input
strings. Since the length of ontology entity's name is
almost constant, in brief, we assume has an O (1)
time complexity of similarity for a pair of entities.
The total time complexity of computing the
similarity matrix is O (n2), where n stands for the
maximum number of entities between two given
ontologies. The space complexity of this SMOA is O
(n2).
The computational cost of the VSM algorithm
is dominated by the TF-IDF technology. The
TF-IDF needs to compute the frequency of the word
w in the bag of words and the inverse fraction of
strings that contain w. The time complexity of this
operation is O (t1+t2), where t1 and t2 are the number
of the tokens in the text d1 and d2, respectively. In
brief, we assume the t1 is equal to t2, and then have
an O (n) time complexity of similarity for a pair of
entities. As a result, the total time complexity of
computing the matcher matrix is O (n+n2), where n
is the number of entities. The space complexity of
this VSM is O (n2).
For the structural similarity, the Hierarchy
Distance Matcher needs to find the number of
superclasses. Since the superclasses of each concept
are stored in hash table, the time complexity is O (1).
Therefore, the total time complexity of computing
the matcher matrix is O (n2), where n is the number
of entities. Similarly, the time complexity of
Numbered Hierarchy Distance Matcher is also O
(n2). Its space complexity is O (n2).

Algorithm 2: Grasshopper Search Optimizing
Ontology Alignment (GSOOM)
Input: The parameters of GOA (pop_size, cmax, cmin),
termination criteria (maxIter), source and target
ontologies O1 and O2
Output: BestAlignment
Pop
InitializePopulation(pop_size);
MapSet
InitializeCorrespondence (O1, O2);
EvaluateIndividualFitness (pop);
X
SelectBestIndividual (pop);
T
GetBestFitness (X);
bestAlignment
ExtractAlignment (X, MapSet);
while (l<maxIter)
Update c using Eq. (13)
for search everyone in Pop
Normalize the distances between grasshoppers
in
the range [1, 4]
Update the position of the current individual X
by Eq. (12);
EvaluateIndividualFitness (X);
Update X if it is a better solution;
Boundary check about the current individual;
end for
X
SelectBestIndividual (pop);
bestAlignment
ExtractAlignment (X, MapSet)
T
UpdateBestFitness (X)
Update T if there is a better solution;
l=l+1;
end while
Return BestAlignment

6 Experiments
In this section, we investigate the performance
of the GSOOM algorithm on benchmark and
conference datasets containing multiple real-world
ontologies. First, we investigate the influence of

Journal Pre-proof

information, or replacing concepts with random
strings to determine the strengths and weaknesses of
different alignment algorithms [44]. More
specifically,these ontologies can be classified into
five groups, as show in Table 1.The five groups have
been placed into three categories: 1XX, 2XX, 3XX
in which 1XX test cases are used to test concept,
and 2XX test cases are used to compare different
modifications, and 3XX test cases are real-world
ontologies from the same field developed by
different organizations.
The conference track includes seven ontologies
which are suitable for ontology alignment task
because of their heterogeneous character of origin
[45]. These ontologies are described in Table 2
In order to evaluate our approach from a
comprehensive and detailed way, we utilize
precision, recall and F-measure [42], just like OAEI
method. Precision represents how many of the
predicted positive samples are genuine positive
samples. The recall indicates how many positive
examples of the sample have been predicted
correctly. In ontology matching, Precision is the
percentage of correctly matching entities in all
entities, and recall is the percentage of correctly
matching entities in all correct matching as defined
to follow:

lP
repro
of

parameters npop (number of population), cmax
(maximum value of parameter c), cmin (minimum
value of parameter c), γ, τ and miter (maximum
number of iterations) of GOA algorithm on the
performance of GSOOM on a real-world dataset.
This task takes into account the convergence speed,
matching accuracy and resource consumption of the
algorithm.
As the result, the optimal values of these
parameters will be determined and configured into
the GSOOM algorithm. Subsequently, GSOOM was
carried out throughout the test datasets to complete
the experimental task. Finally, we give the alignment
quality of the GSOOM algorithm and compare it
with the state-of-the-art and similar methods on
different datasets. Next, the test datasets and
evaluation method is described in Section 6.1.

6.1Experimental datasets and evaluation method

rna

In the experiment, we take advantages of the
datasets from two tracks of the OAEI, i.e.,
benchmark and conference tracks.
The benchmark track which is wide in feature
coverage, progressive and stable. It serves the
purpose of evaluating the strength and weakness of
matchers by being progressive and wide coverage.
The organization evaluates the performance of
matchers which are taking part in the OAEI’s
competition and the results of OAEI’s official web
site are available. Therefore, the evaluation of the
ontology alignment algorithm is reasonable and fair
by using the benchmark and conference track. The
benchmark track to consist of 51 ontologies, all
from the same domain. Most of the ontologies are
manually modified, such as deleting some natural
language labels, comments and structural

correct _ matching _ entities
all _ entities
correct _ matching _ entities
recall 
all _ correct _ matching
The F-measure is the harmonic mean of
precision and recall:
precision  recall
F  measure  2 
precision  recall
precision 

Table1. The description of OAEI benchmark track

Descriptions
Test ontologies are regular ontologies and other with a restriction
Ontologies with same structure and different lexical and linguistic features
Ontologies with same lexical and linguistic features and different structure
Ontologies with different lexical, linguistic and structure features
Test ontologies are real world ontologies

Jou

Name Test cases
1XX
101-104
2XX
201-210
2XX
221-247
2XX
248-266
3XX
301-304

6.2 Empirical study of the impact of dynamic
parameters
In this section, we investigate the performance
of the GSOOM algorithm on the real-world
ontology ID5-304. In order to determine the best

parameters, the important parameters npop, cmax, cmin,
γ, τ and miter of the GOA algorithm are set to
different values for our empirical evaluations to
achieve the best performance. Therefore, there are
six different scenarios in which only one parameter
is varied while the values of other parameters are
fixed. These scenarios are shown in Table 3.

Journal Pre-proof

Name
Cmt
Conference
confOf
edas

lP
repro
of

Number of population (npop). To investigate the
algorithm has been updated and stored high-quality
impact of parameter npop on the performance of the
information, which is updated in 20 populations.
GSOOM algorithm, we fix the values of c max=1,
However, more populations may increase redundant
cmin=0.00001, γ=0.5, τ =1.5 and miter =10 and change
information, which lead to a decrease in
the size of the population size npop from 1 to 60 with
performance of algorithm. As the result, npop=20 is
steps of size 10. In Table 4, the results of the first
chosen for all tested instances. Note that when the
Scenario (I) in Table 3 are showed for the GSOOM
size of population is npop=1, the diversity of this
algorithm. As can be seen from Table 4, when the
population is strictly limited. Since the update of the
size of the population is npop=20, our algorithm
GOA individual position requires the information of
achieves the best result, making 20 the most suitable
position of other individuals, the current position
value compared to other values. The results also
and the target value. Therefore, the algorithm GOA
indicate that the quality of this candidate solution is
cannot update the individual's position, which also
better during the algorithm execution. This fact can
indicate that the performance of the GSOOM
also be inferred from Table 4. When the population
algorithm cannot be improved.
size increases above 20, the performance of the
algorithm decreases. This may be because the
Table 2. The description of OAEI conference track
Number of Classes

Number of Datatype Properties

Number of Object Properties

36
60
39
104

10
18
23
20

49
46
13
30

0

33

3

38

11

17

ekaw

74

iasted

140

sigkdd

49

Table 3. We consider six different scenarios for evaluating the impact of important parameters on the
performance of the GSOOM algorithm. For each scenario, five parameters are fixed and only vary the test
parameter to examine its influence on performance.
Fixed parameters
Variable parameter
Values
cmax=1, cmin=0.00001, γ=0.5
I
npop
1, 10, 20, 30, 40, 50, 60
τ =1.5, miter =10
npop=20, cmin=0.00001, γ=0.5
0.1, 0.2, 0.3, 0.4,0.5, 0.6, 0.7,
II
cmax
τ =1.5, miter =10
0.8, 0.9, 1
npop=20, cmax=1, γ =0.5
1, 0.01, 0.001, 0.0001,
III
cmin
τ =1.5, miter =10
0.00001, 0.000001
npop=20, cmax=1, cmin=0.01,
IV
γ
0.0, 0.2, 0.4, 0.5, 0.6, 0.8, 1.0
τ =1.5, miter =10
npop=20, cmax=1, cmin=0.01
V
τ
1.0, 1.2, 1.4, 1.5, 1.6, 1.8, 2.0
γ=0.8, miter =10
npop=20, cmax=1, cmin=0.01
10, 20, 30, 40, 50, 60, 70,
VI
miter
γ=0.8, τ =1.0
80,100
smaller cmax will decrease the performance of the
Maximum value of parameter c (cmax).
The parameter cmax is the maximum value which is
GSOOM algorithm. When the cmax increases above
an important factor for balancing exploration and
0.5, the performance of the algorithm increases
exploitation. Furthermore, cmax is the main factor
gradually. The reason is that the smaller cmax reduces
affecting the algorithm's exploration capability. In
the performance of exploration and enters the local
Fig.1, the second scenario is carried out to
search too early. When the maximum value of c is
investigate the impact of cmax on the performance of
cmax=1.0, our algorithm achieves the best result,
the GSOOM. The cmax is changed between 0.1 and 1
making 1.0 the most suitable value compared to
with steps of size 0.1. As can be seen from Fig.1, the
other values. Therefore, cmax=1.0 is chosen for all

Jou

rna

Scenario

Journal Pre-proof

investigate one important parameters γ that is the
intensity of attraction among grasshoppers
illustrated in equation (4) .In Table 6, we show the
results of the forth Scenario (IV) for GSOOM
algorithm. As can be seen from Table 6, when the
value of γ is γ=0.8, our algorithm achieves the best
result which also indicate that the intensity of
attraction is stronger among grasshoppers. However,
when the value of γ is γ=1.0, the performance of this
algorithm significantly declines. Therefore, γ=0.8 is
chosen for the most suitable value of all tested
instances.
Table 5.
The best Target Fitness value for the GSOOM
algorithm on the 304 data set for the different values
of cmin corresponding to Scenario (III) in Table 3.

lP
repro
of

tested instances.
Minimum value of parameter c (cmin).
The parameter cmin is the minimum value which is
an important factor for balancing exploration and
exploitation. Furthermore, cmin is the main factor
affecting the algorithm's exploitation capability. In
Table 5, the third scenario is conducted to
investigate the impact of cmin on the performance of
the GSOOM. The cmin is changed between 0.000001
and 1 with steps of size 10. As can be seen from
Table 5, the smaller or larger cmin will decrease the
performance of the GSOOM algorithm.
When the cmin reduces below 0.001, the
performance of the algorithm starts to decrease. The
reason is that smaller cmin will reduce the
exploitation performance, and furthermore the
algorithm is too slow to enter the local search in a
certain number of iterations. Note that when the
minimum value is cmin =0.1 or cmin=1.0, the fact that
the algorithm converges prematurely does not
improve the performance of our algorithm in
selecting many candidate solutions. Therefore, cmin
=0.01 is chosen for all tested instances.

cmin
1
0.1
0.01
0.001
0.0001
0.00001
0.000001

Target Fitness
0.56
0.57
0.60
0.56
0.57
0.57
0.55

rna

Table 6.
The best Target Fitness value of the GSOOM
algorithm on the 304 data set for the different values
of γ corresponding to Scenario (IV) in Table 3.

Fig. 1. The impact of the maximum value cmax on the
performance of the GSOOM corresponding to Scenario
(II) in Table 3.

Table 4. The best Target Fitness value for the
GSOOM algorithm on the 304 data set of the
different size of population corresponding to
Scenario (I) in Table 3.
Target Fitness
0.560
0.570
0.600
0.574
0.577
0.574
0.568

Jou

npop
1
10
20
30
40
50
60

The intensity of attraction ( ).
As discussed before, the npop=20, cmax=1.0 and
cmin=0.01 were determined .In this experiment, we

γ
0.0
0.2
0.4
0.5
0.6
0.8
1.0

Target Fitness
0.550
0.570
0.565
0.562
0.573
0.595
0.563

The attractive length scale ( ).
Next, we begin to investigate the second important
parameter τ that is the attractive length scale among
grasshoppers illustrated in equation (4). Before that,
the npop=20, cmax=1.0, cmin=0.01and γ=0.8 were
determined. We adjusted this parameter τ between 1
and 2, and the results are shown in Table 7. As can
be seen from Table 7, when the value of τ is τ=1.0,
our algorithm achieves the best result which also
indicates that attractive length scale is suitable for
GSOOM algorithm under selecting the candidate
solutions. Therefore, τ=1.0 is chosen for the most
suitable value of all tested instances.
Maximum number of iterations (miter).

Journal Pre-proof

τ
1.0
1.2
1.4
1.5
1.6
1.8
2.0

Target Fitness
0.599
0.598
0.567
0.593
0.558
0.572
0.574

Table 8.
The best Target Fitness value of the GSOOM
algorithm on the 304 data set for the different values
of miter corresponding to Scenario (VI) in Table 3.

rna

Target Fitness
0.564
0.575
0.567
0.599
0.569
0.600
0.600
0.600
0.600
0.600

Jou

miter
10
20
30
40
50
60
70
80
90
100

the GSOOM algorithm. The specific parameters are
configured as follows:
·Population size=20,
·Intensity of attraction γ=0.8,
·Attractive length scale τ=1.0,
·cmax=1,
·cmin=0.01,
·Termination condition=60

lP
repro
of

In this subsection, the last scenario is conducted to
investigate the impact of the maximum number of
iterations. We adjust the number of iterations
between 10 and 100 with steps of size 10 to
determine minimum value of miter. The results of the
scenario are showed for the GSOOM algorithm. As
can be seen from Table 8, when the number of
iterations is miter =60, our algorithm achieves the
best performance. Since the iteration number
increases above 60, the performance of this
algorithm has hardly changed. Therefore, miter =60 is
chosen for the most suitable value of all tested
instances.
Table 7. The best Target Fitness value of the
GSOOM algorithm on the 304 data set for the
different values of τ corresponding to Scenario (V)
in Table 3.

6.3 The alignment quality of the GSOOM
In this section, we investigate the alignment
quality of the GSOOM algorithm on benchmark and
conference track. The AP, AR and AF are used to
represent the average precision, average recall and
average F-measure, respectively. More precisely, the
result of each ontology is also the average of ten
independent runs. The best parameters obtained in
the experiments in Section 6.2 are used to configure

6.3.1 The benchmark track

In this section, we investigate the alignment
quality of the GSOOM algorithm on benchmark
track corresponding to 1XX, 2XX, 3XX in Table
1.As can be seen from Table 9 , the GSOOM
algorithm could provide reasonably good solutions.
The GSOOM obtained the best matching effect on
test case 1XX (101-104) and 2XX (221-247), and
the F-measure achieves 100% and 99% respectively.
The reason is that the lexical and linguistic
information are provided completely by developer.
Furthermore, the GSOOM performs perfectly on test
case 1XX.This value of precision and recall arrives
at 100%. In addition, the 2XX (221-247)
information is suppressed for their hierarchy,
individual, restriction of property and the relation
between properties. For this aspect, the performance
of the GSOOM is not suppressed, as the focus of the
GSOOM is based on the combination of lexical,
linguistic and hierarchy information.
For test case 2XX (201-210), the GSOOM also
achieves acceptable matching effect (the AF value
reaches 71%). The performance is suppressed since
lexical and linguistic information are suppressed or
replaced by a random one in 201 and 202. However,
see Appendix Aa, even though the comments are
suppressed or the labels are replaced by synonyms,
the ontology 203-208 is also obtained good
matching.
In test cases 2XX (248-266) which has poor
lexical and structural information, the GSOOM
algorithm acquired the worst results. However, these
ontologies which have poor lexical, linguistic and
structural information are rarely occur in real-world.
Having analyzed the alignment quality, we
started to compare with a similar method MapPSO
which is discrete particle swarm optimisation
algorithm-based approach, similar to the GSOOM.
The main difference between GSOOM and MapPSO
lies on the parameters determination and the way of
optimizing. For GSOOM, it is a meta-matching
approach and the parameters are optimized by
standard GOA algorithm; and for MapPSO, the

Journal Pre-proof

Table C.
Table 10 shows the AP, AR and AF of MapPSO
and GSOOM, respectively. As can be seen from
Table 10, GSOOM significantly improves the
alignment effect on all three indexes. The results
indicate that the strategy of parameter optimization
in GSOOM is vital not only to promote the
alignment performance, but also to reduce the
dependencies on experts’ experiences.

lP
repro
of

ontology alignment problem is modeled as a directly
global optimization of a mapping between two
ontologies and the parameters are assigned
according to the experience of user. We compare the
performance between our approach with the
MapPSO in all data sets and consider average
precision (AP), average recall (AR) and average
F-measure (AF) as quality measures. The matching
result of each ontology is shown in the Appendix

Table 9. The quality of the GSOOM on benchmark data sets.
Test cases
101-104
201-210
221-247
248-266
301-304

GSOOM

AP
1.00
0.75
0.99
0.12
0.59

AR
1.00
0.68
0.99
0.11
0.60

AF
1.00
0.71
0.99
0.12
0.59

Table 10. The AP, AR and AF a of the GSOOM and MapPSO algorithm on different ontologies.
Test cases

MapPSO

Jou

rna

AP
AR
AF
101-104
1.00
1.00
1.00
201-210
0.65
0.58
0.61
221-247
0.98
0.99
0.98
248-266
0.06
0.05
0.05
301-304
0.49
0.33
0.35
From the results of these experiments, the results
obtained by GSOOM and MapPSO are comparable
in test sets 101-104 and 221-247 which have enough
lexical, linguistic and structural information.
However, it can be seen from the data sets 201-210,
248-266 and the real ontologies 301-304 that the
performance of the proposed approach is
significantly higher than MapPSO, which indicates
that the application of GOA in ontology matching is
feasible and effective.
Finally, we select the-state-of-art systems that
participated in OAEI (AML, LogMap, LogMapLt,
XMap, Lily, CODI, AROMA, TaxoMap) as
comparison. The experimental results are shown in
Table 11A, 11B and 12. To facilitate the comparison,
the results of GSOOM are listed in Table 11A, 11B
and 12. Among them, the results of Table 11A and
11B are shown for test cases 101-266, since these
systems have no experimental results in 301-304.
From Table 11A to Table 12, although the
performance of AML, LogMap, LogMapLt, XMap,
CODI, AROMA and TaxoMap are higher than
GSOOM on AP or AR in some data sets, GSOOM
outperforms all of them on the index AF. It indicates

GSOOM

AP
AR
AF
1.00
1.00
1.00
0.75
0.68
0.71
0.99
0.99
0.99
0.12
0.11
0.12
0.59
0.60
0.59
that the overall performance of GSOOM is the best
in Table 11A, 11B, 12, as AF is a comprehensive
indicator.
From Table 11B, the performance of Lily is
best; the performance of XMap, CODI, AROMA
and GSOOM are almost equivalent in test cases
2XX (221-247). As Lily combines several effective
matching techniques under matching, the Generic
Ontology Matching (GOM), Ontology matching
debugging and Ontology matching tuning
techniques are used to enhance overall performance
[43]. However, it is not clear whether this debugging
technique requires reference alignment. In addition,
the ontology matching tuning technique is not
totally automatic, it is difficult to find out typical
statistical parameters that distinguish each task from
others. Meanwhile, learning from test datasets can
be time-consuming.
From the perspective of each test case, the
following observations could be got:
1) Test cases 101-104: All methods achieve
their best performances, excluding AML and
TaxoMap. The reason is that the names, labels and
comments of the entities in the ontologies to be

Journal Pre-proof

linguistic and different structure information, the
lexical matcher and the linguistic matcher in
GSOOM could get the best match; and the impact of
mismatch identified by the structure matcher can be
lessened by lowering its weight.
4) Test cases 248-266: As the information of
lexical, linguistic and structure of the ontologies is
strictly subtracted and scrambled, the performance
of all methods on this test cases is not as well as
those on other test cases. As the focus of the
GSOOM is based on lexical, structural and linguistic
information, its performance is degrading largely in
this category. But in real-world scenarios, ontology
developers are not keen to build such ontologies.
Therefore, the performance degrading in this
category is acceptable in our view.

lP
repro
of

aligned in this category have not been suppressed.
Therefore, various matchers (whether lexical
matchers, linguistic matchers, structural matchers or
the hybrid matchers) are able to get good results.
2) Test cases 201-210: As the ontologies to be
aligned in this category have same structure,
different linguistic and lexical information, GSOOM
is superior to the LogMap, LogMapLt, AML, XMap,
CODI, AROMA and TaxoMap methods as GSOOM
can automatically find the optimal weights and
threshold parameters to combine the matching
results from the lexical, linguistic, and structural
matchers.
3) Test cases 221-247: The performance of
GSOOM and Lily are the best in this category. As
the ontologies to be aligned have same lexical,

Table 11A.Comparison of GSOOM approach with the state-of-the-art ontology matching systems.
AML

LogMap

Dataset

LogMapLt

GSOOM

AP

AR

AF

AP

AR

AF

AP

AR

AF

AP

AR

AF

101

1.00

0.00

0.00

0.94

0.96

0.95

0.56

0.99

0.71

1.00

1.00

1.00

201-202

1.00

0.00

0.00

1.00

0.00

0.00

0.00

0.00

0.00

0.35

0.27

0.31

221-247
248-266
101-266

1.00
1.00

0.67
0.00

0.75
0.00
0.37

0.78
1.00

0.86
0.00

0.81
0.00
0.41

0.60
0.00

1.00
0.00

0.73
0.00
0.36

0.99
0.12

0.99
0.11

0.99
0.11
0.57

Table 11B.Comparison of GSOOM approach with the state-of-the-art ontology matching systems.
XMap

Lily

Dataset
AR

AF

101

0.94

1.00

0.97

201-202

1.00

0.00

0.00

221-247

0.97
1.00

0.96
0.00

0.96
0.00
0.48

248-266
101-266

GSOOM

AP

AR

AF

AP

AR

AF

1.00

1.00

1.00

1.00

1.00

1.00

0.99

0.90

0.94

0.35

0.27

0.31

1.00
0.89

1.00
0.44

1.00
0.54
0.80

0.99
0.12

0.99
0.11

0.99
0.11
0.57

rna

AP

Table 12. Comparison of GSOOM approach with the state-of-the-art ontology matching systems.
CODI
Dataset
AP

AR

101-104

1.00

0.99

201-210

0.73

0.30

AROMA

TaxoMap

GSOOM

AF

AP

AR

AF

AP

AR

AF

AP

AR

AF

0.99

1.00

0.98

0.99

1.00

0.34

0.51

1.00

1.00

1.00

0.39

0.86

0.66

0.88

0.25

0.38

0.75

0.68

0.71

0.99

0.98

0.97

0.97

0.95

0.96

0.94

0.61

0.61

0.99

0.99

0.99

248-266

0.27

0.00

0.07

0.27

0.00

0.01

0.47

0.00

0.06

0.12

0.11

0.11

301-304

0.95

0.44

0.59

0.80

0.54

0.63

0.71

0.31

0.44

0.59

0.60

0.59

Jou

0.60

221-247

101-304

0.54

0.59

5) Test cases 301-304: The ontologies to be
aligned in this category are real ontologies
developed by different organizations. The
performance of GSOOM in this category is the best.
From Appendix Table Aa, this test gets a precision
value of the 97% and a recall value of 91% on

0.39

0.64

304.In addition, the GSOOM algorithm reduces the
dependencies on experts’ experiences to set weights
parameters and does not need reference alignments.
Therefore, GSOOM is more suitable for ontology
alignment in real-world scenarios.

Journal Pre-proof

6.3.2 The conference track

lP
repro
of

The experiment in this section includes 21
matching tasks and each task consists of two
ontologies. Each task comprises of not only the
matching of classes but the matching of properties
as well.Table 13A and 13B show the precision,
recall,and F-measure of various systems along with
the
average.Concerning
average
F-measure,GSOOM outperforms all the other
approaches and is quite competitive with AML
which is the best alignment system in this track. In
terms of average recall, our ontology alignment
system achieves the best performances by the
margin of %4 from AML, the second-best system,
and by the margin of %10 from LogMap.In terms of
precision, AML and LogMap have better

performance than GSOOM.
In summary, GSOOM can achieve high quality
on different datasets. Compared with the discrete
MapPSO method, GSOOM significantly improves
the alignment quality in precision, recall and
F-measure; compared with the other 8 methods,
GSOOM does not need to configure appropriate
parameters of the weights and threshold, which
promotes the scalability and usability.
In conclusion, on one hand, GSOOM
transforms the ontology alignment problem into an
optimization problem, and optimizes the weights
and filter threshold of the matchers by
meta-matching, which eliminates the dependency on
experts' experiences and reference alignment; on the
other hand, based on the experiments, GSOOM has
been demonstrated to be able to promote the
performance of ontology alignment.

Table 13A.Comparison of GSOOM approach with the state-of-the-art ontology matching systems.
Datasets

AML

LogMap

F

R

cmt-conference

0.67

0.59

cmt-confOf

0.90

cmt-edas

0.91

cmt-ekaw
cmt-iasted

LogMapLt

Xmap

GSOOM

P

F

R

P

F

R

P

F

R

P

F

R

0.53

0.73

0.62

0.53

0.56

0.42

0.33

0.00

0.00

0.00

0.64

0.76

0.93

0.69

0.56

0.83

0.45

0.31

0.67

0.48

0.38

0.88

0.58

0.44

0.80

0.62

0.5

0.83

0.77

0.89

0.73

0.62

0.73

0.67

0.62

0.75

0.72

0.69

0.73

0.79

0.85

0.75

0.63

0.55

0.75

0.63

0.55

0.56

0.50

0.45

0.70

0.67

0.64

0.58

0.61

0.64

0.80

0.89

1.00

0.80

0.89

1.00

0.80

0.89

1.00

0.80

0.89

1.00

0.67

0.80

1.00

cmt-sigkdd

0.92

0.92

0.92

1.00

0.91

0.83

0.89

0.76

0.67

0.91

0.87

0.83

0.91

0.87

0.83

conference-confOf

0.87

0.87

0.87

0.85

0.79

0.73

0.90

0.72

0.60

0.71

0.75

0.80

0.86

0.83

0.80

conference-edas

0.73

0.69

0.65

0.85

0.73

0.65

0.75

0.62

0.53

0.79

0.71

0.65

0.67

0.74

0.82

conference-ekaw

0.78

0.75

0.72

0.6

0.53

0.48

0.62

0.42

0.32

0.58

0.59

0.60

0.68

0.72

0.76

conference-iasted

0.83

0.50

0.36

0.88

0.64

0.50

0.8

0.42

0.29

0.62

0.45

0.36

0.86

0.57

0.43

conference-sigkdd

0.85

0.79

0.73

0.85

0.79

0.73

0.8

0.64

0.53

0.82

0.69

0.60

0.80

0.80

0.80

confOf-edas

0.92

0.71

0.58

0.77

0.62

0.53

0.58

0.58

0.58

0.91

0.67

0.53

0.82

0.78

0.74

confOf-ekaw

0.94

0.86

0.80

0.93

0.80

0.70

0.77

0.61

0.50

0.76

0.78

0.80

0.85

0.85

0.85

confOf-iasted

0.80

0.57

0.44

1.00

0.62

0.44

1.00

0.62

0.44

0.43

0.52

0.67

0.71

0.63

0.56

confOf-sigkdd

1.00

0.92

0.86

1.00

0.83

0.71

1.00

0.73

0.57

0.8

0.67

0.57

0.83

0.77

0.71

edas-ekaw

0.79

0.59

0.48

0.75

0.62

0.52

0.59

0.50

0.43

0.75

0.62

0.52

0.76

0.73

0.70

edas-iasted

0.82

0.60

0.47

0.88

0.52

0.37

0.88

0.52

0.37

0.57

0.48

0.42

0.69

0.56

0.47

edas-sigkdd

1.00

0.80

0.67

0.88

0.61

0.47

0.88

0.61

0.47

1.00

0.64

0.47

0.80

0.64

0.53

ekaw-iasted

0.88

0.78

0.70

0.78

0.74

0.70

0.60

0.60

0.60

0.58

0.64

0.70

0.70

0.70

0.70

Jou

rna

P

ekaw-sigkdd

0.80

0.76

0.73

0.88

0.74

0.64

0.88

0.74

0.64

0.78

0.70

0.64

0.89

0.80

0.73

iasted-sigkdd

0.81

0.84

0.87

0.78

0.85

0.93

0.73

0.73

0.73

0.68

0.76

0.87

0.71

0.75

0.80

Average value

0.85

0.74

0.68

0.84

0.69

0.62

0.76

0.61

0.53

0.71

0.64

0.61

0.76

0.73

0.72

lP
repro
of

Journal Pre-proof

Table 13B.Comparison of GSOOM approach with the state-of-the-art ontology matching systems.

Datasets
cmt-conference
cmt-confOf
cmt-edas
cmt-ekaw
cmt-iasted
cmt-sigkdd
conference-confOf
conference-edas
conference-ekaw
conference-iasted
conference-sigkdd
confOf-edas
confOf-ekaw
confOf-iasted
confOf-sigkdd
edas-ekaw
edas-iasted
edas-sigkdd
ekaw-iasted
ekaw-sigkdd
iasted-sigkdd
Average value

P
0.53
0.80
0.64
0.55
0.57
0.70
0.67
0.41
0.62
0.67
0.71
0.69
0.79
0.46
0.17
0.67
0.50
0.62
0.50
0.50
0.56
0.59

Lily
F
0.56
0.38
0.58
0.55
0.73
0.64
0.59
0.41
0.63
0.52
0.69
0.56
0.77
0.55
0.15
0.59
0.42
0.43
0.62
0.48
0.61
0.55

R
0.60
0.25
0.54
0.55
1.00
0.58
0.53
0.41
0.64
0.43
0.67
0.47
0.75
0.67
0.14
0.52
0.37
0.33
0.8
0.45
0.67
0.54

7 Conclusions and future work

P
0.70
0.67
0.89
0.70
0.80
0.75
0.67
0.69
0.67
0.67
0.77
0.56
0.87
0.67
1.00
0.75
0.75
0.88
0.64
0.90
0.79
0.75

CODI
F
0.56
0.48
0.73
0.67
0.89
0.75
0.67
0.6
0.50
0.40
0.71
0.51
0.74
0.67
0.92
0.62
0.58
0.61
0.67
0.86
0.76
0.66

R
0.47
0.38
0.62
0.64
1.00
0.75
0.67
0.53
0.40
0.29
0.67
0.47
0.65
0.67
0.86
0.52
0.47
0.47
0.70
0.82
0.73
0.61

Jou

rna

In this work, we propose a new ontology
matching approach, called GSOOM, which exploits
the grasshopper optimization algorithm to discover
the correspondences between two given ontologies
by optimizing the weights of combining matchers.
The problem of ontology alignment was modeled as
an optimization problem. A fitness function was
proposed by introducing the number of
corresponding and average similarity of candidate
solutions. The grasshopper algorithm was then
adapted to optimize the fitness function and the final
alignment was obtained after reaching the
termination condition. GSOOM has shown a good
performance in discovering the pairs of entities of
two given ontologies. Furthermore, it is also strong
scalability and practicality in comparison to other
ontology matching methods.
However, there are multiple directions on
which GSOOM can be improved. We plan to
investigate the scalability of the algorithm to be able
to match large-scale ontologies. Another very
important direction would be to study more broader
similarity measures to enhance the performance of

P
0.19
0.31
0.42
0.14
0.27
0.33
0.33
0.25
0.48
0.22
0.47
0.42
0.57
0.35
0.33
0.42
0.28
0.38
0.17
0.45
0.26
0.34

AROMA
F
0.24
0.28
0.40
0.16
0.42
0.37
0.41
0.34
0.52
0.25
0.50
0.49
0.60
0.46
0.38
0.43
0.33
0.36
0.26
0.45
0.35
0.38

R
0.33
0.25
0.38
0.18
1.00
0.42
0.53
0.53
0.56
0.29
0.53
0.58
0.65
0.67
0.43
0.43
0.42
0.33
0.50
0.45
0.53
0.48

P
0.08
0.08
0.14
0.08
0.03
0.12
0.15
0.07
0.13
0.06
0.15
0.10
0.22
0.04
0.03
0.05
0.01
0.07
0.03
0.03
0.11
0.08

MapPSO
F
0.13
0.12
0.23
0.14
0.06
0.19
0.24
0.12
0.21
0.11
0.26
0.15
0.31
0.07
0.06
0.08
0.03
0.11
0.05
0.05
0.18
0.14

R
0.40
0.25
0.69
0.45
0.50
0.50
0.60
0.41
0.48
0.43
0.73
0.32
0.55
0.22
0.29
0.22
0.11
0.33
0.30
0.18
0.47
0.40

P
0.64
0.80
0.73
0.58
0.67
0.91
0.86
0.67
0.68
0.86
0.80
0.82
0.85
0.71
0.83
0.76
0.69
0.80
0.70
0.89
0.71
0.76

GSOOM
F
0.76
0.62
0.79
0.61
0.80
0.87
0.83
0.74
0.72
0.57
0.80
0.78
0.85
0.63
0.77
0.73
0.56
0.64
0.70
0.80
0.75
0.73

R
0.93
0.50
0.85
0.64
1.00
0.83
0.80
0.82
0.76
0.43
0.80
0.74
0.85
0.56
0.71
0.70
0.47
0.53
0.70
0.73
0.80
0.72

matching. At the same time, we will improve the
computation time complexity of each matcher such
as WordNet matcher. Yet another way of improving
GSOOM is to use external knowledge as the
ontology, which may further improve the
performance on the OAEI track.

Acknowledgements
The authors are grateful to Editor in Chief, Dr. Jie
Lu, for her assistance and would like to thank the
anonymous reviewers for their helpful comments.
This work was financially supported by the
National Key Research and Development Plan of
China under Grant No. 2017YFB0503702.

References
[1] P. Ochieng, S. Kyanda, Large-Scale Ontology
Matching: State-of-the-Art Analysis. ACM Comput.
Surv. 51(4) (2018) 75:1-75:35.
[2] H. Karimi and A. Kamandi, "Ontology alignment
using inductive logic programming", in: 4th

Journal Pre-proof

[17] J. Luo, H. Chen, Q. Zhang, Y. Xu, H. Huang and X.
Zhao. An improved grasshopper optimization
algorithm with application to financial stress
prediction. Applied Mathematical Modelling.
64(2018)654-668.
[18] A. Zakeri, A. Hokmabadi, Efficient feature
selection method using real-valued grasshopper
optimization algorithm. Expert Syst. Appl.
119(2019)61-72.
[19] J. Silva, K. Revoredo, F.A. Baiao, ALIN Results for
OAEI 2018.in: Proceedings of the 13th International
Workshop on Ontology Matching, 2018.
[20] J.M. Destro, G.O. Santos, J.C. Reis, R.S. Torres,
A.M. Carvalho, I.L. Marques, EVOCROS: Results
for OAEI 2018. In: Proceedings of the 13th
International Workshop on Ontology Matching,
2018.
[21] Y. Tang, P. Wang, Z. Pan, H. Liu, Lily Results for
OAEI 2018. In: Proceedings of the 13th
International Workshop on Ontology Matching,
2018.
[22] W.E. Djeddi, S.B. Yahia, M.T. Khadir, XMap:
Results for OAEI 2018. In: Proceedings of the 13th
International Workshop on Ontology Matching,
2018.
[23] J. Wang, Z. Ding, C. Jiang. Gaom: genetic
algorithm-based ontology matching. In: IEEE
Asia-Pacific conference on services computing,
2006, APSCC’06, IEEE, pp. 617–620.
[24] J. Bock, J. Hettenhausen, Discrete particle swarm
optimisation for ontology alignment. Inf. Sci.
192(2012)152–173.
[25] G. Ryma, M.K. Kholladi, Genetic Algorithm with
Hill Climbing for Correspondences Discovery in
Ontology Mapping. Journal of Information
Technology Research.12 (2019)153–169.
[26] J. Martinez-Gil, E. Alba, J.F.A. Montes. Optimizing
ontology alignments by using genetic algorithms. In
International Semantic Web Conference, 2008,
workshop on ontology matching (OM)2008, pp.
1-15.
[27] A.L. Ginsca, A. Iftene, Using a Genetic Algorithm
for Optimizing the Similarity Aggregation Step in
the Process of Ontology Alignment. 2010.
[28] M.M. Romero, J.M Vazquez-Naya, F.J. Novoa, G.
Vazquez. A Genetic Algorithms-Based Approach for
Optimizing Similarity Aggregation in Ontology
Matching. 7902 (2013)435-444.
[29] G. Acampora, P. Avella, V. Loia, S. Salerno, A.
Vitiello. Improving Ontology Alignment through
Memetic Algorithms. IEEE International Conference
on Fuzzy Systems June 27-30, 2011.
[30] X. Xue, Y. Wang. Optimizing ontology alignment
through a Memetic Algorithm using both
MatchFmeasure and Unanimous Improvement Ratio.
Artificial Intelligence, 223(2015)65-81.
[31] X. Xue, J. Chen. Optimizing ontology alignment
through hybrid population-based incremental
learning algorithm. Memetic Computing, 11(2):
209-217 (2019)

Jou

rna

lP
repro
of

International Conference on Web Research (ICWR),
Tehran, 2018, pp. 118-127.
[3] S. Babalou, M. J. Kargar and S. H. Davarpanah,
"Large-scale ontology matching: A review of the
literature," 2016 Second International Conference on
Web Research (ICWR), Tehran, 2016, pp. 158-165.
[4] M. Buron, F. Goasdoué, I. Manolescu, M.L. Mugnier,
Ontology-Based RDF Integration of Heterogeneous
Data, in: International Conference on Extending
Database Technology, EDBT 2020, Copenhagen,
Denmark, March 30 - April 02,2020, pp.299-310.
[5] T. Hoppe, J.A. Qundus, S. Peikert, Ontology-based
Entity Recognition and Annotation, in: The
Conference on Digital Curation Technologies
(Qurator 2020), Berlin, Germany, January 20-21,
2020, pp.2-9.
[6] C. Oscar, P. Freddy, C.F. David. Towards a new
generation of ontology based data access,Semantic
Web.11(1) (2020)153-160.
[7] F. Ghorbel, F. Hamdi, E. Métais, N. Ellouze, F.
Gargouri, Ontology-based representation and
reasoning about precise and imprecise temporal data:
A fuzzy-based view. Data Knowl. Eng. 124
(2019)1-26.
[8] E. Kharlamov,D. Hovland, M.G. Skjæveland, D.
Bilidas, J.R.Ernesto , G. Xiao, A.Soylu, D.Lanti,
M.Rezk, D.Zheleznyakov, M. Giese, H.Lie, Y.E.
Ioannidis, Y.Kotidis, M. Koubarakis, A. Waaler,
Ontology Based Data Access in Statoil. J. Web
Semant. 44(2017)3-36.
[9] M. Guia, R.R. Silva, J. Bernardino, A Hybrid
Ontology-Based Recommendation System in
e-Commerce. Algorithms 12(11) (2019)239.
[10] A. Annane, Z. Bellahsene, F. Azouaou, C. Jonquet.
Building an effective and efficient background
knowledge resource to enhance ontology matching. J.
Web Semant. 51(2018)51-68.
[11] L. Asprino, V. Presutti, A. Gangemi, P. Ciancarini,
Frame-Based Ontology Alignment. Proceedings of
the Thirty-First AAAI Conference on Artificial
Intelligence (AAAI), 2017.
[12] W. Deng, J. Xu and H. Zhao, An Improved Ant
Colony Optimization Algorithm Based on Hybrid
Strategies for Scheduling Problem, in IEEE Access,
(7) (2019)20281-20292.
[13] Deng, W., Zhao, H., Zou, L. et al. A novel
collaborative optimization algorithm in solving
complex optimization problems. Soft Comput 21
(2017)4387–4398
[14] S. Saremi, S. Mirjalili, A. Lewis,Grasshopper
optimisation algorithm: Theory and application.
Advances in Engineering Software, 2017, pp.30-47.
[15] A.K. Bhandari, K. Rahul. A novel local contrast
fusion-based fuzzy model for color image multilevel
thresholding using grasshopper optimization. Appl.
Soft Comput. 81 (2019)
[16] H. Hichem, M. Elkamel, M. Rafik, M.T. Mesaaoud
and C. Ouahiba. A new binary grasshopper
optimization algorithm for feature selection problem.
Journal of King Saud University. 2019.

Journal Pre-proof

Jou

rna

lP
repro
of

[32] M. Biniz, R. El Ayachi. Optimizing Ontology
Alignments by Using Neural NSGA-II. Journal of
Electronic Commerce in Organizations, vol (16)
(2018) 29–42.
[33] Z. Feng, W. Li and X. Li. Ontology Engineering
and Its Application, Tsinghua University Press,
China, First Edition, 2007.
[34] J. Euzenat, P. Shvaiko, Ontology Matching.
Springer-Verlag Berlin Heidelberg, 2013.
[35] S. Bashath and A. R. Ismail, Improved Particle
Swarm Optimization by Fast Simulated Annealing
Algorithm, International Conference of Artificial
Intelligence and Information Technology (ICAIIT),
Yogyakarta, Indonesia, 2019, pp. 297-301.
[36] G. Stoilos, G. Stamou, S. Kollias, A string metric
for ontology alignment. International Semantic Web
Conference,2005,pp.625–635.
[37] G. Salton, A. Wong, C.S. Yang, a Vector Space
Model for Automatic Indexing. Communications of
the ACM 18 (11) (1975) 613–620.
[38] G.A. Miller, WordNet: a lexical database for
English. Commun ACM. 38(11) (1995)39–41.
[39] S. Khan, M. Safyay, Semantic matching in
hierarchical ontologies. Journal of King Saud
University (2014)26,247-257.
[40] M.G. Jorge, A.M. J, An overview of current
ontology meta-matching solutions. The knowledge
engineering review, (2010)1-24.
[41] R. Forsati, M. Shamsfard. Symbiosis of
evolutionary and combinatorial ontology mapping
approaches. Inf. Sci.342 (2016)53–80.
[42] C.v. Rijsbergen, Information retrieval, Department
[43] P. Wang, W. Wang. Lily Results for OAEI 2016. In:
Proceedings of the 11th International Workshop on
Ontology Matching, 2016.
[44] Ontology Alignment Evaluation Initiative,
http://oaei.ontologymatching.org/2019/conference/in
dex.html, OAEI,2019.
[45] Ontology Alignment Evaluation Initiative,
http://oaei.ontologymatching.org/2016/ benchmarks
/index.html, OAEI,2016.

Journal Pre-proof

lP
repro
of

Appendix

Table Aa. The quality of alignment for the GSOOM algorithm on benchmark data sets
GSOOM

ID

Precision

Recall

F-measure

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

0.55

0.42

0.48

0.15

0.11

0.13

1.00

1.00

1.00

1.00

1.00

1.00

0.98

0.85

0.91

0.90

0.87

0.88

0.91

0.81

0.86

0.99

0.99

0.99

0.40

0.36

0.38

0.60

0.34

0.43

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

0.97

1.00

0.98

0.97

1.00

0.99

241

1.00

1.00

1.00

246

0.97

1.00

0.98

247

0.93

0.85

0.89

248

0.05

0.04

0.05

249

0.17

0.12

0.14

250

0.15

0.15

0.15

251

0.14

0.11

0.12

252

0.12

0.09

0.11

253

0.12

0.08

0.10

254

0.06

0.06

0.06

257

0.15

0.15

0.15

258

0.19

0.14

0.16

259

0.10

0.08

0.09

260

0.15

0.17

0.16

261

0.09

0.09

0.09

262

0.09

0.09

0.09

265

0.15

0.17

0.16

266

0.09

0.09

0.09

101
103
104
201
202
203
204
205
206
207
208
209
210
221
222
223
224
225
228
230
231
232
233
236
237
238
239

Jou

rna

240

Journal Pre-proof

Table Ab. The quality of alignment for GSOOM algorithm on real world data sets

301
302
303
304

lP
repro
of

GSOOM

ID

Precision

Recall

F-measure

0.59

0.59

0.59

0.40

0.46

0.42

0.40

0.42

0.41

0.97

0.91

0.94

Table B. The precision (P), recall(R) and F-measure (F) of the other methods

Test
cases

AML

Lily

LogMap

LogMapLt

XMap

F

R

P

F

R

P

F

R

P

F

R

P

F

R

101

1.00

0.00

0.00

1.00

1.00

1.00

0.94

0.95

0.96

0.56

0.71

0.99

0.94

0.97

1.00

201

1.00

0.00

0.00

1.00

0.99

0.99

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

202

1.00

0.00

0.00

0.98

0.89

0.81

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

221

1.00

0.51

0.34

1.00

1.00

1.00

0.94

0.94

0.95

0.56

0.72

0.99

0.94

0.97

1.00

222

1.00

0.5

0.33

1.00

1.00

1.00

0.00

0.00

0.00

0.57

0.72

0.99

0.94

0.78

0.67

223

1.00

0.51

0.34

1.00

1.00

1.00

0.94

0.94

0.95

0.56

0.72

0.99

0.94

0.97

1.00

224

1.00

0.51

0.34

1.00

1.00

1.00

0.94

0.94

0.95

0.83

0.9

0.99

0.94

0.97

1.00

225

1.00

0.51

0.34

1.00

1.00

1.00

0.94

0.95

0.96

0.56

0.72

0.99

0.94

0.97

1.00

228

1.00

1.00

1.00

1.00

1.00

1.00

0.85

0.92

1.00

0.32

0.48

1.00

1.00

1.00

1.00

232

1.00

0.51

0.34

1.00

1.00

1.00

0.94

0.94

0.95

0.83

0.90

0.99

0.94

0.97

1.00

233

1.00

1.00

1.00

1.00

1.00

1.00

0.85

0.92

1.00

0.32

0.48

1.00

1.00

1.00

1.00

236

1.00

1.00

1.00

1.00

1.00

1.00

0.85

0.92

1.00

0.67

0.80

1.00

1.00

1.00

1.00

237

1.00

0.50

0.33

1.00

1.00

1.00

0.00

0.00

0.00

0.84

0.91

0.99

0.94

0.78

0.67

238

1.00

0.51

0.34

1.00

1.00

1.00

0.94

0.95

0.96

0.83

0.90

0.99

0.94

0.97

1.00

239

1.00

1.00

1.00

1.00

1.00

1.00

0.85

0.92

1.00

0.32

0.48

1.00

1.00

1.00

1.00

240

1.00

1.00

1.00

1.00

1.00

1.00

0.85

0.92

1.00

0.32

0.48

1.00

1.00

1.00

1.00

241

1.00

1.00

1.00

1.00

1.00

1.00

0.85

0.92

1.00

0.67

0.80

1.00

1.00

1.00

1.00

246

1.00

1.00

1.00

1.00

1.00

1.00

0.85

0.92

1.00

0.67

0.80

1.00

1.00

1.00

1.00

247

1.00

1.00

1.00

1.00

1.00

1.00

0.85

0.92

1.00

0.67

0.80

1.00

1.00

1.00

1.00

248

1.00

0.00

0.00

0.98

0.89

0.81

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

249

1.00

0.00

0.00

0.86

0.73

0.64

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

250

1.00

0.00

0.00

1.00

0.57

0.39

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

251

1.00

0.00

0.00

0.96

0.86

0.78

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

252

1.00

0.00

0.00

1.00

0.91

0.84

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

253

1.00

0.00

0.00

0.87

0.78

0.7

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

254

1.00

0.00

0.00

1.00

0.6

0.42

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

257

1.00

0.00

0.00

1.00

0.22

0.12

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

258

1.00

0.00

0.00

0.71

0.62

0.54

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

259

1.00

0.00

0.00

0.86

0.72

0.62

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

Jou

rna

P

260

1.00

0.00

0.00

1.00

0.58

0.41

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

261

1.00

0.00

0.00

1.00

0.26

0.15

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

262

1.00

0.00

0.00

0.75

0.16

0.09

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

265

1.00

0.00

0.00

1.00

0.06

0.03

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

266

1.00

0.00

0.00

0.33

0.06

0.03

1.00

0.00

0.00

0.00

0.00

0.00

1.00

0.00

0.00

Journal Pre-proof

Table C. The precision (P), recall(R) and F-measure (F) of the other methods
AROMA

CODI

R

F

101

1.00

0.97

0.98

103

1.00

0.98

0.99

104
201

1.00
1.00

0.98
0.90

0.99
0.95

202

0.00

0.00

0.00

203

1.00

0.66

0.80

204

1.00

0.95

0.97

205

1.00

0.9

0.95

206

1.00

0.91

0.95

207

1.00

0.91

0.95

208

0.95

0.42

0.58

209

0.73

0.25

0.37

210

0.91

0.10

0.18

221

1.00

0.98

0.99

222

1.00

0.98

0.99

223

0.95

0.92

0.93

224

1.00

0.95

0.97

225

1.00

0.98

0.99

228

1.00

1.00

1.00

230

0.93

0.93

0.93

231

1.00

0.97

0.98

232

1.00

0.94

0.97

233

1.00

1.00

1.00

236

1.00

1.00

1.00

237

1.00

0.94

0.97

238

0.94

0.91

0.92

239

0.97

1.00

0.98

240

0.82

0.85

0.83

241

1.00

0.97

0.98

246

0.97

0.97

0.97

247

0.79

0.82

0.80

248

0.00

249

1.00

250

0.00

251

0.00

252

0.00

253

1.00

254

0.00

257

0.00

258

1.00

259

1.00

TaxoMap

MapPSO

P

R

F

P

R

F

P

R

F

1.00

1.00

1.00

1.00

0.34

0.51

1.00

1.00

1.00

1.00

1.00

1.00

1.00

0.34

0.51

1.00

1.00

1.00

1.00
0.88

0.98
0.07

0.99
0.13

1.00
1.00

0.34
0.34

0.51
0.51

1.00
0.47

1.00
0.38

1.00
0.42

0.00

0.00

0.00

1.00

0.01

0.02

0.06

0.05

0.05

1.00

0.76

0.86

0.97

0.33

0.49

1.00

1.00

1.00

0.79

0.69

0.74

1.00

0.34

0.51

0.98

0.98

0.98

0.68

0.18

0.28

1.00

0.34

0.51

0.82

0.66

0.73

0.81

0.26

0.39

1.00

0.34

0.51

0.9

0.80

0.85

0.87

0.28

0.42

1.00

0.34

0.51

0.87

0.75

0.81

0.75

0.51

0.61

0.88

0.29

0.44

0.85

0.74

0.79

0.68

0.13

0.22

0.53

0.08

0.14

0.18

0.15

0.16

0.82

0.14

0.24

0.41

0.09

0.15

0.35

0.29

0.32

1.00

0.96

0.98

1.00

0.34

0.51

1.00

1.00

1.00

1.00

1.00

1.00

0.88

0.31

0.46

1.00

1.00

1.00

1.00

1.00

1.00

0.88

0.3

0.45

0.98

0.98

0.98

1.00

1.00

1.00

1.00

0.34

0.51

1.00

1.00

1.00

1.00

0.99

0.99

1.00

0.34

0.51

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

0.97

0.98

0.83

0.35

0.49

0.97

1.00

0.98

1.00

1.00

1.00

1.00

0.34

0.51

1.00

1.00

1.00

1.00

0.94

0.97

1.00

0.34

0.51

1.00

1.00

1.00

1.00

0.88

0.94

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

1.00

0.99

0.99

0.88

0.31

0.46

0.99

1.00

0.99

0.99

0.99

0.99

0.88

0.30

0.45

0.97

0.97

0.97

0.97

1.00

0.98

0.88

1.00

0.94

0.97

1.00

0.98

0.94

0.97

0.95

0.88

0.88

0.88

0.91

0.94

0.92

1.00

0.88

0.94

1.00

1.00

1.00

1.00

1.00

1.00

0.97

1.00

0.98

0.88

1.00

0.94

0.97

1.00

0.98

0.97

1.00

0.98

0.88

0.88

0.88

0.88

0.91

0.89

0.00

0.00

0.00

0.00

0.00

1.00

0.01

0.02

0.06

0.05

0.05

0.01

0.02

1.00

0.01

0.02

1.00

0.01

0.02

0.06

0.05

0.05

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.04

0.03

0.03

0.00

0.00

0.00

0.00

0.00

1.00

0.01

0.02

0.07

0.05

0.06

0.00

0.00

0.00

0.00

0.00

1.00

0.01

0.02

0.02

0.02

0.02

0.01

0.02

1.00

0.01

0.02

1.00

0.01

0.02

0.08

0.06

0.07

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.04

0.03

0.03

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.08

0.06

0.07

0.01

0.02

1.00

0.01

0.02

1.00

0.01

0.02

0.08

0.06

0.07

0.01

0.02

1.00

0.01

0.02

1.00

0.01

0.02

0.05

0.04

0.04

Jou

rna

P

lP
repro
of

Test cases

260

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.14

0.10

0.12

261

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.04

0.03

0.03

262

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.05

0.03

0.04

265

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.05

0.03

0.04

266

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.00

0.07

0.06

0.06

301

0.86

0.64

0.73

0.93

0.24

0.38

0.69

0.31

0.43

0.68

0.61

0.64

302

0.73

0.23

0.35

1.00

0.42

0.59

0.76

0.27

0.40

0.5

0.02

0.04

303

0.69

0.52

0.59

0.92

0.50

0.65

0.48

0.29

0.36

0.00

0.00

0.00

304

0.91

0.78

0.84

0.94

0.61

0.74

0.9

0.37

0.52

0.76

0.68

0.72

Average

0.589

0.54

0.39

0.58

Author contributions

lP
repro
of

Journal Pre-proof

Zhaoming Lv and Rong Peng conceived of the presented idea.

Zhaoming Lv carried out the experiment and wrote the original manuscript.

Jou

rna

Zhaoming Lv revised the manuscript with help from Rong Peng directed.All authors
discussed the results and contributed to the final manuscript.

Journal Pre-proof

Jou

rna

lP
repro
of

Conflict of interest statement
We declare that we do not have any commercial or associative interest that represents a conflict of
interest in connection with the work submitted.