DART 2012 Abstracts


Full Papers
Paper Nr: 2
Title:

A New Structure-based Similarity Measure for Automatic Ontology Matching

Authors:

Thi Thuy Anh Nguyen and Stefan Conrad

Abstract: Various ontology matching solutions have been proposed so far. In this paper, we present a method to match two ontologies using a basic lexical similarity measure (edit-distance) in order to obtain initial mappings and a new structure-based similarity measure as well as to find correspondences among the concepts of the given ontologies. The structural measure allows us to determine the similarity value of two concepts based on the lexical similarity of these concept labels and similarities of their descendants on which their levels in two graphs can be different. In addition, the efficiency of the structure-based matching method is estimated by using a set of centroid concepts. We evaluate the proposed method on I3CON 2004 benchmark. The results show that our method has some prominent features for ontology matching.

Paper Nr: 3
Title:

Visitors and Contributors in Wikipedia

Authors:

Antonio J. Reinoso, Juan Ortega-Valiente, Rocio Muñoz-Mansilla and Gabriel Pastor

Abstract: Wikipedia continues to provide the community with a vast collection of articles that cover almost all the different areas of knowledge. The on-line encyclopedia is built upon altruistic contributions from individuals and organizations, which constitutes an absolutely new approach in knowledge compilation and distribution. The progression of Wikipedia to a mass phenomenon has promoted several initiatives devoted to deal with different issues concerning it, specially the quality of its contents and the authoring of its contributions. However, very few attention has been paid to aspects related to users’ attitudes and behavior when browsing the information it offers. For this reason, this paper aims to find patterns that can describe how users interact and behave when visiting the Wikipedia’s pages. We will consider the two most common forms of interaction between Wikipedia and its users: visits and contributions (edits). From these observations we will be used obtain different metrics, such as the degree of participation or the reluctance exhibited by users which, in addition, will be used to perform a comparison amongst the different Wikipedia editions. Our study is based on a sample of the requests that users submit to Wikipedia, which we receive in the form of log lines. Its results can help to better understand the nature of the relationship between Wikipedia and its users, and to properly characterize the different interactions between them.

Paper Nr: 4
Title:

Geometric Encoding of Sentences based on Clifford Algebra

Authors:

Agnese Augello, Manuel Gentile, Giovanni Pilato and Giorgio Vassallo

Abstract: Natural language sentences can be represented as vectors in a high dimensional vector space. Generally, these models are based on bag of words approaches, and therefore they do not fully capture the semantics of sentences which depends both by the semantics of the words, and their order in in the phrase. In this work we propose a sub-symbolic methodology to encode natural language sentences considering both these two aspects. The proposed approach exploits the properties of Geometric Algebra rotation operators, called rotors, to code sentences through the rotation of an orthogonal basis of a semantic space. The methodology is based on three main steps: the construction of a semantic space, the association of ad-hoc rotors to sentence bigrams, and finally the coding of the sentence through the application of the obtained rotors to a standard basis in the semantic space.

Paper Nr: 7
Title:

Predicate Argument Structures for Information Extraction from Dependency Representations - Null Elements are Missing

Authors:

Rodolfo Delmonte

Abstract: State of the art parsers are currently trained on converted versions of Penn Treebank into dependency representations which however don’t include null elements. This is done to facilitate structural learning and prevent the probabilistic engine to postulate the existence of deprecated null elements everywhere (see R. Gaizauskas, 1995). However it is a fact that in this way, the semantics of the representation used and produced on runtime is inconsistent and will reduce dramatically its usefulness in real life applications like Information Extraction, Q/A and other semantically driven fields by hampering the mapping of a complete logical form. What systems have come up with are “Quasi”-logical forms or partial logical forms mapped directly from the surface representation in dependency structure. We show the most common problems derived from the conversion and then describe an algorithm that we have implemented to apply to our converted Italian Treebank, that can be used on any CONLL-style treebank or representation to produce an “almost complete” semantically consistent dependency treebank.