
VerbNet.Br database available for download [ZIP]

VerbNet.Br search tool: VerbNet.Br 1.0

Gold Standard [TXT]

Disclaimer: VerbNet.Br is licensed under a Creative Commons Attribution 4.0 International License. This means you can distribute, remix, tweak, and build upon VerbNet.Br, even commercially, as long as you give us the credit for the original creation. VerbNet.Br license [PDF].

The construction of the VerbNet.Br is Carolina Evaristo Scarton master’s degree under supervision of Sandra Maria Aluísio. The project is being developed in the Center of Computational Linguistics (NILC) at the Universidade de São Paulo (USP). The research has financial support of FAPESP (São Paulo Research Foundation) under process number 2010/03785-0.

VerbNet.Br: semi-automatic construction of a domain independent verbal lexicon of Brazilian Portuguese

Abstract: The development of basic linguistic resources, as lexicons, is a priority for Natural Language Processing (NLP), because it is important to many tasks: describing actions for a simulated enviroment (Allbeck et al., 2002); building semantic parsers (Shi and Mihalcea, 2005); improving word sense disambiguation (Girju et al., 2005) and others. However, the major part of existing lexical resources is specific of English language. VerbNet (Kipper, 2005) is one of the lexical resources developed for English. It is a domain independent lexicon that provides semantic and syntactic information about English verbs. VerbNet is based in Levin’s verb classes (Levin, 1993) and has mappings to Princeton WordNet (WordNet.Pr) (Fellbaum, 1998). There are few computational studies based on Levin classes for languages other than English and, for Portuguese, the cenario is not different. There are only some linguistics studies (Cançado, 1996; Ávila, 2006; Ciríaco, 2007; Moraes, 2008; Godoy, 2009; Amaral, 2010) that are not available in a computational format. To fill this gap, this research aims to create VerbNet-Br, a lexical resource for Brazilian Portuguese with the same characteristics of VerbNet. It is very expensive and time consuming to build manually such kind of resource. For this reason, there is an increasing interest in doing it through computational techniques. One of these techniques is machine learning on a training corpus (Merlo et al., 2002; Joanis and Stevenson, 2003; Ferrer, 2004; Kipper et al., 2006; Schulte im Walde, 2006; Sun et al., 2008; Sun and Korhonen, 2009; Sun et al., 2010; Sun and Korhonen, 2011) and another technique is reusing resources developed in another language (English) to build a new aligned resource, taking profit of the cross-linguistic potential of the Levin classes (Jackendoff, 1980; Merlo et al., 2002; Sun et al., 2010). The later technique has been adopted in this research. We are using the mappings between VerbNet and WordNet.Pr and the alignements between WordNet.Pr and the Brazilian WordNet (WordNet.Br) (Dias-da-Silva et al., 2002; Dias-da-Silva, 2005; Dias-da-Silva et al., 2008; Dias-da-Silva, 2010). The method used to build VerbNet-Br is not language-specific, that is, it may be employed to build similar resources in languages other than Brazilian Portuguese. This method comprises four steps, being three of them automatic and one manual. The first step (manual) consists in translating the diathesis alternations from English into Portuguese. The second step (automatic) was the search of the diathesis alternations of Brazilian Portuguese verbs, by using Brazilian Portuguese corpora (Lácio-Ref (Aluísio et al., 2004); PLN-BR-FULL (Bruckschen et al., 2008) e Revista FAPESP (Aziz and Specia, 2011)) and the tool developed by Zanette (2010). The third step (automatic) defined the candidate members of the VerbNet.Br classes by using existing resources (VerbNet, WordNet.Pr and WordNet.Br). Finally, the fourth step (automatic) will combine the three previous steps in order to select the verbs for VerbNet.Br. The evaluation of the results will be made intrinsically and extrinsically. Intrinsic evaluation includes quantitative and qualitative measures. The qualitative evaluation consists of (a) manually analyzing some classes of VerbNet, translating them into Portuguese to build a golden standard; and (b) comparing the golden standard to the results of the clustering method proposed in this research. The quantitative evaluation will consider the success rate of VerbNet-Br class members. In what concerns extrinsic evaluation, we will use VerbNet-Br to develop new metrics for the Coh-Metrix-Port tool (Scarton et al., 2009; Scarton and Aluisio, 2010; Scarton et al., 2010).



