VerbNet.Br

VerbNet.Br database available for download [ZIP]

VerbNet.Br search tool: VerbNet.Br 1.0

Gold Standard [TXT]

Disclaimer: VerbNet.Br is licensed under a Creative Commons Attribution 4.0 International License. This means you can distribute, remix, tweak, and build upon VerbNet.Br, even commercially, as long as you give us the credit for the original creation. VerbNet.Br license [PDF].

The construction of the VerbNet.Br is Carolina Evaristo Scarton master’s degree under supervision of Sandra Maria Aluísio. The project is being developed in the Center of Computational Linguistics (NILC) at the Universidade de São Paulo (USP). The research has financial support of FAPESP (São Paulo Research Foundation) under process number 2010/03785-0.

VerbNet.Br: semi-automatic construction of a domain independent verbal lexicon of Brazilian Portuguese

Abstract: The development of basic linguistic resources, as lexicons, is a priority for Natural Language Processing (NLP), because it is important to many tasks: describing actions for a simulated enviroment (Allbeck et al., 2002); building semantic parsers (Shi and Mihalcea, 2005); improving word sense disambiguation (Girju et al., 2005) and others. However, the major part of existing lexical resources is specific of English language. VerbNet (Kipper, 2005) is one of the lexical resources developed for English. It is a domain independent lexicon that provides semantic and syntactic information about English verbs. VerbNet is based in Levin’s verb classes (Levin, 1993) and has mappings to Princeton WordNet (WordNet.Pr) (Fellbaum, 1998). There are few computational studies based on Levin classes for languages other than English and, for Portuguese, the cenario is not different. There are only some linguistics studies (Cançado, 1996; Ávila, 2006; Ciríaco, 2007; Moraes, 2008; Godoy, 2009; Amaral, 2010) that are not available in a computational format. To fill this gap, this research aims to create VerbNet-Br, a lexical resource for Brazilian Portuguese with the same characteristics of VerbNet. It is very expensive and time consuming to build manually such kind of resource. For this reason, there is an increasing interest in doing it through computational techniques. One of these techniques is machine learning on a training corpus (Merlo et al., 2002; Joanis and Stevenson, 2003; Ferrer, 2004; Kipper et al., 2006; Schulte im Walde, 2006; Sun et al., 2008; Sun and Korhonen, 2009; Sun et al., 2010; Sun and Korhonen, 2011) and another technique is reusing resources developed in another language (English) to build a new aligned resource, taking profit of the cross-linguistic potential of the Levin classes (Jackendoff, 1980; Merlo et al., 2002; Sun et al., 2010). The later technique has been adopted in this research. We are using the mappings between VerbNet and WordNet.Pr and the alignements between WordNet.Pr and the Brazilian WordNet (WordNet.Br) (Dias-da-Silva et al., 2002; Dias-da-Silva, 2005; Dias-da-Silva et al., 2008; Dias-da-Silva, 2010). The method used to build VerbNet-Br is not language-specific, that is, it may be employed to build similar resources in languages other than Brazilian Portuguese. This method comprises four steps, being three of them automatic and one manual. The first step (manual) consists in translating the diathesis alternations from English into Portuguese. The second step (automatic) was the search of the diathesis alternations of Brazilian Portuguese verbs, by using Brazilian Portuguese corpora (Lácio-Ref (Aluísio et al., 2004); PLN-BR-FULL (Bruckschen et al., 2008) e Revista FAPESP (Aziz and Specia, 2011)) and the tool developed by Zanette (2010). The third step (automatic) defined the candidate members of the VerbNet.Br classes by using existing resources (VerbNet, WordNet.Pr and WordNet.Br). Finally, the fourth step (automatic) will combine the three previous steps in order to select the verbs for VerbNet.Br. The evaluation of the results will be made intrinsically and extrinsically. Intrinsic evaluation includes quantitative and qualitative measures. The qualitative evaluation consists of (a) manually analyzing some classes of VerbNet, translating them into Portuguese to build a golden standard; and (b) comparing the golden standard to the results of the clustering method proposed in this research. The quantitative evaluation will consider the success rate of VerbNet-Br class members. In what concerns extrinsic evaluation, we will use VerbNet-Br to develop new metrics for the Coh-Metrix-Port tool (Scarton et al., 2009; Scarton and Aluisio, 2010; Scarton et al., 2010).

 

References

Allbeck, J., Kipper, K., Adams, C., Schuler, W., Zoubanova, E., Badler, N., Palmer, M. e Joshi, A. (2002): ACUMEN: Amplifying Control and Understanding of Multiple ENtities. In Proceedings of First International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2002). Bologna, Itália, pp. 191-198.

Aluísio, S. M., Pinheiro, G. M., Manfrim, A. M. P., Genovês Jr., L. H. M. and Tagnin, S. E. O. (2004): The Lácio-web: Corpora and Tools to Advance Brazilian Portuguese Language Investigations and Computational Linguistic Tools. In Proceedings of the 4th International Conference on Language Resources and Evaluation, pages 1779–1782, Lisbon, Portugal.

Amaral, L. L. (2010): O Verbos de Modo de Movimento no Português Brasileiro. 53f. Trabalho de Conclusão de Curso (Bacharel em Letras) – Faculdade de Letras, Universidade Federal de Minas Gerais, Belo Horizonte.

Ávila, M. C. (2006): Propriedades semânticas e alternâncias sintáticas do verbo: um exercício exploratório de delimitação do significado. 114f. Dissertação (Mestrado em Letras) – Faculdade de Ciências e Letras, Universidade Estadual Paulista, Araraquara.

Aziz, W. and Specia, L. (2011): Fully Automatic Compilation of Portuguese-English and Portuguese-Spanish Parallel Corpora. 8th Brazilian Symposium in Information and Human Language Technology (STIL-2011), Cuiaba, Brazil.

Bruckschen, M., Muniz, F., Souza, J. G. C., Fuchs, J. T., Infante, K., Muniz, M., Gonçalves, P. N., Vieira, R. e Aluísio, S. M. (2008): Anotação Lingüística em XML do Corpus PLN-BR. Série de Relatórios do NILC. NILC-TR-09-08, 39 p.

Cançado, M. (1996): Verbos Psicológicos: Análise Descritiva dos Dados do Português Brasileiro. Revista de Estudos da Linguagem, v. 4, n. 1, pp. 89-114.

Chagas de Souza, P. (2000): A Alternância Causativa no Português do Brasil: Defaults num Léxico Gerativo. 199f. Tese (Doutorado em Linguística) – Faculdade de Filosofia, Letras e Ciências Humanas, Universidade de São Paulo, São Paulo.

Ciríaco, L. S. (2007): A alternância causativo/ergativa no PB: restrições e propriedades semânticas. 114f. Dissertação (Mestrado em Linguística) – Faculdade de Letras, Universidade Federal de Minas Gerais, Belo Horizonte.

Dias-da Silva, B. C., Oliveira, M. F. d., e Moraes, H. R. d. (2002). Groundwork for the development of the brazilian portuguese wordnet. In Proceedings of the Third International Conference on Advances in Natural Language Processing. London, UK, pp. 189–196.

Dias-da Silva, B. C. (2005) A construção da base da wordnet.br: conquistas e desafios. In Proceedings of the Third Workshop in Information and Human Language Technology (TIL 2005), in conjunction with XXV Congresso da Sociedade Brasileira de Computação. São Leopoldo, RS, Brasil, pp. 2238–2247.

Dias-da-Silva, B. C., Di Felippo, A. e Nunes, M. G. V. (2008). The automatic mapping of Princeton WordNet lexicalconceptual relations onto the Brazilian Portuguese WordNet database. In Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, pp. 1535-1541.

Dias-da-Silva, B. C. (2010). Brazilian Portuguese WordNet: A Computational Linguistic Exercise of Encoding Bilingual Relational Lexicons. International Journal of Computational Linguistics and Applications, New Delhi, v.1, n. 1-2, pp. 137 - 150.

Fellbaum, C. (1998). WordNet: An electronic lexical database. MIT Press. Cambridge, Massachusetts.

Ferrer, E. E. (2004): Towards a semantic classification of Spanish verbs based on subcategorisation information. In Proceedings of the Workshop on Student research (ACLstudent 2004), in conjunction with ACL 2004. Barcelona, Espanha.

Girju, R., Roth, D. e Sammons, M. (2005): Token-level disambiguation of VerbNet classes. In Proceedings of Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes. Saarbruecken, Germany.

Godoy, L. (2009): Verbos Psicológicos: Análise Descritiva dos Dados do Português Brasileiro. ALFA – Revista de Linguística, v. 53, n. 1, pp. 283-299.

Jackendoff, R. (1990): Semantic Structures. MIT Press. Cambridge, Massachusetts.

Joanis, E. e Stevenson, S. (2003): A general feature space for automatic verb classification. In Proceedings of the 10th conference on European chapter of the Association for Computational Linguistics (EACL 2003). Budapest, Hungria, pp. 163-170.

Kipper, K. (2005): Verbnet: A broad coverage, comprehensive verb lexicon. 146f. Ph.D. Thesis (Philosophy) - University of Pennsylvania, USA.

Kipper K., Korhonen A., Ryant N. e Palmer, M. (2006): Extending VerbNet with Novel Verb Classes. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC 2006). Genoa, Itália.

Levin, B. (1993): English Verb Classes and Alternation, A Preliminary Investigation. The University of Chicago Press.

Merlo, P., Stevenson, S., Tsang, V. e Allaria, G. (2002): A multilingual paradigm for automatic verb classification. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002). Philadelphia, PA, USA, pp. 207-214.

Moraes, H. R. (2008): Aspectos sintaticamente relevantes do significado lexical: estudo dos verbos de movimento. 171f. Tese (Doutorado em Linguística e Língua Portuguesa) – Faculdade de Ciências e Letras, Universidade Estadual Paulista, Araraquara.

Scarton, C. E., Almeida, D. M. e Aluisio, S. M. (2009):  Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. In Proceedings of the The 7th Brazilian Symposium in Information and Human Language Technology (STIL 2009). São Carlos, SP, Brazil, 1 CD-ROM ISSN 2175-6201.

Scarton, C. E. e Aluísio, S. M. (2010): Análise da Inteligibilidade de textos via ferramentas de Processamento de Língua Natural: adaptando as métricas do Coh-Metrix para o Português. Revista Linguamática (Revista para o Processamento Automático das Línguas Ibéricas - ISSN: 1647-0818), v. 2, n. 1, pp. 45-61.

Scarton, C. E., Gasperin, C. e Aluisio, S. M. (2010: Revisiting the Readability Assessment of Texts in Portuguese. In Proceedings of the 12th Ibero-American Conference on Artificial Intelligence (Iberamia 2010). Bahia Blanca, Argentina, pp. 306-315.

Schulte im Walde, S. (2006): Experiments on the Automatic Induction of German Semantic Verb Classes. Computational Linguistics, v. 32, n. 2, pp. 159-194.

Shi, L. e Mihalcea, R. (2005): Putting pieces together: Combining FrameNet, VerbNet and WordNet for robust semantic parsing. In Proceedings of 6th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing 2005). Cidade do México, México, pp. 99-110.

Sun, L., Korhonen, A. e Krymolowski, Y. (2008): Verb class discovery from rich syntactic data. In Proceedings of the 9th international conference on Computational linguistics and intelligent text processing (CICLing 2008). Haifa, Israel, pp. 16-27.

Sun, L. e Korhonen, A. (2009): Improving verb clustering with automatically acquired selectional preferences. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing (EMNLP 2009). Singapura, pp. 638-647.

Sun, L., Korhonen, A., Poibeau, T. e Messiant, C. (2010): Investigating the cross-linguistic potential of VerbNet: style classification. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING 2010). Beijing, China, pp. 1056-1064.

Sun, L. and Korhonen, A. (2011): Hierarchical Verb Clustering Using Graph Factorization. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP). Edinburgh, UK.

Zanette, A.  (2010): Aquisição de Subcategorization Frames para Verbos da Língua Portuguesa. Projeto de diplomação, Universidade Federal do Rio Grande do Sul.

Publications:

SCARTON, C., DURAN, M. S. and ALUISIO, S. M. (to appear): Using Cross-linguistic Knowledge to Build VerbNet-style Lexicons: Results for a (Brazilian) Portuguese VerbNet.  Accepted to appear in the proceedings of the International Conference on Computational Processing of Portuguese (PROPOR 2014), October 6-9, Sao Carlos-SP, Brazil.

ZILIO, L., ZANETTE, A. and SCARTON, C. (2014): Automatic Extraction of Subcategorization Frames from Portuguese Corpora. In Aluisio, S. M. and Tagnin. S. E. O. (eds.) New Languages Technologies and Linguistic Research: a Two-Way Road. Cambridge Scholars Publishing, pp. 78-96.

SCARTON, C., SUN, L., KIPPER-SCHULER, K., DURAN, M. S., PALMER, M. and KORHONEN, A. (2014): Verb Clustering for Brazilian Portuguese. In the Proceedings of 15th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2014), Katmandu, Nepal, pp. 25-39.

SCARTON, C. E. (2013). VerbNet.Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil. Master’s dissertation, Instituto de Ciências Matemáticas e de Computação, University of São Paulo, São Carlos. Accessed in 2014-09-29, at http://www.teses.usp.br/teses/disponiveis/55/55134/tde-19042013-160640/

SCARTON, C. and ALUISIO, S. Towards a cross-linguistic VerbNet-style lexicon to Brazilian Portuguese. In: LREC 2012 Workshop on Creating Cross-language Resources for Disconnected Languages and Styles, 2012, Istambul, Turkey.

ZANETTE, A., SCARTON, C. and ZILIO, L. (2012): Automatic extraction of subcategorization frames from corpora: an approach to Portuguese. In Demonstration Session of the International Conference on Computational Processing of  Portuguese Language (PROPOR 2012). Coimbra, Portugal.

SCARTON, C. E. . VerbNet.Br: construção semiautomática de um léxico computacional de verbos para o português do Brasil. In: The 8th Brazilian Symposium in Information and Human Language Technology, 2011, Cuiabá-MT. The 8th Brazilian Symposium in Information and Human Language Technology, 2011. v. 1.

SCARTON, C. E. . VerbNet-Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil. In: I Congresso Internacional de Estudos do Léxico, 2011, Salvador - BA. Comunicação Coordenada: O verbo no Computador nos anais do I Congresso Internacional de Estudos do Léxico, 2011.

SCARTON, C. E. ; ALUISIO, S. M. . VerbNet.Br: construção semiautomática de um léxico verbal online e independente de domínio para o português do Brasil. In: X Encontro de Linguística de Corpus, 2011, Belo Horizonte - MG. X Encontro de Linguística de Corpus, 2011.

Links:

Downloads: