Abstract
Psycholinguistic properties of words have been used in various approaches to Natural Language Processing tasks, such as text simplification and readability assessment. Most of these properties are subjective, involving costly and time-consuming surveys to be gathered. Recent approaches use the limited datasets of psycholinguistic properties to extend them automatically to large lexicons. However, some of the resources used by such approaches are not available to most languages. This study presents a method to infer psycholinguistic properties for Brazilian Portuguese (BP) using regressors built with a light set of features usually available for less resourced languages: word length, frequency lists, lexical databases composed of school dictionaries and word embedding models. The correlations between the properties inferred are close to those obtained by related works. The resulting resource contains 26,874 words in BP annotated with concreteness, age of acquisition, imageability and subjective frequency.
Related Publications
dos Santos, L. B., Duran, M. S., Hartmann, N. S., Candido Junior, A., Paetzold, G. H., Aluísio, S. M. (2017). A Lightweight Regression Method to Infer Psycholinguistic Properties for Brazilian Portuguese. In Text, Speech, and Dialogue: 20th International Conference, TSD 2017, Prague, Czech Republic, August 27-31, 2017. Springer.
Contact
Leandro Borges dos Santos, ICMC-NILC, University of São Paulo, e-mail: This e-mail address is being protected from spambots. You need JavaScript enabled to view it. or This e-mail address is being protected from spambots. You need JavaScript enabled to view it. ;
Sandra Aluisio, ICMC-NILC, University of São Paulo, Brazil, e-mail: This e-mail address is being protected from spambots. You need JavaScript enabled to view it. .
Download