Abstract-We describe a preliminary project of extracting information from an extant dictionary of historical biographies, the "Dicionário Histórico-Biográfico Brasileiro" (the Brazilian Historical and Biographical Dictionary, shortened as DHBB), a longstanding project at the 'Centro de Pesquisa e Documentação de História Contemporânea do Brasil' (CPDOC) of the 'Fundação Getulio Vargas' (FGV). For information extraction, we rely on Natural Language Processing tools such as FreeLing as well as our own resources NomLex-PT, a lexicon of nominalizations, and OpenWN-PT, a Portuguese version of Princeton's WordNet database. While our project currently highlights the potential of information extraction in a fun exploratory manner, we also discuss the engaging of historians interested in the affordances of digital tools.
Higuchi, Suemi; Freitas, Maria Cláudia (Advisor). Automatic information extraction: a distant reading of the Brazilian Historical-Biographical Dictionary (DHBB). Rio de Janeiro, 2021. 176 p.
We discuss several challenges of evaluating information extraction patterns, using the DHBB corpus, a public resource for the Dicionário Histórico-Biográfico Brasileiro, to stress both the limitations and the advantages of using a corpus-based approach for the task of identifying political families in Brazilian society.
As áreas das humanas, em especial a literatura e a história, sempre legaram aos registros textuais grande parte da sua razão de ser e de seu modo de fazer. O presente artigo busca refletir e ampliar o horizonte das relações entre as humanidades e o uso das tecnologias disponíveis, focando prinicpalmente nos métodos de leitura distante para estudos literários, alicerçados na linguística com corpus. De que forma a prática de pesquisa nestas áreas tem sido impactada com o uso de ferramentas digitais? Que desafios precisam ser enfrentados e que oportunidades se abrem neste cenário potencialmente inovador? Estas são algumas das questões discutidas no texto.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.