Text Analysis

Centro de Finanças e Macroeconomia

Brazilian Politics – Discourse Structure

Last update: 17/8/2022

Author: Guilherme Souza

In this study, we use transcriptions from the Brazilian senate to construct topics of discussion. We analyze text using Seeded Latent Dirichlet Allocation (Seeded LDA).

The LDA is a statistical approach for separating unobserved groups, generally using a set of words to separate them into topics. In the seeded version, we pre-define topics by picking three words per topic. The seeded version allows us to perform theory-driven analysis. The result reflects the attention given by politicians to each topic over time. This work is inspired by http://structureofnews.com/.

Below we provide three files:

  • Composition: lists the words analyzed and the relative importance for each topic during the whole time period
  • Seeds: list of words we used to create the topics
  • Importance: monthly data with relative importance of each topic

For each month, the sum of importances of each topic sums to 1. The analysis starts in jan/2000.

If you have any doubt or comments please e-mail us at cefim@insper.edu.br