by Alessandra Maia Terra de Faria, Carlos Trucíos, and Marcelos Cantañeda de Araújo

Reviewed by Matheus Lucas Hebling

Find out what and how presidential candidates tweet and are tweeted.


This research aims to follow the tweets related to the three main presidential candidates according to the opinion polls available for the 2022 elections in Brazil.


Daily tweets spanning from July 1st to July 31st were collected for each one of the three main candidates in the Brazilian presidential election. Tweets were collected from candidates’ timelines and Twitter users mentioning the candidates, totalling more than 14.7 million tweets. Data were extracted through a Twitter API used exclusively for academic purposes and analyzed using R software.
The authors thank Twitter for the academic accounts granted to them.
Herein is the updated data (July versus June, data as of August 19th, 2022) of Twitter followers for each of the candidates.
Bolsonaro – from 8.4 up to 8.6 million (2.5% increase in followers in comparison to the previous month)
Lula – from 3.8 up to 4 million (5% increase in followers in comparison to the previous month)
Ciro – 1,4 million (no change in comparison to the previous month)

Candidates’ tweets

In Figure 1, we report the number of tweets on the candidates’ timeline, among the three that were part of our survey: Lula, Ciro, and Bolsonaro, according to the frequency with which the candidates tweeted in July.

Figure 1: Timelines

Figures 2 and 3 present the most frequent words in the candidates’ timeline tweets and the most frequent words in the candidates’ timeline tweets weighted by the inverse document frequency (TF-IDF), respectively.

Figure 2: Most frequently used words in the candidates’ timeline.

The analysis of the most frequent words in the candidates’ timeline tweets (Figure 2) allows us to present a dominant panorama of the subjects they deal with. In common in the three profiles, we find the term “Brazil”, keeping the pattern found in May and June. The difference arose in the mention of the word “people” [“povo”], also mentioned frequently in the three profiles. Although the term “people” is prominent in all candidates, it is more frequent in Lula’s profile than in the other candidates. In the profiles of Lula and Ciro, the terms “Lula”, “country” [“país”], “today” [“hoje”] and “president” [“presidente”] are common. In Lula’s profile alone, the emphasis on the verbs “to have” [“ter”] and “to do” [“fazer”] denotes the continuity of the propositional character already found in May and June, and the mention of words like “want” [“quero”], “today” [“hoje”] and “together” [“juntos”], brings an emphasis on the development of proposals. In Ciro’s profile, the concern to name the other two candidates remains, as observed in April, May, and June. It is possible to highlight in the consolidated for the month of July, the mentions of the terms “folk” [“gente”] and “government” [“governo”]. Finally, in Bolsonaro’s profile, it is possible to identify terms in English*, due to a sequence of tweets on July 27th in response to Leonardo DiCaprio**, regarding the Amazon (which had a lot of repercussions on the network). Other highlights were the mention of amounts in reais (“r”) in terms of “thousand” [“mil”], “2022”, “07” and “reduction” [“redução”], the latter being due to discussions about tax reductions.
* The English terms in Bolsonaro’s profile are quite frequent in the common English language. If stop words in the English language were also used, the terms would have disappeared.
**His account on Twitter @LeoDiCaprio has 19.666.715 followers.

Figure 3 TF-IDF by candidates’ timeline

In Figure 3, the TF-IDF (term frequency-inverse document frequency) reflects the frequency of words in candidate timeline tweets that are infrequent for the three candidates overall. Thereby:

● In Lula’s profile, the novelty in July is in the terms “Alckmin” (referring to his partner, the vice presidential candidate), “together” [“juntos”], “rebuild” [“reconstruir”], “talk” [“conversar”], “technology” [“tecnologia”], “culture” [“cultura”], “wanted” [“queria”] and “Talhada” ( “Serra Talhada” is a mountain location close to the region of Garanhuns in Pernambuco State. They received a visit from the candidate whose images of the crowds went viral on the website in July). The emphasis that remained regarding June was the theme of “hunger” [“fome”].
● Bolsonaro’s profile features, again, emphasis on the English words “the”, “you”, “that”, and “and” (referring to tweets in response to actor Leonardo Dicaprio), as well as highlighting the years 2019 (which maintains since April, showing an attempt to emphasize the government’s achievements in the year before the pandemic) and 2022. The terms “reduction” [“redução”], “products” [“produtos”] and “drugs” [“drogas”] are also highlighted. Terms in English also appeared in the previous month, but due to the president’s participation in the “IX Summit of the Americas, 2022”.
● In Ciro’s profile, the novelty in July is a spraying of terms. “Ciro”, “pdt” (in reference to his party, “Partido Democrático Trabalhista”), “candidacy” [“candidatura”], “follow” [“acompanhar”], “choose” [“escolher”], “name” [“nome”], “exists” [“existe”], “voice” [“voz”], “suggested” [“sugeriram”], “spotify”, “drop” [“soltar”], “playlist”, “play “, “search” [“pesquisar”], “officials” [“oficiais”], “songs” [“músicas”], “jingles”, “hashtag”, “spin” [“giro”], “state” [“estadual”], “sing” [“cantar”] and “press” [“apertar”]. The terms “spotify”, “playlist”, “play”, “spin” [“O Giro do Ciro” refers to calls with takes and campaign visits], all relate to the candidate’s media campaign.

Tweets about the candidate

The total number of tweets mentioning each candidate is displayed in Figure 4 and the daily evolution in Figure 5.
Next, in Figure 4, we present, in descending order (from the most cited to the least cited), the total number of tweets that mentioned the name of each candidate surveyed in the month of July: Bolsonaro, Lula, and Ciro.
To collect the tweets mentioning the respective candidates, the words “Bolsonaro”, “Ciro” and “Lula” were used as search criteria. Tweets mentioning “Ciro Nogueira” were excluded from the analyzes referring to candidate Ciro.

Figure 4: Number of tweets mentioning the candidates.

During the previous month, the interactions of all the candidates intensified, that is, they were all more assiduously on Twitter. In the daily evolution of tweets (Figure 5) it can be seen that Ciro has the lowest number of daily interactions and Bolsonaro the highest number of daily interactions (except on July 2nd and 3rd, when the number of tweets by Lula was slightly higher). The large number of tweets mentioning Bolsonaro on the 18th and 19th of July refers to the mention of the leader of a criminal faction on the 18th and the reduction in the price of gasoline on the 19th when the president stated that “Brazil will have a of the cheapest gasoline’ (sic) in the world” [ “‘uma das gasolina’ (sic) mais barata do mundo”].

Figure 5: Daily evolution of tweets mentioning candidates.

Word clouds

Finally, we present below three-word clouds with, excluding stop words, the top 100 words used in the interactions of Twitter users in July. For better visualization, each candidate’s name was taken from its cloud.
A word cloud is a graphical representation of the most frequent words within a text or set of texts.
Next, we present three-word clouds, where each one corresponds to a candidate. It is important to point out that each candidate’s name was taken from its cloud, for better visualization of the associated words. It should also be noted that each cloud reflects the 100 most relevant words associated, excluding stop words, to each candidate in the interactions of Twitter users on the thirty-first day of July.
In text analysis, stop words are quite common words such as “and”, “from”, “the”, etc. These words are not useful for analysis and are often removed before analysis.

Figure 6: Word cloud for Bolsonaro

Figure 7: Word cloud for Lula

Figure 8: Word cloud for Ciro

When analyzing the clouds, we share the first impression of each one:
● Bolsonaro: the words “Lula” and “Brasil” are consolidated in the foreground (keeping the same structure as found in the previous month). In the background, “government” [“governo”], “against” [“contra”], “about” [“sobre”], “people” [“povo”] and “PT” [“Partido dos Trabalhadores”]
● Lula: “PT”, “president” and “Brasil” appear in the foreground (keeping the same structure as in the previous month); in the background “ex”, “to vote” [“votar”], “against” [“contra”], “PCC” [“Primeiro Comando da Capital (PCC)” – First Capital Command (PCC) is the largest criminal organization in Brazil], “left” [“esquerda”] and “people” [“povo”].
● Ciro: the trend of recent months remained in the foreground (“Lula” and “Bolsonaro)”. In the background, “vote” [“voto”], “turn” [“turno”] and “Tebet” (in reference to the presidential candidate Simone Tebet for MDB Party).

Sentiment analysis

The sentiment of each tweet was constructed by identifying the sentiments of the basic units (the words) using the Oplexicon v3.0 and Sentilex dictionaries, from the LexiconPT Package. Thus, each word found in the dictionaries receives 1, -1, or 0 scores, depending on whether the feeling is positive, negative, or neutral, respectively. Words not found in the dictionaries also receive a 0 score. The values assigned to each word within the tweet were added up, and depending on the result positive, negative, or zero, the sentiment of the tweet is classified. In Figure 9, feelings (Negative, Neutral, and Positive) are presented in percentages per candidate. It is possible to highlight a balance between the feelings expressed in the tweets of the three candidates. Such data will be monitored over time comparatively. This is a portrait, a sentimental snapshot of June on Twitter. This is a picture, a sentimental snapshot of July on Twitter. When analyzing proportionally the number of tweets mentioning each candidate, Ciro had the lowest percentage of tweets with negative sentiment and the highest percentage of tweets with positive and neutral sentiments. Candidate Lula had the highest percentage of negative tweets and the lowest percentage of positive tweets. Bolsonaro had the lowest percentage of neutral tweets.

Figure 9: Sentiments of tweets per candidate

Next, it will be possible to look at the word cloud of each candidate, separately, according to the feelings attributed to each tweet, in Figures 10, 11, and 12. Words in pink appear in tweets rated as associated with positive feelings, words in blue appear in tweets rated as associated with negative feelings, and words in beige appear in tweets rated as neutral.

The word clouds are considered the 200 most frequent words.

Figure 10: Word cloud Sentiments for Bolsonaro.

Figure 11: Word cloud Sentiments for Lula

Figure 12: Word cloud Sentiments for Ciro

● Bolsonaro: Tweets related to candidate Bolsonaro that were classified as associated with positive sentiments are characterized by words such as “democracy” [“democracia”], “world” [“mundo”], and “re-elected” [“reeleito”]. Tweets classified as associated with negative feelings are characterized by words such as “party” [“partido”], “guilt” [“culpado”], “corruption” [“corrupção”]. Finally, tweets considered neutral highlight the word “president” [“president”].
● Lula: Tweets related to candidate Lula that were classified as associated with positive feelings are characterized by words such as “know” [“sabe”], “to see” [“ver”], “good” [“bom”] and “people” [“povo”]. Tweets classified as negative are characterized by words such as “to vote” [“votar”], “left” [“esquerdo”], “follow” [“segue”]. Finally, tweets with neutral sentiment are mainly characterized by the name “PCC” [“Primeiro Comando da Capital (PCC)” – First Capital Command (PCC) is the largest criminal organization in Brazil], “Anitta”*** and “president”.
● Ciro: Tweets related to candidate Ciro that were rated as associated with positive feelings are characterized by words such as “best” [“melhor”], “world” [“mundo”], and “need” [“precisa”]. Tweets classified as negative are characterized by words such as “to vote” [“votar”]. Finally, tweets with neutral sentiment are characterized by words like “Bolsonaro”, “Tebet” (about the presidential candidate Simone Tebet for MDB Party), and “president”.
*** Anitta is a Brazilian singer popstar who started in the favelas in Rio de Janeiro. Her account on Twitter @Anitta has 18.091.412 followers. On July 11th she posted to announce that she will support Lula in the 2022 elections. In the tweet, the singer said that “she was never PT, but that this year she will support the PT candidate”.


The 25 most frequent bigrams in tweets mentioning each of the candidates are shown in Figures 13 to 15. The direction of the zeta reveals the order in which the bigram appears and the greater the intensity of the zeta, the greater the frequency of the bigram.

Figure 13: Bigrams of Bolsonaro

Bolsonaro: among the most frequent bigrams we have “government => Bolsonaro”, “Jair =>Bolsonaro” and “president=> Bolsonaro”. Following those, in sequence, we can notice Michelle => Bolsonaro ( this can be understood in connection with the greater exposure/presence of the First Lady, as a strategy to reach the female audience and evangelicals groups in his campaign), “urns => electronic”, “Eduardo=> Bolsonaro”, “fake=>news”, “forces => armed”, “Marcelo => Arruda”****, “first => turn”.
**** Marcelo Arruda was a member of the Workers’ Party (PT) of Brazil, killed and shot by a policeman, when the victim was celebrating his 50th birthday, while the officer was shouting slogans in favor of the current president and candidate, Jair Bolsonaro. The tragedy occurred in the southern city of Foz do Iguaçu, a western region of Paraná, at the hands of the federal penitentiary agent Jorge José da Rocha Guaranho, who first went to the place and after 20 minutes, returned armed and began to shoot the PT leader, whose party was being held with themes dedicated to the PT and Lula.

Figure 14: Bigrams of Lula

– Lula: among the most frequent bigrams we have “first=>turn”, “Marcos => Valerio”***** and “Lula => president”, followed by “turn => left”, “left => follow”, “ex => prisoner”, “people=>Brazilian”, “vou=>votar” and “Celso=>Daniel”. With a lower intensity “rede => globo” ( TV station), “Jair => Bolsonaro”, “Sergio => Moro” and “fake=>news”.

*****Marcos Valerio is one of the main accused in the Mensalão scandal and was found guilty of bribery, embezzlement, money laundering, tax evasion, and conspiracy. In October 2012 he was given a sentence of 40 years 2 months and 10 days imprisonment and a fine of R$2.72 million.
******Sérgio Moro is an ex-judge that was involved in ethical violations and legally prohibited collaboration between him and prosecutors who convicted and imprisoned former Brazilian President Luiz Inácio Lula da Silva on corruption charges — a conviction that resulted in Lula being barred from the 2018 presidential election.

Figure 15: Bigrams of Ciro

Ciro: among the most frequent bigrams we have, predominantly, “Ciro => Gomes”. Among other bigrams, but with less intensity, we have “Ciro => blocked”, “Ciro => Tebet”, “Tebet => ignored”, “Tebet => with me”, “find=> accepted” and “vote=> printed “.

Final comments

The presentation of this dataset aims to contribute to interpretations of the movement on Twitter of possible candidates in the 2022 elections, as well as about what is said about them in the interactions of users of the platform throughout the month of July, in comparison to what was found in June ( ), what was found in May ( ) and what was found in April ( This is ongoing research work and will be refined over the months leading up to the 2022 election.

Alessandra Maia Terra de Faria, Social Sciences Department at PUC-RIO / PPGCS – UFRRJ. E-mail:

Carlos Trucíos, Department of Statistics, University of Campinas. E-mail:

Marcelo Castañeda de Araujo, Department of Business/UFRJ.