Big data media analysis and the representation of urban violence in Brazil

Lead Research Organisation: Lancaster University
Department Name: Linguistics and English Language


Brazil's current social and political situation gives rise to a particular breed of urban violence aimed at individuals and characterized by its continual presence. The average Brazilian citizen has to contend with this violence on a daily basis. This creates a general state of fear and insecurity among the population in general, but, at the same time, may promote on the part of more socially aware individuals, a sense of empathy with the less privileged classes in Brazil. The influence of the media contributes to this scenario. Daily news reports highlight violent acts carried out by individuals or groups from all social classes. The impact of violence on people's everyday lives is thus amplified by the media. This fosters beliefs, attitudes and values related to violence, which may or may not be consistent with the actual incidence, forms and causes of violence.
The partners will investigate the linguistic representation of urban violence in Brazil by applying the techniques of Corpus Linguistics to two datasets, or 'corpora':
1. The existing transcripts of two focus groups on living with urban violence conducted in Fortaleza, Brazil in 2010, for a total of approximately 20,000 words (Focus Groups Corpus);
2. A 2-million-word corpus of news reports in the Brazilian press, to be constructed as part of the partnership (News Reports Corpus).

Planned Impact

Urban violence is a major problem in Brazil. The UK-funded part of the partnership will enable the Brazilian partners to study the representation of urban violence in a large language dataset. They will benefit from the creation of a dedicated press corpus, and from being introduced to computer-aided methods to analyse this corpus, as well as an existing set of transcriptions of focus group discussions. The creation of the News Report Corpus, and the application of corpus linguistic techniques to the comparative analysis of this corpus and the focus group data, will make it possible to investigate the relationships between official statistics about urban violence, media representations and citizens' views.

The work of the partnership will therefore be relevant to governmental and non-governmental organisations dealing with urban violence in Brazil. They will benefit by gaining a better understanding of the relationship between violence statistics, media representations of urban violence, and ordinary people's perceptions of urban violence. A better understanding of these relationships can lead to better strategies for alleviating the consequences of urban violence on citizens' lives, and for fostering attitudes conducive to the solution of the social problems that cause the violence in the first place. These organisations will be reached via the project's website (and particularly blogs dedicated to a non-academic audience) and by being invited to the workshop to be held in Brazil and to the Corpus Linguistics Summer Schools organised at Lancaster University by the Centre for Corpus Approaches to Social Science.


10 25 50
Description Corpus linguistic tools were employed to analyse references to different types of urban violence in a corpus consisting of 5,127 news report (1,778,282 words) published in four Brazilian broadsheet papers - Zero Hora and Pioneiro (from Rio Grande do Sul), and Folha de São Paulo, and O Estado de São Paulo (from São Paulo) - between 01/Jan/2014 to 31/Dec/2014.

The list below shows the crimes most frequently mentioned in 2014 by the four Brazilian newspapers under analysis, ordered here by their frequency in the corpus.
i. Roubo (1,962 instances): Robbery or stealing of someone's property by violent means or threat
ii. Homicídio (1,827 instances): Homicide
iii. Assalto (1,143 instances): Assault with the intention of stealing someone's property
iv. Assassinato (931 instances): Murder
v. Furto (843 instances): Robbery or stealing of someone's property (not by violent means or threat)
vi. Estupro (432 instances): Rape

Most of these crimes - especially (i) and (ii) - show a direct link with drug trafficking as indicated by the frequent occurrence of drug(s) and traffic around them.

Roubo (i) and Furto (v) refer in most cases to car theft in urban areas, with recurrent mentions of official statistics showing that such crime has increased. This applies to both states, São Paulo and Rio Grande do Sul. Mentions of burglaries and mugging on the streets or buses (indicated by the analyses of crimes (i), (iii) and (v)) are more frequent in the newspapers from Rio Grande do Sul than in those from the state of São Paulo.

Homicide in most cases refers to premeditated murder, frequently co-occurring with words such as 'deliberate' (doloso and qualificado), 'intention' (intenção) or 'motive' (motivo). The phrases 'double' and 'triple homicides' are also frequent, indicating the occurrences of such crimes. Violence is also seen in a number of other collocates which frequently appear in the surroundings of 'homicide': latrocínio (larceny followed by death), 'kill', 'murder', 'robbery/theft', 'rape', 'death', and 'physical injury'.

References to 'corpse(s)' and 'concealment' also frequently co-occur with 'homicide', indicating that many of the homicides reported by the newspapers involved concealing the corpse. This was the case in the murdering of Bernardo Boldrini, a 11-year-old boy who was killed by his father and step-mother in Tres Passos, a small-size town in the state of Rio Grande do Sul, in April 2004. The Boldrini case also explains the high occurrence of 'murder' (assassinato) in the corpus, as most instances refer to that specific crime, thus referring to domestic rather than urban violence.

As for 'rape', most instances refer to incidents in gardens surrounding some universities in the city of São Paulo.
Another interesting finding that emerged from the analysis is that 'public security' is also a salient item in the discourse, with 944 occurrences in the corpus. Most instances appear in the newspapers from São Paulo (Folha de São Paulo and O Estado de São Paulo) referring to São Paulo Public Security Council (Secretaria de Segurança Pública do Estado de São Paulo). Mentions of public security as a citizens' right, however, is almost non-existent.
Exploitation Route Further analyses of the corpus of news reports can be carried out. The results of the analyses can be used to gain a greater understanding of how the media may both reflect and shape public perceptions of urban violence in Brazil.
Sectors Communities and Social Services/Policy,Government, Democracy and Justice,Security and Diplomacy

Description Two 2-day workshops on the applications of corpus linguistic methods to discourse analysis were held in, respectively, the University of Caxias do Sul (May 2015) and the University of Fortaleza (November 2016). In combination, these workshops attracted over 80 participants from around Brazil, over half of whom were postgraduate students. Participants were able to practise the methods in hands-on sessions. In the second workshop, the main findings from the project were presented, and participants carried out their own investigations on the representation of urban violence in the Brazilian press. The project also inspired the first International Conference on 'Violence, Politeness, Conflict Mediation and Access to Justice' that was held at the University of Fortaleza in September 2016. During the conference, academics from Linguistics, Law, Sociology, Psychiatry and Healthcare delivered papers on topics including figurative language in violence talk, drugs and violence, violence against women, restorative justice, and conflict mediation. The conference was open to different branches of the Brazilian police and the general public.
First Year Of Impact 2015
Sector Communities and Social Services/Policy,Education
Impact Types Cultural,Policy & public services

Description Corpus Linguistics training workshop at University of Caxias do Sul 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact This workshop brought together an audience of about 40 people, including lecturers, researchers, MA and PhD students from various Brazilian universities. The project team introduced a variety of Corpus Linguistics techniques, and gave participants a chance to try them out on different data sets.
Year(s) Of Engagement Activity 2015
Description Corpus Linguistics training workshop at University of Fortaleza 
Form Of Engagement Activity Participation in an activity, workshop or similar
Part Of Official Scheme? No
Geographic Reach International
Primary Audience Postgraduate students
Results and Impact The project team introduced the tools of Corpus Linguistics to an audience of academics and postgraduate students from different Universities in the North of Brazil. The project data and main findings were also presented.
Year(s) Of Engagement Activity 2015
Description Talk on 'The Representation of Urban Violence in Brazilian Newspapers' at Warwick University 
Form Of Engagement Activity A talk or presentation
Part Of Official Scheme? No
Geographic Reach National
Primary Audience Postgraduate students
Results and Impact Event for postgraduates and researchers in the Centre for Applied Linguistics at Warwick University.
Year(s) Of Engagement Activity 2015