Three members of our research group (Renáta Németh, Eszter Rita Katona, Zoltán Kmetty) recently published an article, which aims to present the characteristics and possibilities of automated text analysis. Their goal is to inspire Hungarian social scientists by providing an insight into a less-institutionalized area, since they believe that at an international level, text mining will be a standard method for empirical social science research within a few years.
The conference “Sociology at the Dawn of a Successful Century” will be held on October 8-9th at the Hungarian Academy of Sciences. The sociological applications of NLP will be presented in it’s own section, led by two co-leaders of our research group, Ildikó Barna and Renáta Németh (and Bence Ságvári, leader of the CSS-Recens research group of the Hungarian Academy of Sciences). Júlia Koltai, member of RC2S2, belongs to the organizing committee of the conference. From RC2S2 Ildikó Barna, Eszter Katona, Zoltán Kmetty, Árpád Knap, Júlia Koltai, Renáta Németh, Márton Rakovics, Domonkos Sik perform as speaker / co-author.
Both Eszter Katona and Árpád Knap won the scholarship of the New National Excellence Program (ÚNKP) for the next one-year period. The title of Eszter’s research is Corruption risk and prediction – Analysis of the texts of public procurement tenders using the tools of natural language processing. Her supervisor is Mihály Fazekas. According to the hypothesis of Eszter’s research, the examination of the wording of public procurement tenders can help in uncovering suspected cases of corruption. The title of Árpád’s research is Analysis of emotions related to twentieth-century traumas using Natural Language Processing methods. In his research, he will analyse emotions found in social media and the press, related to the two most influential events of the 20th century’s Hungarian history: the Treaty of Trianon and the Holocaust. Árpád’s supervisor is Ildikó Barna, the co-leader of our research group.
We are pleased to announce that our student, Bernadett Csala-Ferencz (MSc in Survey Statistics and Data Analytics), has won a scholarship from the New National Excellence Program for the new semester. The title of her research is: Exploratory clustering of online posts on depression. With the support of the program, Bernadett is analysing the posts of English-language online depression forums using natural language processing (NPL) methods within the Research Center for Computational Social Science research group. Her supervisor is Renáta Németh.
Our research group has received an outstanding opportunity – we have been awarded the 2020 “OTKA” (Hungarian Scientific Research Fund) research grant for 2020-2023 for our project titled “The layers of the political public sphere in Hungary (2001-2020) – a sociological analysis of the official, media-based and lay online public sphere using automated text analytics and critical discourse analysis”. The Principal Investigator is Renáta Németh.
Our research focuses on revealing language change in the political public sphere applying NLP (Natural Language Processing) methods on a large digital text corpus in critical discourse analysis framework, which treats language as a tool of ideology and power.
The two highlighted stakes–important for both society and sociology–of the research: (1) The inner workings of the political public sphere on its different levels, the dynamics and interaction between these levels, the exercise of power through language, the expressed ideological polarization, and identification of discourses free of these tendencies. (2) The organic integration of NLP methods into empirical sociology.
Both aspects of the research have international relevance since the studied phenomena of language polarization and diffusion of usage patterns are not specific to Hungary. The integration of NLP methods into empirical sociology is an emerging topic of huge interest because it allows for the new kind of analysis of the large digital corpora at hand utilizing sociological knowledge in the process.
All innovative methodological solutions of data collection and analysis will be made publicly available through digital repositories, scientific articles and conference talks to support the international and domestic users of NLP. Senior members of the research group will provide opportunities to join the project by supervising researches of Scientific Students’ Associations (TDK), master theses and Ph.D. topics for young researchers, and research internships or thesis supervision for graduate students.
Significance of the research
The digital revolution is also the revolution of self-expression. Before the internet, textual documents mainly bore the narratives of the elite, but now almost everybody has the opportunity to express themselves online. The primary relevance of our research is that by the automated processing of this continuously forming flow of texts, even those characteristics of the political public sphere can be examined and understood that have previously been only available in local fragments by the observer. Thus both the social and the scientific stakes of our research are high.
From a scientific standpoint, our research opens new perspectives by involving observational data into quantitative research in addition to previously used self-reported survey data, and also by combining qualitative discourse analysis with quantitative methods, employing new and innovative solutions from computational linguistics. This methodological blend is an international novelty. Using NLP on a large-scale text corpus covering different levels of digital communication–to our knowledge–has never been done in Hungary. While there is evidence for strong ideological polarization (e.g. Vegetti 2018), and polarization in the network structure of the political public sphere has been examined (Bene and Szabó 2019), language polarization has not been researched domestically, and there are only a few examples internationally, which are on a much smaller scale of application compared to our own research (see e.g. Demszky et al., 2019, on Twitter data).
Our research has an important stake from a social standpoint as well: the public sphere is one of the cornerstones of modern democracy and serves an important role in preventing potential distortions and crises.
One of the strengths of the proposal is that it is backed by a young but highly successful research group, already with several international publications, doctoral research topics, and a consciously built domestic and international network of collaborations. Besides the compilation and utilization of innovative methods for sociology, the aim of the research group is to foster the institutionalization of the new and promising automated text analysis methods in social sciences.
A new paper entitled Sociologists using machine learning: Hermeneutic limitations of ‘big data’ text analytics if non-trivial concepts are taught has been published in the International Journal of Qualitative Methods (Q1, impact factor 3.6) by Renáta Németh, Domonkos Sik and Fanni Máté. We were pleased to read the reviews, e.g. “The article is the one of the fundamental researches in this area” and “I look forward to seeing future efforts as you proceed with this research“. We hope that we will be able to fulfil the latter, since we already have three international publications being reviewed.
The paper can be accessed on the following URL: https://journals.sagepub.com/doi/full/10.1177/1609406920949338
Further results in this field: https://rc2s2.eu/en/project/discursive-framing-of-depression-in-online-health-communities/
Jews have been accused many times throughout the history of deliberately spreading disease among non-Jews. Simultaneously with the outbreak of the coronavirus epidemic, conspiracy theories linking Jews to the virus appeared. The internet is of paramount importance in the distribution of these conspiracy theories. In our research, we examine a large text corpus of Hungarian online articles and comments/posts to answer the research question of whether coronavirus-related antisemitic discourses appear in the Hungarian online space, and if so, what their content is. Our corpus contains articles, comments, and posts written in Hungarian between December 1, 2019, and July 10, 2020, in which the different forms of the word Jew, Zionism, Israel, and that of coronavirus appear simultaneously. Fifteen students from Sociology BA at ELTE University, Faculty of Social Sciences, are participating in the research as interns.
We are pleased to announce that two theses supervised by members of our research group have been awarded the title of “Thesis of the Year”. Anna Farkas wrote the best dissertation at Sociology BA (supervisor: Renáta Németh), and Jakab Buda at Survey Statistics and Data Analysis MSc (supervisor: Márton Rakovics). Congratulations! The dissertations can be found here, along with other dissertations in Computational Social Science led by members of our research team.
Inspira Group, a research company, according to their “data for social good” principle, made it possible for Anna Farkas, a BA student in sociology, to add questions to their online omnibus research, free of charge. The paper, supervised by Renáta Németh, is a case study that investigates gender bias in Google Translate and its translations of occupations from Hungarian (a gender-neutral language) to English (a gender-based language) (the thesis can be accessed on this link). Using quantitative methods, the study aims to measure the extent of gender bias in machine translations. It examines the use of pronouns in the English translation of sentences such as “ő egy orvos” (“he/she is a doctor”). To measure the bias in the algorithm, the study compares Google Translate’s translations to the proportion of men and women in each occupation, and to society’s perception of those occupations. To assess whether people find those occupations feminine or masculine, we used a survey. Inspira assisted in this research: as part of their online omnibus research, they carried out the survey on a representative sample about the perceptions of occupations using questions provided by the Anna Farkas. The study found that Google Translate mirrors people’s perception of occupations to a greater extent than the proportion of men and women in those occupations.
Anna Brecsok’s thesis (Survey Statistics and Data Analytics MSc), in which she conducted a survey experiment to investigate a problem she encountered at her workplace, was published in the Hungarian Statistical Review. Anna’s supervisor, and the co-author of the paper was Renáta Németh, co-leader of our research group.