Eszter Katona, Árpád Knap, Fanni Máté, Mihály Csótó – Topic modelling of the Információs Társadalom

2021.07.13. Publication

Three members of our research group: Eszter Katona, Árpád Knap and Fanni Máté, with the contribution of Mihály Csótó, wrote an article for the special anniversary issue of the Information Society journal. The primary aim of the study is to review the topics that the journal has included in the Hungarian discourse of “information society studies” over the past 15 years and to explore the thematic structure of the journal with NLP methods. In addition to the content analysis, the article also provides an insight into the co-author network of the journal, as well as the relationship between the authors and each topic.

The paper is accessible on the following URL: https://doi.org/10.22503/inftars.XXI.2021.1.1
The visualizations are available on this link: https://inftars.infonia.hu/inftars20?lang=hu

Németh, Sik, Katona (2021) – The asymmetries of the biopsychosocial model of depression in lay discourses – Topic modelling online depression forums

2021.04.26. Publication Discursive framing of depression in online health communities

New results of our project ‘NLP analysis of online depression forums’ was published in SSM Population Health (D1) written by Renáta Németh, Domonkos Sik and Eszter Katona. The asymmetries of the biopsychosocial model of depression in lay discourses – Topic modeling of online depression forums.

Our former publications in related topics

2021.04.14. Publication Discursive framing of depression in online health communities

Sik Domonkos: From mental disorders to social suffering: Making sense of depression for critical theories. EUROPEAN JOURNAL OF SOCIAL THEORY (2018)

Sik, Domonkos: Válaszok a szenvedésre: A hálózati szolidaritás elmélete. Budapest, Magyarország : ELTE Eötvös Kiadó (2018) , 228 p.

Sik, Domonkos: A szenvedés határállapotai: Egy kritikai hálózatelmélet vázlata. Budapest, Magyarország : ELTE Eötvös Kiadó (2018) , 246 p.

Deckovic-Dukres, V., Hrkal, J., Németh, R., Vitrai, J., Zach, H.: Inequalities in health system responsiveness. Joint World Health Survey Report Based on Data from Selected Central European Countries, 2007. Jelentés a WHO megbízásából.

Remák, E., Gál, R.I., Németh, R.: Health and morbidity in the accession countries. Country report – Hungary. ENEPRI Research Reports 28, Brussels: ENEPRI, 2006.

Albert, F., Dávid, B., Németh, R.: Social support, social cohesion. In.: National Health Interview Survey 2003, Research Report, 2005. (Hung.)

(magyarul: Albert Fruzsina, Dávid Beáta, Németh Renáta: Társas támogatottság, társadalmi kohézió. In.: Országos Lakossági Egészségfelmérés OLEF2003, Kutatási Jelentés, 2005.)

Sik, Domonkos (2020): From Lay Depression Narratives to Secular Ritual Healing: An Online Ethnography of Mental Health Forums

2020.12.29. Publication Discursive framing of depression in online health communities

The article aims at analysing online depression forums enabling lay reinterpretation and criticism of expert biomedical discourses. Firstly, two contrasting interpretations of depression are reconstructed: expert psy-discourses are confronted with the phenomenological descriptions of lay experiences, with a special emphasis on online forums as empirical platforms hosting such debates. After clarifying the general theoretical stakes concerning contested ‘depression narratives’, the results of an online ethnography are introduced: the main topics appearing in online discussions are summarised (analysing how the abstract tensions between lay and expert discourses appear in the actual discussions), along with the idealtypical discursive logics (analysing pragmatic advises, attempts of reframing self-narratives and expressions of unconditional recognition). Finally, based on these analyses an attempt is made to explore the latent functionality of online depression forums by referring to a secular ‘ritual healing’ existing as an unreflected, contingent potential.

Renáta Németh, Domonkos Sik, Fanni Máté. 2020. “Machine learning of concepts hard even for humans: the case of online depression forums”. International Journal of Qualitative Methods

2020.08.25. Publication Discursive framing of depression in online health communities

Social scientists of mixed-methods research have traditionally used human annotators to classify texts according to some predefined knowledge. The ‘big data’ revolution, the fast growth of digitized texts in recent years brings new opportunities but also new challenges. In our research project, we aim to examine the potential for natural language processing (NLP) techniques to understand the individual framing of depression in online forums. In this paper, we introduce a part of this project experimenting with NLP classification (supervised machine learning) method, which is capable of classifying large digital corpora according to various discourses on depression. Our question was whether an automated method can be applied to sociological problems outside the scope of hermeneutically more trivial business applications.

The present article introduces our learning path from the difficulties of human annotation to the hermeneutic limitations of algorithmic NLP methods. We faced our first failure when we experienced significant inter-annotator disagreement. In response to the failure, we moved to the strategy of intersubjective hermeneutics (interpretation through consensus). The second failure arose because we expected the machine to effectively learn from the human-annotated sample despite its hermeneutic limitations. The machine learning seemed to work appropriately in predicting bio-medical and psychological framing, but it failed in case of sociological framing. These results show that the sociological discourse about depression is not as well founded as the bio medical and the psychological discourses – a conclusion which requires further empirical study in the future. An increasing part of machine learning solution is based on human annotation of semantic interpretation tasks, and such human-machine interactions will probably define many more applications in the future. Our paper shows the hermeneutic limitations of ‘big data’ text analytics in the social sciences, and highlights the need for a better understanding of the use of annotated textual data and the annotation process itself.

The supplementary material of this article can be found here.

Barna, Ildikó, and Árpád Knap. 2020. „A Case Study of Using LDA Topic Modeling in Sociological Research – Antisemitism in Contemporary Hungary”. Presentation, Institute of Formal and Applied Linguistics, Charles University, Prague, Czech Republic.

2020.01.20. Presentation Online Antisemitism

Ildikó Barna, co-leader of our research group, gave a presentation on contemporary Hungarian antisemitism at the Formal and Applied Linguistics Institute of Charles University Prague. The presentation was based on the Online Antisemitism project conducted with Árpád Knap. In addition to presenting the results of the research so far, in her lecture she also discussed why sociological and domain knowledge is indispensable for interpreting the output of natural language processing.

Further information of the lecture is available on the university’s website. The video recording of the presentation can be accessed on this link.

The post about the presentation can be found here on our website.

Koltai, Júlia – Kmetty, Zoltán – Bozsonyi, Károly (2019) From Durkheim to machine learning – finding the relevant sociological content in a social media discourse. In: Rudas, Tamás – Péli, Gábor (eds.) Pathways Between Social Science and Computational Social Science – Therories, Methods and Interpretations. New York, NY, Springer. (forthcoming)

2019.12.15. Publication Data Science in Social Research

The phenomenon of suicide is in the focus of social scientists since Durkheim. Internet and social media sites provide new ways for people to express their positive feelings, but they are also platforms to express suicide ideation or depressed thoughts. Most of these contents are not notes about real suicides, but some of them are cry for help. Nevertheless, suicide and depression related content varies among platforms and it is not evident, how a researcher can find these contents in mass data of social media.  Our paper uses the corpus of more than 4 million Instagram posts, related to mental health problems. After defining the initial corpus, we present two different strategies to find the relevant sociological content in the noisy environment of social media. The first approach starts with a topic modelling (Latent Dirichlet Allocation), which output serves as the basis of a supervised classification method, based on advanced machine learning techniques. The other strategy is built on an artificial neural network based word embedding language model.

Németh, Renáta; Koltai, Júlia (2019): Sociological knowledge discovery through text analytics. In: Rudas, Tamás – Péli, Gábor (eds.) Pathways Between Social Science and Computational Social Science – Therories, Methods and Interpretations. New York, NY, Springer. (forthcoming) 

2019.12.01. Publication Data Science in Social Research

In our work, based on recent research reports, we discuss the advances, challenges and opportunities of Big Data text analytics in sociology. The advances include the utilization of the originally and primarily business and technology-oriented development of information technology, data science, AI and NLP; and also, the rapid growth of computing capacity. These advances provide opportunities. Social behavior can be directly observed, not only on self-reported basis. The observation and analysis could happen in real-time, and – because of the development of NLP methods – the understanding of the content is getting deeper.

As our paper shows, there are new possibilities for sociological research which are in some sense just byproduct of information science. We introduce recently developed methods which can be applied to specific sociological problems outside the scope of business applications. We present sociological topics not yet studied in this area and show new insights the approach can offer to classical sociological questions. As our aim is to encourage sociologists to enter this field, we discuss the new methods on the base of the classic quantitative approach, using its concepts and terminology, addressing also the question of new skills acquired from traditionally trained sociologists.

Barna, Ildikó, and Árpád Knap. 2019. „New Ways of Scrutinising Overt and Subtle Antisemitism in Hungary”. In 14th ESA Conference – Abstract book: Europe and Beyond: Boundaries, Barriers and Belonging, 880. Manchester, Egyesült Királyság: European Sociological Association

2019.08.21. Presentation Online Antisemitism

The level of antisemitism in Hungary has always been among the highest in Europe. Representative surveys show that approximately 33 to 40 per cent of the Hungarian population is antisemitic. Although there has been some fluctuation, the level of antisemitism has remained quite stable. Moreover, we found, based on representative surveys among Hungarian Jews, that although the proportion of those having experienced or witnessed antisemitic acts one year prior to the survey decreased massively from 79 to 58 per cent between 1999 and 2017, the perception of antisemitism severely deteriorated. While in 1999, 37 per cent of Jews thought that antisemitism was strong or very strong in Hungary, in 2017 65 per cent said the same. This high discrepancy between experience and perception is due to several factors, being one of them the spread of online hatred. This fact makes the analysis of online sources necessary. Due to the vast amount of unstructured online textual data, their examination demands new tools, one of them being Natural Language Processing (NLP). NLP is an interdisciplinary field of research in the intersection of computer science, artificial intelligence, as well as linguistics. In our research, we apply NLP on a massive corpus of recent Hungarian news articles, social media content, and online forum comments. NLP makes possible not only the examination of the structure, the main topics, and actors of overt antisemitism but the identification of underlying subjects and specificities of latent antisemitism. In our paper, we present the first results of our research.

The post about the conference can be found here on our website.

Bio, psycho or social – Discursive framing of depression in online health communities – IC2S2, 5th International Conference on Computational Social Science, Amsterdam, 2019

2019.07.17. Presentation Discursive framing of depression in online health communities

In our research we aimed at gathering and automatically classifying online forum posts into the above three framing types by applying different supervised learning algorithms. As our dataset, we decided to use depression-related posts from the most popular English-speaking health forums within the time interval 2016-2018. We obtained only publicly available posts, which are shared willingly by their authors. We used Python to implement our analyses. After pre-processing and feature extraction, the scikit-learn library was used with different algorithms (SVM, Naive Bayes, Logistic Regression and Decision Trees). Our poster can be downloaded here.

Related publications

2019.07.01. Publication Corruption in Online Editorial Media

Kostadinova, Tatiana; Kmetty, Zoltán: Corruption and Political Participation in Hungary: Testing Models of Civic Engagement. EAST EUROPEAN POLITICS AND SOCIETIES Online first p. 1 (2019)

Kmetty, Zoltán: Incumbent party support and perceptions of corruption – an experimental study. SZOCIOLÓGIAI SZEMLE 28 : 4 pp. 152-165., 14 p. (2018)

Kmetty, Zoltán: Korrupció percepciója, pártosság, választási részvétel: Hogyan változott a szavazók véleménye a hazai politikai korrupcióról a 2014-2018-as parlamenti ciklus alatt? In: Böcskei, Balázs; Szabó, Andrea (szerk.) Várakozások és valóságok. Parlamenti választás 2018. Budapest, Magyarország : Napvilág Kiadó, MTA TK PTI, (2018) pp. 292-316., 25 p.

Previous publications

2019.07.01. Publication Corruption in Online Editorial Media

Eszter Katona’s presentation

Eszter Katona held a presentation entitled ‘Natural Language Processing in Social Sciences’ on 10 May 2019 in Basel, at the Joint Annual Conference of the GPSA Methods of Political Science Section and the SPSA Empirical Methodology Working Group (https://www.methodology-dvpw-svpw.com/). She presented the results of the paper she wrote together with Renáta Németh and Zoltán Kmetty. The study is currently under review in a Hungarian sociology journal.

Eszter, Rita Katona ; Renáta, Németh ; Zoltán, Kmetty (2019): Natural Language Processing in Social Sciences. Joint Annual Conference of the GPSA Methods of Political Science Section and the SPSA Empirical Methodology Working Group May 10/11, 2019.

Katona Eszter (2019): Szakdolgozat, 2018, ELTE, Survey Statisztika MSc

Previous publications in epistemology/sociology of science

2019.06.26. Publication Data Science in Social Research
  • Katona, Eszter, Németh, Renáta, Kmetty, Zoltán: Text analytics in social sciences – An example for NLP’s application (in Hung., submitted)
  • Bárdits, Anna, Németh, Renáta (2017): The rite of statistical significance testing – contemporary critics; the rite in sociology. Szociológiai Szemle, 27:(1) pp. 119-125. (in Hung.)
  • Bárdits, Anna, Németh, Renáta, Terplán, Győző (2016): An old problem in the spotlight again. The mistaken practice of the null-hypothesis significance test. (in Hung.) Statistical Review, 94:(1) pp. 52-75.
  • Németh, Renáta (2015): Causal inference in empirical sociological research. Szociológiai Szemle, 25(2), pp:2-30. (in Hung.)
  • Németh, Renáta (2015): Do numbers really speak for themselves? Replika, Special issue on Big data and Sociology, 92-92, pp: 203-208. (in Hung.)
  • Németh, Renáta (2014): Methods of quantitative social research paradigms. socio.hu, 2014/3, pp. 1-16. (Hung.)

Barna, Ildikó, and Árpád Knap. 2019. „Antisemitism in Contemporary Hungary: Exploring Topics of Antisemitism in the Far-Right Media Using Natural Language Processing”. Theo-Web 18 (1): 75–92.

2019.06.15. Publication Online Antisemitism

In this paper, we explore antisemitism in contemporary Hungary. After briefly introducing the different types of antisemitism, we show the results of a quantitative survey carried out in 2017 on a nationally representative sample. Next, we present the research we conducted on the articles related to Jews from the far-right site Kuruc.info. Our corpus contained 2,289 articles from the period between February 28, 2016, and March 20, 2019. To identify latent topics in the text, we employed one of the methods of Natural Language Processing (NLP), namely topic modeling using the LDA method. We extracted fifteen topics. We found that racial antisemitism, unmeasurable by survey research, is overtly present in the discourse of Kuruc.info. Moreover, we identified topics that were connected to other types of antisemitism.

Keywords: antisemitism, Hungary, Natural Language Processing, topic modeling, LDA

Kmetty, Zoltán – Koltai, Júlia: Understanding Cultural Choices with NLP (2019). Presentation at the Data Science Meetup Budapest, May 9, 2019.

2019.05.09. Presentation Data Science in Social Research

Parallel with the rise of digital textual data, natural language processing methods developed rapidly in the last decade. In our presentation, we will focus on artificial neural network based word embedding methods, which became widespread in recent years. Different fields apply these methods, such as linguists for dictionary building; developers for music video recommendations systems; companies for the analysis of product reviews, etc. However, their application in the understanding of human behaviour and culture was limited so far, though the huge amount of available digital data (text) provide a lot of information about our preferences, choices and the way we think. We will show several examples of the utilization of word embedding methods in this field. The presentation also provides details about the methodology, the problems to be solved and the directions of further development.

Bartus, Tamás – Kisfalusi, Dorottya – Koltai, Júlia (2019) Logisztikus regressziós együtthatók összehasonlítása (The Comparison of Coefficients in Logistic Regression) In: Statisztikai Szemle (Hungarian Statistical Review) 97(3): 221-240.

2019.03.09. Publication Data Science in Social Research

Recently, increasing attention has been devoted to the problem that estimated coefficients of logistic (and other non-linear) regression models cannot be compared across groups, samples, or nested model specifications due to the possible differences in the magnitude of unobserved heterogeneity. This study reviews methods which aim to solve this problem and investigates their effectiveness through simulation. Parameter estimates of nested model specifications can be made comparable using y-standardization or by comparing the estimates of the multivariate model to the estimates of a special, quasi-univariate model. Methods which aim to make coefficients comparable across groups and samples (such as testing the proportionality of interaction effects and heterogeneous choice models), however, do not provide adequate solutions for the problem. Causes behind this failure are discussed. 

Ildikó Barna’s lecture “Overt and Subtle Antisemitism in Hungary” at the “Antisemitism, Anti-Zionism, Israel, and the Holocaust” workshop in Salzburg

2019.02.24. Presentation Online Antisemitism

Ildiko Barna held a presentation entitled “Overt and Subtle Antisemitism in Hungary” at the “Antisemitism, Anti-Zionism, Israel, and the Holocaust” workshop in Salzburg on 23 February 2019, discussing the methods and challenges of measuring antisemitism. Besides presenting the results of surveys on antisemitism in Hungary, she talked about new research opportunities offered by Natural Language Processing (NLP) methods, and also about the work of the research group.

Kmetty, Zoltán – Koltai, Júlia: Big data based decision making mechanisms from the viewpoint of social sciences. Presentation at the event of HUB Design House, called  ‘The Power of Big Data’ January 9, 2019.

2019.01.09. Presentation Data Science in Social Research

In our presentation, we presented the possibilities of large scale data-based decision making, focusing on the dangers, when this type of decision making does not work properly. Within this latter topic, we emphasised the importance of interpretation and causality.