Krisztián Boros – Meta-analysis of missing data handling methods with text-mining

Krisztián Boros (LinkedIn; GitHub)

Survey Statistics and Data Analytics MSC, 2020

supervisor: Zoltán Kmetty

The ubiquity of missing data in quantitative research is undeniable. We may encounter with missing data due to, for example, non-response, incorrect sampling, or data processing errors. During the past 50 years, researchers have developed a wide variety of missing data handling methods; the spectrum of available techniques extends from the basic deletion methods (e.g. listwise- and pairwise deletion) to the more involved techniques (e.g. Multiple Imputation, EM-algorithm).

The aim of my thesis is twofold. On one hand, I introduce a text-mining approach to collect and analyze papers while pointing out the advantages and disadvantages of this particular approach using the Total Survey Error Framework. On the other hand, I try to examine the possible trends of the missing data handling methods across years and scientific fields.

The results show that the popularity of advanced techniques (e.g. Multiple Imputation, EM-algorithm) had been growing over the past 20 years, but the not-advanced techniques (e.g. deletion methods, mean imputation) are still in widespread use. In the case of the methodology, several limitations of the text-mining approach were pointed out such as the questionable generalizability and reliability of the results.

The thesis can be accessed on this link.