Data Mining

Data Mining aims to uncover structure or patterns in datasets.

Personally, I am most interested in mining academic literature to gain insights into the state of a particular field.

Mining the Bundestag, 22 Jan. 2023 (posts)
Did you know the German parliament publishes protocols for all of its proceedings in PDF format? It is relatively straightforward to download and parse them, so we can easily collect a dataset of transcripts of what seems to be every speech in the Bundestag since the Second World War. My original idea was to mine the speeches for word associations. Some words will be associated with other words based on the intended connotation, and this association might change over time as the connotations …
Categories: Data Mining
1024 Words, Tagged with: Bundestag · Data Mining · Generative Models
Thumbnail for Mining the Bundestag
Mining tagesschau.de, 26 Nov. 2022 (posts)
I like to read tagesschau.de, so I wrote a script to scrape it in regular intervals. My original goal was to determine which articles stay on the front page the longest, which ones allow commenting (a feature that seems to have been disabled almost entirely since March 2020), and if articles are modified after the initial release (without mentioning this), because I sometimes feel that headlines change. Dataset Creation Tagesschau provides a JSON API, so fetching all of the articles is …
Categories: Data Mining
1040 Words, Tagged with: Tagesschau · Generative Models · Data Mining
Thumbnail for Mining tagesschau.de
Social Work Research Map, 11 Nov. 2022 (papers)
During the last weeks, I worked with some colleagues on a website that aims to improve access to social work literature. We described the results in out paper Social Work Research Map – ein niederschwelliger Zugang zu internationalen Publikationen der Sozialen Arbeit, which has been published in the journal Soziale Passagen. While the paper is written in german, there is also a technical report in english. Abstract Internationalization is a central topic in higher education policy in Germany. An …
Categories: Data Mining
280 Words, Tagged with: Soziale Passagen · SWORM · Data Mining
Thumbnail for Social Work Research Map
Data-Mining als Werkzeug empirischer Sozialforschung, 13 Jul. 2020 (papers)
Inspired by the “Spiegel-Mining” talk from David Kriesel, a friend of mine and a Prof. from the Hochschule Magdeburg scraped a german website that regularly publishes reviews of social work literature, and mined the resulting 18.000 articles, hoping for interesting insights. In an attempt to visualize the discourse, we created several topic maps, like the one below, which you can find on the accompanying (german) website. The colours represent the gender of the authors of the review. …
Categories: Data Mining
150 Words, Tagged with: Sozial Extra · Data Mining
Thumbnail for Data-Mining als Werkzeug empirischer Sozialforschung