A complete understanding of the current health, demographic and social change today requires incorporating the slow changes that occur across generations and the intergenerational transmission of biological and social traits. As a consequence, there is a growing demand for data containing individual-level information on life course and family relationships at a large scale going back several generations. Given the expense of building these databases, researchers either turn to exploiting the existing historical population registries, repurpose existing collections or, in a few cases, build brand new databases harnessing the experience of the last 30 years in the field.
In this talk the lecturer will present the project Link-Lives, which develops methods for a brand new nation-wide historical population registry in Denmark covering the period 1787-1968, integrating machine-readable historical datasets. As the current civil registration system started in 1968, connecting both will make the reconstruction of up to 10 generations possible. She will present the unique challenges of combining datasets already created and in the making from different origins: ongoing crowdsourcing projects, genealogical collections from private companies, ongoing experiments with automatic recognition of handwriting and existing research databases, amounting rapidly to a hundred million records. She will discuss the methods developed to manage and link these historical records through replicable, transparent, historically accurate and robust methods, building on a combination of domain-expertise and machine learning methods. Finally, Revuelta-Eugercios will discuss the current research this new database makes possible within the constraints of the project, as well as the new avenues of research, from history to social and health sciences, that will open up in the near future.