Getting Started in DS

Authors
Affiliations

Eleonora Gandolfi

Nora McGregor

Published

June 4, 2024

Modified

February 10, 2025

Abstract

This guide aims to define Digital Scholarship & Data Science in a research library context and provide an overview of the key competency frameworks, reports, networks and communities of practice for library professionals to explore and learn more.

Keywords

TBD, TBD

Introduction

In this Topic Guide we’ll look at broad definitions of digital scholarship & data science in a research library context, and provide recommended resources for library professionals to learn more about how new technologies and approaches are changing our traditional work of curation, creation, collecting, sharing at cultural heritage institutions and the benefits of gaining new skills in this area!

What is “digital scholarship”?

Though there are many definitions of what constitutes “digital scholarship” out there, we tend to favour the broadest of interpretations, that is, roughly, any type of innovative research that combines the methodologies from traditional humanities & social science disciplines with computational tools and digital methods provided by computing disciplines, such as data science.

Though closely aligned to, and generously informed by, the academic discipline Digital Humanities, “digital scholarship” allows us to consider more broadly the full range of innovative scholarly activities our users seek to undertake with our digital collections and data, across a diverse range of disciplines.

“Digital scholarship” allows us to define a space for us heritage professionals, where research is undertaken, in our own right, during the course of our daily work, utilizing computational methods in the curation, creation, collecting and sharing of our digital collections and data, but is not confined to formal academic pursuits or a particular discipline.

Is this all just “data science”?

Not quite. Data science, also an academic field, can be defined as a set of computational methods for the identification of novel and actionable insights from data.

Data science is an interdisciplinary academic field that uses statistics, scientific computing, scientific methods, processes, algorithms and systems to extract or extrapolate knowledge and insights from noisy, structured, and unstructured data.

So when we talk about data science in libraries, we’re talking about the specific skills and use of particular computational methods in undertaking some types of digital scholarship activities.

Relevance to the Library Sector (Case Studies/Use Cases)

So what might “digital scholarship and data science in libraries” actually look like?

  • a subject librarian using a digital tool to clean up a set of catalogue records in order to understand gaps in the metadata and collection scope
  • a collaborative project to automatically transcribe handwritten texts from old manuscripts
  • a library director reading up on the latest in AI and seeking expert perspectives in order to write a strategy document
  • a metadata specialist packaging up digital collections into datasets that can be used by researchers
  • a digitisation project looking to improve the searchability of printed books in under-resourced languages
  • a reference librarian pointing a researcher to datasets that might help them with their research enquiry
  • an imagining technician creating a 3D model of a collection item
  • a licensing manager keeping up to date on the latest uses of Text and Data Mining (TDM) and AI in research so that digital collections meets the needs of library users and staff who want to work with them at scale
  • a major interdisciplinary research project using the latest technologies to ask research questions of digital heritage collections https://livingwithmachines.ac.uk/
  • a curator creating an online exhibit with annotations https://www.exhibit.so/
  • a research software engineer contributing to international knowledge exchange networks (e.g. IIIF, AI4LAM)
  • an assistant librarian attending a summer school https://www.cdh.cam.ac.uk/dataschools/

Why is it important for library staff to learn these skills?

In the recommended reading section of this Topic Guide we have linked to a number of key competency and skills reports and frameworks that define the need for such skills in library work. But in the most high-level terms we know that:

  • Libraries need to continually keep apace this digital turn in order to understand the change in service requirements and support colleagues and each other keen to make the most of it.
  • Digital scholarship work is collaborative, requires input across disciplines and domain expertise, our curatorial experts have an essential role to play in that.
  • We’ve so much to gain from understanding digital methods and having closer collaborations with digital scholars—there’s a synergy in solving shared issues (e.g. correcting OCR, enriching collections metadata, conquering back-cataloguing).
  • Digital scholars are, today, using technology in innovative ways, expectations have already changed, they’re seeking access at scale to our collections for computational analysis, they’re using Generative AI to ask their research questions, we need to understand these technologies to understand how the nature of archival enquiry is changing
  • Cultural heritage digital collections are only going to grow and we need the digital skills to work confidently with them at scale

Hands-on activity and other self-guided tutorial(s)

Each Topic Guide on this site includes a range of specific hands-on activities and other self-guided tutorials colleagues across Europe personally recommend. When you’re ready to go further and have a better idea of the specific skills you need for a particular task, we can recommend having a good search through these excellent platforms which host or link to a great many in-depth training materials:

AI4Culture https://ai4culture.eu/

AI4Culture is a capacity building platform for the application of AI in the Cultural Heritage Sector. AI4Culture has been co-funded by the European Union under the Digital Europe Programme. Their aim is to enable professionals, researchers, and enthusiasts within the sector with the resources they need to integrate AI into their daily workflow, find creative ways to use them and solve their current problems. The platform hosts a pool of readily deployed AI software tools, along with training and testing datasets that have been curated for use within the sector.

Clarin Learning Resources [https://www.clarin.eu/content/learning-and-training-resources]

The CLARIN Learning Hub gives access to open educational resources on various topics of relevance to digital scholarship and data science, including full online training modules to learn these new skills.

DARIAH-Campus https://campus.dariah.eu/

DARIAH is a pan-European infrastructure for arts and humanities scholars working with computational methods. It supports digital research as well as the teaching of digital research methods. Though not specific to the library professional context, tutorials here are useful for applying techniques to digital collections.

The Glam Workbench https://glam-workbench.net/

The GLAM Workbench is the brainchild of Tim Sherratt, a historian, and is a collection of Jupyter notebooks to help you explore and use data from GLAM institutions (galleries, libraries, archives, and museums). It includes tools, tutorials, examples, hacks, and even some pre-harvested datasets. It’s aimed at researchers in the humanities but has useful tutorials for anyone interested in working with GLAM data.

Ineo https://www.ineo.tools/

Ineo is a project developed and maintained by CLARIAH that lets you search, browse, find and select digital resources for research in humanities and social sciences. It offers access to thousands of tools, datasets, workflows, standards and learning material. It is a work in progress so do keep that in mind when browsing.

Library Carpentry https://librarycarpentry.org/

Library Carpentry is an international volunteer community, under the Carpentries, focussed building software and data skills within library and information-related communities. The lessons here are meant to be taught as workshops led by a Carpentries certified instructor (for a fee) but you may find it useful to have a read through the content which is open and available to all.

The Programming Historian https://programminghistorian.org/en/

The Programming Historian has been publishing peer-reviewed tutorials on digital tools and techniques for humanists since 2008 and though they’re generally aimed at academic researchers, staff at British Library have found them highly useful over the years in their own work!

Social Sciences & Humanities Open Marketplace https://marketplace.sshopencloud.eu/search?order=score&categories=training-material

Built as part of the Social Sciences and Humanities Open Cloud project (SSHOC), the Social Sciences and Humanities Open Marketplace is a discovery portal which pools and contextualises resources for Social Sciences and Humanities research communities: tools, services, training materials, datasets, publications and workflows. The Marketplace highlights and showcases solutions and research practices for every step of the SSH research data life cycle.

Finding Communities of Practice

As you embark on learning more about digital scholarship and data science in a library context, you might want to explore and join existing communities of practice.

LIBER Working Groups

Working groups are open to staff at participating LIBER Member institutions:

International Networks

National Networks (European)

Ireland/UK - RLUK Digital Scholarship Network

We’d love to hear it! Suggest edits by opening a new Issue or adding to the discussion on existing Issues on the project Github. If you’re new to GitHub don’t worry, we have a Topic Guide for that: GitHub: How to navigate and contribute to Git-based projects! Or just drop us a line!