AI & ML in Libraries Literacies

Contributed by: Nora McGregor, ORCID iD
Original published date: 04/06/2024
Last modified: See Github page history

Suggested Citation: Nora McGregor, “AI & ML in Libraries Literacies,” Digital Scholarship & Data Science Essentials for Library Professionals (2024), [DOI link tbd]

Introduction: AI & ML terms demystified

AI is mentioned absolutely everywhere these days, first it was just in movies, the news, but now it’s cropping up in our library meetings and strategies and funding calls, but what does it really mean, particularly in a library context? Let’s try to get to the bottom of this!

To do that I always like to start off with a bit of basic jargon busting.

Artificial Intelligence (AI) is actually a really broad field of computer science (and an umbrella term) that refers to the research and development of systems and machines capable of doing tasks that typically require human intelligence to perform, such as

Reasoning
Problem-solving
Learning
Perception

Sometimes folks may speak of or refer to AI as systems and machines that actually have true intelligence, and though today’s AI systems are shockingly convincing in how well they perform, what we’re seeing today are just very advanced machine learning algorithms and models performing specific and discrete functions extremely well! We’re a long way off (if ever) from machines having sentience (or, Artificial General Intelligence (AGI)/Strong AI) so don’t worry!

You might also sometimes hear people talk about Traditional AI vs Generative AI. Traditional AI refers to using machine learning based systems for doing tasks like classifying data (e.g., assigning labels to images, automatically transcribing handwritten texts, or identifying genre of digitised texts). This is the type of AI we make a whole lot of use of in the library world. Generative AI on the other hand refers broadly to systems whose primary function is to generate new content (e.g., conversation, books, art). This is where conversation generating AI systems like ChatGPT (Generative Pre-trained Transformer) fall under for example and we’re only just now exploring the potential applications for these new powerful Generative AI systems in library work.

Whenever AI is being discussed you may often hear the term Machine Learning (ML) mentioned, and sometimes they’re used interchangeably which can be confusing!

Machine Learning (ML) is more specifically a main subfield of AI and core technology underpinning the other subfields of AI, that focuses on the development of algorithms and models that allow computers to learn patterns and relationships from data and make predictions on new data. Instead of being explicitly programmed for specific tasks, ML algorithms use data to learn and improve their performance over time.

The primary task of Machine Learning is prediction.

An algorithm is a plan, a set of step-by-step instructions, in order of operation, for solving a problem or performing a task. Making a sandwich is a classic example. “Get two pieces of bread…” If we want to tell a computer to do something, we have to write a computer program that will tell it, step-by-step, exactly what we want it to do and how we want it to do it.

A machine learning algorithm is a special kind of algorithm designed to help computers learn from data. Instead of giving the computer explicit instructions for every task, we give it data and let it find patterns, relationships, and trends in that data in order to make decisions or predictions on new data on its own.

“Training a model” is the process of teaching a machine learning algorithm to make predictions or decisions based on data. It’s important to remember that data is the lifeblood of ML and the model is only as good as the type and quality of data you give it.

A machine learning model represents what was learned by a machine learning algorithm and this is what can be used to make predictions on new data without needing explicit instructions. The model contains the rules, numbers, and any other algorithm-specific data structures required to make predictions on new data. If it doesn’t work very well, you can go back and give the algorithm more data or tweak its parameters to create a new better model.

Here’s a super simplistic example of this type of process in action:

Let’s say I have thousands of images of handwritten manuscript pages digitised from our library collection and ready to go online. The problem is, I want to make them all text searchable as well, but to transcribe these all by hand would take me well into my retirement and I have other things to be getting on with. I would like to train a machine learning model to help me automatically recognise the handwritten text in this collection. I would first need to somehow show my machine learning algorithm examples of a correct result so that it could start to recognise the pattern of that (in this case, text on a page). To do this I can show it images with associated text annotations which have been transcribed perfectly by me, the more, the better! Once the algorithm has learned a bit about what a correct page of handwritten text looks like, a model is then created which contains all the rules and parameters and calculations ready for the particular task I have set it of predicting what handwritten text is on pages of this particular collection that it’s never seen before!

We actually do quite alot of the above at the British Library, you can read more about the actual process and results here, and I’ll cover a bit more of this in the next section!

Snapshot of an interface from Transkribus software used to create data to train a model to recognise handwritten arabic manuscript pages from a British Library collection.

Ok, quick quiz time!

Remembering that Machine Learning is about prediction, which of the following do you think would require ML?

Counting the number of people in a museum using information from entry and exit barriers.
A search system that looks for images similar to a user submitted sketch.
A system that recommends library books based on what other users have ordered.
A queueing system that spreads people evenly between 5 ticket booths
A program which extracts names from documents by finding all capitalised words and checking them against a list of known names
A system which turns digitised handwritten documents into searchable text
A robot that cleans the vases in a museum without bumping into them or breaking them

If you answered 2, 3, 6 & 7 you are correct! The others could all be most easily executed with a straightforward algorithm, programmed using a simple set of easily defined rules, rather than requiring prediction.

Relevance to the Library Sector (Case Studies/Use Cases)

As you now know, machine learning algorithms and models underpin all the other subfields of AI and there are a LOT of subfields of AI depending on who you talk to! In this guide though we’ll focus our attention on just these two particular AI research areas to give you a general sense of how machine learning is practically applied in the library context today:

Natural Language Processing (NLP): which is concerned with making AI systems more capable of natural and effective interaction with humans (text and speech)
Computer Vision (CV): which is concerned with enabling machines to interpret and make decisions based on visual data from the world.

For a really great overview of a wider range of AI use cases in Libraries I recommend having a read of Section 3: Library Applications in Cox A & Mazumdar’s Defining artificial intelligence for librarians (2022) from 2022 which covers back-end operations and library services for users.

Let’s have a look at some NLP and Computervision use cases in a library setting:

Natural Language Processing (NLP)

NLP involves the development of a wide range of algorithms, models, and systems for analysing, understanding and extracting meaningful information from textual and speech data representing human language. We can use NLP for things like:

Subject indexing to enhance library catalogue search

Named Entity Recognition (NER) is a text analysis process within NLP that helps turn unstructured text into structured text. A sentence or a chunk of text is parsed through to find entities that can be put under categories like names, organisations, locations, quantities, monetary values, percentages, etc.

In the library world it can be used as part of a process to understand what subjects (people, places, concepts) are contained within a digitised text and help us enhance our catalogue records for items or search functionality. There is a very nicely outlined use case here of how the United States Holocaust Memorial Museum used NER to automatically extract person names and location from Oral History Transcript to improve indexing and search in their catalogue.

Automatic Language Detection & Genre Classification

The British Library has used different machine learning techniques and experiments derived from the field of NLP to assign language codes and genre classification to catalogue records, in order to enhance resources described. In the first phase of the Automated Language Identification of Bibliographic Resources project, language codes were assigned to 1.15 million records with 99.7% confidence. The automated language identification tools developed will be used to contribute to future enhancement of over 4 million legacy records. The genre classification case study includes a nice description of machine learning as well as references to other use cases for metadata clean up.

Text & Conversation Generation

Language models are a type of machine learning model designed to predict the likelihood of a sequence of text, which means that they can be set up to predict the most likely way to continue a conversation. A large language model (LLM), such as the models behind ChatGPT, are highly complex neural networks that have been exposed to an enormous amount of text from books, articles, websites, and more. They perform natural language processing tasks such as generating and classifying text, answering questions, and translating text and are the backbone of NLP today. On the other hand there are small language models (SML) too which, well, you guessed it, are much smaller, as in, they don’t need quite as much data and time to be trained. Whether or not to use either depends on your use case and motivations!

There are and have been for many years, large language models out there actually but ChatGPT has currently caught the popular imagination because of its publicly available interface and remarkable performance, so it’s worth spending a little time here unpacking just what ChatGPT is and its potential impact on library work.

The large language models behind ChatGPT have learned something about patterns in grammar and word meaning, including the way that meaning arises contextually across multiple sentences and multiple turns in a conversation. When you ask ChatGPT a question, you are presenting the model with new information it tries to make a prediction on, in this case, it tries to generate a response that matches the pattern of conversation.

You ask questions or give prompts to the model, and it provides responses in natural language, or rather, estimates what should come next in a conversation. When ChatGPT gives a response, it isn’t actually looking up information and then composing that information into a response; it’s just making an estimation of a response based on patterns it has seen. So, when you ask it factual questions, especially ones with common answers or phrases, it might give you an answer that sounds right but remember this is because it’s mimicking what it has seen in its training data.

Librarians are still investigating use cases for this new Generative AI applications, but for now at least, ChatGPT is certainly useful as a personal writing assistant or tool to help give you ideas for

creating a title for a new exhibition
creating exhibition labels
outlining a basic structure for an information literacy workshop
creating a blog post on a topic for which you are very familiar
helping you reword something for different audiences
writing a funding proposal!

It’s also good to get in the habit of trying out and being aware of how these particular models work as more and more library users will be using this technology too, and may not know quite have a clear understanding of what’s behind the responses generated by them. We’ve seen librarians having to answer queries about citations that have been made up by ChatGPT, article references which sound very much like they exist, but have just been hallucinated by the model!

Computer Vision (CV) use cases

We can use computer vision to train models to automatically analyse and understand useful information from images and videos.

In the library world we can use this to label millions of images with descriptive metadata (“this is a picture of a cat”), or, as we see below, a model can be trained to classify this image as a newspaper based on objects identified in the layout (for example, a nameplate for the newspaper, a headline, photographs, and illustrations and so on). The model learns how to identify that this is the NYT based on learning from other newspaper images it’s seen (for example, if given NY Tribune, NY Times, and NY Post images, it can distinguish between the various titles).

Putting it altogether: ML + CV + NLP

One of the state of the art applications of machine learning seen in cultural heritage at the moment is Handwritten Text Recognition (HTR) which we had a look at earlier briefly in our simple example of training a model. The idea with HTR is to convert digitised handwritten documents into searchable and machine readable text. To achieve this HTR actually uses a combination of Computer Vision (CV) and Natural Language Processing (NLP).

Since handwriting can be tricky and ambiguous you might have a Computer Vision model try to identify possible letters from the shapes, and another to work out what the most likely word is from those shapes. But let’s imagine that there’s a smudge on the page, and the letters and maybe even whole words are completely illegible. In that case you might turn to your NLP language models which look at sentence level predictions, taking into account words in the whole line of text the model uses that context to work out what words are most likely missing in those smudged spots!

Sometimes a model trained for a particular task (in the case of this HTR example, identifying a particular handwriting style) can be applied to similar content (other handwriting styles) with very good results. Transkribus has Public AI Models (transkribus.org) that have been created by users of the system and are then shared and can be reused by anyone.

Hands-on activity and other self-guided tutorial(s)

The following quick little hands-on activities below were developed by the Digital Research Team for British Library staff as part of the Digital Scholarship Training Programme.

Activity: Explore Natural Language Processing

Copy and paste a paragraph of text from somewhere around the web, or from your own collections, and see how each of these cloud services handle it:

Cloud Natural Language
IBM Watson Natural Language Understanding Text Analysis
displaCy
Voyant Tools (voyant-tools.org) Voyant Tools is an open-source, web-based application for performing text analysis. It supports scholarly reading and interpretation of texts or corpus, particularly by scholars in the digital humanities, but also by students and the general public. It can be used to analyse online texts or ones uploaded by users.
Annif - tool for automated subject indexing There are many video tutorials here and the ability to demo the tool

Activity: Explore Conversation Generation (ChatGPT)

To get a useful response from ChatGPT, “prompting” is key. If you only ask a simple question, you may not be happy with the results and decide to dismiss the technology too quickly, but today’s purpose is to have a deeper play in order to develop our critical thinking and information evaluation skills, allowing us to make informed decisions about utilising tools like ChatGPT in our endeavours. Basics of Prompting | Prompt Engineering Guide (promptingguide.ai) gives a nice quick walk-through of how to start writing good prompts or you can take a free course !

Have a play trying to get ChatGPT to generate responses to some of the questions here (or come up with your own questions!) Critically evaluate the responses you receive from ChatGPT, what are its strengths and weaknesses, ethical considerations and challenges of using AI tools such as this.

Is the information/response credible?
Are there any biases in the responses?
Does the information align with what you know from other sources?

Activity: Explore Computer Vision & Handwritten Text Recognition

Find an image from somewhere on the web, or from your own collection, and see how each of these cloud services handles it! Try with some images of basic objects to see results (cars, fruit, bicycles…) and images with text within them.

Google Cloud Vision API
Visual Geometry Group - University of Oxford
Transkribus(Try for free, but does require a free account to login)

Activities: Exploring Hands-on AI (workshop materials)

The following workshop Exploring Hands-on AI was delivered to LIBER colleagues at the LIBER 2024 Annual Conference, but you can walk through many of the exercises at your own pace independently! Through a series of hands-on exercises, presented via Google Collab sheets, the workshop aimed to clarify how data science, and more particularly, generative AI systems based on Large Language Models (LLMs) can be applied within a library context and covers:

Google Colab Basics
Introduction to Machine Learning
Object detection with YOLO
Large Language Models
Retrieval Augmented Generation

Taking the next step

The AI4Lam group is an excellent, engaged and welcoming international organisation dedicated to all things AI in Libraries, Archives and Museums. It’s free for anyone to join and is a great first step for anyone interested in learning more about this topic!