Summarization

Summarization#

In this part of the course, we will attempt to use a language model to make summaries of some papers. Making summaries of documents is also known as summarizing or summarization. There exists specialized software for making summaries. However, general large language models are also becoming quite good at this task.

Again, we will use LangChain, an open-source library for making applications with LLMs.

Exercise: Create new notebook

Create a new Jupyter Notebook called summarizing by clicking the File-menu in JupyterLab, and then New and Notebook. If you are asked to select a kernel, choose “Python 3”. Give the new notebook a name by clicking the File-menu in JupyterLab and then Rename Notebook. Use the name summarizing.

Exercise: Stop old kernels

JupyterLab uses a Python kernel to execute the code in each notebook. To free up GPU memory used in the previous chapter, you should stop the kernel for that notebook. In the menu on the left side of JupyterLab, click the dark circle with a white square in it. Then click KERNELS and Shut Down All.

Document location#

We have collected some papers licensed with a Creative Commons license. We will try to load all the documents in the folder defined below. If you prefer, you can change this to a different folder name.

document_folder = '/fp/projects01/ec443/documents/terrorism'

The Language Model#

We’ll use models from HuggingFace, a website that has tools and models for machine learning. We’ll use the open-weights LLM meta-llama/Llama-3.2-3B-Instruct. This model has a large context window, which means that we can use it to process quite large documents. Yet it is small enough that we can use it with the smallest GPUs on Fox. However, for better results you might want to use one of the somewhat larger models with around 7B or 8B parameters, for example mistralai/Ministral-8B-Instruct-2410.

Tokens versus Words

Short words can be a single token, but longer words usually consist of multiple tokens. Therefore, the maximum document size with this model is less than 128k words. Exactly how words are converted to tokens depends on the tokenizer. LLMs usually come with tokenizers. We will use the default tokenizer that ship with the LLM we use.

import os
os.environ['HF_HOME'] = '/fp/projects01/ec443/huggingface/cache/'

To use the model, we create a pipeline. A pipeline can consist of several processing steps, but in this case, we only need one step. We can use the method HuggingFacePipeline.from_model_id(), which automatically downloads the specified model from HuggingFace.

As before, we check if we have a GPU available.

import torch
device = 0 if torch.cuda.is_available() else -1

from langchain_community.llms import HuggingFacePipeline

llm = HuggingFacePipeline.from_model_id(
    model_id='meta-llama/Llama-3.2-3B-Instruct',
    task='text-generation',
    device=device,
    pipeline_kwargs={
        'max_new_tokens': 1000,
        #'do_sample': True,
        #'temperature': 0.3,
        #'num_beams': 4,
    }
)

We can give some arguments to the pipeline:

model_id: the name of the model on HuggingFace
task: the task you want to use the model for
device: the GPU hardware device to use. If we don’t specify a device, no GPU will be used.
pipeline_kwargs: additional parameters that are passed to the model.
- max_new_tokens: maximum length of the generated text
- do_sample: if False, the most likely next word is chosen. This makes the output deterministic. We can introduce some randomness by sampling among the most likely words instead. The default value seems to be True.
- temperature: the temperature controls the statistical distribution of the next word and is usually between 0 and 1. A low temperature increases the probability of common words. A high temperature increases the probability of outputting a rare word. Model makers often recommend a temperature setting, which we can use as a starting point.
- num_beams: by default the model works with a single sequence of tokens/words. With beam search, the program builds multiple sequences at the same time, and then selects the best one in the end.

Making a Prompt#

We can use a prompt to tell the language model how to answer. The prompt should contain a few short, helpful instructions. In addition, we provide placeholders for the input, called context. LangChain replaces the placeholder with the input document when we execute a query.

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate

separator = '\nYour Summary:\n'
prompt_template = '''Write a summary of the following:

{context}
''' + separator
prompt = PromptTemplate(template=prompt_template,
                        input_variables=['context'])

Separating the Summary from the Input#

LangChain returns both the input prompt and the generated response in one long text. To get only the summary, we must split the summary from the document that we sent as input. We can use the LangChain output parser RegexParser for this.

from langchain.output_parsers import RegexParser
import re

output_parser = RegexParser(
    regex=rf'{separator}(.*)',
    output_keys=['summary'],
    flags=re.DOTALL)

Create chain#

The document loader loads each PDF page as a separate ‘document’. This is partly for technical reasons because that is the way PDFs are structured. Therefore, we use the chain called create_stuff_documents_chain which joins multiple documents into a single large document.

chain = create_stuff_documents_chain(
        llm, prompt, output_parser=output_parser)

Loading the Documents#

We use LangChain’s DirectoryLoader to load all in files in document_folder. document_folder is defined at the start of this Notebook.

from langchain_community.document_loaders import DirectoryLoader

loader = DirectoryLoader(document_folder)
documents = loader.load()
print('number of documents:', len(documents))

Creating the Summaries#

Now, we can iterate over these documents with a for-loop.

summaries = {}

for document in documents:
    filename = document.metadata['source']
    print('Summarizing document:', filename)
    result = chain.invoke({"context": [document]})
    summary = result['summary']
    summaries[filename] = summary
    print('Summary of file', filename)
    print(summary)

Saving the Summaries to Text Files#

Finally, we save the summaries for later use. We save all the summaries in the file summaries.txt. If you like, you can store each summary in a separate file.

with open('summaries.txt', 'w') as outfile:
    for filename in summaries:
        print('Summary of ', filename, file = outfile)
        print(summaries[filename], file=outfile)
        print(file=outfile)

Bonus Material#

Make an Overall Summary

We can also try to generate an overall summary of all the documents. This doesn’t make much sense with documents on different topics. If all the documents are related or on the same topic, it could make sense to make an overall summary of all the summaries.

First, we need to import some more functions:

from langchain.schema.document import Document
from langchain.prompts import ChatPromptTemplate

We make a new prompt, with more specific instructions than for the regular summaries.

total_prompt = ChatPromptTemplate.from_messages(
    [("system", "Below is a list of summaries of some papers. Make a total summary all the information in all the papers:\n\n{context}\n\nTotal Summary:")]
)

Then, we can make a new chain based on the LLM and the prompt:

total_chain = create_stuff_documents_chain(llm, total_prompt)

This chain needs a list of Document objects as input.

list_of_summaries = [Document(summary) for summary in summaries.values()]

Now, we can invoke the chain with this list as input, and print the result:

total_summary = total_chain.invoke({"context": list_of_summaries})

print('Summary of all the summaries:')
print(total_summary)

Finally, we save the overall summary to a text file:

with open('total_summary.txt', 'w') as outfile:
    print(total_summary, file=outfile)

Exercises#

Exercise: Summarize your own document

Make a summary of a document that you upload to your own documents folder. Read the summary carefully, and evaluate it with these questions in mind:

Is the summary useful?
Is there anything missing from the summary?
Is the length of the summary suitable?

Exercise: Adjust the summary

Try to make some adjustments to the prompt to modify the summary you got in exercise 1. For example, you can ask for a longer or more concise summary. Or you can tell the model to emphasize certain aspects of the text.

Exercise: Make a summary in a different language

We can use the model to get a summary in a different language from the original document. For example, if the prompt is in Norwegian the response will usually also be Norwegian. You can also specify on the prompt which language you want the summary to be in. Use the model to make a summary of your document from exercise 1 in a different language.

Bonus Exercise: Slurm Jobs

When you have made a program that works, it’s more efficient to run the program as a batch job than in JupyterLab. This is because a JupyterLab session reserves a GPU all the time, also when you’re not running computations. Therefore, you should save your finished program as a regular Python program that you can schedule as a job.

You can save your code by clicking the “File”-menu in JupyterLab, click on “Save and Export Notebook As…” and then click “Executable Script”. The result is the Python file summarizing.py that is downloaded to your local computer. You will also need to download the slurm script LLM.slurm.

Upload both the Python file summarizing.py and the slurm script LLM.slurm to Fox. Then, start the job with this command:

! sbatch LLM.slurm summarizing.py

Slurm creates a log file for each job which is stored with a name like slurm-1358473.out. By default, these log files are stored in the current working directory where you run the sbatch command. If you want to store the log files somewhere else, you can add a line like below to your slurm script. Remember to change the username.

#SBATCH --output=/fp/projects01/ec443/<username>/logs/slurm-%j.out