Summarization#
In this part of the course, we will attempt to use a language model to make summaries of some papers. Making summaries of documents is also known as summarizing or summarization. There exists specialized software for making summaries. However, general large language models are also becoming quite good at this task.
Again, we will use LangChain, an open-source library for making applications with LLMs.
Exercise: Create new notebook
Create a new Jupyter Notebook called summarizing
by clicking the File-menu in JupyterLab, and then New and Notebook.
If you are asked to select a kernel, choose “Python 3”.
Give the new notebook a name by clicking the File-menu in JupyterLab and then Rename Notebook.
Use the name summarizing
.
Exercise: Stop old kernels
JupyterLab uses a Python kernel to execute the code in each notebook. To free up GPU memory used in the previous chapter, you should stop the kernel for that notebook. In the menu on the left side of JupyterLab, click the dark circle with a white square in it. Then click KERNELS and Shut Down All.
Document location#
We have collected some papers licensed with a Creative Commons license. We will try to load all the documents in the folder defined below. If you prefer, you can change this to a different folder name.
document_folder = '/fp/projects01/ec443/documents/terrorism'
The Language Model#
We’ll use models from HuggingFace, a website that has tools and models for machine learning. We’ll use the open-weights LLM meta-llama/Llama-3.2-3B-Instruct. This model has a large context window, which means that we can use it to process quite large documents. Yet it is small enough that we can use it with the smallest GPUs on Fox. However, for better results you might want to use one of the somewhat larger models with around 7B or 8B parameters, for example mistralai/Ministral-8B-Instruct-2410.
Tokens versus Words
Short words can be a single token, but longer words usually consist of multiple tokens. Therefore, the maximum document size with this model is less than 128k words. Exactly how words are converted to tokens depends on the tokenizer. LLMs usually come with tokenizers. We will use the default tokenizer that ship with the LLM we use.
import os
os.environ['HF_HOME'] = '/fp/projects01/ec443/huggingface/cache/'
To use the model, we create a pipeline.
A pipeline can consist of several processing steps, but in this case, we only need one step.
We can use the method HuggingFacePipeline.from_model_id()
, which automatically downloads the specified model from HuggingFace.
from langchain_community.llms import HuggingFacePipeline
llm = HuggingFacePipeline.from_model_id(
model_id='meta-llama/Llama-3.2-3B-Instruct',
task='text-generation',
device=0,
pipeline_kwargs={
'max_new_tokens': 1000,
#'do_sample': True,
#'temperature': 0.3,
#'num_beams': 4,
}
)
We can give some arguments to the pipeline:
model_id
: the name of the model on HuggingFacetask
: the task you want to use the model fordevice
: the GPU hardware device to use. If we don’t specify a device, no GPU will be used.pipeline_kwargs
: additional parameters that are passed to the model.max_new_tokens
: maximum length of the generated textdo_sample
: by default, the most likely next word is chosen. This makes the output deterministic. We can introduce some randomness by sampling among the most likely words instead.temperature
: the temperature controls the statistical distribution of the next word and is usually between 0 and 1. A low temperature increases the probability of common words. A high temperature increases the probability of outputting a rare word. Model makers often recommend a temperature setting, which we can use as a starting point.num_beams
: by default the model works with a single sequence of tokens/words. With beam search, the program builds multiple sequences at the same time, and then selects the best one in the end.
Making a Prompt#
We can use a prompt to tell the language model how to answer. The prompt should contain a few short, helpful instructions. In addition, we provide placeholders for the input, called context. LangChain replaces the placeholder with the input document when we execute a query.
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.llm import LLMChain
from langchain.prompts import PromptTemplate
separator = '\nYour Summary:\n'
prompt_template = '''Write a summary of the following:
{context}
''' + separator
prompt = PromptTemplate(template=prompt_template,
input_variables=['context'])
Separating the Summary from the Input#
LangChain returns both the input prompt and the generated response in one long text. To get only the summary, we must split the summary from the document that we sent as input. We can use the LangChain output parser RegexParser for this.
from langchain.output_parsers import RegexParser
import re
output_parser = RegexParser(
regex=rf'{separator}(.*)',
output_keys=['summary'],
flags=re.DOTALL)
Create chain#
The document loader loads each PDF page as a separate ‘document’.
This is partly for technical reasons because that is the way PDFs are structured.
Therefore, we use the chain called create_stuff_documents_chain
which joins multiple documents into a single large document.
chain = create_stuff_documents_chain(
llm, prompt, output_parser=output_parser)
Loading the Documents#
We use LangChain’s DirectoryLoader
to load all in files in document_folder
.
document_folder
is defined at the start of this Notebook.
from langchain_community.document_loaders import DirectoryLoader
loader = DirectoryLoader(document_folder)
documents = loader.load()
print('number of documents:', len(documents))
Creating the Summaries#
Now, we can iterate over these documents with a for
-loop.
summaries = {}
for document in documents:
filename = document.metadata['source']
print('Summarizing document:', filename)
result = chain.invoke({"context": [document]})
summary = result['summary']
summaries[filename] = summary
print('Summary of file', filename)
print(summary)
Saving the Summaries to Text Files#
Finally, we save the summaries for later use.
We save all the summaries in the file summaries.txt
.
If you like, you can store each summary in a separate file.
with open('summaries.txt', 'w') as outfile:
for filename in summaries:
print('Summary of ', filename, file = outfile)
print(summaries[filename], file=outfile)
print(file=outfile)
Bonus Material#
Make an Overall Summary
We can also try to generate an overall summary of all the documents. This doesn’t make much sense with documents on different topics. If all the documents are related or on the same topic, it could make sense to make an overall summary of all the summaries.
First, we need to import some more functions:
from langchain.schema.document import Document
from langchain.prompts import ChatPromptTemplate
We make a new prompt, with more specific instructions than for the regular summaries.
total_prompt = ChatPromptTemplate.from_messages(
[("system", "Below is a list of summaries of some papers. Make a total summary all the information in all the papers:\n\n{context}\n\nTotal Summary:")]
)
Then, we can make a new chain based on the LLM and the prompt:
total_chain = create_stuff_documents_chain(llm, total_prompt)
This chain needs a list of Document
objects as input.
list_of_summaries = [Document(summary) for summary in summaries.values()]
Now, we can invoke the chain with this list as input, and print the result:
total_summary = total_chain.invoke({"context": list_of_summaries})
print('Summary of all the summaries:')
print(total_summary)
Finally, we save the overall summary to a text file:
with open('total_summary.txt', 'w') as outfile:
print(total_summary, file=outfile)
Exercises#
Exercise: Summarize your own document
Make a summary of a document that you upload to your own documents folder. Read the summary carefully, and evaluate it with these questions in mind:
Is the summary useful?
Is there anything missing from the summary?
Is the length of the summary suitable?
Exercise: Adjust the summary
Try to make some adjustments to the prompt to modify the summary you got in exercise 1. For example, you can ask for a longer or more concise summary. Or you can tell the model to emphasize certain aspects of the text.
Exercise: Make a summary in a different language
We can use the model to get a summary in a different language from the original document. For example, if the prompt is in Norwegian the response will usually also be Norwegian. You can also specify on the prompt which language you want the summary to be in. Use the model to make a summary of your document from exercise 1 in a different language.
Bonus Exercise: Slurm Jobs
When you have made a program that works, it’s more efficient to run the program as a batch job than in JupyterLab. This is because a JupyterLab session reserves a GPU all the time, also when you’re not running computations. Therefore, you should save your finished program as a regular Python program that you can schedule as a job.
You can save your code by clicking the “File”-menu in JupyterLab, click on “Save and Export Notebook As…” and then click “Executable Script”.
The result is the Python file summarizing.py
that is downloaded to your local computer.
You will also need to download the slurm script
LLM.slurm
.
Upload both the Python file summarizing.py
and the slurm script LLM.slurm
to Fox.
Then, start the job with this command:
sbatch LLM.slurm summarizing.py
Slurm creates a log file for each job which is stored with a name like slurm-1358473.out
.
By default, these log files are stored in the current working directory where you run the sbatch
command.
If you want to store the log files somewhere else, you can add a line like below to your slurm script.
Remember to change the username.
#SBATCH --output=/fp/projects01/ec443/<username>/logs/slurm-%j.out