Lesson 3: HuggingFace NLP Models

fractal
python
LLM
Author

Chris Kroenke

Published

October 7, 2023

Running powerful NLP models with the HuggingFace transformers library.

Intro

Welcome to the third lesson of the course. Let’s recap our progress so far:

  • Lesson 1: We made a python environment for LLMs.
  • Lesson 2: Set up a personal blog to track our progress.

Next we will use our first LLM. We’ll start with a Natural Language Processing (NLP) model provided by the HuggingFace team.

Notebook best practices

First, let’s set up our notebook to be fully interactive and easy to use. We can do this with a couple of “magic functions” built-in to Jupyter.

Specifically, we use the magic autoreload and matplotlib functions. The cell below shows them in action:

# best practice notebook magic
%load_ext autoreload
%autoreload 2
%matplotlib inline

Let’s take a look at what these magic functions do.

autoreload dynamically reloads code libraries, even as they’re changing under the hood. That means we do not have to restart the notebook after every change. We can instead code and experiment on the fly.

matplotlib inline automatically displays any plots below the code cell that created them. The plots are also saved in the notebook itself, which is perfect for our blog posts.

All of our notebooks going forward will start with these magic functions.

Let’s start with the "hello, world!" of NLP: sentiment analysis.

Sentiment Analysis with HuggingFace

Note

The code and examples below are based on the official HuggingFace tutorial, reworked to better suit the course.

Imagine that we’re selling some product. And we’ve gathered a bunch of reviews from a large group of users to find out both the good and bad things that people are saying. The bad reviews will point out where our product needs improving. Positive reviews will show what we’re doing right.

Figuring out the tone of a statement (positive vs. negative) is an area of NLP known as sentiment analysis.

Going through each review would give us a ton of insight about our product. But, it would take a ton of intense and manual effort. Enter Machine Learning to the rescue! An NLP model can automatically analyze and classify the reviews in bulk.

First, a Pipeline

Let’s take a look at the HuggingFace NLP model that we’ll run. At a high level, the model is built around three key pieces:

  1. A Config file.
  2. A Preprocessor file.
  3. Model file(s).

The HuggingFace API has a handy, high-level pipeline that wraps up all three objects for us.

Important

Before going forward, make sure that the llm-env environment from the first lesson is active. This environment has the HuggingFace libraries used below.

The code below uses the transformers library to build a Sentiment Analysis pipeline.

# load in the pipeline object from HuggingFace
from transformers import pipeline

# create a sentiment analysis pipeline
classifier = pipeline("sentiment-analysis")
1
Import the pipeline.
2
Instantiate the sentiment classifier.
No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.

Line 5 Load in the sentiment analysis pipeline.

Since we didn’t specify a model, you can see in the output above that HuggingFace picked a distilbert model for us by default.

We will learn more about what exactly distilbert is and how it works later on. For now, think of it as a useful NLP genie who can look at a sentence and tell us whether its has a positive or negative tone.

Next, let’s find out what the model thinks about the sentence: "HuggingFace pipelines are awesome!"

# sentiment analysis on a simple, example sentence
example_sentence = "HuggingFace pipelines are awesome!"
classifier(example_sentence)
[{'label': 'POSITIVE', 'score': 0.9998503923416138}]

Not bad. We see a strong confident score for a POSITIVE label, as could be expected.

We can also pass many sentences at once, which starts to show the bulk processing power of these models. Let’s process four sentences at once: three positive ones, and a clearly negative one.

# many sentences at once, in a python list
many_sentences = [
    "HuggingFace pipelines are awesome!",
    "I hope you're enjoying this course so far",
    "Hopefully the material is clear and useful",
    "I don't like this course so far",
]

# process many sentences at once
results = classifier(many_sentences)

# check the tone of each sentence
for result in results:
    print(f"label: {result['label']}, score: {round(result['score'], 4)}")
label: POSITIVE, with score: 0.9999
label: POSITIVE, with score: 0.9998
label: POSITIVE, with score: 0.9998
label: NEGATIVE, with score: 0.8758

Congrats! You’ve now ran a HuggingFace pipeline and used it to analyze the tone of a few sentences. Next, let’s take a closer look at the pipeline object.

Going inside the pipeline

Under the hood, a pipeline handles three key HuggingFace NLP pieces: Config, Preprocessor, and Model.

To better understand each piece, let’s take one small step down the ladder of abstraction and build our own simple pipeline.

We will use the same distilbert model from before. First we need the three key pieces mentioned above. Thankfully, we can import each of these pieces from the transformers library.

Config class

The config class is a simple map with the options and configurations of a model. It has the key-value pairs that define a model’s architecture and hyperparameters.

# config for the model
from transformers import DistilBertConfig

Preprocessor class

The preprocessor object in this case is a Tokenizer. Tokenizers convert strings and characters into special tensor inputs for the LLM.

Note

Correctly pre-processing inputs is one of the most important and error-prone steps in using ML models. In other words, it’s good to offload to a class that’s already been tested and debugged.

# input preprocessor to tokenize strings
from transformers import DistilBertTokenizer

Model class

The model class holds the weights and parameters for the actual LLM. It’s the “meat and bones” of the setup, so to speak.

# the text classifier model
from transformers import DistilBertForSequenceClassification

Naming the model

We need to know a model’s full, proper name in to load it from HuggingFace. Its name is how we find the model on the HuggingFace Model Hub.

Once we know its full name, there is a handy from_pretrained() function that will automatically find and download the pieces for us.

In this case, the distilbert model’s full name is:
> distilbert-base-uncased-finetuned-sst-2-english.

# sentiment analysis model name
model_name = 'distilbert-base-uncased-finetuned-sst-2-english'

In the code below we can now load each of the three NLP pieces for this model.

# create the config
config = DistilBertConfig.from_pretrained(model_name)

# create the input tokenizer 
tokenizer = DistilBertTokenizer.from_pretrained(model_name)

# create the model
model = DistilBertForSequenceClassification.from_pretrained(model_name)

Next we will compose these three pieces together to mimic the original pipeline example.

Putting together a simple_pipeline

Preprocessing the inputs

First, we create a preprocess function to turn a given text string into the proper, tokenized inputs than an LLM expects.

def preprocess(text: str):
    """
    Sends `text` through the model's tokenizer.  
    The tokenizer turns words and characters into proper inputs for an NLP model.
    """
    tokenized_inputs = tokenizer(text, return_tensors='pt')
    return tokenized_inputs

Let’s test this preprocessing function on the example sentence from earlier.

# manually preprocessing the example sentence: "HuggingFace pipelines are awesome!"
preprocess(example_sentence)
{'input_ids': tensor([[  101, 17662, 12172, 13117,  2015,  2024, 12476,   999,   102]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]])}

It turned an input string into numerical embeddings for the LLM. We’ll breakdown what exactly this output means later on in the course. For now, think of it as sanitizing and formatting the text into a format that the LLM has been trained to work with.

Runnning the model

Next up, let’s make our own forward function that run the LLM on preprocessed inputs.

def forward(text: str):
    """
    First we preprocess the `text` into tokens.
    Then we send the `tokenized_inputs` to the model.
    """
    tokenized_inputs = preprocess(text)
    outputs = model(**tokenized_inputs)
    return outputs

Let’s check what this outputs for our running example sentence.

outputs = forward(example_sentence); outputs
SequenceClassifierOutput(loss=None, logits=tensor([[-4.2326,  4.5748]], grad_fn=<AddmmBackward0>), hidden_states=None, attentions=None)

You’ll see a lot going on in the SequenceClassifierOutput above. To be honest, this is where the original pipeline does most of the heavy-lifting for us. It takes the raw, detailed output from an LLM and converts it into a more human-readable format.

We’ll mimic this heavy-lifting by using the Config class and model outputs to find out whether the sentence is positive or negative.

def process_outputs(outs):
    """
    Converting the raw model outputs into a human-readable result.

    Steps:
        1. Grab the raw "scores" from the model for Positive and Negative labels.  
        2. Find out which score is the highest (aka the model's decision).  
        3. Use the `config` object to find the class label for the highest score.  
        4. Turn the raw score into a human-readable probability value.  
        5. Print out the predicted labels with its probability.  
    """
    # 1. Grab the raw "scores" that from the model for Positive and Negative labels
    logits = outs.logits

    # 2. Find the strongest label score, aka the model's decision
    pred_idx = logits.argmax(1).item()

    # 3. Use the `config` object to find the class label
    pred_label = config.id2label[pred_idx]  

    # 4. Calculate the human-readable number for the score
    pred_score = logits.softmax(-1)[:, pred_idx].item()

    # 5. return the label and score in a dictionary
    return {
        'label': pred_label,
        'score': pred_score, 
    }

We can now put together a simple_pipeline, and check how it compares to the original pipeline.

def simple_pipeline(text):
    """
    Putting the NLP pieces and functions together into a pipeline.
    """
    # get the model's raw output
    model_outs = forward(text)
    # convert the raw outputs into a human readable result
    predictions = process_outputs(model_outs)
    return predictions

Calling the simple_pipeline on the example sentence, drumroll please…

# running our simple pipeline on the example text
simple_pipeline(example_sentence)
{'label': 'POSITIVE', 'score': 0.9998503923416138}

And just like that, we too a small peek under the pipeline hood and built our own, simple working version.

One pain point: we had to know the full, proper name of the different Distilbert* pieces to import the Config, Preprocessor, and Model. This gets overwhelming fast given the flood of LLM models released almost daily. Thankfully, HuggingFace has come up with a great solution to this problem: the Auto class.

True HuggingFace magic: Auto classes

With Auto classes, we don’t have to know the exact or proper name of the LLM’s objects to import them. We only need the proper name of the model on the hub:

# viewing our distilbert model's name
model_name
'distilbert-base-uncased-finetuned-sst-2-english'

Run the cell below to import the Auto classes. Then we’ll use them with the model name to create an even cleaner simple_pipeline.

# importing the Auto classes
from transformers import AutoConfig
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification

Next we create the three key NLP pieces with the Auto classes.

# building the pieces with `Auto` classes
config = AutoConfig.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

We can now use these pieces to build a simple_pipeline class that’s cleaner than before, and can handle any model_name:

class SentimentPipeline:
    def __init__(self, model_name: str):
        """
        Simple Sentiment Analysis pipeline.
        """
        self.model_name = model_name
        self.config = AutoConfig.from_pretrained(self.model_name)
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(self.model_name)

    def preprocess(self, text: str):
        """
        Sends `text` through the LLM's tokenizer.  
        The tokenizer turns words and characters into special inputs for the LLM.
        """
        tokenized_inputs = self.tokenizer(text, return_tensors='pt')
        return tokenized_inputs

    def forward(self, text: str):
        """
        First we preprocess the `text` into tokens.
        Then we send the `token_inputs` to the model.
        """
        token_inputs = self.preprocess(text)
        outputs = self.model(**token_inputs)
        return outputs

    def process_outputs(self, outs):
        """
        Here we mimic the post-processing that HuggingFace automatically does in its `pipeline`.  
        """
        # grab the raw scores from the model for Positive and Negative labels
        logits = outs.logits

        # find the strongest label score, aka the model's decision
        pred_idx = logits.argmax(1).item()

        # use the `config` object to find the actual class label
        pred_label = self.config.id2label[pred_idx]  

        # calculate the human-readable probability score for this class
        pred_score = logits.softmax(-1)[:, pred_idx].item()

        # return the predicted label and its score
        return {
            'label': pred_label,
            'score': pred_score, 
        }
    
    def __call__(self, text: str):
        """
        Overriding the call method to easily and intuitively call the pipeline.
        """
        model_outs = self.forward(text)
        preds = self.process_outputs(model_outs)
        return preds

Using the custom SentimentPipeline

Let’s leverage both the new class and a different model, to show the power of Auto classes.

For fun, let’s use BERT model that was trained specifically on tweets. The full model’s name is finiteautomata/bertweet-base-sentiment-analysis.

# using a different model
new_model_name = 'finiteautomata/bertweet-base-sentiment-analysis'
# creating a new sentiment pipeline
simple_pipeline = SentimentPipeline(new_model_name)
emoji is not installed, thus not converting emoticons or emojis into text. Install emoji: pip3 install emoji==0.6.0

Now let’s run it on our handy example sentence.

# calling our new, flexible pipeline
simple_pipeline(example_sentence)
{'label': 'POS', 'score': 0.9908382296562195}

Congrats! You’ve now built a flexible pipeline for Sentiment Analysis that can leverage most NLP models on the HuggingFace hub.

Conlusion

This notebook went through the basics of using a HuggingFace pipeline to run sentiment analysis on a few sentences. We then looked under the hood at the pipeline’s three key pieces: Config, Preprocessor, and Model.

Lastly, we built our own simple_pipeline from scratch to see how the pieces fit together.

The goal of this notebook was two fold. First, we wanted to gain hands-on experience with using the transformers API from HuggingFace. It’s an incredibly powerful library, that lets us do what used to be difficult, research-level NLP tasks in a few lines of code.

Second, we wanted to get some familiarity with downloading models. The model weights that we downloaded from HuggingFace are the same ones that we will be fine-tuning, quantizing, and deploying on our devices throughout the course.

There are two appendixes below. The first one gives a handy way of counting the number of weights in a model. The second one goes into more details about how to interactively debug an analyze the code in a Jupyter notebook.

Appendix 1: Counting the number of parameters in a model

The following code snippet counts the number of trainable parameters in a model. It’s a question that comes up often when working with LLMs, and having a quick reference to find out a rough model’s size often comes in handy.

def count_parameters(model):
    """
    Counts the number of trainable parameters in a `model`.
    """
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

Here we use it to count the number of parameters in the distilbert model from above.

# view the number of parameters in the last model used
f"Number of trainable params: {count_parameters(model):,}"
'Number of trainable params: 66,955,010'

Appendix 2: Inspecting the classifier, notebook style.

What is the classifier object, exactly? Jupyter has many powerful ways of inspecting and analyzing its code.

One of the simplest ways of checking an object is to call it by itself in a code cell, as shown below.

# show the contents of the `classifier` object
classifier
<transformers.pipelines.text_classification.TextClassificationPipeline at 0x176de7850>

We can see the classifier is a type of TextClassification pipeline. This makes sense: we fed it an input sentence and asked it to classify the statement as positive vs. negative.

There is also a tab-autocomplete feature to find the members and methods of an object. For example, to look up everything in classifier, hit tab after adding a ..

Uncomment the cells below and hit the tab key to test the auto-complete feature.

## tab after the `.` to auto-complete all variables/methods
# classifier.

Let’s say you vaguely remember the name of a variable or function, say for example the forward() method. In that case you can type the first few letters and hit tab to auto-complete the full set of options:

## tab after the `.for` to auto-complete the rest of the options
# classifier.for

Asking questions: ? and ??

Lastly, we can literally interrogate an object in Jupyter for more information.

If we tag a single ? after an object, we’ll get its basic documentation (docstring). Note that we omit it here to keep the notebook from getting too busy.

## the power of asking questions
classifier?

If we tag on two question marks: ??, then we get the full source code of the object:

## really curious about classifier
classifier??

Both ? and ?? are excellent and quick ways to look under the hood of any object in Jupyter.

Inspecting a specific classifier function

Let’s take a look at the function that does the heavy lifting for our sentiment analysis task: forward().

 # looking at what actually runs the inputs
classifier.forward
<bound method Pipeline.forward of <transformers.pipelines.text_classification.TextClassificationPipeline object at 0x176de7850>>

What does this function actually do? Let’s find out.

# source code of the forward function
classifier.forward??
Signature: classifier.forward(model_inputs, **forward_params)
Docstring: <no docstring>
Source:   
    def forward(self, model_inputs, **forward_params):
        with self.device_placement():
            if self.framework == "tf":
                model_inputs["training"] = False
                model_outputs = self._forward(model_inputs, **forward_params)
            elif self.framework == "pt":
                inference_context = self.get_inference_context()
                with inference_context():
                    model_inputs = self._ensure_tensor_on_device(model_inputs, device=self.device)
                    model_outputs = self._forward(model_inputs, **forward_params)
                    model_outputs = self._ensure_tensor_on_device(model_outputs, device=torch.device("cpu"))
            else:
                raise ValueError(f"Framework {self.framework} is not supported")
        return model_outputs
File:      ~/mambaforge/envs/llm_base/lib/python3.11/site-packages/transformers/pipelines/base.py
Type:      method

We can see that it automatically handles whether we’re running a TensorFlow (tf) or PyTorch (pt) model. Then, it makes sure the tensors are on the correct device. Lastly is calls another function, _forward() on the prepared inputs.

We can follow the rabbit hole as far down as needed. Let’s take a look at the source of _forward.

# going deeper
classifier._forward??
Signature: classifier._forward(model_inputs)
Docstring:
_forward will receive the prepared dictionary from `preprocess` and run it on the model. This method might
involve the GPU or the CPU and should be agnostic to it. Isolating this function is the reason for `preprocess`
and `postprocess` to exist, so that the hot path, this method generally can run as fast as possible.

It is not meant to be called directly, `forward` is preferred. It is basically the same but contains additional
code surrounding `_forward` making sure tensors and models are on the same device, disabling the training part
of the code (leading to faster inference).
Source:   
    def _forward(self, model_inputs):
        # `XXXForSequenceClassification` models should not use `use_cache=True` even if it's supported
        model_forward = self.model.forward if self.framework == "pt" else self.model.call
        if "use_cache" in inspect.signature(model_forward).parameters.keys():
            model_inputs["use_cache"] = False
        return self.model(**model_inputs)
File:      ~/mambaforge/envs/llm_base/lib/python3.11/site-packages/transformers/pipelines/text_classification.py
Type:      method

Ah, we can see it calls the model of the classifier. This is the distilbert model we saw earlier! Now we can peek under the hood at the actual Transformer LLM.

# the distilbert sentiment analysis model
classifier.model
DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
            (lin1): Linear(in_features=768, out_features=3072, bias=True)
            (lin2): Linear(in_features=3072, out_features=768, bias=True)
            (activation): GELUActivation()
          )
          (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        )
      )
    )
  )
  (pre_classifier): Linear(in_features=768, out_features=768, bias=True)
  (classifier): Linear(in_features=768, out_features=2, bias=True)
  (dropout): Dropout(p=0.2, inplace=False)
)

We will breakdown the different pieces in this model later on in the course.

The important takeaway for now is that this shows the main structure of most Transformer LLM models. The changes are mostly incremental from this foundation.