Series: llms-course

Part 4 of 8

← Lesson 2: Blogging with nbdev View All Lesson 4: Quantized LLMs with llama.cpp →

Lesson 3: HuggingFace NLP Models

Running powerful NLP models with the HuggingFace transformers library.

Intro

Welcome to the third lesson of the course. Let's recap our progress so far:

Lesson 1: We made a python environment for LLMs.
Lesson 2: Set up a personal blog to track our progress.

Next we will use our first LLM. We'll start with a Natural Language Processing (NLP) model provided by the HuggingFace team.

Notebook best practices

First, let's set up our notebook to be fully interactive and easy to use. We can do this with a couple of "magic functions" built-in to Jupyter.

Specifically, we use the magic autoreload and matplotlib functions. The cell below shows them in action:

#| classes: code-alone
# best practice notebook magic
%load_ext autoreload
%autoreload 2
%matplotlib inline

Let's take a look at what these magic functions do.

autoreload dynamically reloads code libraries, even as they're changing under the hood. That means we do not have to restart the notebook after every change. We can instead code and experiment on the fly.

matplotlib inline automatically displays any plots below the code cell that created them. The plots are also saved in the notebook itself, which is perfect for our blog posts.

All of our notebooks going forward will start with these magic functions.

Let's start with the "hello, world!" of NLP: sentiment analysis.

Sentiment Analysis with HuggingFace

:::: callout-note The code and examples below are based on the official HuggingFace tutorial, reworked to better suit the course. ::::

Imagine that we're selling some product. And we've gathered a bunch of reviews from a large group of users to find out both the good and bad things that people are saying. The bad reviews will point out where our product needs improving. Positive reviews will show what we're doing right.

Figuring out the tone of a statement (positive vs. negative) is an area of NLP known as sentiment analysis.

Going through each review would give us a ton of insight about our product. But, it would take a ton of intense and manual effort. Enter Machine Learning to the rescue! An NLP model can automatically analyze and classify the reviews in bulk.

First, a Pipeline

Let's take a look at the HuggingFace NLP model that we'll run. At a high level, the model is built around three key pieces:

A Config file.
A Preprocessor file.
Model file(s).

The HuggingFace API has a handy, high-level pipeline that wraps up all three objects for us.

:::: callout-important Before going forward, make sure that the llm-env environment from the first lesson is active. This environment has the HuggingFace libraries used below. ::::

The code below uses the transformers library to build a Sentiment Analysis pipeline.

# load in the pipeline object from HuggingFace
from transformers import pipeline #<1>

# create a sentiment analysis pipeline
classifier = pipeline("sentiment-analysis") # <2>

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.

Import the pipeline.
Instantiate the sentiment classifier.

Line 5 Load in the sentiment analysis pipeline.

Since we didn't specify a model, you can see in the output above that HuggingFace picked a distilbert model for us by default.

We will learn more about what exactly distilbert is and how it works later on. For now, think of it as a useful NLP genie who can look at a sentence and tell us whether its has a positive or negative tone.

Next, let's find out what the model thinks about the sentence: "HuggingFace pipelines are awesome!"

# sentiment analysis on a simple, example sentence
example_sentence = "HuggingFace pipelines are awesome!"
classifier(example_sentence)

Not bad. We see a strong confident score for a POSITIVE label, as could be expected.

We can also pass many sentences at once, which starts to show the bulk processing power of these models. Let's process four sentences at once: three positive ones, and a clearly negative one.

# many sentences at once, in a python list
many_sentences = [
    "HuggingFace pipelines are awesome!",
    "I hope you're enjoying this course so far",
    "Hopefully the material is clear and useful",
    "I don't like this course so far",
]

# process many sentences at once
results = classifier(many_sentences)

# check the tone of each sentence
for result in results:
    print(f"label: {result['label']}, score: {round(result['score'], 4)}")

Congrats! You've now ran a HuggingFace pipeline and used it to analyze the tone of a few sentences. Next, let's take a closer look at the pipeline object.

Going inside the `pipeline`

Under the hood, a pipeline handles three key HuggingFace NLP pieces: Config, Preprocessor, and Model.

To better understand each piece, let's take one small step down the ladder of abstraction and build our own simple pipeline.

We will use the same distilbert model from before. First we need the three key pieces mentioned above. Thankfully, we can import each of these pieces from the transformers library.

Config class

The config class is a simple map with the options and configurations of a model. It has the key-value pairs that define a model's architecture and hyperparameters.

# config for the model
from transformers import DistilBertConfig

Preprocessor class

The preprocessor object in this case is a Tokenizer. Tokenizers convert strings and characters into special tensor inputs for the LLM.

:::: callout-note Correctly pre-processing inputs is one of the most important and error-prone steps in using ML models. In other words, it's good to offload to a class that's already been tested and debugged. ::::

# input preprocessor to tokenize strings
from transformers import DistilBertTokenizer

Model class

The model class holds the weights and parameters for the actual LLM. It's the "meat and bones" of the setup, so to speak.

# the text classifier model
from transformers import DistilBertForSequenceClassification

Naming the model

We need to know a model's full, proper name in to load it from HuggingFace. Its name is how we find the model on the HuggingFace Model Hub.

Once we know its full name, there is a handy from_pretrained() function that will automatically find and download the pieces for us.

In this case, the distilbert model's full name is:

distilbert-base-uncased-finetuned-sst-2-english.

#| classes: code-alone
# sentiment analysis model name
model_name = 'distilbert-base-uncased-finetuned-sst-2-english'

In the code below we can now load each of the three NLP pieces for this model.

#| classes: code-alone
# create the config
config = DistilBertConfig.from_pretrained(model_name)

# create the input tokenizer 
tokenizer = DistilBertTokenizer.from_pretrained(model_name)

# create the model
model = DistilBertForSequenceClassification.from_pretrained(model_name)

Next we will compose these three pieces together to mimic the original pipeline example.

Putting together a `simple_pipeline`

Preprocessing the inputs

First, we create a preprocess function to turn a given text string into the proper, tokenized inputs than an LLM expects.

#| classes: code-alone
def preprocess(text: str):
    """
    Sends `text` through the model's tokenizer.  
    The tokenizer turns words and characters into proper inputs for an NLP model.
    """
    tokenized_inputs = tokenizer(text, return_tensors='pt')
    return tokenized_inputs

Let's test this preprocessing function on the example sentence from earlier.

# manually preprocessing the example sentence: "HuggingFace pipelines are awesome!"
preprocess(example_sentence)

It turned an input string into numerical embeddings for the LLM. We'll breakdown what exactly this output means later on in the course. For now, think of it as sanitizing and formatting the text into a format that the LLM has been trained to work with.

Runnning the model

Next up, let's make our own forward function that run the LLM on preprocessed inputs.

#| classes: code-alone
def forward(text: str):
    """
    First we preprocess the `text` into tokens.
    Then we send the `tokenized_inputs` to the model.
    """
    tokenized_inputs = preprocess(text)
    outputs = model(**tokenized_inputs)
    return outputs

Let's check what this outputs for our running example sentence.

outputs = forward(example_sentence); outputs

You'll see a lot going on in the SequenceClassifierOutput above. To be honest, this is where the original pipeline does most of the heavy-lifting for us. It takes the raw, detailed output from an LLM and converts it into a more human-readable format.

We'll mimic this heavy-lifting by using the Config class and model outputs to find out whether the sentence is positive or negative.

#| classes: code-alone
def process_outputs(outs):
    """
    Converting the raw model outputs into a human-readable result.

    Steps:
        1. Grab the raw "scores" from the model for Positive and Negative labels.  
        2. Find out which score is the highest (aka the model's decision).  
        3. Use the `config` object to find the class label for the highest score.  
        4. Turn the raw score into a human-readable probability value.  
        5. Print out the predicted labels with its probability.  
    """
    # 1. Grab the raw "scores" that from the model for Positive and Negative labels
    logits = outs.logits

    # 2. Find the strongest label score, aka the model's decision
    pred_idx = logits.argmax(1).item()

    # 3. Use the `config` object to find the class label
    pred_label = config.id2label[pred_idx]  

    # 4. Calculate the human-readable number for the score
    pred_score = logits.softmax(-1)[:, pred_idx].item()

    # 5. return the label and score in a dictionary
    return {
        'label': pred_label,
        'score': pred_score, 
    }

We can now put together a simple_pipeline, and check how it compares to the original pipeline.

def simple_pipeline(text):
    """
    Putting the NLP pieces and functions together into a pipeline.
    """
    # get the model's raw output
    model_outs = forward(text)
    # convert the raw outputs into a human readable result
    predictions = process_outputs(model_outs)
    return predictions

Calling the simple_pipeline on the example sentence, drumroll please...

# running our simple pipeline on the example text
simple_pipeline(example_sentence)

And just like that, we too a small peek under the pipeline hood and built our own, simple working version.

One pain point: we had to know the full, proper name of the different Distilbert* pieces to import the Config, Preprocessor, and Model. This gets overwhelming fast given the flood of LLM models released almost daily. Thankfully, HuggingFace has come up with a great solution to this problem: the Auto class.

True HuggingFace magic: `Auto` classes

With Auto classes, we don't have to know the exact or proper name of the LLM's objects to import them. We only need the proper name of the model on the hub:

# viewing our distilbert model's name
model_name

Run the cell below to import the Auto classes. Then we'll use them with the model name to create an even cleaner simple_pipeline.

#| classes: code-alone
# importing the Auto classes
from transformers import AutoConfig
from transformers import AutoTokenizer
from transformers import AutoModelForSequenceClassification

Next we create the three key NLP pieces with the Auto classes.

#| classes: code-alone
# building the pieces with `Auto` classes
config = AutoConfig.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

We can now use these pieces to build a simple_pipeline class that's cleaner than before, and can handle any model_name:

class SentimentPipeline:
    def __init__(self, model_name: str):
        """
        Simple Sentiment Analysis pipeline.
        """
        self.model_name = model_name
        self.config = AutoConfig.from_pretrained(self.model_name)
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(self.model_name)

    def preprocess(self, text: str):
        """
        Sends `text` through the LLM's tokenizer.  
        The tokenizer turns words and characters into special inputs for the LLM.
        """
        tokenized_inputs = self.tokenizer(text, return_tensors='pt')
        return tokenized_inputs

    def forward(self, text: str):
        """
        First we preprocess the `text` into tokens.
        Then we send the `token_inputs` to the model.
        """
        token_inputs = self.preprocess(text)
        outputs = self.model(**token_inputs)
        return outputs

    def process_outputs(self, outs):
        """
        Here we mimic the post-processing that HuggingFace automatically does in its `pipeline`.  
        """
        # grab the raw scores from the model for Positive and Negative labels
        logits = outs.logits

        # find the strongest label score, aka the model's decision
        pred_idx = logits.argmax(1).item()

        # use the `config` object to find the actual class label
        pred_label = self.config.id2label[pred_idx]  

        # calculate the human-readable probability score for this class
        pred_score = logits.softmax(-1)[:, pred_idx].item()

        # return the predicted label and its score
        return {
            'label': pred_label,
            'score': pred_score, 
        }
    
    def __call__(self, text: str):
        """
        Overriding the call method to easily and intuitively call the pipeline.
        """
        model_outs = self.forward(text)
        preds = self.process_outputs(model_outs)
        return preds

Using the custom `SentimentPipeline`

Let's leverage both the new class and a different model, to show the power of Auto classes.

For fun, let's use BERT model that was trained specifically on tweets. The full model's name is finiteautomata/bertweet-base-sentiment-analysis.

#| classes: code-alone
# using a different model
new_model_name = 'finiteautomata/bertweet-base-sentiment-analysis'

# creating a new sentiment pipeline
simple_pipeline = SentimentPipeline(new_model_name)

Now let's run it on our handy example sentence.

# calling our new, flexible pipeline
simple_pipeline(example_sentence)

Congrats! You've now built a flexible pipeline for Sentiment Analysis that can leverage most NLP models on the HuggingFace hub.

Conlusion

This notebook went through the basics of using a HuggingFace pipeline to run sentiment analysis on a few sentences. We then looked under the hood at the pipeline's three key pieces: Config, Preprocessor, and Model.

Lastly, we built our own simple_pipeline from scratch to see how the pieces fit together.

The goal of this notebook was two fold. First, we wanted to gain hands-on experience with using the transformers API from HuggingFace. It's an incredibly powerful library, that lets us do what used to be difficult, research-level NLP tasks in a few lines of code.

Second, we wanted to get some familiarity with downloading models. The model weights that we downloaded from HuggingFace are the same ones that we will be fine-tuning, quantizing, and deploying on our devices throughout the course.

There are two appendixes below. The first one gives a handy way of counting the number of weights in a model. The second one goes into more details about how to interactively debug an analyze the code in a Jupyter notebook.

Appendix 1: Counting the number of parameters in a model

The following code snippet counts the number of trainable parameters in a model. It's a question that comes up often when working with LLMs, and having a quick reference to find out a rough model's size often comes in handy.

#| classes: code-alone
def count_parameters(model):
    """
    Counts the number of trainable parameters in a `model`.
    """
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

Here we use it to count the number of parameters in the distilbert model from above.

# view the number of parameters in the last model used
f"Number of trainable params: {count_parameters(model):,}"

Appendix 2: Inspecting the `classifier`, notebook style.

What is the classifier object, exactly? Jupyter has many powerful ways of inspecting and analyzing its code.

One of the simplest ways of checking an object is to call it by itself in a code cell, as shown below.

# show the contents of the `classifier` object
classifier

We can see the classifier is a type of TextClassification pipeline. This makes sense: we fed it an input sentence and asked it to classify the statement as positive vs. negative.

There is also a tab-autocomplete feature to find the members and methods of an object. For example, to look up everything in classifier, hit tab after adding a ..

Uncomment the cells below and hit the tab key to test the auto-complete feature.

## tab after the `.` to auto-complete all variables/methods
# classifier.

Let's say you vaguely remember the name of a variable or function, say for example the forward() method. In that case you can type the first few letters and hit tab to auto-complete the full set of options:

## tab after the `.for` to auto-complete the rest of the options
# classifier.for

Asking questions: `?` and `??`

Lastly, we can literally interrogate an object in Jupyter for more information.

If we tag a single ? after an object, we'll get its basic documentation (docstring). Note that we omit it here to keep the notebook from getting too busy.

#| output: false
## the power of asking questions
classifier?

Signature:      classifier(*args, **kwargs)
Type:           TextClassificationPipeline
String form:    <transformers.pipelines.text_classification.TextClassificationPipeline object at 0x176de7850>
File:           ~/mambaforge/envs/llm_base/lib/python3.11/site-packages/transformers/pipelines/text_classification.py
Docstring:     
Text classification pipeline using any `ModelForSequenceClassification`. See the [sequence classification
examples](../task_summary#sequence-classification) for more information.

Example:

```python
>>> from transformers import pipeline

>>> classifier = pipeline(model="distilbert-base-uncased-finetuned-sst-2-english")
>>> classifier("This movie is disgustingly good !")
[{'label': 'POSITIVE', 'score': 1.0}]

>>> classifier("Director tried too much.")
[{'label': 'NEGATIVE', 'score': 0.996}]
```

Learn more about the basics of using a pipeline in the [pipeline tutorial](../pipeline_tutorial)

This text classification pipeline can currently be loaded from [`pipeline`] using the following task identifier:
`"sentiment-analysis"` (for classifying sequences according to positive or negative sentiments).

If multiple classification labels are available (`model.config.num_labels >= 2`), the pipeline will run a softmax
over the results. If there is a single label, the pipeline will run a sigmoid over the result.

The models that this pipeline can use are models that have been fine-tuned on a sequence classification task. See
the up-to-date list of available models on
[huggingface.co/models](https://huggingface.co/models?filter=text-classification).

Arguments:
    model ([`PreTrainedModel`] or [`TFPreTrainedModel`]):
        The model that will be used by the pipeline to make predictions. This needs to be a model inheriting from
        [`PreTrainedModel`] for PyTorch and [`TFPreTrainedModel`] for TensorFlow.
    tokenizer ([`PreTrainedTokenizer`]):
        The tokenizer that will be used by the pipeline to encode data for the model. This object inherits from
        [`PreTrainedTokenizer`].
    modelcard (`str` or [`ModelCard`], *optional*):
        Model card attributed to the model for this pipeline.
    framework (`str`, *optional*):
        The framework to use, either `"pt"` for PyTorch or `"tf"` for TensorFlow. The specified framework must be
        installed.

        If no framework is specified, will default to the one currently installed. If no framework is specified and
        both frameworks are installed, will default to the framework of the `model`, or to PyTorch if no model is
        provided.
    task (`str`, defaults to `""`):
        A task-identifier for the pipeline.
    num_workers (`int`, *optional*, defaults to 8):
        When the pipeline will use *DataLoader* (when passing a dataset, on GPU for a Pytorch model), the number of
        workers to be used.
    batch_size (`int`, *optional*, defaults to 1):
        When the pipeline will use *DataLoader* (when passing a dataset, on GPU for a Pytorch model), the size of
        the batch to use, for inference this is not always beneficial, please read [Batching with
        pipelines](https://huggingface.co/transformers/main_classes/pipelines.html#pipeline-batching) .
    args_parser ([`~pipelines.ArgumentHandler`], *optional*):
        Reference to the object in charge of parsing supplied pipeline parameters.
    device (`int`, *optional*, defaults to -1):
        Device ordinal for CPU/GPU supports. Setting this to -1 will leverage CPU, a positive will run the model on
        the associated CUDA device id. You can pass native `torch.device` or a `str` too.
    binary_output (`bool`, *optional*, defaults to `False`):
        Flag indicating if the output the pipeline should happen in a binary format (i.e., pickle) or as raw text.

    return_all_scores (`bool`, *optional*, defaults to `False`):
        Whether to return all prediction scores or just the one of the predicted class.
    function_to_apply (`str`, *optional*, defaults to `"default"`):
        The function to apply to the model outputs in order to retrieve the scores. Accepts four different values:

        - `"default"`: if the model has a single label, will apply the sigmoid function on the output. If the model
          has several labels, will apply the softmax function on the output.
        - `"sigmoid"`: Applies the sigmoid function on the output.
        - `"softmax"`: Applies the softmax function on the output.
        - `"none"`: Does not apply any function on the output.
Call docstring:
Classify the text(s) given as inputs.

Args:
    args (`str` or `List[str]` or `Dict[str]`, or `List[Dict[str]]`):
        One or several texts to classify. In order to use text pairs for your classification, you can send a
        dictionary containing `{"text", "text_pair"}` keys, or a list of those.
    top_k (`int`, *optional*, defaults to `1`):
        How many results to return.
    function_to_apply (`str`, *optional*, defaults to `"default"`):
        The function to apply to the model outputs in order to retrieve the scores. Accepts four different
        values:

        If this argument is not specified, then it will apply the following functions according to the number
        of labels:

        - If the model has a single label, will apply the sigmoid function on the output.
        - If the model has several labels, will apply the softmax function on the output.

        Possible values are:

        - `"sigmoid"`: Applies the sigmoid function on the output.
        - `"softmax"`: Applies the softmax function on the output.
        - `"none"`: Does not apply any function on the output.

Return:
    A list or a list of list of `dict`: Each result comes as list of dictionaries with the following keys:

    - **label** (`str`) -- The label predicted.
    - **score** (`float`) -- The corresponding probability.

    If `top_k` is used, one such dictionary is returned per label.

If we tag on two question marks: ??, then we get the full source code of the object:

#| output: false
## really curious about classifier
classifier??

Signature:      classifier(*args, **kwargs)
Type:           TextClassificationPipeline
String form:    <transformers.pipelines.text_classification.TextClassificationPipeline object at 0x176de7850>
File:           ~/mambaforge/envs/llm_base/lib/python3.11/site-packages/transformers/pipelines/text_classification.py
Source:        
@add_end_docstrings(
    PIPELINE_INIT_ARGS,
    r"""
        return_all_scores (`bool`, *optional*, defaults to `False`):
            Whether to return all prediction scores or just the one of the predicted class.
        function_to_apply (`str`, *optional*, defaults to `"default"`):
            The function to apply to the model outputs in order to retrieve the scores. Accepts four different values:

            - `"default"`: if the model has a single label, will apply the sigmoid function on the output. If the model
              has several labels, will apply the softmax function on the output.
            - `"sigmoid"`: Applies the sigmoid function on the output.
            - `"softmax"`: Applies the softmax function on the output.
            - `"none"`: Does not apply any function on the output.
    """,
)
class TextClassificationPipeline(Pipeline):
    """
    Text classification pipeline using any `ModelForSequenceClassification`. See the [sequence classification
    examples](../task_summary#sequence-classification) for more information.

    Example:

    ```python
    >>> from transformers import pipeline

    >>> classifier = pipeline(model="distilbert-base-uncased-finetuned-sst-2-english")
    >>> classifier("This movie is disgustingly good !")
    [{'label': 'POSITIVE', 'score': 1.0}]

    >>> classifier("Director tried too much.")
    [{'label': 'NEGATIVE', 'score': 0.996}]
    ```

    Learn more about the basics of using a pipeline in the [pipeline tutorial](../pipeline_tutorial)

    This text classification pipeline can currently be loaded from [`pipeline`] using the following task identifier:
    `"sentiment-analysis"` (for classifying sequences according to positive or negative sentiments).

    If multiple classification labels are available (`model.config.num_labels >= 2`), the pipeline will run a softmax
    over the results. If there is a single label, the pipeline will run a sigmoid over the result.

    The models that this pipeline can use are models that have been fine-tuned on a sequence classification task. See
    the up-to-date list of available models on
    [huggingface.co/models](https://huggingface.co/models?filter=text-classification).
    """

    return_all_scores = False
    function_to_apply = ClassificationFunction.NONE

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

        self.check_model_type(
            TF_MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES
            if self.framework == "tf"
            else MODEL_FOR_SEQUENCE_CLASSIFICATION_MAPPING_NAMES
        )

    def _sanitize_parameters(self, return_all_scores=None, function_to_apply=None, top_k="", **tokenizer_kwargs):
        # Using "" as default argument because we're going to use `top_k=None` in user code to declare
        # "No top_k"
        preprocess_params = tokenizer_kwargs

        postprocess_params = {}
        if hasattr(self.model.config, "return_all_scores") and return_all_scores is None:
            return_all_scores = self.model.config.return_all_scores

        if isinstance(top_k, int) or top_k is None:
            postprocess_params["top_k"] = top_k
            postprocess_params["_legacy"] = False
        elif return_all_scores is not None:
            warnings.warn(
                "`return_all_scores` is now deprecated,  if want a similar functionality use `top_k=None` instead of"
                " `return_all_scores=True` or `top_k=1` instead of `return_all_scores=False`.",
                UserWarning,
            )
            if return_all_scores:
                postprocess_params["top_k"] = None
            else:
                postprocess_params["top_k"] = 1

        if isinstance(function_to_apply, str):
            function_to_apply = ClassificationFunction[function_to_apply.upper()]

        if function_to_apply is not None:
            postprocess_params["function_to_apply"] = function_to_apply
        return preprocess_params, {}, postprocess_params

    def __call__(self, *args, **kwargs):
        """
        Classify the text(s) given as inputs.

        Args:
            args (`str` or `List[str]` or `Dict[str]`, or `List[Dict[str]]`):
                One or several texts to classify. In order to use text pairs for your classification, you can send a
                dictionary containing `{"text", "text_pair"}` keys, or a list of those.
            top_k (`int`, *optional*, defaults to `1`):
                How many results to return.
            function_to_apply (`str`, *optional*, defaults to `"default"`):
                The function to apply to the model outputs in order to retrieve the scores. Accepts four different
                values:

                If this argument is not specified, then it will apply the following functions according to the number
                of labels:

                - If the model has a single label, will apply the sigmoid function on the output.
                - If the model has several labels, will apply the softmax function on the output.

                Possible values are:

                - `"sigmoid"`: Applies the sigmoid function on the output.
                - `"softmax"`: Applies the softmax function on the output.
                - `"none"`: Does not apply any function on the output.

        Return:
            A list or a list of list of `dict`: Each result comes as list of dictionaries with the following keys:

            - **label** (`str`) -- The label predicted.
            - **score** (`float`) -- The corresponding probability.

            If `top_k` is used, one such dictionary is returned per label.
        """
        result = super().__call__(*args, **kwargs)
        # TODO try and retrieve it in a nicer way from _sanitize_parameters.
        _legacy = "top_k" not in kwargs
        if isinstance(args[0], str) and _legacy:
            # This pipeline is odd, and return a list when single item is run
            return [result]
        else:
            return result

    def preprocess(self, inputs, **tokenizer_kwargs) -> Dict[str, GenericTensor]:
        return_tensors = self.framework
        if isinstance(inputs, dict):
            return self.tokenizer(**inputs, return_tensors=return_tensors, **tokenizer_kwargs)
        elif isinstance(inputs, list) and len(inputs) == 1 and isinstance(inputs[0], list) and len(inputs[0]) == 2:
            # It used to be valid to use a list of list of list for text pairs, keeping this path for BC
            return self.tokenizer(
                text=inputs[0][0], text_pair=inputs[0][1], return_tensors=return_tensors, **tokenizer_kwargs
            )
        elif isinstance(inputs, list):
            # This is likely an invalid usage of the pipeline attempting to pass text pairs.
            raise ValueError(
                "The pipeline received invalid inputs, if you are trying to send text pairs, you can try to send a"
                ' dictionary `{"text": "My text", "text_pair": "My pair"}` in order to send a text pair.'
            )
        return self.tokenizer(inputs, return_tensors=return_tensors, **tokenizer_kwargs)

    def _forward(self, model_inputs):
        # `XXXForSequenceClassification` models should not use `use_cache=True` even if it's supported
        model_forward = self.model.forward if self.framework == "pt" else self.model.call
        if "use_cache" in inspect.signature(model_forward).parameters.keys():
            model_inputs["use_cache"] = False
        return self.model(**model_inputs)

    def postprocess(self, model_outputs, function_to_apply=None, top_k=1, _legacy=True):
        # `_legacy` is used to determine if we're running the naked pipeline and in backward
        # compatibility mode, or if running the pipeline with `pipeline(..., top_k=1)` we're running
        # the more natural result containing the list.
        # Default value before `set_parameters`
        if function_to_apply is None:
            if self.model.config.problem_type == "multi_label_classification" or self.model.config.num_labels == 1:
                function_to_apply = ClassificationFunction.SIGMOID
            elif self.model.config.problem_type == "single_label_classification" or self.model.config.num_labels > 1:
                function_to_apply = ClassificationFunction.SOFTMAX
            elif hasattr(self.model.config, "function_to_apply") and function_to_apply is None:
                function_to_apply = self.model.config.function_to_apply
            else:
                function_to_apply = ClassificationFunction.NONE

        outputs = model_outputs["logits"][0]
        outputs = outputs.numpy()

        if function_to_apply == ClassificationFunction.SIGMOID:
            scores = sigmoid(outputs)
        elif function_to_apply == ClassificationFunction.SOFTMAX:
            scores = softmax(outputs)
        elif function_to_apply == ClassificationFunction.NONE:
            scores = outputs
        else:
            raise ValueError(f"Unrecognized `function_to_apply` argument: {function_to_apply}")

        if top_k == 1 and _legacy:
            return {"label": self.model.config.id2label[scores.argmax().item()], "score": scores.max().item()}

        dict_scores = [
            {"label": self.model.config.id2label[i], "score": score.item()} for i, score in enumerate(scores)
        ]
        if not _legacy:
            dict_scores.sort(key=lambda x: x["score"], reverse=True)
            if top_k is not None:
                dict_scores = dict_scores[:top_k]
        return dict_scores
Call docstring:
Classify the text(s) given as inputs.

Args:
    args (`str` or `List[str]` or `Dict[str]`, or `List[Dict[str]]`):
        One or several texts to classify. In order to use text pairs for your classification, you can send a
        dictionary containing `{"text", "text_pair"}` keys, or a list of those.
    top_k (`int`, *optional*, defaults to `1`):
        How many results to return.
    function_to_apply (`str`, *optional*, defaults to `"default"`):
        The function to apply to the model outputs in order to retrieve the scores. Accepts four different
        values:

        If this argument is not specified, then it will apply the following functions according to the number
        of labels:

        - If the model has a single label, will apply the sigmoid function on the output.
        - If the model has several labels, will apply the softmax function on the output.

        Possible values are:

        - `"sigmoid"`: Applies the sigmoid function on the output.
        - `"softmax"`: Applies the softmax function on the output.
        - `"none"`: Does not apply any function on the output.

Return:
    A list or a list of list of `dict`: Each result comes as list of dictionaries with the following keys:

    - **label** (`str`) -- The label predicted.
    - **score** (`float`) -- The corresponding probability.

    If `top_k` is used, one such dictionary is returned per label.

Both ? and ?? are excellent and quick ways to look under the hood of any object in Jupyter.

Inspecting a specific `classifier` function

Let's take a look at the function that does the heavy lifting for our sentiment analysis task: forward().

 # looking at what actually runs the inputs
classifier.forward

What does this function actually do? Let's find out.

# source code of the forward function
classifier.forward??

Signature: classifier.forward(model_inputs, **forward_params)
Docstring: <no docstring>
Source:   
    def forward(self, model_inputs, **forward_params):
        with self.device_placement():
            if self.framework == "tf":
                model_inputs["training"] = False
                model_outputs = self._forward(model_inputs, **forward_params)
            elif self.framework == "pt":
                inference_context = self.get_inference_context()
                with inference_context():
                    model_inputs = self._ensure_tensor_on_device(model_inputs, device=self.device)
                    model_outputs = self._forward(model_inputs, **forward_params)
                    model_outputs = self._ensure_tensor_on_device(model_outputs, device=torch.device("cpu"))
            else:
                raise ValueError(f"Framework {self.framework} is not supported")
        return model_outputs
File:      ~/mambaforge/envs/llm_base/lib/python3.11/site-packages/transformers/pipelines/base.py
Type:      method

We can see that it automatically handles whether we're running a TensorFlow (tf) or PyTorch (pt) model. Then, it makes sure the tensors are on the correct device. Lastly is calls another function, _forward() on the prepared inputs.

We can follow the rabbit hole as far down as needed. Let's take a look at the source of _forward.

# going deeper
classifier._forward??

Signature: classifier._forward(model_inputs)
Docstring:
_forward will receive the prepared dictionary from `preprocess` and run it on the model. This method might
involve the GPU or the CPU and should be agnostic to it. Isolating this function is the reason for `preprocess`
and `postprocess` to exist, so that the hot path, this method generally can run as fast as possible.

It is not meant to be called directly, `forward` is preferred. It is basically the same but contains additional
code surrounding `_forward` making sure tensors and models are on the same device, disabling the training part
of the code (leading to faster inference).
Source:   
    def _forward(self, model_inputs):
        # `XXXForSequenceClassification` models should not use `use_cache=True` even if it's supported
        model_forward = self.model.forward if self.framework == "pt" else self.model.call
        if "use_cache" in inspect.signature(model_forward).parameters.keys():
            model_inputs["use_cache"] = False
        return self.model(**model_inputs)
File:      ~/mambaforge/envs/llm_base/lib/python3.11/site-packages/transformers/pipelines/text_classification.py
Type:      method

Ah, we can see it calls the model of the classifier. This is the distilbert model we saw earlier! Now we can peek under the hood at the actual Transformer LLM.

# the distilbert sentiment analysis model
classifier.model

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): MultiHeadSelfAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)
            (lin1): Linear(in_features=768, out_features=3072, bias=True)
            (lin2): Linear(in_features=3072, out_features=768, bias=True)
            (activation): GELUActivation()
          )
          (output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
        )
      )
    )
  )
  (pre_classifier): Linear(in_features=768, out_features=768, bias=True)
  (classifier): Linear(in_features=768, out_features=2, bias=True)
  (dropout): Dropout(p=0.2, inplace=False)
)

We will breakdown the different pieces in this model later on in the course.

The important takeaway for now is that this shows the main structure of most Transformer LLM models. The changes are mostly incremental from this foundation.

Chris Kroenke's Site