Lesson 1: A Python Environment for LLMs


Chris Kroenke


September 27, 2023

Creating an Large Language Model (LLM) python environment with mamba and pip


The first thing we need to run an open-source LLM is a programming environment for the model. An environment is a programming ecosystem with all of the software libraries and packages that the LLM needs.


Setting up an environment is one of the most time-consuming and challenging tasks in Machine Learning. There is no silver bullet, as we can see by the multiple tools and approaches available to tackle the problem.

It is ok to struggle or feel lost when setting up the environment! That is very normal and even expected. There is a good reason for all of the memes in the ML community about what a pain it is to deal with CUDA drivers…

Take some comfort in the fact that once we build the environment, most of the other tasks will feel easy by comparison.

In this notebook we create a useful starter environment. An environment we can use to prototype ML models, write blog posts, explore python code, make plots, etc.

The goal is to have a powerful launch pad for learning and experimenting. Down the road, we can make leaner environments focused on more specific apps.

Now, let’s start building our python LLM environment.

The Base Environment: mamba

There are many tools to create python environments. In this course, we will use mamba to make and manage our environments.

mamba is is a highly optimized C++ wrapper build around the very popular Conda package manager.

If you are familiar with Anaconda, then you already know mamba by proxy. Any call to conda can be drop-in replaced with a call to mamba instead.


A conda horror story: Once I ran a simple conda command on a GPU cluster that took more than a day to complete. The same mamba command finished in less than 10 minutes.

Conda’s stability can change a lot between versions, whereas mamba tends to remain fast and reliable.

Installing mamba

There is a mamba installation script that handles all of the setup for us. But in case you run into any issues, here is a link to the official installation instructions.

In the cells below we install mamba on a Mac computer.


The installation steps are identical for Linux, but they change a bit for Windows.

mamba on Mac

There are many mamba installation scripts for different computers and OSes. How do we know which script is right for us? Here is where the handy uname shell command comes to the rescue.

uname tells use about the computer and system it is running from. This information lets us automatically grab the correct installation script for our specific Mac.

Run the bash commands below to do the following:
- Find the appropriate mamba Mac installation script.
- Download the script from the official mamba repo.

# find the name of the appropriate installation script
script_name="Mambaforge-$(uname)-$(uname -m).sh"

# mamba repo url with all the installation scripts

# download the appropriate script
curl -L -O ${script_repo}/${script_name}

Note that this command downloads the script into the directory that you’re running it from.

Next, we can run the downloaded shell script to install mamba.

The script will step through all of installation steps. It will prompt you for some info along the way, but we can accept all of the defaults for now (i.e. don’t type anything in, just hit enter).

# run the Mambaforge installer
bash Mambaforge-$(uname)-$(uname -m).sh

If you prefer to download the script directly, grab it from here: https://github.com/conda-forge/miniforge/releases/

Once mamba is installed, we are ready to create a base python environment.

Creating a mamba python environment

We can use mamba to install a specific version of python. As of writing, python versions 3.10 and 3.11 are popular with the ML community.

Our LLM environment will be called, quite creatively, llm-env.

Let’s now use mamba to create the environment with python 3.11.

# create the `llm-env` python 3.11 environment
mamba create -n llm-env python=3.11

Now that we have a base environment, we can activate it and start installing the python packages we need to run LLMs.

Bringing in pip

We could install all of the needed python libraries with mamba. However, we will use python’s built in pip package manager instead.

Some folks might understandable gripe with this choice. After hyping up mamba so much, why would we bring in another package amanger?

The reason is because we’re relying on some new and state-of-the-art libraries. Some of these libraries are not always available via mamba. And, more than that, sometimes the repos need extra installation steps which are better handled through pip.

To recap: we are bringing in pip because if offers more flexibility than mamba when installing bleeding edge LLM libraries.

First, make sure that the new llm-env environment is activated. Then we’ll install a few basic libraries that just about all LLM applications need.

Installing pytorch

The main library we need is pytorch. PyTorch does most of the heavy lifting for Neural Networks in python.

# install the pytorch libraries
pip install torch

Note that this command also installs all of the additional libraries that pytorch’s depends on.

Installing helper libraries

We will also install some helper libraries for the rest of the course. We also install the popular scientific package scipy, which many ML libraries rely on.

# install the jupyter notebook library
pip install jupyterlab

# install matplotlib for drawing plots
pip install matplotlib

# library for writing blogs
pip install nbdev 

# helpful python utilities
pip install fastcore

# a powerful scientific library
pip install scipy 

Aside: Installing Rust

Many LLMs rely on the Rust programming language for fast and optimized tokenizers. Run the command below to install Rust on your system and leverage these optimized tokenizers:

# install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Installing HuggingFace libraries

Next up, we install a suite of HuggingFace libraries. These libraries let us fully leverage the powerful tools offered by the HuggingFace team.

We won’t use all of them initially, but they will be available should you ever need them in other, personal projects.

# install the main LLM library
pip install transformers

# library for optimized LLM training 
pip install accelerate

# library for optimized LLM inference
pip install optimum

# quick access to great data utilities
pip install datasets

# install an optimized tokenizer library
pip install setuptools-rust
pip install tiktoken

Congrats! We have now created a powerful python environment for LLMs. We will use this llm-env going forward in the rest of the course.


This notebook covered the basics of setting up a python environment for LLMs. We used the mamba package manager to install a basic python environment. Then, we used pip to install a set of libraries for running and learning about LLMs.

There are two appendixes below. The first appendix goes into more details about why creating environments is challenging, and why we even need them in the first place.

The second appendix covers how to install NVIDIA’s GPU libraries on a fresh Ubuntu 22.04 machine. If you are running this on a Linux machine, the second appendix also installs some powerful CUDA-only libraries that speed up LLMs even more. We will come back to this section later in the course when fine-tuning and augmenting our LLMs.

Appendix 1: Silent Failures in ML Models

LLMs, and Machine Learning models in general, often fail in different ways than other software. For instance, classic bugs in regular software are things like: type mismatches, syntax errors, compilation errors, etc. In other words, failures that stem from a clearly wrong operation (aka a bug) that snuck into the code. We wanted the computer to do X, but we told it by accident to do Y instead.

In contrast, ML models often have “silent” failures. There is no syntax or compilation error - the program still runs and completes fine. But, there is something wrong in the code: adding where we should have subtracted, grabbing the wrong element from a list, or using the wrong mathematical function. There is no type checker or compiler that would (or even could, for now) catch these errors.

The fixes for these silent failures are clear:
- Carefully inspect the code.
- Monitor and validate the model’s inputs and outputs.
- Clarity in both the algorithms and models we are running.

There is another, unfortunate kind of silent failure: version mismatches.

Version mismatches happen when we use a programming library with a different version than the one the model was originally trained with.

Since the software libraries we rely on are updated frequently, both subtle and major changes in their internals can affect a model’s output. These failures are unfortunately immune to our careful, logical checks.

Avoiding these silent failures is the main reason for being consistent and disciplined with our model’s programming environment. A good environment setup keeps us focused on the important, conceptual parts of our pipeline instead of getting bogged down managing software versions.

An environment for the future

There is a nice benefit to spending this much time up front on our environment.

Not only do we now have a specialized environment to run and fine-tune an LLM. But, it is also a springboard to keep up with the state of the art in the field. We now have a way to bring in other groundbreaking improvements as they are released. And, a way to weave in the latest and greatest models. The LLM world is now our oyster, and llm-env the small grain of sand pearl-in-waiting.

Appendix 2: Installing NVIDIA Drivers and CUDA Libraries on a fresh Ubuntu 22.04 machine

There are three things we need to install to run ML models on NVIDIA GPUs:
- NVIDIA Drivers.
- CUDA Libraries.
- cuDNN Libraries.

Folks often talk about “CUDA” as a loose mix of all three.

However, it’s important to keep things clear and separate. Let’s use a music analogy to help us along: imagine the GPU is an instrument, and running an LLM is like playing a song.

The NVIDIA Drivers let us pick up the instrument (GPU) with our hands and get ready to play. It’s the first step in making any music at all.

The CUDA libraries are the basic music theory (scales, chords, etc) and sheet reading that we need to play songs well.

The cuDNN library is like a set of advanced skills and muscle memory, built on top of lots of practice, that let us really shred on the instrument.

With this in mind, let’s install the NVIDIA Drivers and CUDA libraries on a fresh Ubuntu 22.04 machine.

Helper libraries

First, some best practice for system administration: make sure to update the Ubuntu package list:

# update the ubuntu package list
sudo apt update

There are two Ubuntu packages that are worth installing:
- software-properties-common - build-essential

software-properties-common includes a set of tools for adding and managing software repositories. It makes our life a bit easier.

build-essential has a list of packages that are critical for building Ubuntu software. It has tools like like the GNU Compiler Collection (GCC) and GNU Make that are key for development.

# install useful linux packages
sudo apt install software-properties-common
sudo apt install build-essential

Installing NVIDIA Drivers

We’ll use one of the most reliable and straightforward way of installing NVIDIA drivers: the graphics drivers PPA.

# add the graphics drivers ppas
sudo add-apt-repository ppa:graphics-drivers/ppa

# update the package list again
sudo apt update

Now we can run the command below to install the actual nvidia drivers. As of writing, the 535 version of the driver is stable and supports a good number of GPU cards.

# install the nvidia drivers
sudo apt install nvidia-driver-535

After installing the drivers, make sure to reboot your system before going forward!

# restart the system after installing the drivers
sudo reboot

Once the machine is back up, run the following command to check if the drivers were installed correctly.

# this should show us any available gpus

Installing CUDA

With the drivers working, we can now install the CUDA library. The CUDA library is a set of ML tools optimized for NVIDIA GPUs.

The example below uses a local .dev installer for CUDA version 12.1. The steps come straight from the official CUDA website.

There is a lot going on in the steps below. But it has been, in my experience, one of the most straightforward and reliable ways to install specific CUDA versions.

# full steps to install CUDA 12.1 libraries 

# setting up the repo
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/

# installing the libraries
sudo apt-get update
sudo apt-get -y install cuda

Here is a full breakdown of what the bash commands above did. Feel free to skim this list on a first pass. The key takeaway: these commands install CUDA version 12.1 on our system. 12.1 is the latest version of CUDA as of writing used in many of the bleeding edge LLM libraries:

  1. Download the Pinning File: Utilize wget to download the cuda-ubuntu2204.pin file from NVIDIA’s developer website. This file aids in managing APT preferences regarding the CUDA repository.

  2. Relocate and Rename the Pinning File: Move the downloaded cuda-ubuntu2204.pin file to the /etc/apt/preferences.d/ directory, and rename it to cuda-repository-pin-600. This step ensures that APT recognizes the preferences for the CUDA repository.

  3. Fetch the CUDA Repository Package: Download the Debian package for setting up the CUDA repository on your system. Ensure to get the package corresponding to CUDA version 12.1 for Ubuntu 22.04.

  4. Deploy the CUDA Repository Package: Utilize dpkg to install the downloaded Debian package, which in turn sets up the CUDA repository on your system.

  5. Transfer the GPG Keyring File: Copy the GPG keyring file from the CUDA repository directory to your system’s keyrings directory. This file is crucial for verifying the authenticity of packages from the CUDA repository.

  6. Refresh the APT Package List: Instruct APT to update its list of available packages. This step incorporates the information from the newly added CUDA repository.

  7. Initiate CUDA Installation: Command APT to install the cuda package along with all its necessary dependencies from the CUDA repository. The -y flag is used to automate the process by affirming “yes” to any prompts encountered.


After installing CUDA, we need to run the following lines to modify our ~/.bashrc file. These changes make sure that we can actually see and find the newly installed CUDA libraries:

# modify paths so we can find CUDA binaries
echo 'export PATH=/usr/local/cuda-12.1/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
source ~/.bashrc

Then reboot the system on more time, with feeling:

sudo reboot

Once the machine is back online, run the following command to check if CUDA was installed correctly:

# this command shows us the CUDA version
nvcc --version

# # it should output something like this:
#   nvcc: NVIDIA (R) Cuda compiler driver
#   Copyright (c) 2005-2023 NVIDIA Corporation
#   Built on Tue_Feb__7_19:32:13_PST_2023
#   Cuda compilation tools, release 12.1, V12.1.66
#   Build cuda_12.1.r12.1/compiler.32415258_0

Installing accelerated CUDA libraries on Linux

If you are running on a Linux machine, you’ll have access to many powerful libraries to speed up LLM training and inference. Not all of these are available on Mac or Windows, but hopefully that changes with time.


Here we also see the first instance of needing a pip install with extra steps - something we could not have done with mamba alone.

# install optimized CUDA LLM libraries

# library to massively speed up Transformer LLMs
pip install flash-attn --no-build-isolation

# library crucial for quantized LLMs
pip install bitsandbytes 

# xformers library from Meta
pip install install -U xformers --index-url https://download.pytorch.org/whl/cu121

And that does it! Phew, we made it. After following the above, your Ubuntu 22.04 machine stands ready at the bleeding edge of LLMs.