Lesson 1: A Python Environment for LLMs


Chris Kroenke


September 27, 2023

Creating an LLM python environment with mamba and pip


To use an open-source LLM, the first thing we need is a programming environment for the model. The environment is a computing ecosystem with all of the software libraries and packages the LLM needs.


Setting up an environment can be one of the most time-consuming and challenging tasks in Machine Learning. There is no silver bullet, as you can see by the many approaches that folks have come up with for this problem.

It’s ok to feel lost or struggle with setting up the environment! That is totally normal. There is good reason for all of the memes in the ML community about the pain of dealing with CUDA drivers…

Please take some comfort in the fact that once we build the environment, most of the other tasks will seem easy by comparison.

Here we will build a useful base environment. Something we can use for prototyping ML models, writing blog posts, making plots, etc. The goal is to give you a powerful starting point for learning and experimenting. Down the road, we can make leaner environments focused on more specific apps.

Now, let’s start building our python environment for LLMs.

The Base Environment: mamba

mamba is is a highly optimized C++ wrapper build around the very popular Conda package manager. It is faster and more pleasant to use that pure conda.

If you are familiar with conda, then you already know mamba by proxy. Any conda command can be drop-in replaced with a call to mamba instead.


A conda horror story: I once ran a simple conda command on a GPU cluster that took more than one day to complete. The same mamba command finished in less than 10 minutes.

Conda’s stability changes a lot by version, whereas mamba tends to stay fast and reliable.

Installing mamba

mamba offers an installation script that handles all of the setup for us. But in case you run into any issues, here is a link to the official installation instructions.

Next we install mamba on a Mac computer.


The installation steps are identical for Linux, but they change a bit for Windows.

mamba on Mac

How do we know which mamba installation script to use? The uname shell command comes to the rescue. It returns information about the computer and system it is called from. We can use uname to automatically grab the right installation script for our specific Mac.

The bash commands below will do the following:
- Find the appropriate mamba Mac installation script.
- Download the script from the official mamba repo.

# find the name of the appropriate installation script
script_name="Mambaforge-$(uname)-$(uname -m).sh"

# mamba repo url with all the installation scripts

# download the appropriate script
curl -L -O ${script_repo}/${script_name}

Note that this command downloads the script into the directory that you’re running it from.

Once the shell script is downloaded, run it to install mamba:

# run the Mambaforge installer
bash Mambaforge-$(uname)-$(uname -m).sh

If you prefer to download the script directly, grab it from here: https://github.com/conda-forge/miniforge/releases/

The script now steps through the installation process. It will prompt you for some info along the way, but you can accept all of the defaults for now (i.e. don’t type anything in, just hit enter).

Once mamba is installed, we are ready to create a base python environment.

Creating a mamba python environment

We use mamba to install a specific version of python. For example, python versions 3.10 and 3.11 are popular with current open-source LLMs.

Our LLM environment will be called, quite creatively, llm-env. Let’s now use mamba to create the environment with python 3.11.

# create the base python environment
mamba create -n llm-env python=3.11

Now that we have a base environment, we can activate it and start installing the python packages we will need to run LLMs.

Bringing in pip

We could install all of the needed python libraries with mamba. However, we will use python’s built in pip package manager instead.

This is because we’ll rely on some new and state-of-the-art code repos. Repos that are not always available via mamba. And, more than that, sometimes the repos need extra installation steps which are better handled through pip. To recap: pip offers us more flexibility and power than mamba when installing bleeding edge LLM libraries.

First, make sure that the new llm-env environment is activated. Then we’ll install a few basic libraries that just about all LLM applications need.

Installing pytorch

The basic library we will need is pytorch. This is the main library that handles most of the heavy lifting for python Neural Networks.

# install the pytorch libraries
pip install torch torchvision torchaudio

Note that this will also grab and install pytorch’s many dependencies.

Installing helper libraries

Next we install a few helper libraries for the rest of the course. These are libraries for editing Jupyter notebooks, making plots, and writing a blog. We also install the popular scientific package scipy, which many ML libraries rely on.

# install the jupyter notebook library
pip install jupyterlab

# install matplotlib for drawing plots
pip install matplotlib

# library for writing blogs
pip install nbdev 

# helpful python utilities
pip install fastcore

# a powerful scientific library
pip install scipy 

Aside: Installing Rust

Many LLMs rely on the Rust programming language for fast and optimized tokenizers. Run the short command below to install Rust on our systems and leverage the optimized tokenizers:

# install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

Installing HuggingFace libraries

Next up, we install a suite of HuggingFace libraries for dealing with LLMs. With these libraries you’ll be able to fully leverage the powerful tools offered by the HuggingFace team.

We won’t use all of them initially, but they will be available should you ever need them in other personal projects.

# install the main LLM library
pip install transformers

# library for optimized LLM training 
pip install accelerate

# library for optimized LLM inference
pip install optimum

# quick access to great data utilities
pip install datasets

# install an optimized tokenizer library
pip install setuptools-rust
pip install tiktoken

Congrats! We have now created a powerful python environment for LLMs. Going forward, we will use this llm-env in the rest of the course.


This notebook covered the basics of setting up a python environment for LLMs. We used the mamba package manager to install a base python environment. Then, we used pip to install a set of libraries for running and learning about LLMs.

There are two appendixes below. The first appendix goes into more details about why environments can be challenging, and why we need them in the first place.

The second appendix covers how to install NVIDIA’s GPU libraries on a fresh Ubuntu 22.04 machine. If you are running this on a Linux machine, the second appendix also installs some powerful CUDA-only libraries that speed up LLMs even more. We will come back to this section later in the course when fine-tuning and augmenting our LLMs.

Appendix 1: Silent Failures in ML Models

LLMs, and Machine Learning models more generally, often fail in different ways than other software. For instance, classic bugs in regular software are things like: type mismatches, syntax errors, compilation errors, etc. In other words, failures that stem from a clearly wrong operation (aka a bug) that snuck into the code. We wanted the computer to do X, but we told it by accident to do Y instead.

In contrast, ML models often have “silent” failures. There is no syntax or compilation error - the program still runs and completes fine. But, there is something wrong in the code: adding where we should have subtracted, grabbing the wrong element from a list, or using the wrong mathematical function. There is no type checker or compiler that would (or even could, for now) catch these errors.

The fixes for these silent failures are clear:
- Carefully inspecting the code.
- Monitoring and validating the model outputs.
- Clarity in both the algorithms and models we are running.

There is another, unfortunate kind of silent failure: version mismatches. Version mismatches happen when we use a different version of a specific programming library from the version that the model was originally created with.

Since the software libraries we rely on are updated frequently, both subtle and major changes in their internals can affect a model’s output. These failures are unfortunately immune to our careful, logical checks.

Avoiding these silent failures is the main reason for being consistent and disciplined with our model’s programming environment. A good environment setup keeps us focused on the important, conceptual parts of our pipeline instead of getting bogged down managing software versions.

An environment for the future

There is a nice benefit to spending this much time up front on our environment.

We now not only have a specialized environment to run and fine-tune an LLM. But it is also a springboard to keep up with the state of the art in the field. A setup in which to bring in other groundbreaking improvements. And, to weave in the latest and greatest models as they are released. The LLM world is now our oyster, and llm-env the small grain of sand-would-be-pearl.

Appendix 2: Installing NVIDIA Drivers and CUDA Libraries on a fresh Ubuntu 22.04 machine

There are three things we need to install to run ML models on NVIDIA GPUs:
- NVIDIA Drivers.
- CUDA Libraries.
- cuDNN Libraries.

Folks often talk about “CUDA” as referring to all three of the above.

But it’s important to keep things clear and separate. Let’s use a music analogy to help us along: imagine the GPU is an instrument, and running an LLM is like playing a song.

The NVIDIA Drivers let us pick up the instrument (GPU) with our hands and get ready to play. It’s the first step in making any music at all.

The CUDA libraries are the basic music theory (scales, chords, etc) and sheet reading that we need to play songs well.

The cuDNN library is like a set of advanced skills and muscle memory, built on top of lots of practice, that let us really shred.

With this in mind, let’s install the NVIDIA Drivers and CUDA libraries on a fresh Ubuntu 22.04 machine.

Helper libraries

First, some best practice. Make sure to update the Ubuntu package list:

# update the ubuntu package list
sudo apt update

There are two Ubuntu packages that are worth installing:
- software-properties-common - build-essential

software-properties-common is a set of tools for adding and managing software repositories. It makes our life a bit easier.

build-essential contains a list of packages that are essential for building Ubuntu packages. It has software key for development like the GNU Compiler Collection (GCC) and GNU Make. It also has the tools to build and install projects from source (aka straight from the repo’s folder).

# install useful linux packages
sudo apt install software-properties-common
sudo apt install build-essential

Installing NVIDIA Drivers

We’ll use one of the most reliable and straightforward methods to install the NVIDIA drivers: the graphics drivers PPA.

# add the graphics drivers ppas
sudo add-apt-repository ppa:graphics-drivers/ppa

# update the package list again
sudo apt update

Now we can install the nvidia drivers themselves. As of writing, the 535 version of the driver is stable and supports a good number of GPU cards.

# install the nvidia drivers
sudo apt install nvidia-driver-535

After installing the drivers, make sure to reboot your system before going forward!

# restart the system after installing the drivers
sudo reboot

Once the machine is back up, run the following command to check if the drivers were installed correctly.

# should show us any available gpus

Installing CUDA

With the drivers working, we can now install the CUDA library. The CUDA library has a set of ML tools optimized for NVIDIA GPUs.

The example below uses a local .dev installer for CUDA version 12.1. The steps comee straight from the official CUDA website.

There is a lot going on in the steps below. But it has been, in my experience, one of the most straightforward and reliable ways to install specific CUDA versions. Other methods may be easier, but it can be harder to pin down specific versions which leads to many headaches down the road.

# full steps to install CUDA 12.1 libraries 

# setting up the repo
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/

# installing the libraries
sudo apt-get update
sudo apt-get -y install cuda

Here is a full breakdown of what the bash commands above did. Feel free to skim this list on a first pass. The key takeaway: these commands install CUDA version 12.1 on our system. 12.1 is the latest version of CUDA as of writing used in many of the bleeding edge LLM libraries:

  1. Download the Pinning File: Utilize wget to download the cuda-ubuntu2204.pin file from NVIDIA’s developer website. This file aids in managing APT preferences regarding the CUDA repository.

  2. Relocate and Rename the Pinning File: Move the downloaded cuda-ubuntu2204.pin file to the /etc/apt/preferences.d/ directory, and rename it to cuda-repository-pin-600. This step ensures that APT recognizes the preferences for the CUDA repository.

  3. Fetch the CUDA Repository Package: Download the Debian package for setting up the CUDA repository on your system. Ensure to get the package corresponding to CUDA version 12.1 for Ubuntu 22.04.

  4. Deploy the CUDA Repository Package: Utilize dpkg to install the downloaded Debian package, which in turn sets up the CUDA repository on your system.

  5. Transfer the GPG Keyring File: Copy the GPG keyring file from the CUDA repository directory to your system’s keyrings directory. This file is crucial for verifying the authenticity of packages from the CUDA repository.

  6. Refresh the APT Package List: Instruct APT to update its list of available packages. This step incorporates the information from the newly added CUDA repository.

  7. Initiate CUDA Installation: Command APT to install the cuda package along with all its necessary dependencies from the CUDA repository. The -y flag is used to automate the process by affirming “yes” to any prompts encountered.


After installing CUDA, we need to run the following lines to modify our ~/.bashrc file. These changes make sure that we can actually see and find the newly installed CUDA libraries:

# modify paths so we can find CUDA binaries
echo 'export PATH=/usr/local/cuda-12.1/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-12.1/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
source ~/.bashrc

Then reboot the system on more time, with feeling:

sudo reboot

Once the machine is back online, run the following command to check if CUDA was installed correctly:

# this command shows us the CUDA version
nvcc --version

# # it should output something like this:
#   nvcc: NVIDIA (R) Cuda compiler driver
#   Copyright (c) 2005-2023 NVIDIA Corporation
#   Built on Tue_Feb__7_19:32:13_PST_2023
#   Cuda compilation tools, release 12.1, V12.1.66
#   Build cuda_12.1.r12.1/compiler.32415258_0

Installing accelerated CUDA libraries on Linux

If you are running on a Linux machine, you’ll have access to many powerful libraries to speed up LLM training and inference even more. Not all of these are available on Mac or Windows, but hopefully that changes with time.


Here we also see the first instance of needing a pip install with extra steps - something we could not have done with mamba alone.

# install the optimized CUDA LLM libraries:

# library to massively speed up Transformer LLMs
pip install flash-attn --no-build-isolation

# library crucial for quantized LLMs
pip install bitsandbytes 

# xformers library from Meta
pip install install -U xformers --index-url https://download.pytorch.org/whl/cu121

And that does it! Phew, we made it. After following the above, your Ubuntu 22.04 machine stands ready at the bleeding edge of LLMs.