Skip to content

Instantly share code, notes, and snippets.

@jacksmith15
Last active June 14, 2023 18:07
Show Gist options
  • Save jacksmith15/3eed356c5c03008a8ec828f922f58d07 to your computer and use it in GitHub Desktop.
Save jacksmith15/3eed356c5c03008a8ec828f922f58d07 to your computer and use it in GitHub Desktop.

Python Dependency Management

Below is a very rough-and-ready summary of Python dependency management tools I have used.

An (incomplete) list of important factors of dependency management tooling:

  • Reproducibility - will committed files allow us to exactly reproduce an environment across dev machines, CI and production environments
  • Speed of resolution - how fast can a complete environment be resolved given a set of primary dependencies
  • Ease of management - how easy is it to upgrade one/all packages? how easy is it to add a new package? how easy is it to inspect the dependency graph?
  • Availability of packages - can I get the packages I need?
  • Environment size - How big is the resulting environment? If its smaller, its faster to distribute as container layers.

Main options

TODO: Add examples of common workflow operations for each tool.

Plain pip

pip is fairly limited, and generally dependencies are maintained in a requirements.txt file. Development and production dependencies are separated via multiple requirements-*.txt files.

Dependency resolution is not too fast, but not too slow either. Since v20 its guaranteed correct across multiple dependencies (backtracking).

No built-in workflow for virtual environment management, or locking dependencies (but can be approximated with simple scripts).

Reproducible environments can be achieved by generating a lock file, but there is no built-in workflow for this in pip.

Only supports Python packages, but these can include arbitrary binaries with negotiation for different platforms or building from source (see psycopg2-binary).

It is not possible to manage multiple Python versions with pip.

pipenv

Wrapper layer around pip which adds a pre-defined workflow for:

  • managing virtual environments
  • generating a restoring from lockfiles
  • upgrading transitive dependencies
  • project-specific command runner (a la make lint)
  • supports dev and main dependency groups

Dependency resolution is basically the same as pip in terms of speed, environment size and package availability, but it supports reproducible environments and dependency management is more straightforward.

Integrates with pyenv to create environments with the correct Python version.

poetry

Primary goal is a replacement for setuptools, and generally is used not just for developing, but also for building and publishing Python packages.

  • Uses PubGrub dependency resolution, which is particularly fast.
  • Utilises a lock-file automatically, which allows completely reproducible environments.
  • Manages virtual environments, provides tools for upgrades etc (just like pipenv)
  • Supports arbitrary dependency groups for e.g. specific tests vs production.
  • Supports only Python packages (but wheels can include arbitrary compiled binaries with platform negotiation)

Integrates with pyenv to create environments with the correct Python version.

conda

Originally built to solve the following two problems not solved by pip:

  • Ensures all previous constraints are met (i.e. dependency 2 can't override dependency 1)
  • Supports installing platform-specific non-Python dependencies

The former is now the default behaviour of pip (since 20.0.0), and the latter has been partially, but not completely mitigated with support for packaging binaries or custom build systems in Python packages.

It does not create lockfiles allowing for environment reproduction (but this can be achieved by also using conda-lock), and does not support grouping dependencies for tests vs production.

Dependency resolution is quite slow (although perhaps faster with mamba?). The conflict resolution is not great (perhaps also better with mamba?) and doesn't provide any tools for inspecting dependencies.

It manages Python itself natively, and can install non-Python system packages (e.g. make).

bazel

Bazel, and its many variations (Please, Pants, Buck etc) are full monorepo build systems. They work quite differently from any of the above, but are extremely powerful and fast. I can't really explain the full workings, but I think the key takeaway is that a Bazel monorepo probably requires dedicated resource on an ongoing basis, so is likely not appropriate for smaller organisations.

See https://github.com/jacksmith15/bazel-python-demo for an example Python mono-repo set-up with Bazel.

Key advantages are:

  • Lightning fast builds based of full caching of all inputs
  • Selectively test, build based on changes (propagating through any affected transitive dependencies)
  • Share dependencies between build targets (whilst still sandboxing each target) - reduces maintenance overhead as you only need to manage one dependency set
  • ...much more
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment