Stop putting this into your Python Dockerfiles

ENV PYTHONDONTWRITEBYTECODE 1 is very common in python dockerfiles. However most of the time it is useless at best and harmful at worst.

What does PYTHONDONTWRITEBYTECODE do

CPython doesn't execute the code directly. Instead it first compiles the code into bytecode which is cached as .pyc files in __pycache__ directory you know and hate. Having cached the bytecode, the interpreter can skip the compilation of modules that haven't changed. If for some reason you want to prevent python from caching the bytecode, you can set PYTHONDONTWRITEBYTECODE environment variable to a non-empty string.

Why include it in dockerfiles

The first thing that comes to mind when thinking about potential problems with caching things is the possibility of getting a stale version of the resource we are caching. Since code doesn't change during container runtime (unless you are doing some weird stuff) this is unlikely to cause any problems and can be ruled out as a reason to disable bytecode caching in containers.

The other obvious thing that comes to mind is the size of the cache files. Since we want to make our images as small as possible it may make sense not to store the cache files. However in most cases PYTHONDONTWRITEBYTECODE saves us nothing at all!

pip doesn't care about this option

Since you usually aren't running your app during the build process, no pyc files are generated for your code. Now you need to peek into a freshly created vritualenv to see if they are generated for your dependencies during installation. If you do that you'll see that they indeed are and maybe your project has a lot of them, so eliminating them could surely shave off some MBs from your image. You'll be disappointed to find out that this doesn't happen with PYTHONDONTWRITEBYTECODE=1:

export PYTHONDONTWRITEBYTECODE=1
virtualenv venv
. venv/bin/activate
pip install django
find . **/django/**/*.pyc

It should come as no surprise though, as pip doesn't generate the bytecode by importing the packages but by explicitly compiling them. To prevent this you can pass --no-compile flag to pip install. Poetry doesn't have a no-compile option so you'd need to export the lock file to requirements.txt and use pip with no-compile to avoid having pyc files for your deps. While using no-compile might make sense for some workloads, I don't think it's worth it in most cases, especially for web apps and soon you'll see why.

How much does it save you

This varies widely but if you are prudent with your usage of dependencies it should be less than 1MB. I tested this on the largest project from work and the difference was 2Mb (from 500Mb to 498Mb).

How it hurts you

Disabling bytecode caching means the modules need to be compiled during import time. Since most imports occur at startup this means that disabling bytecode caching mostly impacts the startup of the application. This is especially bad if you are using a pre-fork WSGI server like Gunicorn without preloading the app because each worker will need to go through the full compilation process, wasting CPU cycles and slowing down the worker startup. It will also impact worker restarts if you used something like gunicorn max-requests in your config. Since timeout for gunicorn workers applies to their startup as well, you may even have workers timing out on startup!

TL;DR

You probably want pip install --no-compile instead of PYTHONDONTWRITEBYTECODE=1, even though you probably shouldn't use either one of them.

Share:

If you've come this far with the article you may want to know a thing or two about me if you don't already. You can also read other blog posts or about stuff I've learned recently.

This website is open source. If you've come across a mistake please let me know there. For other types of feedback you can reach out to me through email or social-media.

🚧The website is still under construction!🚧