ResourcesGuidesMigrating your infrastructure from Python 2 to Python 3

Migrating your infrastructure from Python 2 to Python 3

Python 2 end of life

As you may have heard, Python 2 is reaching the end of its lifespan. On January 1st, 2020, it will officially stop being supported by the Python Software Foundation (the non-profit that develops Python). This has been in the works since 2015, and is, needless to say, a Really Big Deal.

In fact, the Python Foundation is making such a big deal out of it that they secured agreements with the majority of open-source providers and package maintainers that those providers and maintainers would actively stop supporting Python 2. That’s a lot of people and a lot of packages. The intent of this is to ensure that it is phased out as quickly and painlessly as possible.

So many people are on board with this plan that free support on Stack Overflow and the like is going to virtually disappear, and the Foundation suggests that you may only be able to continue getting support through paid channels going forward. Here’s an actual quote from their site:

If people find catastrophic security problems in Python 2, or in software written in Python 2, then most volunteers will not help fix them. If you need help with Python 2 software, then many volunteers will not help you, and over time fewer and fewer volunteers will be able to help you. You will lose chances to use good tools because they will only run on Python 3, and you will slow down people who depend on you and work with you.

This quote can be read as saying “if there are security or stability issues found with Python 2, you’re on your own and help will not be forthcoming.”

This might sound like bullying at first, but the logic behind this is that they needed the entire community to push for Python 3 in order to “help Python users by improving Python faster.” And that’s certainly happened -- check out this list of which top packages which run on Python 3. Hint: it’s all of them.

Python 2 vs. Python 3 - What's changing?

I won’t get too deep into this, since you can read up to your heart’s desire over on this very, very long FAQ about the upgrade. The gist of it is that there are a handful of features which had to be implemented in a different way at a low level, which meant that backwards compatibility could not be maintained.

(image credit learntocodewith.me)


The Python community is extremely good about avoid breaking changes in general, so this is unlikely to happen again in the foreseeable future -- team addressed all of the issues that would break backwards compatibility in one fell swoop, and gave everybody five years to deal with it on top of that.

Future updates and releases will not have this issue -- for example, instead of a Python 4 release at some point in the future, there will likely be a Python 3.10 release which maintains backwards compatibility.

However, it’s critical that this time you do make sure to upgrade everything to Python 3, to get access to continued community support and ensure that your applications remain stable and secure.

Migration guide for infrastructure configs

Unlike the numerous Python 2 to Python 3 upgrade guides that cover porting your code, this guide will focus on infrastructure. From your Docker files to your CI configuration, it’ll cover all of the places you might need to upgrade Python to keep your deployment pipeline stable and secure.

And, it’ll show you how you can use Datree to ensure that all of your infrastructure across all of your repositories is using Python 3 going forward.

So, without further ado, let’s jump in: the changes you’ll need to make to your infrastructure by January 1st, 2020.

1. Ensure your Dockerfile is pulling the correct image

Ensure that your Dockerfile is pulling a correct, specified version of Python 3. You can do this manually, or you can set up a Datree rule to check everything for you. If you do it with Datree you get an extra win of visibility into all of the repositories you have set up, rather than having to do it manually everywhere and risk missing something.

Extra Datree bonus: once you’ve configured a Python rule, it will make sure that developers who are working on Datree-enabled repos are aware of the fact that they have to update the Python version. Here’s an example of a pull request integration via the GitHub checks functionality:

Prevent use of Docker images containing Python 2, using Datree

2. Docker RUN package manager commands should explicitly specify Python 3

If you find yourself installing Python with Docker’s RUN command using `apt-get` or `yum`, you need to make sure to specify that you’re installing the correct version, or you could get unexpected results.

For `apt-get`, instead of this:

RUN apt-get install python

You want this Docker instruction:

RUN apt-get install python=3

And for `yum`, instead of this:

RUN yum install python

You want this:

RUN yum install python=3

If you don’t have a version specified right now, Docker is installing Python 2, but it will start pulling in Python 3 after January 1st, which could seriously mess up your containers.

3. Identify repositories that rely on Python 2

Even if you don’t have a Python version specified anywhere nearby (maybe you’re relying on automatically provisioned Python 2, coming for example from an unspecified version of Python in your Dockerfile), you can actually use a trick to identify code that might be depending on Python 2: you can check to see if the code is using standard Python 2 libraries that don’t exist in Python 3. There’s a good list here.

This technique isn’t very granular, meaning that you won’t get much information on a file by file basis -- not every file is guaranteed to import an outdated standard library.

Rather, it’s helpful for detecting whether or not a repository contains Python 2 code, which is often more than enough information to clue you in on which code needs upgrading. 

If you need more insight into specific files, there’s one more thing you can do. One of the changes made in Python 3 is the way imports work. You can have a read up on this change here, but the gist of it is that Python 3 requires you to specify whether an import is absolute or relative, while Python 2 did not. This meant that if you had a local module, you could import it without specifying that it’s a relative import.

This lead to a specific pattern being used in a lot of Python 2 where developers would write modules sitting next to each other and import them without specifying whether they were installed packages or local code.

By comparing the files and their imports by eye or with some Regex and simple logic, it’s possible to sniff out whether or not code in a repository is likely Python 2 or 3.

Unfortunately, this is not a method you can automate with Datree. Hopefully, you should only have to do it once!

4. Ensure your serverless functions use a Python 3 runtime

If you have a serverless.yml file that depends on Python 2.7 (using a service like Serverless or Terraform), make sure to upgrade it to a version of 3 manually. Here’s a list of available runtimes for AWS Lambda functions you can choose from.

5. Ensure your CI configurations reference Python 3

If you’re using Travis CI, check the `python` configuration in your .travis.yml file.

If you’re using CircleCI, check the `image` property of the `docker` configuration in your `.circleci/config.yml file`.

If you’re using Jenkins, check the `image` property of the `docker` configuration in your Jenkinsfile is correct.

If you’re using GitHub Actions, check the python-version of the YML config file in your `.github/workflows` directory.

Conclusion

Hopefully we’ve established the significance of the upgrade from Python 2 to Python 3 at the beginning of this article, as otherwise it may seem that guide provides very little benefit. If it’s not clear to you why you should upgrade, consider these points:

You’ll be prevented from using all of the new tools being authored in Python. Since they’re all being written in Python 3 these days, you won’t be able to integrate them into your workflow.

If that doesn’t do it for you (maybe your organization is deeply reliant on Python 2?) consider this: you’ll be exposing yourself to security vulnerabilities that inevitably crop up as Python 2 continues to age, and nobody will be patching it anymore, unless you pay lots of money to consultants.

Finally, think of the developers -- Python 3 provides a much better authoring experience than Python 2 does. It’s simpler to understand and more consistent, meaning its behaviors are easier to reason about, which in turn directly correlates to less bugs.

As you can see, there are a lot of different points of failure throughout your pipeline, and the risk of missing one only scale with the size of your codebase.

While it’s absolutely true that you can check all of these points manually, you may still want to use an automated tool like Datree to not only ensure with a couple of mouse clicks that you’ve found all of the places that Python 2 is being used, but also ensure that it will never sneak back into your codebase again.

Goodbye, Python 2 -- we hardly knew ye! 🙌

Shimon Tolts
Co-founder and CEO
Datree
Other resources
No items found.
Raise your standards,
one commit at a time.