We’ve all seen the recent news about the educational group that injected insecure code into the kernel codebase and we’ve heard about malware being found in various Linux distro’s repositories.
How many have thought of Python as a threat vector? Let’s consider this for a minute. Many that learn Python and stick with it learn the basic data types and then learn to leverage the standard library and many have worked on projects that require a 3rd party module. These developers are most certainly familiar with pip. Pip pulls modules from PyPI, which is a 3rd party repository.
Many of the modules on PyPI have hash signatures, but it is not a requirement. This threat vector comprises of two different types of threat actors: 1) a malicious developer who includes undesirable code in their module, or 2) an attacker who uploads a modified package with undesirable code that is owned by a legitimate developer.
There are steps that can/have been taken to limit #2. For example, the use of TLS to encrypt the transfer of the code and including hash signatures (although if an attacker can replace a package how do we know that they didn’t replace the hash signature as well?). #1 is much harder to detect and defend against. Anyone can open an account and upload their own module(s). While some of us can read the source code and find undesirable code, not all of us can do this. Considering the size of some packages, it would take a long time do this.
There is a project in development that is looking at this from a security standpoint. Also, there are security scanners that will scan and identify outdated and known bad packages, but so far all of these that I am familiar with have a price tag on them or only work in specific environments like github or bitbucket.
For me, I develop using venv and will start using ‘pinned’ entries in the requirements.txt file. This means that every module will be pinned to a specific version. Doing this has it’s drawbacks, but it does allow use of the --require-hashes with pip when building a new virtual instance of your application or sharing your application with others. In this case, the hash signature must match the hash listed in requirements.txt or the module install will fail. Also, sudo is not needed to run pip in a venv or virtual instance like it is when using pip to install to the system site_packages. This helps protect the system from a privilege escalation attack.
Their are other security concerns at play, but I wanted to get some of thinking about how they install Python modules in a production environment.
What are your thoughts around this?