Python Security - Food for thought

Mr_McBride · July 1, 2021, 4:09pm

We’ve all seen the recent news about the educational group that injected insecure code into the kernel codebase and we’ve heard about malware being found in various Linux distro’s repositories.

How many have thought of Python as a threat vector? Let’s consider this for a minute. Many that learn Python and stick with it learn the basic data types and then learn to leverage the standard library and many have worked on projects that require a 3rd party module. These developers are most certainly familiar with pip. Pip pulls modules from PyPI, which is a 3rd party repository.

Many of the modules on PyPI have hash signatures, but it is not a requirement. This threat vector comprises of two different types of threat actors: 1) a malicious developer who includes undesirable code in their module, or 2) an attacker who uploads a modified package with undesirable code that is owned by a legitimate developer.

There are steps that can/have been taken to limit #2. For example, the use of TLS to encrypt the transfer of the code and including hash signatures (although if an attacker can replace a package how do we know that they didn’t replace the hash signature as well?). #1 is much harder to detect and defend against. Anyone can open an account and upload their own module(s). While some of us can read the source code and find undesirable code, not all of us can do this. Considering the size of some packages, it would take a long time do this.

There is a project in development that is looking at this from a security standpoint. Also, there are security scanners that will scan and identify outdated and known bad packages, but so far all of these that I am familiar with have a price tag on them or only work in specific environments like github or bitbucket.

For me, I develop using venv and will start using ‘pinned’ entries in the requirements.txt file. This means that every module will be pinned to a specific version. Doing this has it’s drawbacks, but it does allow use of the --require-hashes with pip when building a new virtual instance of your application or sharing your application with others. In this case, the hash signature must match the hash listed in requirements.txt or the module install will fail. Also, sudo is not needed to run pip in a venv or virtual instance like it is when using pip to install to the system site_packages. This helps protect the system from a privilege escalation attack.

Their are other security concerns at play, but I wanted to get some of thinking about how they install Python modules in a production environment.

What are your thoughts around this?

ak2020 · July 2, 2021, 4:47am

I think if we look back to the Unix philosophy, trust was part of its open-ness, so by default most user files were readable by other users; I’m pretty sure this is still the case on Linux unless it’s manually over-ridden though I haven’t really looked much into it as usually I am the only user on my Linux machines.

This good-will and trust surely extends to cases in which code / libraries are shared, in my opinion. I personally think more rigour and independent auditing could really benefit everyone, though there might be questions about how to fund this.

Given how much deliberate invasion of privacy we could be exposed to in today’s world if we go with many big-tech solutions, in my opinion, to go along thinking that we’re not vulnerable, to some extent, to malicious code injection just seems wishful, to my mind.

I really didn’t like the way in which some researchers went about trying to demonstrate a point, though their demonstration should definitely have rung sone alarm bells, in my opinion.

It is for this reason that I think, ideally, when code is brought to the community table, it probably benefits most from auditing if there could be a conflict of interest in its submission.

Just my thoughts

Mr_McBride · July 2, 2021, 12:36pm

I’m still researching, but I’d love to find out if PyPI has scheduled independent audits. Although, this may only be for their internal security processes.

For me, the best case scenario would be to learn that PyPI is doing some sort of code-scanning.