Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
e058486
Fix some docstring typos.
erikrose Sep 3, 2015
62ac258
Delete dead _copy_dist_from_dir().
erikrose Sep 11, 2015
9211d6e
Style tweaks
erikrose Sep 11, 2015
3303be0
Teach requirements parser how to parser hash options, like --sha256.
erikrose Sep 3, 2015
1e41f01
Add checks against requirements-file-dwelling hashes for most kinds o…
erikrose Sep 9, 2015
11dbb92
Switch from --sha256 etc. to a single option: --hash.
erikrose Sep 24, 2015
0c17248
Pass PEP 8 checks.
erikrose Sep 24, 2015
b0ef6ab
Fix unicode errors in unit tests of Hashes under Python 3.
erikrose Sep 25, 2015
f3f73f1
Remove the -H spelling for --hashes.
erikrose Sep 25, 2015
910b82c
--require-hashes no longer implies --no-deps.
erikrose Sep 25, 2015
4f67374
Correct the level of the Wheel Cache heading.
erikrose Oct 7, 2015
14506f8
Document hash-checking mode.
erikrose Oct 7, 2015
bf0ff80
pep8 fixes
erikrose Oct 7, 2015
c62cd71
Add --require-hashes option to pip download and pip wheel.
erikrose Oct 7, 2015
09008bf
Add `pip hash` command.
erikrose Oct 8, 2015
d477ae6
Add warning about `python setup.py install`.
erikrose Oct 8, 2015
7a0a97c
Merge 'develop' into 'hashing' to bring the latter up to date.
erikrose Oct 8, 2015
0e6058b
Change head() method to an attr in hashing exceptions. Tweak English.
erikrose Oct 8, 2015
6f828c3
Correct and clarify docs and comments.
erikrose Oct 9, 2015
52111c1
Demote package-is-already-installed log message to debug-level.
erikrose Oct 9, 2015
b95599a
Change _good_hashes() to a whitelist.
erikrose Oct 9, 2015
3824d73
Revise what hashes protect you against.
erikrose Oct 9, 2015
be4e315
Rewrap args of unpack_http_url() to match the style in send(), above.
erikrose Oct 9, 2015
304c90a
Break after initial """ in multi-paragraph docstrings in exceptions m…
erikrose Oct 9, 2015
05b7ef9
Rename "goods" to "allowed" for clarity.
erikrose Oct 11, 2015
f35ce75
Make "installation bundles" less of an official term.
erikrose Oct 11, 2015
d541304
Allow === as a pinning operator.
erikrose Oct 11, 2015
76983f3
Restore documentation about alternate hash algorithms in URLs.
erikrose Oct 12, 2015
be6dccb
Factor up the idiom of reading chunks from a file until EOF.
erikrose Oct 12, 2015
9e5e34e
Add --algorithm flag to `pip hash`.
erikrose Oct 12, 2015
4c405a0
Restore deleted _copy_dist_from_dir().
erikrose Oct 12, 2015
dcf39bf
Add imports to make the pep8 checker happy about the dead _copy_dist_…
erikrose Oct 12, 2015
7c5e503
Remove unneeded triple quotes.
erikrose Oct 12, 2015
e23f596
Consolidate hash constants in pip.utils.hashing.
erikrose Oct 12, 2015
925e4b4
Fix false hash mismatches when installing a package that has a cached…
erikrose Oct 16, 2015
622b430
Typos and docstrings
erikrose Oct 20, 2015
ee9d6fb
Modernize recommendations to not call setuptools-level things directly.
erikrose Oct 20, 2015
3af5ffa
Improve flow of --require-hashes help message.
erikrose Oct 20, 2015
f38fc90
Obey --require-hashes option in requirements files.
erikrose Oct 21, 2015
4488047
Update the wheel-cache-disabling docs with our latest understanding o…
erikrose Oct 21, 2015
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions docs/reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,4 @@ Reference Guide
pip_show
pip_search
pip_wheel


pip_hash
49 changes: 49 additions & 0 deletions docs/reference/pip_hash.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
.. _`pip hash`:

pip hash
------------

.. contents::

Usage
*****

.. pip-command-usage:: hash


Description
***********

.. pip-command-description:: hash


Overview
++++++++
``pip hash`` is a convenient way to get a hash digest for use with
:ref:`hash-checking mode`, especially for packages with multiple archives. The
error message from ``pip install --require-hashes ...`` will give you one
hash, but, if there are multiple archives (like source and binary ones), you
will need to manually download and compute a hash for the others. Otherwise, a
spurious hash mismatch could occur when :ref:`pip install` is passed a
different set of options, like :ref:`--no-binary <install_--no-binary>`.


Options
*******

.. pip-command-options:: hash


Example
********

Compute the hash of a downloaded archive::

$ pip download SomePackage
Collecting SomePackage
Downloading SomePackage-2.2.tar.gz
Saved ./pip_downloads/SomePackage-2.2.tar.gz
Successfully downloaded SomePackage
$ pip hash ./pip_downloads/SomePackage-2.2.tar.gz
./pip_downloads/SomePackage-2.2.tar.gz:
--hash=sha256:93e62e05c7ad3da1a233def6731e8285156701e3419a5fe279017c429ec67ce0
149 changes: 127 additions & 22 deletions docs/reference/pip_install.rst
Original file line number Diff line number Diff line change
Expand Up @@ -101,14 +101,15 @@ and the newline following it is effectively ignored.

Comments are stripped *before* line continuations are processed.

Additionally, the following Package Index Options are supported:
The following options are supported:

* :ref:`-i, --index-url <--index-url>`
* :ref:`--extra-index-url <--extra-index-url>`
* :ref:`--no-index <--no-index>`
* :ref:`-f, --find-links <--find-links>`
* :ref:`--no-binary <install_--no-binary>`
* :ref:`--only-binary <install_--only-binary>`
* :ref:`--require-hashes <--require-hashes>`

For example, to specify :ref:`--no-index <--no-index>` and 2 :ref:`--find-links <--find-links>` locations:

Expand Down Expand Up @@ -380,16 +381,16 @@ See the :ref:`pip install Examples<pip install Examples>`.
SSL Certificate Verification
++++++++++++++++++++++++++++

Starting with v1.3, pip provides SSL certificate verification over https, for the purpose
of providing secure, certified downloads from PyPI.
Starting with v1.3, pip provides SSL certificate verification over https, to
prevent man-in-the-middle attacks against PyPI downloads.


.. _`Caching`:

Caching
+++++++

Starting with v6.0, pip provides an on by default cache which functions
Starting with v6.0, pip provides an on-by-default cache which functions
similarly to that of a web browser. While the cache is on by default and is
designed do the right thing by default you can disable the cache and always
access PyPI by utilizing the ``--no-cache-dir`` option.
Expand Down Expand Up @@ -425,14 +426,14 @@ Windows

.. _`Wheel cache`:

Wheel cache
***********
Wheel Cache
~~~~~~~~~~~

Pip will read from the subdirectory ``wheels`` within the pip cache dir and use
any packages found there. This is disabled via the same ``no-cache-dir`` option
that disables the HTTP cache. The internal structure of that cache is not part
of the pip API. As of 7.0 pip uses a subdirectory per sdist that wheels were
built from, and wheels within that subdirectory.
Pip will read from the subdirectory ``wheels`` within the pip cache directory
and use any packages found there. This is disabled via the same
``--no-cache-dir`` option that disables the HTTP cache. The internal structure
of that is not part of the pip API. As of 7.0, pip makes a subdirectory for
each sdist that wheels are built from and places the resulting wheels inside.

Pip attempts to choose the best wheels from those built in preference to
building a new wheel. Note that this means when a package has both optional
Expand All @@ -445,19 +446,123 @@ When no wheels are found for an sdist, pip will attempt to build a wheel
automatically and insert it into the wheel cache.


Hash Verification
+++++++++++++++++

PyPI provides md5 hashes in the hash fragment of package download urls.
.. _`hash-checking mode`:

pip supports checking this, as well as any of the
guaranteed hashlib algorithms (sha1, sha224, sha384, sha256, sha512, md5).

The hash fragment is case sensitive (i.e. sha1 not SHA1).
Hash-Checking Mode
++++++++++++++++++

This check is only intended to provide basic download corruption protection.
It is not intended to provide security against tampering. For that,
see :ref:`SSL Certificate Verification`
Since version 8.0, pip can check downloaded package archives against local
hashes to protect against remote tampering. To verify a package against one or
more hashes, add them to the end of the line::

FooProject == 1.2 --hash:sha256=2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824 \
--hash:sha256=486ea46224d1bb4fb680f34f7c9ad96a8f24ec88be73ea8e5a6c65260e9cb8a7

(The ability to use multiple hashes is important when a package has both
binary and source distributions or when it offers binary distributions for a
variety of platforms.)

The recommended hash algorithm at the moment is sha256, but stronger ones are
allowed, including all those supported by ``hashlib``. However, weaker ones
such as md5, sha1, and sha224 are excluded to avoid giving a false sense of
security.

Hash verification is an all-or-nothing proposition. Specifying a ``--hash``
against any requirement not only checks that hash but also activates a global
*hash-checking mode*, which imposes several other security restrictions:

* Hashes are required for all requirements. This is because a partially-hashed
requirements file is of little use and thus likely an error: a malicious
actor could slip bad code into the installation via one of the unhashed
requirements. Note that hashes embedded in URL-style requirements via the
``#md5=...`` syntax suffice to satisfy this rule (regardless of hash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we actually support #md5= in requirements files? I thought we didn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! If somebody bothered to provide a hash in the reqs file, it would get checked. I tried not to change existing behavior.

strength, for legacy reasons), though you should use a stronger
hash like sha256 whenever possible.
* Hashes are required for all dependencies. An error results if there is a
dependency that is not spelled out and hashed in the requirements file.
* Requirements that take the form of project names (rather than URLs or local
filesystem paths) must be pinned to a specific version using ``==``. This
prevents a surprising hash mismatch upon the release of a new version
that matches the requirement specifier.
* ``--egg`` is disallowed, because it delegates installation of dependencies
to setuptools, giving up pip's ability to enforce any of the above.

.. _`--require-hashes`:

Hash-checking mode can be forced on with the ``--require-hashes`` command-line
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered --verify-hashes instead as "require" is a too often used term?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think that doesn't read quite right. We always verify hashes if they are given (either via PyPI, or in the requirements file in this PR), so you're not telling pip to verify hashes, you're mandating that there must be a hash to verify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well…hashes are always verified if they occur in the requirements file. All --require-hashes does is insist on their presence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, fair enough, so maybe --enforce-hashes? I just dread the term "requirement" (setup_require, install_requires, requirements.txt, RequirementSet). All kind of used inconsistently, don't you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel like it's really an overload; we're just using the verb "require", not redefining "requirement", which is really the technical term at issue, I believe. (Also, I can't find a better word. "Demand" is the only decent synonym I can find, and it feels impolite to me. "Enforce", to me, would suggest that pip will somehow make hashes exist for me if they aren't there.)

option::

$ pip install --require-hashes -r requirements.txt
...
Hashes are required in --require-hashes mode (implicitly on when a hash is
specified for any package). These requirements were missing hashes,
leaving them open to tampering. These are the hashes the downloaded
archives actually had. You can add lines like these to your requirements
files to prevent tampering.
pyelasticsearch==1.0 --hash=sha256:44ddfb1225054d7d6b1d02e9338e7d4809be94edbe9929a2ec0807d38df993fa
more-itertools==2.2 --hash=sha256:93e62e05c7ad3da1a233def6731e8285156701e3419a5fe279017c429ec67ce0

This can be useful in deploy scripts, to ensure that the author of the
requirements file provided hashes. It is also a convenient way to bootstrap
your list of hashes, since it shows the hashes of the packages it fetched. It
fetches only the preferred archive for each package, so you may still need to
add hashes for alternatives archives using :ref:`pip hash`: for instance if
there is both a binary and a source distribution.

The :ref:`wheel cache <Wheel cache>` is disabled in hash-checking mode to
prevent spurious hash mismatch errors. These would otherwise occur while
installing sdists that had already been automatically built into cached wheels:
those wheels would be selected for installation, but their hashes would not
match the sdist ones from the requirements file. A further complication is that
locally built wheels are nondeterministic: contemporary modification times make
their way into the archive, making hashes unpredictable across machines and
cache flushes. Compilation of C code adds further nondeterminism, as many
compilers include random-seeded values in their output. However, wheels fetched
from index servers are the same every time. They land in pip's HTTP cache, not
its wheel cache, and are used normally in hash-checking mode. The only downside
of having the the wheel cache disabled is thus extra build time for sdists, and
this can be solved by making sure pre-built wheels are available from the index
server.

Hash-checking mode also works with :ref:`pip download` and :ref:`pip wheel`. A
:ref:`comparison of hash-checking mode with other repeatability strategies
<Repeatability>` is available in the User Guide.

.. warning::
Beware of the ``setup_requires`` keyword arg in :file:`setup.py`. The
(rare) packages that use it will cause those dependencies to be downloaded
by setuptools directly, skipping pip's hash-checking. If you need to use
such a package, see :ref:`Controlling
setup_requires<controlling-setup-requires>`.

.. warning::
Be careful not to nullify all your security work when you install your
actual project by using setuptools directly: for example, by calling
``python setup.py install``, ``python setup.py develop``, or
``easy_install``. Setuptools will happily go out and download, unchecked,
anything you missed in your requirements file—and it’s easy to miss things
as your project evolves. To be safe, install your project using pip and
:ref:`--no-deps <install_--no-deps>`.

Instead of ``python setup.py develop``, use... ::

pip install --no-deps -e .

Instead of ``python setup.py install``, use... ::

pip install --no-deps .


Hashes from PyPI
~~~~~~~~~~~~~~~~

PyPI provides an MD5 hash in the fragment portion of each package download URL,
like ``#md5=123...``, which pip checks as a protection against download
corruption. Other hash algorithms that have guaranteed support from ``hashlib``
are also supported here: sha1, sha224, sha384, sha256, and sha512. Since this
hash originates remotely, it is not a useful guard against tampering and thus
does not satisfy the ``--require-hashes`` demand that every package have a
local hash.


.. _`editable-installs`:
Expand Down
80 changes: 59 additions & 21 deletions docs/user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ Specifiers`

For more information and examples, see the :ref:`pip install` reference.

.. _PyPI: http://pypi.python.org/pypi


.. _`Requirements Files`:

Expand Down Expand Up @@ -71,7 +73,6 @@ In practice, there are 4 common uses of Requirements files:
pkg2
pkg3>=1.0,<=2.0


3. Requirements files are used to force pip to install an alternate version of a
sub-dependency. For example, suppose `ProjectA` in your requirements file
requires `ProjectB`, but the latest version (v1.3) has a bug, you can force
Expand Down Expand Up @@ -591,44 +592,81 @@ From within a real python, where ``SomePackage`` *is* installed globally, and is
Ensuring Repeatability
**********************

Four things are required to fully guarantee a repeatable installation using requirements files.
pip can achieve various levels of repeatability:

Pinned Version Numbers
----------------------

Pinning the versions of your dependencies in the requirements file
protects you from bugs or incompatibilities in newly released versions::

SomePackage == 1.2.3
DependencyOfSomePackage == 4.5.6

1. The requirements file was generated by ``pip freeze`` or you're sure it only
contains requirements that specify a specific version.
Using :ref:`pip freeze` to generate the requirements file will ensure that not
only the top-level dependencies are included but their sub-dependencies as
well, and so on. Perform the installation using :ref:`--no-deps
<install_--no-deps>` for an extra dose of insurance against installing
anything not explicitly listed.

2. The installation is performed using :ref:`--no-deps <install_--no-deps>`.
This guarantees that only what is explicitly listed in the requirements file is
installed.
This strategy is easy to implement and works across OSes and architectures.
However, it trusts PyPI and the certificate authority chain. It
also relies on indices and find-links locations not allowing
packages to change without a version increase. (PyPI does protect
against this.)

3. None of the packages to be installed utilize the setup_requires keyword. See
:ref:`Controlling setup_requires<controlling-setup-requires>`.
Hash-checking Mode
------------------

Beyond pinning version numbers, you can add hashes against which to verify
downloaded packages::

4. The installation is performed against an index or find-links location that is
guaranteed to *not* allow archives to be changed and updated without a
version increase. While this is safe on PyPI, it may not be safe for other
indices. If you are working with an unsafe index, consider the `peep project
<https://pypi.python.org/pypi/peep>`_ which offers this feature on top of pip
using requirements file comments.
FooProject == 1.2 --hash:sha256=2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824

This protects against a compromise of PyPI or the HTTPS
certificate chain. It also guards against a package changing
without its version number changing (on indexes that allow this).
This approach is a good fit for automated server deployments.

.. _PyPI: http://pypi.python.org/pypi/
Hash-checking mode is a labor-saving alternative to running a private index
server containing approved packages: it removes the need to upload packages,
maintain ACLs, and keep an audit trail (which a VCS gives you on the
requirements file for free). It can also substitute for a vendor library,
providing easier upgrades and less VCS noise. It does not, of course,
provide the availability benefits of a private index or a vendor library.

For more, see :ref:`pip install\'s discussion of hash-checking mode <hash-checking mode>`.

.. _`Installation Bundle`:

Create an Installation Bundle with Compiled Dependencies
********************************************************
Installation Bundles
--------------------

You can create a simple bundle that contains all of the dependencies you wish
to install using::
Using :ref:`pip wheel`, you can bundle up all of a project's dependencies, with
any compilation done, into a single archive. This allows installation when
index servers are unavailable and avoids time-consuming recompilation. Create
an archive like this::

$ tempdir=$(mktemp -d /tmp/wheelhouse-XXXXX)
$ pip wheel -r requirements.txt --wheel-dir=$tempdir
$ cwd=`pwd`
$ (cd "$tempdir"; tar -cjvf "$cwd/bundled.tar.bz2" *)

Once you have a bundle, you can then install it using::
You can then install from the archive like this::

$ tempdir=$(mktemp -d /tmp/wheelhouse-XXXXX)
$ (cd $tempdir; tar -xvf /path/to/bundled.tar.bz2)
$ pip install --force-reinstall --ignore-installed --upgrade --no-index --no-deps $tempdir/*

Note that compiled packages are typically OS- and architecture-specific, so
these archives are not necessarily portable across machines.

Hash-checking mode can be used along with this method to ensure that future
archives are built with identical packages.

.. warning::
Finally, beware of the ``setup_requires`` keyword arg in :file:`setup.py`.
The (rare) packages that use it will cause those dependencies to be
downloaded by setuptools directly, skipping pip's protections. If you need
to use such a package, see :ref:`Controlling
setup_requires<controlling-setup-requires>`.
3 changes: 3 additions & 0 deletions pip/basecommand.py
Original file line number Diff line number Diff line change
Expand Up @@ -280,6 +280,9 @@ def populate_requirement_set(requirement_set, args, options, finder,
wheel_cache=wheel_cache):
found_req_in_file = True
requirement_set.add_requirement(req)
# If --require-hashes was a line in a requirements file, tell
# RequirementSet about it:
requirement_set.require_hashes = options.require_hashes

if not (args or options.editables or found_req_in_file):
opts = {'name': name}
Expand Down
Loading