Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 21 additions & 21 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,10 +26,10 @@ We are looking into contributing back the changes to the upstream project. Try n

We are looking for all sorts of changes. The changes will be broken down into 2 pieces:

1. Changes to the operations/flow of the tool. These are changes don't affect what secrets are found, but affect how the tool is used.
1. Changes to the operations/flow of the tool. These changes don't affect what secrets are found, but affect how the tool is used.
1. Changes to the secrets detection logic. This changes which secrets are going to be detected.

Which type to change is up to you to decide. If you are passionate about detecting secrets, then work on the logic. If you passionate about the UX or how to tool is used, then make a change to the operation/flow aspect of the tool.
Which type to change is up to you to decide. If you are passionate about detecting secrets, then work on the logic. If you are passionate about the UX or how to tool is used, then make a change to the operation/flow aspect of the tool.

## Issues and PRs

Expand All @@ -45,15 +45,15 @@ We'd also love PRs. If you're thinking of a large PR, we advise opening up an is
1. Create a new branch: `git checkout -b my-branch-name`.
1. Make your change, add tests, and make sure the tests still pass.
1. Push to your fork and [submit a pull request][pr].
1. Pat your self on the back and wait for your pull request to be reviewed and merged.
1. Pat yourself on the back and wait for your pull request to be reviewed and merged.

Here are a few things you can do that will increase the likelihood of your pull request being accepted:

- Write and update tests.
- Keep your changes as focused as possible. If there are multiple changes you would like to make that are not dependent upon each other, consider submitting them as separate pull requests.
- Write a [good commit message](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html).

Work in Progress pull requests are also welcome to get feedback early on, or if there is something blocked you.
Work in Progress pull requests are also welcome to get feedback early on, or if something is blocking you.

## Resources

Expand All @@ -64,26 +64,26 @@ Work in Progress pull requests are also welcome to get feedback early on, or if

## Process for Adding a New Secret Detector to detect-secrets

There are two key steps for developing a new secret detector: secret identification and secret verification.
Two key steps for developing a new secret detector are secret identification and secret verification.
It is often easier to review contributions if these two steps are submitted as separate PRs, although this is not mandatory.
The processes for each of these two steps are outlined below.

### Secret Identification

- Develop an understanding of all the secret types for a given service. A service may have combinations of basic-auth, IAM auth, tokens, keys, passwords, and / or other proprietary authentication methods.
- Develop an understanding of all the secret types for a given service. A service may have combinations of basic-auth, IAM auth, tokens, keys, passwords, and/or other proprietary authentication methods.
- Identify any specification documents from the service provider for the use & format of the secret types to be captured.
- Develop an understanding of the API / service call uses.
- Is it purely RESTful, or are there prevalent SDKs which should be accounted for and detected?
- Search / identify examples of the signature and use cases in github.ibm.com or create your own.
- Is it purely RESTful, or are there prevalent SDKs that should be accounted for and detected?
- Search/identify examples of the signature and use cases in github.ibm.com or create your own.
- Iterate until you have sufficient representation of the different ways in which the secret may be used.
- Using the other detectors under `detect-secrets/plugins` as examples, create a new Python file under that path. The file should contain a new detector class which inherits from `RegexBasedDetector`.
- Using the other detectors under `detect-secrets/plugins` as examples, create a new Python file under that path. The file should contain a new detector class that inherits from `RegexBasedDetector`.
- Write one or more regexes to match and capture secrets when found within the use cases identified above. Assign a list of regexes to the `denylist` variable. We have created helper functions to make this easier, which may be seen in the existing detectors.
- If multiple factors exist, identify a primary factor to capture with the `denylist` regexes. Secondary factors will be captured as part of the verification process below.
- Create test cases to ensure that example secrets matching the (primary factor's) secret signature will be caught. Use the test files under `tests/plugins` as examples.

### Secret Verification

- Identify a service endpoint (API call or SDK) which can be used to check the validity of a secret.
- Identify a service endpoint (API call or SDK) that can be used to check the validity of a secret.
- In complex cases (where the service is hosted internally), it's often helpful to identify an IBM SME who can help navigate the API / SDK spec of the service for verification purposes. [w3 ProductPages](https://productpages.w3ibm.mybluemix.net/ProductPages/index.html) is a good resource to help identify an SME.
- Note: if there are _many_ signature hits, it may create a stressful load on the verification endpoint, so a key design point is to minimize false positive cases.
- Using the existing plugins in `detect_secrets/plugins` as examples, add the `verify()` function to your detector. The `verify` function should validate a found secret with the service endpoint and determine whether it is active or not, returning either `VerifiedResult.VERIFIED_TRUE` or `VerifiedResult.VERIFIED_FALSE`. `verify()` may also return `VerifiedResult.UNVERIFIED` if verification cannot be completed due to issues like endpoint availability, lack of expected data elements, etc.
Expand All @@ -95,9 +95,9 @@ The processes for each of these two steps are outlined below.
First, set up `pyenv`:

1. `brew install pyenv`
1. install the latest version of python with `pyenv install <version number>`
1. set the global version of python with `pyenv global <version number>`
1. To ensure the python installation controlled by `pyenv` is being used, you may need to add the following to your `.bashrc` (or equivalent):
1. install the latest version of Python with `pyenv install <version number>`
1. set the global version of Python with `pyenv global <version number>`
1. To ensure the Python installation controlled by `pyenv` is being used, you may need to add the following to your `.bashrc` (or equivalent):
```sh
export PYENV_ROOT="$HOME/.pyenv"
export PATH="$PYENV_ROOT/shims:$PATH"
Expand Down Expand Up @@ -244,13 +244,13 @@ levels. Here are a couple of examples:
pytest tests/core/baseline_test.py::TestInitializeBaseline
```

- Running a single test function, inside test class
- Running a single test function, inside the test class

```bash
pytest tests/core/baseline_test.py::TestInitializeBaseline::test_basic_usage
```

- Running a single root level test function
- Running a single root-level test function

```bash
pytest tests/plugins/base_test.py::test_fails_if_no_secret_type_defined
Expand All @@ -260,7 +260,7 @@ levels. Here are a couple of examples:

### PotentialSecret

This lives at the very heart of the engine, and represents a line being flagged
This lives at the very heart of the engine and represents a line being flagged
for its potential to be a secret.

Since the detect-secrets engine is heuristics-based, it requires a human to read
Expand All @@ -287,11 +287,11 @@ We can see that the JSON dump clearly shows this.
```

However, since it is designed for easy reading, we didn't want the baseline to
be the single file that contained all the secrets in a given repository.
be the single file that contains all the secrets in a given repository.
Therefore, we mask the secret by hashing it with three core attributes:

1. The actual secret
2. The filepath where it was found
2. The file path where it was found
3. How the engine determined it was a secret

Any potential secret that has **all three values the same is equal**.
Expand All @@ -302,7 +302,7 @@ to investigate:
- Same secret value, but present in different files
- Same secret value, caught by multiple plugins

Furthermore, this will **not** flag on every single usage of a given secret in a
Furthermore, this will **not** flag every single usage of a given secret in a
given file, to minimize noise.

**Important Note:** The line number does not play a part in the identification
Expand All @@ -312,7 +312,7 @@ to quickly identify the secret that was identified by a given plugin.

### SecretsCollection

A collection of `PotentialSecrets` are stored in a `SecretsCollection`. This
A collection of `PotentialSecrets` is stored in a `SecretsCollection`. This
contains a list of all the secrets in a given repository, as well as any other
details needed to recreate it.

Expand Down Expand Up @@ -347,7 +347,7 @@ adding results found to the collection.

##### 3. Outputting

We need to be able to create a baseline from a SecretsCollection, so that it
We need to be able to create a baseline from a SecretsCollection so that it
can be used for future comparisons. In the same spirit as the `PotentialSecret`
object, it is designed for **high readability**, and may contain other metadata
that aids human analysis of the generated output (e.g. `generated_at` time).