diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 2007fdc4f..722bc730f 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -26,10 +26,10 @@ We are looking into contributing back the changes to the upstream project. Try n We are looking for all sorts of changes. The changes will be broken down into 2 pieces: -1. Changes to the operations/flow of the tool. These are changes don't affect what secrets are found, but affect how the tool is used. +1. Changes to the operations/flow of the tool. These changes don't affect what secrets are found, but affect how the tool is used. 1. Changes to the secrets detection logic. This changes which secrets are going to be detected. -Which type to change is up to you to decide. If you are passionate about detecting secrets, then work on the logic. If you passionate about the UX or how to tool is used, then make a change to the operation/flow aspect of the tool. +Which type to change is up to you to decide. If you are passionate about detecting secrets, then work on the logic. If you are passionate about the UX or how to tool is used, then make a change to the operation/flow aspect of the tool. ## Issues and PRs @@ -45,7 +45,7 @@ We'd also love PRs. If you're thinking of a large PR, we advise opening up an is 1. Create a new branch: `git checkout -b my-branch-name`. 1. Make your change, add tests, and make sure the tests still pass. 1. Push to your fork and [submit a pull request][pr]. -1. Pat your self on the back and wait for your pull request to be reviewed and merged. +1. Pat yourself on the back and wait for your pull request to be reviewed and merged. Here are a few things you can do that will increase the likelihood of your pull request being accepted: @@ -53,7 +53,7 @@ Here are a few things you can do that will increase the likelihood of your pull - Keep your changes as focused as possible. If there are multiple changes you would like to make that are not dependent upon each other, consider submitting them as separate pull requests. - Write a [good commit message](http://tbaggery.com/2008/04/19/a-note-about-git-commit-messages.html). -Work in Progress pull requests are also welcome to get feedback early on, or if there is something blocked you. +Work in Progress pull requests are also welcome to get feedback early on, or if something is blocking you. ## Resources @@ -64,26 +64,26 @@ Work in Progress pull requests are also welcome to get feedback early on, or if ## Process for Adding a New Secret Detector to detect-secrets -There are two key steps for developing a new secret detector: secret identification and secret verification. +Two key steps for developing a new secret detector are secret identification and secret verification. It is often easier to review contributions if these two steps are submitted as separate PRs, although this is not mandatory. The processes for each of these two steps are outlined below. ### Secret Identification -- Develop an understanding of all the secret types for a given service. A service may have combinations of basic-auth, IAM auth, tokens, keys, passwords, and / or other proprietary authentication methods. +- Develop an understanding of all the secret types for a given service. A service may have combinations of basic-auth, IAM auth, tokens, keys, passwords, and/or other proprietary authentication methods. - Identify any specification documents from the service provider for the use & format of the secret types to be captured. - Develop an understanding of the API / service call uses. - - Is it purely RESTful, or are there prevalent SDKs which should be accounted for and detected? -- Search / identify examples of the signature and use cases in github.ibm.com or create your own. + - Is it purely RESTful, or are there prevalent SDKs that should be accounted for and detected? +- Search/identify examples of the signature and use cases in github.ibm.com or create your own. - Iterate until you have sufficient representation of the different ways in which the secret may be used. -- Using the other detectors under `detect-secrets/plugins` as examples, create a new Python file under that path. The file should contain a new detector class which inherits from `RegexBasedDetector`. +- Using the other detectors under `detect-secrets/plugins` as examples, create a new Python file under that path. The file should contain a new detector class that inherits from `RegexBasedDetector`. - Write one or more regexes to match and capture secrets when found within the use cases identified above. Assign a list of regexes to the `denylist` variable. We have created helper functions to make this easier, which may be seen in the existing detectors. - If multiple factors exist, identify a primary factor to capture with the `denylist` regexes. Secondary factors will be captured as part of the verification process below. - Create test cases to ensure that example secrets matching the (primary factor's) secret signature will be caught. Use the test files under `tests/plugins` as examples. ### Secret Verification -- Identify a service endpoint (API call or SDK) which can be used to check the validity of a secret. +- Identify a service endpoint (API call or SDK) that can be used to check the validity of a secret. - In complex cases (where the service is hosted internally), it's often helpful to identify an IBM SME who can help navigate the API / SDK spec of the service for verification purposes. [w3 ProductPages](https://productpages.w3ibm.mybluemix.net/ProductPages/index.html) is a good resource to help identify an SME. - Note: if there are _many_ signature hits, it may create a stressful load on the verification endpoint, so a key design point is to minimize false positive cases. - Using the existing plugins in `detect_secrets/plugins` as examples, add the `verify()` function to your detector. The `verify` function should validate a found secret with the service endpoint and determine whether it is active or not, returning either `VerifiedResult.VERIFIED_TRUE` or `VerifiedResult.VERIFIED_FALSE`. `verify()` may also return `VerifiedResult.UNVERIFIED` if verification cannot be completed due to issues like endpoint availability, lack of expected data elements, etc. @@ -95,9 +95,9 @@ The processes for each of these two steps are outlined below. First, set up `pyenv`: 1. `brew install pyenv` -1. install the latest version of python with `pyenv install ` -1. set the global version of python with `pyenv global ` -1. To ensure the python installation controlled by `pyenv` is being used, you may need to add the following to your `.bashrc` (or equivalent): +1. install the latest version of Python with `pyenv install ` +1. set the global version of Python with `pyenv global ` +1. To ensure the Python installation controlled by `pyenv` is being used, you may need to add the following to your `.bashrc` (or equivalent): ```sh export PYENV_ROOT="$HOME/.pyenv" export PATH="$PYENV_ROOT/shims:$PATH" @@ -244,13 +244,13 @@ levels. Here are a couple of examples: pytest tests/core/baseline_test.py::TestInitializeBaseline ``` -- Running a single test function, inside test class +- Running a single test function, inside the test class ```bash pytest tests/core/baseline_test.py::TestInitializeBaseline::test_basic_usage ``` -- Running a single root level test function +- Running a single root-level test function ```bash pytest tests/plugins/base_test.py::test_fails_if_no_secret_type_defined @@ -260,7 +260,7 @@ levels. Here are a couple of examples: ### PotentialSecret -This lives at the very heart of the engine, and represents a line being flagged +This lives at the very heart of the engine and represents a line being flagged for its potential to be a secret. Since the detect-secrets engine is heuristics-based, it requires a human to read @@ -287,11 +287,11 @@ We can see that the JSON dump clearly shows this. ``` However, since it is designed for easy reading, we didn't want the baseline to -be the single file that contained all the secrets in a given repository. +be the single file that contains all the secrets in a given repository. Therefore, we mask the secret by hashing it with three core attributes: 1. The actual secret -2. The filepath where it was found +2. The file path where it was found 3. How the engine determined it was a secret Any potential secret that has **all three values the same is equal**. @@ -302,7 +302,7 @@ to investigate: - Same secret value, but present in different files - Same secret value, caught by multiple plugins -Furthermore, this will **not** flag on every single usage of a given secret in a +Furthermore, this will **not** flag every single usage of a given secret in a given file, to minimize noise. **Important Note:** The line number does not play a part in the identification @@ -312,7 +312,7 @@ to quickly identify the secret that was identified by a given plugin. ### SecretsCollection -A collection of `PotentialSecrets` are stored in a `SecretsCollection`. This +A collection of `PotentialSecrets` is stored in a `SecretsCollection`. This contains a list of all the secrets in a given repository, as well as any other details needed to recreate it. @@ -347,7 +347,7 @@ adding results found to the collection. ##### 3. Outputting -We need to be able to create a baseline from a SecretsCollection, so that it +We need to be able to create a baseline from a SecretsCollection so that it can be used for future comparisons. In the same spirit as the `PotentialSecret` object, it is designed for **high readability**, and may contain other metadata that aids human analysis of the generated output (e.g. `generated_at` time).