Skip to content

Commit 58ae2c9

Browse files
authored
docs: add a tutorial for malware detection check (#816)
This PR adds a new tutorial to showcase the mcn_detect_malicious_metadata_1 check and adjust a corresponding integration check to make sure the examples in the tutorial are continuously tested. It also improves the Using Macaron page to encourage users analyzing an artifact using -purl over a repository, and enhances the configuration instructions. It adds two new helper relations for policies that let users add constraint for the confidence score of a check result: - check_passed_with_confidence - check_failed_with_confidence Finally, it improves the rendering of justification column in the HTML report when the presented data is a dictionary. Signed-off-by: behnazh-w <[email protected]>
1 parent f70654b commit 58ae2c9

File tree

20 files changed

+1098
-570
lines changed

20 files changed

+1098
-570
lines changed
47.8 KB
Loading

docs/source/assets/er-diagram.svg

Lines changed: 579 additions & 462 deletions
Loading

docs/source/index.rst

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,9 +55,6 @@ the requirements that are currently supported by Macaron.
5555
* - SLSA level
5656
- SLSA spec v0.1
5757
- Concrete check
58-
* - 0+
59-
- **Provenance verified** - Provenance is available and verified.
60-
- See :doc:`SLSA Build Levels </pages/checks/slsa_builds>`
6158
* - 1
6259
- **Scripted build** - All build steps were fully defined in a “build script”.
6360
- Identify and validate build script(s).
@@ -70,6 +67,9 @@ the requirements that are currently supported by Macaron.
7067
* - 2
7168
- **Build service** - All build steps are run using some build service (e.g. GitHub Actions)
7269
- Identify and validate the CI service(s) used for the build process.
70+
* - 2+
71+
- **Provenance verified** - Provenance is available and verified.
72+
- See :doc:`SLSA Build Levels </pages/checks/slsa_builds>`
7373
* - 3
7474
- **Trusted builders** - Guarantees the identification of the top-level build configuration used to initiate the build. The build is verified to be hermetic, isolated, parameterless, and executed in an ephemeral environment.
7575
- Identify and validate that the builder used in the CI pipeline is a trusted one.
@@ -92,6 +92,19 @@ the requirements that are currently supported by Macaron.
9292
- **Provenance derived commit** - Check if the analysis target's commit matches the commit in the provenance.
9393
- If there is no commit, this check will fail.
9494

95+
****************************************************************************************
96+
Macaron checks that report integrity issues but do not map to SLSA requirements directly
97+
****************************************************************************************
98+
99+
.. list-table::
100+
:widths: 20 40
101+
:header-rows: 1
102+
103+
* - Check name
104+
- Description
105+
* - Detect malicious metadata
106+
- This check analyzes the metadata of a package and reports malicious behavior. This check currently supports PyPI packages.
107+
95108
----------------------
96109
How does Macaron work?
97110
----------------------

docs/source/pages/supported_technologies/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,9 @@ Package Registries
7979
* - `npm Registry <https://registry.npmjs.org>`_
8080
- Projects built with npm or Yarn and published on the npm registry.
8181
- :doc:`page </pages/supported_technologies/npm_registry>`
82+
* - `Python Package Index (PyPI) <https://pypi.org/>`_
83+
- Projects built with Pip or Poetry and published on the PyPI registry.
84+
- :doc:`page </pages/supported_technologies/pypi_registry>`
8285

8386
-----------
8487
Provenances
@@ -115,3 +118,4 @@ See also
115118
witness
116119
maven_central
117120
npm_registry
121+
pypi_registry
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
.. Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved.
2+
.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
3+
4+
===========================
5+
Python Package Index (PyPI)
6+
===========================
Lines changed: 243 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,243 @@
1+
.. Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved.
2+
.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
3+
4+
.. _detect-malicious-package:
5+
6+
----------------------------
7+
Detecting malicious packages
8+
----------------------------
9+
10+
In this tutorial we show how to use Macaron to find malicious packages. Imagine you’ve discovered a Python package you want to add as a dependency to your project, but you’re unsure whether you can trust its maintainers. In this case, you can run Macaron to see if it can detect any malicious behavior. Note that Macaron is an analysis tool and can either miss a malicious behavior or report a false positive.
11+
12+
.. list-table::
13+
:widths: 25
14+
:header-rows: 1
15+
16+
* - Supported packages
17+
* - Python packages (PyPI)
18+
19+
.. contents:: :local:
20+
21+
22+
**********
23+
Background
24+
**********
25+
26+
Detecting malicious behavior in open-source software has been a focus for the `Open Source Security Foundation <https://github.com/ossf>`_ (OpenSSF) community in recent years. One significant initiative is :term:`SLSA`, which offers practical recommendations to enhance the integrity of software packages and infrastructure. Macaron is designed to detect poorly maintained or malicious packages by implementing checks inspired by the SLSA specification. However, some forms of attacks currently fall outside the scope of SLSA version 1—notably, SLSA doesn't address the issue of malicious maintainers. Our primary goal is to make it more difficult for malicious actors to compromise critical supply chains and infrastructure. To achieve this, we're developing new methods to detect when maintainers of open source projects are untrustworthy and deliberately spreading malware.
27+
28+
******************************
29+
Installation and Prerequisites
30+
******************************
31+
32+
Skip this section if you already know how to install Macaron.
33+
34+
.. toggle::
35+
36+
Please follow the instructions :ref:`here <installation-guide>`. In summary, you need:
37+
38+
* Docker
39+
* the ``run_macaron.sh`` script to run the Macaron image.
40+
41+
.. note:: At the moment, Docker alternatives (e.g. podman) are not supported.
42+
43+
44+
You also need to provide Macaron with a GitHub token through the ``GITHUB_TOKEN`` environment variable.
45+
46+
To obtain a GitHub Token:
47+
48+
* Go to ``GitHub settings`` → ``Developer Settings`` (at the bottom of the left side pane) → ``Personal Access Tokens`` → ``Fine-grained personal access tokens`` → ``Generate new token``. Give your token a name and an expiry period.
49+
* Under ``"Repository access"``, choosing ``"Public Repositories (read-only)"`` should be good enough in most cases.
50+
51+
Now you should be good to run Macaron. For more details, see the documentation :ref:`here <prepare-github-token>`.
52+
53+
***********
54+
Run Macaron
55+
***********
56+
57+
In this tutorial, we run Macaron on the ``django`` Python package as an example with and without its dependencies to check malicious behavior and apply a policy to fail if the corresponding check fails.
58+
59+
60+
'''''''''''''''''''''''''''''''''''''
61+
Analyzing django without dependencies
62+
'''''''''''''''''''''''''''''''''''''
63+
64+
First, we need to run the ``analyze`` command of Macaron to run a number of :ref:`checks <checks>` on the ``django`` package. In this tutorial, we are interested in the results of the ``mcn_detect_malicious_metadata_1`` check. Check :ref:`this tutorial <include_exclude_checks>` if you would like to exclude other checks.
65+
66+
.. code-block:: shell
67+
68+
./run_macaron.sh analyze -purl pkg:pypi/[email protected] --skip-deps
69+
70+
.. note:: By default, Macaron clones the repositories and creates output files under the ``output`` directory. To understand the structure of this directory please see :ref:`Output Files Guide <output_files_guide>`.
71+
72+
.. code-block:: shell
73+
74+
open output/reports/pypi/django/django.html
75+
76+
.. _fig_django-malware-check:
77+
78+
.. figure:: ../../_static/images/tutorial_django_5.0.6_detect_malicious_metadata_check.png
79+
:alt: Check ``mcn_detect_malicious_metadata_1`` result for ``[email protected]``
80+
:align: center
81+
82+
The image above shows the result of the ``mcn_detect_malicious_metadata_1`` check for ``[email protected]``. The check has passed, which means this package is not malicious. If a package is malicious, this check fails. If the ecosystem is not supported, the check returns ``UNKNOWN``. You can also see the result of individual heuristics applied in this check under the ``Justification`` column.
83+
84+
Now we can write a policy to ensure that all versions of ``django`` pass the ``mcn_detect_malicious_metadata_1`` check. The policy will be enforced against the output of the ``analyze`` command that is cached in the local database at ``output/macaron.db``.
85+
86+
.. code-block:: shell
87+
88+
./run_macaron.sh verify-policy --database output/macaron.db --file policy.dl
89+
90+
Where the policy looks like below:
91+
92+
.. code-block:: prolog
93+
94+
#include "prelude.dl"
95+
96+
Policy("check-django", component_id, "Check django artifacts.") :-
97+
check_passed(component_id, "mcn_detect_malicious_metadata_1").
98+
99+
100+
apply_policy_to("check-django", component_id) :-
101+
is_component(component_id, purl),
102+
match("pkg:pypi/django@.*", purl).
103+
104+
The ``match`` constraint in this policy allows us to apply the policy on all versions of ``django``. The result of this command should show that the policy succeeds with a zero exit code (if a policy fails to pass, Macaron returns a none-zero error code):
105+
106+
.. code-block:: javascript
107+
108+
passed_policies
109+
['check-django']
110+
component_satisfies_policy
111+
['1', 'pkg:pypi/[email protected]', 'check-django']
112+
failed_policies
113+
component_violates_policy
114+
115+
Note that the ``match`` constraint applies a regex pattern and can be expanded to ensure the ``mcn_detect_malicious_metadata_1`` check passes on all Python packages analyzed so far by Macaron:
116+
117+
.. code-block:: prolog
118+
119+
apply_policy_to("check-django", component_id) :-
120+
is_component(component_id, purl),
121+
match("pkg:pypi.*", purl).
122+
123+
+++++++++++++++++++++++++++++++++++++++
124+
Verification Summary Attestation report
125+
+++++++++++++++++++++++++++++++++++++++
126+
127+
Additionally, Macaron generates a Verification Summary Attestation (:term:`VSA`) report that contains the policy, and information about the analyzed artifact. See :ref:`this page <vsa>` for more details. For instance, the VSA report for the ``check-django`` policy shown above can be viewed by running this command:
128+
129+
.. toggle::
130+
131+
.. code-block:: shell
132+
133+
cat output/vsa.intoto.jsonl | jq -r '.payload' | base64 -d | jq
134+
135+
.. code-block:: json
136+
137+
{
138+
"_type": "https://in-toto.io/Statement/v1",
139+
"subject": [
140+
{
141+
"uri": "pkg:pypi/[email protected]"
142+
}
143+
],
144+
"predicateType": "https://slsa.dev/verification_summary/v1",
145+
"predicate": {
146+
"verifier": {
147+
"id": "https://github.com/oracle/macaron",
148+
"version": {
149+
"macaron": "0.11.0"
150+
}
151+
},
152+
"timeVerified": "2024-08-09T02:28:41.968492+00:00",
153+
"resourceUri": "pkg:pypi/[email protected]",
154+
"policy": {
155+
"content": " #include \"prelude.dl\"\n\n Policy(\"check-django\", component_id, \"Check django artifacts.\") :-\n check_passed(component_id, \"mcn_detect_malicious_metadata_1\").\n\n\n apply_policy_to(\"check-django\", component_id) :-\n is_component(component_id, purl),\n match(\"pkg:pypi/django@.*\", purl)."
156+
},
157+
"verificationResult": "PASSED",
158+
"verifiedLevels": []
159+
}
160+
}
161+
162+
.. _django_with_deps:
163+
164+
''''''''''''''''''''''''''''''''''
165+
Analyzing django with dependencies
166+
''''''''''''''''''''''''''''''''''
167+
168+
Macaron supports analyzing a package's dependencies and performs the same set of checks on them as it does on the main target package. To analyze the dependencies of ``[email protected]`` Python package, you can either :ref:`generate an SBOM <python-sbom>` yourself or :ref:`point Macaron to a virtual environment <python-venv-deps>` where ``django`` is installed.
169+
170+
171+
Let's assume ``/tmp/.django_venv`` is the virtual environment where ``[email protected]`` is installed. Run Macaron as follows to analyze ``django`` and its dependencies.
172+
173+
.. code-block:: shell
174+
175+
./run_macaron.sh analyze -purl pkg:pypi/[email protected] --python-venv "/tmp/.django_venv"
176+
177+
178+
By default Macaron only checks the direct dependencies. To turn on recursive dependency analysis, add the following to the ``configurations.ini`` file:
179+
180+
.. code-block:: ini
181+
182+
[dependency.resolver]
183+
recursive = True
184+
185+
And pass that to the ``analyze`` command:
186+
187+
.. code-block:: shell
188+
189+
./run_macaron.sh --defaults-path configurations.ini analyze -purl pkg:pypi/[email protected] --python-venv "/tmp/.django_venv"
190+
191+
To learn more about changing configurations see :ref:`here <change-config>`.
192+
193+
Now we can enforce the policy below to ensure that the ``mcn_detect_malicious_metadata_1`` check always passes on ``django`` and its dependencies, indicating that none of the dependencies have malicious behavior.
194+
195+
.. code-block:: prolog
196+
197+
#include "prelude.dl"
198+
199+
Policy("check-dependencies", component_id, "Check the dependencies of django.") :-
200+
transitive_dependency(component_id, dependency),
201+
check_passed(component_id, "mcn_detect_malicious_metadata_1"),
202+
check_passed(dependency, "mcn_detect_malicious_metadata_1").
203+
204+
apply_policy_to("check-dependencies", component_id) :-
205+
is_component(component_id, purl),
206+
match("pkg:pypi/django@.*", purl).
207+
208+
As you can see below, the policy passes because Macaron doesn't detect malicious behavior for ``django`` or any of its transitive dependencies.
209+
210+
.. code-block:: javascript
211+
212+
passed_policies
213+
['check-dependencies']
214+
component_satisfies_policy
215+
['1', 'pkg:pypi/[email protected]', 'check-dependencies']
216+
failed_policies
217+
component_violates_policy
218+
219+
''''''''''''''''''''''''''''''''''''''''
220+
Require a confidence level in the policy
221+
''''''''''''''''''''''''''''''''''''''''
222+
223+
Macaron also provides a confidence score for each check result, represented as a value ranging from ``0`` to ``1`` (inclusive). You can incorporate this score into your policy to ensure checks meet a required level of confidence. Currently, Macaron :class:`has these confidence levels <macaron.slsa_analyzer.checks.check_result.Confidence>`. For instance, you might adjust the :ref:`check-dependencies policy shown earlier <django_with_deps>` to require that the ``mcn_detect_malicious_metadata_1`` check passes with a high confidence, i.e., ``1``:
224+
225+
.. code-block:: prolog
226+
227+
#include "prelude.dl"
228+
229+
Policy("check-dependencies", component_id, "Check the dependencies of django with high confidence.") :-
230+
transitive_dependency(component_id, dependency),
231+
check_passed_with_confidence(component_id, "mcn_detect_malicious_metadata_1", confidence),
232+
check_passed_with_confidence(dependency, "mcn_detect_malicious_metadata_1", confidence),
233+
confidence = 1.
234+
235+
apply_policy_to("check-dependencies", component_id) :-
236+
is_component(component_id, purl),
237+
match("pkg:pypi/django@.*", purl).
238+
239+
***********
240+
Future Work
241+
***********
242+
243+
We are actively working on the malware detection analysis check in Macaron — to improve precision, support more ecosystems, and in particular, perform more advanced source code analysis. Stay tuned and feel free to contribute to improve this check.

docs/source/pages/tutorials/exclude_include_checks.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
.. Copyright (c) 2024 - 2024, Oracle and/or its affiliates. All rights reserved.
22
.. Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/.
33
4+
.. _include_exclude_checks:
5+
46
=====================================
57
Exclude and include checks in Macaron
68
=====================================

docs/source/pages/tutorials/index.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,9 @@ For the full list of supported technologies, such as CI services, registries, an
1717
.. toctree::
1818
:maxdepth: 1
1919

20-
detect_malicious_java_dep
2120
commit_finder
22-
exclude_include_checks
23-
generate_verification_summary_attestation
21+
detect_malicious_package
2422
npm_provenance
23+
detect_malicious_java_dep
24+
generate_verification_summary_attestation
25+
exclude_include_checks

0 commit comments

Comments
 (0)