-
Notifications
You must be signed in to change notification settings - Fork 30
feat: add in new metadata-based heuristic to pypi malware analyzer #944
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
303f33f to
69b6bd8
Compare
|
5537 existing PyPI packages with a single release were analyzed for their epoch, major, minor, and micro version numbers. From the results, all packages used an Epoch number of 0 (the implied epoch number as none supplied one). The data for major, minor, and micro version numbers included too many anomalies to plot nicely. From the data, the following changes will be made:
There were some anomalies to these rules (e.g. |
ba54226 to
73b5eeb
Compare
|
I am going to request review on this as I don't believe an integration test may be appropriate for this feature, since there is no guarantee that any given package may be assumed to always have one release, so if I use a package for the test if it releases a new version the test no longer makes sense. |
69db6cd to
633ff50
Compare
src/macaron/malware_analyzer/pypi_heuristics/metadata/anomalistic_version.py
Show resolved
Hide resolved
src/macaron/malware_analyzer/pypi_heuristics/metadata/anomalistic_version.py
Outdated
Show resolved
Hide resolved
src/macaron/malware_analyzer/pypi_heuristics/metadata/anomalistic_version.py
Show resolved
Hide resolved
dd2dd17 to
434e322
Compare
src/macaron/malware_analyzer/pypi_heuristics/metadata/anomalistic_version.py
Outdated
Show resolved
Hide resolved
src/macaron/malware_analyzer/pypi_heuristics/metadata/anomalistic_version.py
Outdated
Show resolved
Hide resolved
src/macaron/malware_analyzer/pypi_heuristics/metadata/anomalous_version.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!
This PR adds in a new heuristic to the PyPI malware analyzer that focuses on identifying anomalistic version numbers. An anomalistic version number is classified as one that is unrealistically high for a single release. This heuristic depends on the single-release heuristic, and currently implements thresholds for determining suspicious version numbers on the epoch, major, and minor version components. Versioning numbers must adhere to the PyPI standards (PEP 440, as per the
packagingmodule), otherwise they will not be analysed.This heuristic attempts to identify the versioning pattern to reduce false-positives. Calendar versioning is defined as a versioning pattern where the major is in the form of YYYY or YY, the minor is in the form of MM or M, and the micro in the form of DD or D, with no other release components (i.e. in the form YYYY.MM.DD only). These values must correspond to the upload time with a 48-hour window for time differences to be classified as calendar versioning.
Calendar-semantic versioning is defined as a versioning pattern where the major is in the form of YYYY or YY, but all other components are not detected as calendar versioning. All other versioning patterns are classed as semantic versioning.
Outstanding tasks for this PR:
defaults.ini