Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
82 commits
Select commit Hold shift + click to select a range
d9fd5f9
Differentiate between gitlab and github instances
cdboer Nov 19, 2022
3ecf13c
Restructure getting started section
cdboer Nov 19, 2022
2865e5e
Add PyGithub to dependencies
cdboer Dec 3, 2022
b49eae1
Add github specific dataclasses
cdboer Dec 4, 2022
beb5f9d
Add fetcher for github resources
cdboer Dec 4, 2022
e4ca20c
Change login behaviour for gitlab fetcher
cdboer Dec 4, 2022
af35224
Choose the correct fetcher when handling urls
cdboer Dec 4, 2022
5c58bb2
Add constants for github specific content
cdboer Dec 4, 2022
1f4a8ca
Allow models to handle gitlab resources
cdboer Dec 4, 2022
8cea2ce
Add GithubFetcher to toplevel package objects
cdboer Dec 4, 2022
851564e
Rename tokens.md to gitlab-token.md
cdboer Dec 4, 2022
18b6c02
Add token guide for github
cdboer Dec 4, 2022
cb3f0e0
Fix typo
cdboer Dec 4, 2022
525b9f2
Update fetcher implementation
cdboer Jan 9, 2023
fd938c2
Add fetcher factory to choose the correct fetcher based on a project url
cdboer Jan 9, 2023
634784a
Update domain object implementation
cdboer Jan 9, 2023
2be0f6c
Update constants
cdboer Jan 9, 2023
ed20caa
Annotation parsing should use the new domain objects
cdboer Jan 9, 2023
300c8bf
Add validation exception handling
cdboer Jan 9, 2023
60fb4d9
Add 'additionalProperties: false' to command properties to allow only…
cdboer Jan 9, 2023
a9b1b9e
Update handlers
cdboer Jan 9, 2023
277b7eb
Add new model implementation
cdboer Jan 9, 2023
ae97b03
operations.combine should use a sequence of arguments instead of an i…
cdboer Jan 9, 2023
bce781f
Inject the fetcher_factory instead of github/gitlab fetchers into han…
cdboer Jan 9, 2023
ccfcf68
Format with black & adjust validation to the new parser validation me…
cdboer Jan 9, 2023
61df234
Add subpackage for git fetching
cdboer Jan 29, 2023
7650d0c
Add subpackage for gitlab fetching
cdboer Jan 29, 2023
ab259d6
Add subpackage for github fetching
cdboer Jan 29, 2023
c0d23ad
Add class that handles everything related to project/clone urls
cdboer Jan 29, 2023
4651d73
Remove fetch subpackage as it has been replaced by 'git', 'hub', 'lab…
cdboer Jan 29, 2023
a2d28da
Remove 'Abstract' from 'AbstractRepository'
cdboer Jan 29, 2023
49b04ad
Replace config/parser.py by config/config.py
cdboer Jan 29, 2023
1ae70a7
Update package exports
cdboer Jan 29, 2023
14cf614
Fix schema by using 'oneOf' for array items (fix #78)
cdboer Jan 29, 2023
b064357
Add new commands
cdboer Jan 29, 2023
2ef9f8e
Add timestamps to AnnotatedVersions
cdboer Jan 29, 2023
f9f18bb
Not every tag has to have a matching commit
cdboer Jan 29, 2023
ff61032
Update provenance operations
cdboer Jan 29, 2023
d1202d7
Remove 'Abstract' from 'AbstractUnitOfWork'
cdboer Jan 29, 2023
348107a
Update 'UnitOfWork' import
cdboer Jan 29, 2023
8c3ad8b
Add new handlers, update existing ones
cdboer Jan 29, 2023
19996b1
Update bootstrapping to include the platform from which to fetch the …
cdboer Jan 29, 2023
39d06d4
Add seperate cli for github2prov (#79)
cdboer Jan 29, 2023
a90560e
Add github2prov as seperate script (#79)
cdboer Jan 29, 2023
243e9b8
Remap status names to the ones used in the internal data structures
cdboer Feb 26, 2023
b4074e7
Spelling fix
cdboer Feb 26, 2023
76596e1
Update commands
cdboer Feb 26, 2023
1046d44
Update file reading/writing operations
cdboer Feb 26, 2023
59f3d87
Update handlers to use correct commands
cdboer Feb 26, 2023
dbd614e
Regrouped commands in 'transform' command (#83)
cdboer Feb 26, 2023
796c5e0
Fix read/write option spelling
cdboer Feb 26, 2023
b4c744d
Update example configuration file
cdboer Feb 26, 2023
55de7a9
Fix 'stats' spelling
cdboer Feb 26, 2023
0baac8e
Add transform command to schema
cdboer Feb 26, 2023
b243965
Update README.md
cdboer Feb 26, 2023
f4c5439
Rewrite model explanations (#81)
cdboer Apr 17, 2023
e2ed0ea
Remove old test suite
cdboer Apr 17, 2023
52f420d
Update attribute tables
cdboer May 22, 2023
c6c8498
Add test cases for repository implementation
cdboer May 22, 2023
6678135
Add test cases for file objects
cdboer May 22, 2023
9c21dfe
Add test cases for user objects
cdboer May 22, 2023
dbf2a98
Add script to generate randomized objects
cdboer May 22, 2023
12d7d87
Rename random_generation to conftest.py
cdboer May 22, 2023
b37fba7
Use pytest fixtures to generate random objects
cdboer May 22, 2023
35f1c41
Add fixtures to conftest.py
cdboer May 22, 2023
d69bba2
Rename 'formatter' key to 'format' in schema definition
cdboer Aug 27, 2023
564d0d5
Add 'click' dependency, rename CLI entrypoints, and include 'schema.j…
cdboer Aug 27, 2023
0db55bb
Refactor ProvenanceContext and models to improve attribute usage and …
cdboer Aug 27, 2023
0638a9e
Refactor domain object attributes for clarity
cdboer Aug 27, 2023
cfd9b48
Enhance CLI commands and streamline configuration handling
cdboer Aug 27, 2023
93e5cfa
Refactor ProjectUrl and improve URL parsing
cdboer Aug 27, 2023
768ff9d
Update Git fetcher with additional commit and revision attributes
cdboer Aug 27, 2023
bfc8500
Refactor GithubAnnotationParser and Enhance Annotation Validation
cdboer Aug 27, 2023
f5c9218
Update Annotation Property Name in GitlabAnnotationParser
cdboer Aug 27, 2023
35dd383
Refactor and Enhance Provenance Operations
cdboer Aug 27, 2023
6a86b1b
Merge master into 77-github2prov-prototype-implementation
cdboer Aug 27, 2023
01c28b2
Update config file example
cdboer Aug 28, 2023
f93d403
Update README.md
cdboer Aug 28, 2023
66c591f
Update README title
cdboer Aug 28, 2023
f87a18f
Update README title
cdboer Aug 28, 2023
d28ee4d
Update README title
cdboer Aug 28, 2023
56818ca
Update node attribute tables
cdboer Aug 28, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 72 additions & 40 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<h1 align="center">Welcome to <code>gitlab2prov</code>! 👋</h1>
<h1 align="center"> <code>gitlab2prov</code>, <code>github2prov</code>: (🦊|🐈‍⬛) → 📄 </h1>
<p align="center">
<a href="https://github.com/dlr-sc/gitlab2prov/blob/master/LICENSE">
<img alt="License: MIT" src="https://img.shields.io/badge/license-MIT-yellow.svg" target="_blank" />
Expand Down Expand Up @@ -30,12 +30,12 @@
</p>


> `gitlab2prov` is a Python library and command line tool that extracts provenance information from GitLab projects.
> `gitlab2prov` is a Python library and command line tool that extracts provenance information from GitLab projects. GitHub support is provided by the `github2prov` command line tool contained in this package.

---

The `gitlab2prov` data model has been designed according to [W3C PROV](https://www.w3.org/TR/prov-overview/) specification.
The model documentation can be found [here](https://github.com/DLR-SC/gitlab2prov/tree/master/docs).
The data model underlying `gitlab2prov` & `github2prov` has been designed according to [W3C PROV](https://www.w3.org/TR/prov-overview/) specification.
The model documentation can be found [here](/docs/README.md).

## ️🏗️ ️Installation

Expand All @@ -57,14 +57,27 @@ pip install .[dev] # clone repo, install with extras
pip install gitlab2prov[dev] # PyPi, install with extras
```

That's it! You can now use `gitlab2prov` and `github2prov` from the command line.

```bash
gitlab2prov --version # show version
github2prov --version # show version
```


## ⚡ Getting started

`gitlab2prov` needs a [personal access token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html) to clone git repositories and to authenticate with the GitLab API.
Follow [this guide](./docs/guides/tokens.md) to create an access token with the required [scopes](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html#personal-access-token-scopes).
`gitlab2prov` & `github2prov` require a [personal access token](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html) to clone git repositories and to authenticate with the GitLab/GitHub API.

Use the following guides to obtain a token with the required [scopes](https://docs.gitlab.com/ee/user/profile/personal_access_tokens.html#personal-access-token-scopes) for yourself:
- [Create a personal access token (GitLab)](./docs/guides/gitlab-token.md)
- [Create a personal access token (GitHub)](./docs/guides/github-token.md)


## 🚀‍ Usage

The usage of `gitlab2prov` and `github2prov` is identical. The only difference being that `github2prov` only supports GitHub projects whereas `gitlab2prov` supports only GitLab projects. We will use `gitlab2prov` in the following examples.

`gitlab2prov` can be configured using the command line interface or by providing a configuration file in `.yaml` format.

### Command Line Usage
Expand All @@ -83,66 +96,85 @@ Options:
--help Show this message and exit.

Commands:
combine Combine multiple graphs into one.
extract Extract provenance information for one or more...
load Load provenance files.
merge-duplicated-agents Merge duplicated agents based on a name to...
pseudonymize Pseudonymize a provenance graph.
save Save provenance information to a file.
stats Print statistics such as node counts and...
combine Combine one or more provenance documents.
extract Extract provenance information for one or more gitlab projects.
read Read provenance information from file[s].
stats Print statistics for one or more provenance documents.
transform Apply a set of transformations to provenance documents.
write Write provenance information to file[s].
```

### Configuration Files
`gitlab2prov` supports configuration files in `.yaml` format that are functionally equivalent to command line invocations.

To read configuration details from a file instead of specifying on the command line, use the `--config` option:
To envoke a run using a config file, use the `--config` option:
```ini
# initiate a run using a config file
# run gitlab2prov using the config file 'config/example.yaml'
gitlab2prov --config config/example.yaml
```
You can validate your config file using the provided JSON-Schema `gitlab2prov/config/schema.json` that comes packaged with every installation:
You can validate your config file using the provided [JSON Schema file](gitlab2prov/config/schema.json) that comes packaged with every installation:
```ini
# check config file for syntactical errors
# validate config file 'config/example.yaml' against the JSON Schema
gitlab2prov --validate config/example.yaml
```

Config file example:
Here is an example config file that extracts provenance information from three GitLab projects, reads a serialized provenance document from a file, combines the resulting provenance documents, transforms the combined document and writes it to files in different formats. Finally, statistics about the generated output are printed to the console:

```yaml
- extract:
url: ["https://gitlab.com/example/foo"]
token: tokenA
url:
- "https://gitlab.com/aristotle/nicomachean-ethics"
- "https://gitlab.com/aristotle/poetics"
token: golden_mean_and_drama_token
- extract:
url: ["https://gitlab.com/example/bar"]
token: tokenB
- load:
input: [example.rdf]
- pseudonymize:
url:
- "https://gitlab.com/plato/the-republic"
- "https://gitlab.com/plato/phaedrus"
token: ideal_forms_and_speech_token
- extract:
url: ["https://gitlab.com/socrates/apology"]
token: know_thyself_token
- read:
input: [aristotelian_logic.rdf]
- combine:
- save:
output: combined
format: [json, rdf, xml, dot]
- transform:
use_pseudonyms: true
remove_duplicates: true
- write:
output: philosopher_outputs
format: [json, rdf, xml, dot]
- stats:
fine: true
explain: true
formatter: table
fine: true
explain: true
format: table
```

The config file example is functionally equivalent to this command line invocation:

```
gitlab2prov extract -u https://gitlab.com/example/foo -t tokenFoo \
extract -u https://gitlab.com/example/bar -t tokenBar \
load -i example.rdf \
pseudonymize \
combine \
save -o combined -f json -f rdf -f xml -f dot \
stats --fine --explain --formatter table
gitlab2prov \
extract \
--url https://gitlab.com/aristotle/nicomachean-ethics \
--url https://gitlab.com/aristotle/poetics \
--token golden_mean_and_drama_token \
extract \
--url https://gitlab.com/plato/the-republic \
--url https://gitlab.com/plato/phaedrus \
--token ideal_forms_and_speech_token \
extract \
--url https://gitlab.com/socrates/apology --token know_thyself_token \
read --input aristotelian_logic.rdf \
combine \
transform --use_pseudonyms --remove_duplicates \
write --output philosopher_outputs \
--format json --format rdf --format xml --format dot \
stats --fine --explain --format table

```

### 🎨 Provenance Output Formats

`gitlab2prov` supports output formats that the [`prov`](https://github.com/trungdong/prov) library provides:
`gitlab2prov` & `github2prov` support all output formats that the [`prov`](https://github.com/trungdong/prov) library provides:
* [PROV-N](http://www.w3.org/TR/prov-n/)
* [PROV-O](http://www.w3.org/TR/prov-o/) (RDF)
* [PROV-XML](http://www.w3.org/TR/prov-xml/)
Expand Down Expand Up @@ -201,7 +233,7 @@ You can also cite specific releases published on Zenodo: [![DOI](https://zenodo.
`gitlab2prov` depends on several open source packages that are made freely available under their respective licenses.

| Package | License |
| --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------- |
|-----------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|
| [GitPython](https://github.com/gitpython-developers/GitPython) | [![License](https://img.shields.io/badge/License-BSD_3--Clause-orange.svg)](https://opensource.org/licenses/BSD-3-Clause) |
| [click](https://github.com/pallets/click) | [![License](https://img.shields.io/badge/License-BSD_3--Clause-orange.svg)](https://opensource.org/licenses/BSD-3-Clause) |
| [python-gitlab](https://github.com/python-gitlab/python-gitlab) | [![License: LGPL v3](https://img.shields.io/badge/License-LGPL_v3-blue.svg)](https://www.gnu.org/licenses/lgpl-3.0) |
Expand Down
35 changes: 22 additions & 13 deletions config/example.yaml
Original file line number Diff line number Diff line change
@@ -1,18 +1,27 @@
# yaml-language-server: $schema=../gitlab2prov/config/schema.json
- extract:
url: ["https://gitlab.com/example/foo"]
token: tokenFoo
url:
- "https://gitlab.com/aristotle/nicomachean-ethics"
- "https://gitlab.com/aristotle/poetics"
token: golden_mean_and_drama_token
- extract:
url: ["https://gitlab.com/example/bar"]
token: tokenBar
- load:
input: [example.rdf]
- pseudonymize:
url:
- "https://gitlab.com/plato/the-republic"
- "https://gitlab.com/plato/phaedrus"
token: ideal_forms_and_speech_token
- extract:
url: ["https://gitlab.com/socrates/apology"]
token: know_thyself_token
- read:
input: [aristotelian_logic.rdf]
- combine:
- save:
output: combined
format: [json, rdf, xml, dot]
- transform:
use_pseudonyms: true
remove_duplicates: true
- write:
output: philosopher_outputs
format: [json, rdf, xml, dot]
- stats:
fine: true
explain: true
formatter: table
fine: true
explain: true
format: table
Loading