Skip to content

Commit f8d397e

Browse files
refactor: centralize PAT validation, streamline repo checks & misc cleanup (#349)
* refactor: centralize PAT validation, streamline repo checks & housekeeping * `.venv*` to `.gitignore` * `# type: ignore[attr-defined]` hints in `compat_typing.py` for IDE-agnostic imports * Helpful PAT string in `InvalidGitHubTokenError` for easier debugging * Bump **ruff-pre-commit** hook → `v0.12.1` * CONTRIBUTING: * Require **Python 3.9+** * Recommend signed (`-S`) commits * PAT validation now happens **only** in entry points (`utils.auth.resolve_token` for CLI/lib, `server.process_query` for Web UI) * Unified `_check_github_repo_exists` into `check_repo_exists`, replacing `curl -I` with `curl --silent --location --write-out %{http_code} -o /dev/null` * Broaden `_GITHUB_PAT_PATTERN` * `create_git_auth_header` raises `ValueError` when hostname is missing * Tests updated to expect raw HTTP-code output * Superfluous “token can be set via `GITHUB_TOKEN`” notes in docstrings * `.gitingestignore` & `.terraform` from `DEFAULT_IGNORE_PATTERNS` * Token validation inside `create_git_command` * Obsolete `test_create_git_command_invalid_token` * Adjust `test_clone.py` and `test_git_utils.py` for new status-code handling * Consolidate mocks after token-validation relocation BREAKING CHANGE: `create_git_command` no longer validates GitHub tokens; callers must ensure tokens are valid (via `validate_github_token`) before invoking lower-level git helpers. --------- Co-authored-by: Copilot <[email protected]>
1 parent 2592303 commit f8d397e

14 files changed

+109
-143
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -126,6 +126,7 @@ celerybeat.pid
126126
# Environments
127127
.env
128128
.venv
129+
.venv*
129130
env/
130131
venv/
131132
ENV/

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ repos:
7575
args: ["--disable=line-length"]
7676

7777
- repo: https://github.com/astral-sh/ruff-pre-commit
78-
rev: v0.12.0
78+
rev: v0.12.1
7979
hooks:
8080
- id: ruff-check
8181
- id: ruff-format

CONTRIBUTING.md

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ Thanks for your interest in contributing to Gitingest! 🚀 Gitingest aims to be
1919
cd gitingest
2020
```
2121

22+
**Note**: To contribute, ensure you have **Python 3.9 or newer** installed, as some of the `pre-commit` hooks (e.g. `pyupgrade`) require Python 3.9+.
23+
2224
3. Set up the development environment and install dependencies:
2325

2426
```bash
@@ -31,7 +33,7 @@ Thanks for your interest in contributing to Gitingest! 🚀 Gitingest aims to be
3133
4. Create a new branch for your changes:
3234

3335
```bash
34-
git checkout -b your-branch
36+
git checkout -S -b your-branch
3537
```
3638

3739
5. Make your changes. Make sure to add corresponding tests for your changes.
@@ -66,10 +68,18 @@ Thanks for your interest in contributing to Gitingest! 🚀 Gitingest aims to be
6668

6769
9. Confirm that everything is working as expected. If you encounter any issues, fix them and repeat steps 6 to 8.
6870

69-
10. Commit your changes:
71+
10. Commit your changes (signed):
72+
73+
All commits to Gitingest must be [GPG-signed](https://docs.github.com/en/authentication/managing-commit-signature-verification) so that the project can verify the authorship of every contribution. You can either configure Git globally with:
74+
75+
```bash
76+
git config --global commit.gpgSign true
77+
```
78+
79+
or pass the `-S` flag as shown below.
7080

7181
```bash
72-
git commit -m "Your commit message"
82+
git commit -S -m "Your commit message"
7383
```
7484

7585
If `pre-commit` raises any issues, fix them and repeat steps 6 to 9.

src/gitingest/clone.py

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@
1313
ensure_git_installed,
1414
is_github_host,
1515
run_command,
16-
validate_github_token,
1716
)
1817
from gitingest.utils.os_utils import ensure_directory
1918
from gitingest.utils.timeout_wrapper import async_timeout
@@ -23,7 +22,7 @@
2322

2423

2524
@async_timeout(DEFAULT_TIMEOUT)
26-
async def clone_repo(config: CloneConfig, token: str | None = None) -> None:
25+
async def clone_repo(config: CloneConfig, *, token: str | None = None) -> None:
2726
"""Clone a repository to a local path based on the provided configuration.
2827
2928
This function handles the process of cloning a Git repository to the local file system.
@@ -36,7 +35,6 @@ async def clone_repo(config: CloneConfig, token: str | None = None) -> None:
3635
The configuration for cloning the repository.
3736
token : str | None
3837
GitHub personal access token (PAT) for accessing private repositories.
39-
Can also be set via the ``GITHUB_TOKEN`` environment variable.
4038
4139
Raises
4240
------
@@ -51,10 +49,6 @@ async def clone_repo(config: CloneConfig, token: str | None = None) -> None:
5149
branch: str | None = config.branch
5250
partial_clone: bool = config.subpath != "/"
5351

54-
# Validate token if provided
55-
if token and is_github_host(url):
56-
validate_github_token(token)
57-
5852
# Create parent directory if it doesn't exist
5953
await ensure_directory(Path(local_path).parent)
6054

src/gitingest/query_parser.py

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,6 @@ async def parse_query(
4949
Patterns to ignore. Can be a set of strings or a single string.
5050
token : str | None
5151
GitHub personal access token (PAT) for accessing private repositories.
52-
Can also be set via the ``GITHUB_TOKEN`` environment variable.
5352
5453
Returns
5554
-------
@@ -109,7 +108,6 @@ async def _parse_remote_repo(source: str, token: str | None = None) -> Ingestion
109108
The URL or domain-less slug to parse.
110109
token : str | None
111110
GitHub personal access token (PAT) for accessing private repositories.
112-
Can also be set via the ``GITHUB_TOKEN`` environment variable.
113111
114112
Returns
115113
-------
@@ -301,7 +299,6 @@ async def try_domains_for_user_and_repo(user_name: str, repo_name: str, token: s
301299
The name of the repository.
302300
token : str | None
303301
GitHub personal access token (PAT) for accessing private repositories.
304-
Can also be set via the ``GITHUB_TOKEN`` environment variable.
305302
306303
Returns
307304
-------
@@ -316,7 +313,7 @@ async def try_domains_for_user_and_repo(user_name: str, repo_name: str, token: s
316313
"""
317314
for domain in KNOWN_GIT_HOSTS:
318315
candidate = f"https://{domain}/{user_name}/{repo_name}"
319-
if await check_repo_exists(candidate, token=token if domain == "github.com" else None):
316+
if await check_repo_exists(candidate, token=token if domain.startswith("github.") else None):
320317
return domain
321318

322319
msg = f"Could not find a valid repository host for '{user_name}/{repo_name}'."

src/gitingest/utils/auth.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44

55
import os
66

7+
from gitingest.utils.git_utils import validate_github_token
8+
79

810
def resolve_token(token: str | None) -> str | None:
911
"""Resolve the token to use for the query.
@@ -19,4 +21,7 @@ def resolve_token(token: str | None) -> str | None:
1921
The resolved token.
2022
2123
"""
22-
return token or os.getenv("GITHUB_TOKEN")
24+
token = token or os.getenv("GITHUB_TOKEN")
25+
if token:
26+
validate_github_token(token)
27+
return token

src/gitingest/utils/compat_typing.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
"""Compatibility layer for typing."""
22

33
try:
4-
from typing import ParamSpec, TypeAlias # Py ≥ 3.10
4+
from typing import ParamSpec, TypeAlias # type: ignore[attr-defined] # Py ≥ 3.10
55
except ImportError:
6-
from typing_extensions import ParamSpec, TypeAlias # Py 3.8 / 3.9
6+
from typing_extensions import ParamSpec, TypeAlias # type: ignore[attr-defined] # Py 3.8 / 3.9
77

88
try:
9-
from typing import Annotated # Py ≥ 3.9
9+
from typing import Annotated # type: ignore[attr-defined] # Py ≥ 3.9
1010
except ImportError:
11-
from typing_extensions import Annotated # Py 3.8
11+
from typing_extensions import Annotated # type: ignore[attr-defined] # Py 3.8
1212

1313
__all__ = ["Annotated", "ParamSpec", "TypeAlias"]

src/gitingest/utils/exceptions.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,8 @@ class InvalidGitHubTokenError(ValueError):
4242
"""Exception raised when a GitHub Personal Access Token is malformed."""
4343

4444
def __init__(self) -> None:
45-
super().__init__(
46-
"Invalid GitHub token format. Token should start with 'github_pat_' or 'ghp_' "
47-
"followed by at least 36 characters of letters, numbers, and underscores.",
45+
msg = (
46+
"Invalid GitHub token format. To generate a token, go to "
47+
"https://github.com/settings/tokens/new?description=gitingest&scopes=repo."
4848
)
49+
super().__init__(msg)

0 commit comments

Comments
 (0)