Skip to content

Conversation

@jajanet
Copy link

@jajanet jajanet commented Oct 13, 2025

As part of #47, this PR helps ensure P0 CUJ-1 (log data leak ID and removal) and P0 CUJ-2 (ID sensitive flow to 3P) is addressed in the security:analyze command

This also helps cover more privacy specific features via outputting a simple datamap with source and sinks that the end of the analysis

Pending more test cases, this is an example of what a run would look like with a small set of tests: https://screenshot.googleplex.com/8nuFzxWcS5V2X6b (computer settings won't let me paste or upload an image to GH for some reason)

In short, this mainly adds:

  • privacy taint analysis skill to make sure those issues are flagged (similar to security ones)
  • edits the following analysis fields:
    • Location --> Source Location, to make the privacy datamap more clear
  • the following fields to the analysis:
    • vulnerability type (to differentiate between privacy and security issues)
    • sink (only for privacy issues, to complete the datamap)
    • data type (only for privacy issues, to flag the specific PII)

@google-cla
Copy link

google-cla bot commented Oct 13, 2025

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@capachino capachino changed the title Add privacy specific taxonomy feat: Add privacy specific taxonomy Oct 13, 2025
* **Action:** Read the entire `DRAFT_SECURITY_REPORT.md` file.
* **Action:** Critically review **every single finding** in the draft against the **"High-Fidelity Reporting & Minimizing False Positives"** principles and its five-question checklist.
* **Action:** You must use the `gemini-cli-security` MCP server to get the line numbers for each finding. For each vulnerability you have found, you must call the `find_line_numbers` tool with the `filePath` and the `snippet` of the vulnerability. You will then add the `startLine` and `endLine` to the final report.
* **Action:** After reviewing the detailed findings, you will synthesize all identified privacy violations into a summary table. This table must be included at the top of the final report under a `## Privacy Data Map` heading.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this pollutes the output too much without bringing extra value compared to the "vulnerability" it already surfaces. One idea is just to add source and sink to the summary of the privacy violation when generating the report.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! I wasn't sure the best way to rectify this -- currently, I added fields to the Skillset: Reporting in GEMINI.md that are conditional on a vulnerability being privacy related along with a vulnerability type field

I guess the main question I have is: should the privacy and security issues commingle in the final report?

As of recent changes, they commingle -- for example, we could have a single report which lists a security issue, followed by a couple of privacy issues, which is followed by a security one: XSS, PII in Logs, PII to 3P, SSRF

Alternatively, we could be a separate security section and privacy section. Meaning, the Security section would have XSS, SSRF and Privacy would have PII in Logs, PII to 3P for the same example

Thoughts?

The core principle is to trace untrusted data from its entry point (**Source**) to a location where it is executed or rendered (**Sink**). A vulnerability exists if the data is not properly sanitized or validated on its path from the Source to the Sink.
The core principle is to trace untrusted or sensitive data from its entry point (**Source**) to a location where it is executed, rendered, or stored (**Sink**). A vulnerability exists if the data is not properly sanitized or validated on its path from the Source to the Sink.
### Extended Skillset: Privacy Taint Analysis
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered merging this "Privacy Taint Analysis" into the current taxonomy of "Logging of Sensitive Information" and "PII Handling Violations" in Gemini.md?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that looks like a better spot to put it! Let me move it there!

GEMINI.md Outdated
* **Severity:** Critical, High, Medium, or Low.
* **Location:** The file path where the vulnerability was introduced and the line numbers if that is available.
* **Source Location:** The file path where the vulnerability was introduced and the line numbers if that is available.
* **Sink Location:** If this is a privacy issue, include this location where sensitive data is exposed or leaves the application's trust boundary
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: add a final period here.

---
## Skillset: Privacy Taint Analysis
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are effectively expanding the taxonomy, would it be better to have this included as 1.7 in the section above? This is essentially insecure data handling category, I think? cc: @heltonduarte @capachino

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at this, I agree -- keeping it under a new 1.7 section would be better because of that and it would keep the tool as a single unified workflow!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants