-
Notifications
You must be signed in to change notification settings - Fork 28
DA-1203 identify slow running queries tools 2 #84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @AayushTyagi1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a powerful new query analysis tool designed to help users identify and optimize slow or inefficient queries within their system. By leveraging the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a powerful new tool, analyze_queries, for diagnosing query performance issues by analyzing the system:completed_requests catalog. The overall structure is clean and extensible.
My review focuses on improving the robustness and maintainability of the new tool. I've identified some inconsistencies in how system-level queries are filtered across different analysis types, which could lead to incorrect results. I've also suggested performance improvements by avoiding SELECT * and a refactoring to reduce code duplication in the query filter logic. Addressing these points will make the tool more reliable and easier to maintain.
| SELECT * | ||
| FROM system:completed_requests | ||
| WHERE phaseCounts.`primaryScan` IS NOT MISSING | ||
| AND UPPER(statement) NOT LIKE '% SYSTEM:%' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filtering for this analysis is inconsistent with other analyses and the function's documentation. It's missing filters for INFER and CREATE INDEX queries. To ensure the analysis provides relevant, application-focused results, please add the missing filters.
| AND UPPER(statement) NOT LIKE '% SYSTEM:%' | |
| AND UPPER(statement) NOT LIKE 'INFER %' | |
| AND UPPER(statement) NOT LIKE 'CREATE INDEX%' | |
| AND UPPER(statement) NOT LIKE '% SYSTEM:%' |
| FROM system:completed_requests | ||
| WHERE phaseCounts.`indexScan` IS NOT MISSING | ||
| AND phaseCounts.`fetch` IS NOT MISSING | ||
| AND UPPER(statement) NOT LIKE '% SYSTEM:%' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the primary_index analysis, this query is missing filters for INFER and CREATE INDEX queries, which is inconsistent with the documentation and other analyses. Please add these filters to focus on application-level queries.
| AND UPPER(statement) NOT LIKE '% SYSTEM:%' | |
| AND UPPER(statement) NOT LIKE 'INFER %' | |
| AND UPPER(statement) NOT LIKE 'CREATE INDEX%' | |
| AND UPPER(statement) NOT LIKE '% SYSTEM:%' |
| SELECT statement, | ||
| AVG(phaseCounts.`indexScan` - resultCount) AS diff | ||
| FROM system:completed_requests | ||
| WHERE phaseCounts.`indexScan` > resultCount |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This analysis is missing all filters for system queries (INFER, CREATE INDEX, SYSTEM:*). This is inconsistent with the docstring and other analyses, and may include irrelevant system queries in the results. Please add the standard filters.
| WHERE phaseCounts.`indexScan` > resultCount | |
| WHERE phaseCounts.`indexScan` > resultCount | |
| AND UPPER(statement) NOT LIKE 'INFER %' | |
| AND UPPER(statement) NOT LIKE 'CREATE INDEX%' | |
| AND UPPER(statement) NOT LIKE '% SYSTEM:%' |
| "primary_index": { | ||
| "description": "Queries using primary indexes (typically inefficient, should use secondary indexes)", | ||
| "query": """ | ||
| SELECT * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using SELECT * can be inefficient as it retrieves all columns from system:completed_requests. It's a better practice to explicitly list only the columns needed for the analysis. This reduces data transfer and makes the query's intent clearer. Consider selecting key fields like statement, serviceTime, resultCount, resultSize, and phaseCounts.
| SELECT * | |
| SELECT statement, serviceTime, resultCount, resultSize, phaseCounts |
| "no_covering_index": { | ||
| "description": "Queries not using covering indexes (require document fetches, can be optimized)", | ||
| "query": """ | ||
| SELECT * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using SELECT * can be inefficient as it retrieves all columns from system:completed_requests. It's a better practice to explicitly list only the columns needed for the analysis. This reduces data transfer and makes the query's intent clearer. Consider selecting key fields like statement, serviceTime, resultCount, resultSize, and phaseCounts.
| SELECT * | |
| SELECT statement, serviceTime, resultCount, resultSize, phaseCounts |
| "query": """ | ||
| SELECT statement, | ||
| AVG(phaseCounts.`indexScan` - resultCount) AS diff | ||
| FROM system:completed_requests | ||
| WHERE phaseCounts.`indexScan` > resultCount | ||
| GROUP BY statement | ||
| ORDER BY diff DESC | ||
| LIMIT $limit | ||
| """, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WHERE clause to filter out system queries is duplicated across multiple analysis definitions. To improve maintainability and ensure consistency, you could define this filter as a constant string and use an f-string to inject it into the queries. This would make it easier to update the filter in one place in the future.
Example:
SYSTEM_QUERY_FILTER = "UPPER(statement) NOT LIKE 'INFER %' AND UPPER(statement) NOT LIKE 'CREATE INDEX%' AND UPPER(statement) NOT LIKE '% SYSTEM:%'"
analyses = {
"longest_running": {
"description": "...",
"query": f"""
SELECT ...
FROM system:completed_requests
WHERE {SYSTEM_QUERY_FILTER}
...
""",
},
# ... other analyses
}
Screen.Recording.2025-11-21.at.3.52.25.AM.mov