-
Notifications
You must be signed in to change notification settings - Fork 108
Assistant: Initial pass at implementing a data summary tool for Python #8208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
E2E Tests 🚀 |
5c7173e
to
5567433
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start. My main suggestion is to rename the API that routes requests to the variables comm to something more generic (and it can just query a single session variable at a time) so that we can use it to add more data querying tools without having to modify the Positron API each time
The other changes that we will want to make is to make the handling of these tool calls "asynchronous" so they they do not block the functioning of the variables comm — this means basically copying the pattern from the data explorer comm for the get_column_profiles request (and its corresponding return_column_profiles front-end API, see https://github.com/posit-dev/positron/blob/main/extensions/positron-python/python_files/posit/positron/data_explorer.py#L492-L519)
extensions/positron-python/python_files/posit/positron/variables_comm.py
Show resolved
Hide resolved
"type_display": column.type_display, | ||
"summary_stats": summary_stats, | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good starting point to have this tool surfaced in the variables comm — since computing summary stats or other computed profiles can be expensive (and thus block other messaging handling in the variables comm), we'll probably want to separate "expensive" requests (e.g. summary stats, frequency tables, histograms, etc.) from "cheap" requests (like asking for the schema), and make sure that the expensive requests and performed in an asynchronous-response pattern like the get_column_profiles
request in the data explorer. This doesn't all have to get done in this PR so can be follow up work
extensions/positron-python/python_files/posit/positron/variables.py
Outdated
Show resolved
Hide resolved
src/vs/workbench/api/common/positron/extHost.positron.protocol.ts
Outdated
Show resolved
Hide resolved
29b64a0
to
94cb220
Compare
274cd0f
to
b902acc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is close to a good stopping point for the initial pass — I think the main thing that need to get fixed is the return type for the query_variable_data RPC — since it isn't easy to access all of the data explorer comm types in all the layers where this function is called, we can just return serialized JSON from the function for now (effectively schema: string, column_profiles: string[]
)
# Create a temporary table view with a temporary comm | ||
temp_state = DataExplorerState("temp_summary") | ||
temp_comm = PositronComm.create(target_name="temp_summary", comm_id="temp_summary_comm") | ||
table_view = _get_table_view(value, temp_comm, temp_state, self.kernel.job_queue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe later we can set up a persistent data explorer comm to use for Assistant tool calls (I realized just now after my earlier comment about the async column profiles — not needed for now — that these depend on there being a live comm available to send the frontend event though with the asynchronous result. We can look more closely at this later)
"description": "Result of the summarize operation", | ||
"type": "object", | ||
"properties": { | ||
"children": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is returning a different return type right now (with the schema and column profiles, so a lot more complex). I think to avoid having to drag along the schema and profile result type (and mainly having to expose these in the Positron runtime / extHost API) we can just return the schema and profiles as a serialized JSON string to sidestep this issue for now -- it would be good to make these results well-typed everywhere but there's a bunch of plumbing needed).
I rebased this today and will work on some unit tests on the Python backend portion before it can be merged |
b0bb2d8
to
3d81ecd
Compare
@sharon-wang @jmcphers I think I've got this to a good stopping point on the Python side — I can go ahead and merge but it will be broken for R until #8343 is tackled (shouldn't be too difficult, I don't think!). Let me know how you would like to proceed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uninitiated with the python data explorer code, so I mostly looked at the assistant changes!
For #8343, is the leftover work to implement the equivalent to the extensions/positron-python
changes in this PR for extensions/positron-r
?
// temporarily only enable for Python sessions | ||
let session: positron.LanguageRuntimeSession | undefined; | ||
const sessions = await positron.runtime.getActiveSessions(); | ||
if (sessions && sessions.length > 0) { | ||
session = sessions.find( | ||
(session) => session.metadata.sessionId === options.input.sessionIdentifier, | ||
); | ||
} | ||
if (!session) { | ||
return new vscode.LanguageModelToolResult([ | ||
new vscode.LanguageModelTextPart('[[]]') | ||
]); | ||
} | ||
|
||
if (session.runtimeMetadata.languageId !== 'python') { | ||
return new vscode.LanguageModelToolResult([ | ||
new vscode.LanguageModelTextPart('[[]]') | ||
]); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on filtering out this tool when there's no session and (temporarily) if it's not a python session? Maybe we can also temporarily add a note to the tool description that the tool is only available for Python. This way, the tool shouldn't be available or run at all.
We have some tool filtering logic here:
positron/extensions/positron-assistant/src/participants.ts
Lines 201 to 258 in 90b4e5c
// List of tools for use by the language model. | |
const tools: vscode.LanguageModelChatTool[] = vscode.lm.tools.filter( | |
tool => { | |
// Don't allow any tools in the terminal. | |
if (this.id === ParticipantID.Terminal) { | |
return false; | |
} | |
// Define more readable variables for filtering. | |
const inChatPane = request.location2 === undefined; | |
const inEditor = request.location2 instanceof vscode.ChatRequestEditorData; | |
const hasSelection = inEditor && request.location2.selection?.isEmpty === false; | |
const isAgentMode = this.id === ParticipantID.Agent; | |
// If streaming edits are enabled, don't allow any tools in inline editor chats. | |
if (isStreamingEditsEnabled() && this.id === ParticipantID.Editor) { | |
return false; | |
} | |
// If the tool requires a workspace, but no workspace is open, don't allow the tool. | |
if (tool.tags.includes(TOOL_TAG_REQUIRES_WORKSPACE) && !isWorkspaceOpen()) { | |
return false; | |
} | |
switch (tool.name) { | |
// Only include the execute code tool in the Chat pane; the other | |
// panes do not have an affordance for confirming executions. | |
// | |
// CONSIDER: It would be better for us to introspect the tool itself | |
// to see if it requires confirmation, but that information isn't | |
// currently exposed in `vscode.LanguageModelChatTool`. | |
case PositronAssistantToolName.ExecuteCode: | |
return inChatPane && | |
// The execute code tool does not yet support notebook sessions. | |
positronContext.activeSession?.mode !== positron.LanguageRuntimeSessionMode.Notebook && | |
isAgentMode; | |
// Only include the documentEdit tool in an editor and if there is | |
// no selection. | |
case PositronAssistantToolName.DocumentEdit: | |
return inEditor && !hasSelection; | |
// Only include the selectionEdit tool in an editor and if there is | |
// a selection. | |
case PositronAssistantToolName.SelectionEdit: | |
return inEditor && hasSelection; | |
// Only include the edit file tool in edit or agent mode i.e. for the edit participant. | |
case PositronAssistantToolName.EditFile: | |
return this.id === ParticipantID.Edit || isAgentMode; | |
// Only include the documentCreate tool in the chat pane and if the user is an agent. | |
case PositronAssistantToolName.DocumentCreate: | |
return inChatPane && isAgentMode; | |
// Otherwise, include the tool if it is tagged for use with Positron Assistant. | |
// Allow all tools in Agent mode. | |
default: | |
return isAgentMode || | |
tool.tags.includes('positron-assistant'); | |
} | |
} | |
); |
Otherwise, we could throw an Error noting that this is only available for Python or return a string instead of returning an empty text part, just so it's clear to the user and the model why we were unable to grab the table summary info?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, let me see if I can figure out how to do that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return new vscode.LanguageModelToolResult([ | ||
new vscode.LanguageModelTextPart('[[]]') | ||
]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we throw an error here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is copy-pasted from the inspectVariablesTool so I'm a bit out of my depth what this should be changed to — I'll push my changes now and let me know if you'd like to change this to something else and I'll let you edit the branch directly
Just rebased this and addressed everything but the question about whether to raise an error at https://github.com/posit-dev/positron/pull/8208/files#diff-e480e08db3fbdac969a0529ab74c8ff701d647882e9610bc2eec7b5e2a9f45f2 — I think this is mergeable in its current state and we can make improvements in follow up PRs. The tool is only available in Python sessions for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I'm seeing the tool only available for Python sessions. Thank you for adding this 🙌
@sharon-wang this failed with
I'll try rebasing again to see if it is fixed on top of main |
improve logging performance to satisfy linter clean up code provide temp comm to satisfy pyright modify openRPC specs to autogen comms ccode and fix bug with passing 'path' parameter, also rename summarizeData function to make it more generic create data explorer helper functions revert formatting change
… the ext host API
…ving get schema requests
Co-authored-by: sharon <[email protected]> Signed-off-by: Wes McKinney <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yay checks are green!
First pass at #7114
Provides Assistant with a
getDataSummary
tool, currently only implemented for Python, that provides a JSON structured summary of a data object by using the Positron API to communicate with the Variables Comm. I updated the variable's python backend to reuse existing functionality from the data explorer.I used the
inspectVariables
tool as a guide for retrieving info from the variables comm.Release Notes
New Features
Bug Fixes
QA Notes
@:data-explorer
@:assistant
@:variables
@:plots
@:viewer