Skip to content

[Backend] Refactor ipynb kernel messages serialization #436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Aug 7, 2023

Conversation

senwang86
Copy link
Collaborator

Summary

Currently, the kernel messages are concatenated and stored in a single object, i.e., pod.result. This concatenation behavior creates a few discrepancy with Colab regarding the output results, e.g., it can't produce multiple plots in a single pod, the order of line execution might confuse users (see screenshots in Test section)

Test

Before

  • Screenshot 2023-08-04 at 10 41 48 PM
  • Screenshot 2023-08-04 at 10 40 19 PM

After

Screenshot 2023-08-04 at 10 41 10 PM
  • Export/Import also verified

Follow-up

  • The ResultBlock in Code.tsx needs more tuning
  • We might need to support more message types, Messaging in Jupyter

@senwang86 senwang86 requested a review from lihebi August 5, 2023 07:33
<Box
component="pre"
whiteSpace="pre-wrap"
key={i + 1}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key must be unique. There's a console warning.

Screenshot 2023-08-05 at 1 07 50 PM

@lihebi
Copy link
Collaborator

lihebi commented Aug 5, 2023

  1. The stderr is not printed.
  2. Also, I'd like to keep the visual <hr/> to visually separate the stdout/stderr-stream and the return value.

Reference:

Screenshot 2023-08-05 at 1 12 34 PM

Old behavior:

Screenshot 2023-08-05 at 1 16 23 PM

Comment on lines 297 to 298
// There's no exec_count in display_data, thus we pass in the session exec_count
count: exec_count,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we pass in 0? This value doesn't seem to be useful. Thus, I don't feel maintaining a session exec_count is necessary. It increases the logic complexity.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the count in execute_result is enough.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the count in execute_result is enough.

The tricky part is that execute_result is not always available, instead, IIUC, execute_reply would be the last messages in each cell run. The key is how to update the count properly, the logic would be either on the frontend or backend.

Copy link
Collaborator

@lihebi lihebi Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can wait until the final execution results before setting the count. This should be consistent with the behavior of Jupyter?

Copy link
Collaborator Author

@senwang86 senwang86 Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can wait until the final execution results before setting the count. This should be consistent with the behavior of Jupyter?

Yes, I think we can make it in the frontend as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if maintaining session_exec_count ourselves is accurate. I'd vote for using msgs.content.execution_count because it is always accurate and seems enough for the purpose of showing a count.

text?: string;
count: number;
image?: string;
}[];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not confident about changing this. It is a breaking change, the old values in the DB need migration to work with the new code.

If we really want to change this, we need to supply a DB migration script or procedure/function.

I think the issue you are trying to fix is:

  1. the order of stderr
  2. being able to display multiple images (at the end, not to be mixed with stdout/stderr streams)

I think you can fix (1) without introducing this schema change. For 2, I'd suggest skipping it for now, it's not crucial.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, using an array to store streams doesn't sound like a good idea. The stream can come at any granularity, e.g., commonly line-by-line using because \n flushes stream in most languages, or users may call flush() manually.

I believe "stream" is supposed to be concatenated together upon receiving.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In summary, I suggest fixing only issue 1 here, with minimal change so that we don't have to worry about migration. Forget about 2, it's not important.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to open another PR to fix issue 1, and leave this PR for future reference.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, using an array to store streams doesn't sound like a good idea. The stream can come at any granularity, e.g., commonly line-by-line using because \n flushes stream in most languages, or users may call flush() manually.

I believe "stream" is supposed to be concatenated together upon receiving.

So the change here is put all the returned messages from kernel in an array, rather than manually separating off each message and concatenating the text field. Ipynb kernel would decide how to concatenate each line's execution.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not confident about changing this. It is a breaking change, the old values in the DB need migration to work with the new code.

If we really want to change this, we need to supply a DB migration script or procedure/function.

I think the issue you are trying to fix is:

  1. the order of stderr
  2. being able to display multiple images (at the end, not to be mixed with stdout/stderr streams)

I think you can fix (1) without introducing this schema change. For 2, I'd suggest skipping it for now, it's not crucial.

IIUC, the JSON format change will not render the result filed correctly in the existing repos, in that case, will a re-execution of each pod overwrite the result field?

Copy link
Collaborator

@lihebi lihebi Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right, the re-execution will fix the result field. The only thing that breaks is the existing result, which might not be that important to do a migration for it, especially at this early release point.

@lihebi
Copy link
Collaborator

lihebi commented Aug 7, 2023

  1. The stderr is not printed.
  2. Also, I'd like to keep the visual <hr/> to visually separate the stdout/stderr-stream and the return value.

What do you say about these two issues? @senwang86 The rest of the code looks good to me.

@senwang86
Copy link
Collaborator Author

  1. The stderr is not printed.
  2. Also, I'd like to keep the visual <hr/> to visually separate the stdout/stderr-stream and the return value.

What do you say about these two issues? @senwang86 The rest of the code looks good to me.

These 2 issues are addressed in the ab4818c, can you give it a test?

@lihebi
Copy link
Collaborator

lihebi commented Aug 7, 2023

Just tried, the order is not fixed. Two issues:

  1. the order should be 1,2,3
  2. there shouldn't be spaces in between (there are spaces after 3, and after 2)
Screenshot 2023-08-07 at 3 54 49 PM

Comment on lines 354 to 357
return <></>;
}
default:
return <></>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The console warning is still there, caused by these two lines. Adding key={combineKey} will fix it.

@senwang86
Copy link
Collaborator Author

Just tried, the order is not fixed. Two issues:

  1. the order should be 1,2,3
  2. there shouldn't be spaces in between (there are spaces after 3, and after 2)
Screenshot 2023-08-07 at 3 54 49 PM

Forget to mention about this, it depends on how Ipynb kernel handles the running result, the result is consistent with Colab.

Screenshot 2023-08-07 at 4 03 02 PM

@lihebi
Copy link
Collaborator

lihebi commented Aug 7, 2023

I see, SG.

@senwang86
Copy link
Collaborator Author

I see, SG.

Spaces removed.

Screenshot 2023-08-07 at 4 29 36 PM

@lihebi
Copy link
Collaborator

lihebi commented Aug 7, 2023

Cool, thanks!

@lihebi lihebi merged commit 30b3b0f into codepod-io:main Aug 7, 2023
@senwang86 senwang86 deleted the refactor_ipynb_kernel_messages branch September 7, 2023 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants