Skip to content

Conversation

mgrover1
Copy link
Contributor

Updated pandas notebook to include suggested templating formats.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link

🚀 📚 Preview for git commit SHA: e4b5c0d at: https://60b00738729f6c2f9a806f5e--pythia-foundations.netlify.app

@github-actions
Copy link

🚀 📚 Preview for git commit SHA: e4b5c0d at: https://60b0078c48b35b21e4d31a3a--pythia-foundations.netlify.app

@dcamron
Copy link
Contributor

dcamron commented May 27, 2021

Replaces (and is based off of) #34

@dcamron dcamron added the content Content related issue label May 27, 2021
@github-actions
Copy link

🚀 📚 Preview for git commit SHA: 6dd844e at: https://60b00a47a9a0e7368b02f2ba--pythia-foundations.netlify.app

@github-actions
Copy link

🚀 📚 Preview for git commit SHA: 6dd844e at: https://60b00a5f33317129bb15f36e--pythia-foundations.netlify.app

@brian-rose
Copy link
Member

@mgrover1 is this ready for review?

@mgrover1
Copy link
Contributor Author

@mgrover1 is this ready for review?

I need to update the Pandas markdown file, but after that it will be.

@brian-rose
Copy link
Member

@mgrover1 is this ready for review?

I need to update the Pandas markdown file, but after that it will be.

Ok! When you're ready, feel free to tag one or two individuals for a thorough review. Or if you prefer, let me know and I'll do the tagging.

@github-actions
Copy link

github-actions bot commented Jun 1, 2021

🚀 📚 Preview for git commit SHA: 70caf0f at: https://60b637fabe5f6ca307683daf--pythia-foundations.netlify.app

@github-actions
Copy link

github-actions bot commented Jun 1, 2021

🚀 📚 Preview for git commit SHA: 70caf0f at: https://60b63811f776afb4094454cb--pythia-foundations.netlify.app

@mgrover1 mgrover1 requested review from brian-rose and ktyle June 1, 2021 13:41
@mgrover1
Copy link
Contributor Author

mgrover1 commented Jun 1, 2021

All ready to go @brian-rose ! I requested your and Kevin's feedback!

@brian-rose
Copy link
Member

All ready to go @brian-rose ! I requested your and Kevin's feedback!

Great, I should get to this within the next 24 hours!

@@ -0,0 +1,931 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add one sentence of explanatory text here, like:

you'll often see the nickname pd used as an abbreviation for pandas in the import statement, just like numpy is often imported as np

Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Might we consider adding this and similar explanatory content as a "tip", using the "alert alert-info" class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the alert! Can we fence the code pd, np so they render as code rather than text? I think that helps readability.

@@ -0,0 +1,931 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! I suggest telling the reader more explicitly about the dataset, just one sentence to say what it is and where it comes from.


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the description. There's an issue with the rendering of the special character in El Nino that's causing an odd line break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added in the tilde - it looks okay in the rendered notebook (El is on a different line, but it still shows up). Not really sure how to fix this one - I'd like to keep the tilde in there

@@ -0,0 +1,931 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest adding a sentence of text here to point out that pandas provides really nicely formatted table output in the notebook, which is already an improvement over inspecting a plain list or numpy array!


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes; perhaps first add a line that reads "print (df)", then add a markdown cell that mentions how the output can be nicely-formatted (aka "pretty-printed").

@@ -0,0 +1,931 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, and in the next few cells, we are violating the style suggestion in our template which discourages using code comments as part of the narrative.

I suggest using the text

  • numpy-like interval slices
  • label-based slicing
  • another way of slicing

as sub-sub-headings so the narrative stays organized.


Reply via ReviewNB

@@ -0,0 +1,931 @@
{
Copy link
Member

@brian-rose brian-rose Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First sentence: explicitly point out that the integer indexing that we just showed above is not inclusive of the final value.

Then break this section up with a new subheading.


Reply via ReviewNB

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that we break out the first sentence as a Tip.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good now!

@@ -0,0 +1,931 @@
{
Copy link
Member

@brian-rose brian-rose Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are interested in where the...


Reply via ReviewNB

@@ -0,0 +1,931 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some text here to explain what you're trying to do.


Reply via ReviewNB

@@ -0,0 +1,931 @@
{
Copy link
Member

@brian-rose brian-rose Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to remind the reader here that we don't usually recommend converting to numpy array unless you need to, because then all the label information is lost.


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest using the "Warning" alert rather than "Info" for this one -- goes along with the word "beware"!

@@ -0,0 +1,931 @@
{
Copy link
Member

@brian-rose brian-rose Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First sentence... I'm not sure what you mean. The name of the variable is the name we assigned when we created the new column. Clarify?


Reply via ReviewNB

@@ -0,0 +1,931 @@
{
Copy link
Member

@brian-rose brian-rose Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First bullet point, I suggest strengthening:

  • Pandas is a very powerful tool for working with tabular (i.e. spreadsheet-style) data

Add a third bullet point, after "multiple ways of subsetting your dataframe"...

  • Pandas allows you to refer to subsets of data by label, which generally makes code more readable and more robust


Reply via ReviewNB

core/pandas.md Outdated

From the [Pandas documentation](https://pandas.pydata.org/) "is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language."

Pandas can be a useful library when working with tabular data, which is a common data type in the geosciences. You should have basic familiarity with NumPy and Matplotlib prior to working through the Pandas notebooks presented here.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest stronger language here, something like:

"Pandas is a very powerful library for working with tabular data (i.e. anything you might put in a spreadsheet -- a common data type in the geosciences). It allows us to use labels for our data so that we can write expressive and robust code to manipulate the data."

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is still unresolved.

@brian-rose
Copy link
Member

Great stuff @mgrover1. I left a bunch of small suggestions here and in ReviewNB.

@@ -0,0 +1,931 @@
{
Copy link
Contributor

@ktyle ktyle Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should briefly define what is meant by a Pandas index (essentially, a list containing the row ID's, (by default, a sequential list of integers beginning with 0)


Reply via ReviewNB

@@ -0,0 +1,931 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a tip that the "dot notation" is a "convenience feature", that is mostly interchangeable with the dict notation, except in cases where the column name is not a valid Python object ... such as names beginning with a number, or names that contain spaces.


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@@ -0,0 +1,931 @@
{
Copy link
Contributor

@ktyle ktyle Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reword as:

and these can be extended in ways that you might expect, but perhaps also in unexpected ways:

Reply via ReviewNB

@@ -0,0 +1,931 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminate the two markdown cells that follow, and reword this one as follows:

These capabilities extend back to our original DataFrame, as well! Note there are limitations of the dict label notation: for example, neither of the two following code cells will work:

Reply via ReviewNB

@@ -0,0 +1,931 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Break up these two sentences into their own Markdown cells. The first (which might best be a Tip) should read:

For a more comprehensive explanation, which includes additional examples, limitations, and compares indexing methods between DataFrame and Series see pandas' rules for indexing.


Reply via ReviewNB

@@ -0,0 +1,931 @@
{
Copy link
Contributor

@ktyle ktyle Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataframes --> DataFrames


Reply via ReviewNB

@@ -0,0 +1,931 @@
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works --> accepts and returns


Reply via ReviewNB

@@ -0,0 +1,931 @@
{
Copy link
Contributor

@ktyle ktyle Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

series --> Series


Reply via ReviewNB

@@ -0,0 +1,931 @@
{
Copy link
Contributor

@ktyle ktyle Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reword as:

Let's apply this calculation to this Series; this returns another Series object.

Reply via ReviewNB

@@ -0,0 +1,931 @@
{
Copy link
Contributor

@ktyle ktyle Jun 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas series --> pandas Series 


Reply via ReviewNB

@ktyle
Copy link
Contributor

ktyle commented Jun 2, 2021

@mgrover1 and @brian-rose , I'm all set with my initial round of comments. Great work! This a very comprehensive, yet still concise, intro notebook. Do you have a sense as to how many additional notebooks will fill out this section?

@mgrover1
Copy link
Contributor Author

mgrover1 commented Jun 3, 2021

Thanks @brian-rose and @ktyle for the comments - just pushed the recent changes.

Do you have a sense as to how many additional notebooks will fill out this section?

I think at least one more, if not two similar to these two notebooks from the Unidata Python Workshops
Pythonic Data Analysis
Advanced Pythonic Data Analysis

@github-actions
Copy link

github-actions bot commented Jun 3, 2021

🚀 📚 Preview for git commit SHA: 64d5293 at: https://60b8e76de8c6620ef92440a3--pythia-foundations.netlify.app

@github-actions
Copy link

github-actions bot commented Jun 3, 2021

🚀 📚 Preview for git commit SHA: 64d5293 at: https://60b8e79010c0de0d5cc29c2b--pythia-foundations.netlify.app

@brian-rose
Copy link
Member

brian-rose commented Jun 3, 2021

Awesome, I put a few follow-up comments over on ReviewNB. I'm not sure how to make those appear here in this discussion, so I'll just repeat:

  • Render np and pd as code in the first alert box for the Imports section
  • Issue with the rendering of the word "El Nino" with the special character
  • Use "warning" alert rather than "info" for the last alert box about converting to numpy array with .values

My comment above on the Pandas section title page is still unresolved.

I went ahead and marked most of @ktyle 's comments as "resolved" where it was obvious that you took his suggestion and it looked all good. I left a few open in case Kevin has any more comments on them.

Nice work!

@mgrover1
Copy link
Contributor Author

mgrover1 commented Jun 4, 2021

@brian-rose just pushed the most recent changes

  • Render np and pd as code in the first alert box for the Imports section
  • Included the lines on both sides of these in the html - the warning blocks require using raw html for this
  • Issue with the rendering of the word "El Nino" with the special character
  • It looks like it still renders on the same line - ideally it would be good keep the tilde in there? In the rendered notebook "El" is just on the line above it. In the rendered notebook locally, it does not create a separate line for the tilde.
  • Use "warning" alert rather than "info" for the last alert box about converting to numpy array with .values
  • Changed this to a warning

My comment above on the Pandas section title page is still unresolved.

  • I made sure to make the changes this time

The only thing still missing is using the data directory - going to wait to talk to @andersy005 about this.

@github-actions
Copy link

github-actions bot commented Jun 4, 2021

🚀 📚 Preview for git commit SHA: 602d48e at: https://60ba302ce0e9bc3369c7d3ae--pythia-foundations.netlify.app

@github-actions
Copy link

github-actions bot commented Jun 4, 2021

🚀 📚 Preview for git commit SHA: 602d48e at: https://60ba3077493c41298cc21b63--pythia-foundations.netlify.app

@brian-rose
Copy link
Member

Just fixed my two remaining nits:

  1. We can just put the character ñ directly into the markdown and the weird formatting problem goes away
  2. Changed from "Danger" alert to "Warning" alert for that last alert back about .values

Also merged in all the latest from main just to make sure there were no conflicts.

One thing to be aware of: when we merge #53, the style of the alerts will change a bit. It's possible we'll need to tweak this notebook once more.

@github-actions
Copy link

github-actions bot commented Jun 4, 2021

🚀 📚 Preview for git commit SHA: 0fb945a at: https://60ba6194af38b99cc8f04c5f--pythia-foundations.netlify.app

@github-actions
Copy link

github-actions bot commented Jun 4, 2021

🚀 📚 Preview for git commit SHA: 0fb945a at: https://60ba61a8cb95967e02074508--pythia-foundations.netlify.app

@brian-rose
Copy link
Member

My suggestion is that we merge now with the local data file just to get the content out there.

We can open an issue about using pythia-datasets, and come back and modify this notebook in a future PR once that is up and running.

In that spirit, I'm approving this now. If you prefer to wait until the datasets are ready @mgrover1, that's fine too.

Copy link
Contributor

@ktyle ktyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I agree with @brian-rose ... good to merge!

@mgrover1 mgrover1 merged commit 4f8fa7a into main Jun 4, 2021
@andersy005 andersy005 deleted the pandas_example branch June 9, 2021 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

content Content related issue ready for review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants