-
Notifications
You must be signed in to change notification settings - Fork 57
Add pandas example based on XDev workshop #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
🚀 📚 Preview for git commit SHA: e4b5c0d at: https://60b00738729f6c2f9a806f5e--pythia-foundations.netlify.app |
🚀 📚 Preview for git commit SHA: e4b5c0d at: https://60b0078c48b35b21e4d31a3a--pythia-foundations.netlify.app |
Replaces (and is based off of) #34 |
🚀 📚 Preview for git commit SHA: 6dd844e at: https://60b00a47a9a0e7368b02f2ba--pythia-foundations.netlify.app |
🚀 📚 Preview for git commit SHA: 6dd844e at: https://60b00a5f33317129bb15f36e--pythia-foundations.netlify.app |
@mgrover1 is this ready for review? |
I need to update the Pandas markdown file, but after that it will be. |
Ok! When you're ready, feel free to tag one or two individuals for a thorough review. Or if you prefer, let me know and I'll do the tagging. |
🚀 📚 Preview for git commit SHA: 70caf0f at: https://60b637fabe5f6ca307683daf--pythia-foundations.netlify.app |
🚀 📚 Preview for git commit SHA: 70caf0f at: https://60b63811f776afb4094454cb--pythia-foundations.netlify.app |
All ready to go @brian-rose ! I requested your and Kevin's feedback! |
Great, I should get to this within the next 24 hours! |
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add one sentence of explanatory text here, like:
you'll often see the nicknamepd
used as an abbreviation forpandas
in the import statement, just likenumpy
is often imported asnp
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Might we consider adding this and similar explanatory content as a "tip", using the "alert alert-info" class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the alert! Can we fence the code pd
, np
so they render as code rather than text? I think that helps readability.
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I suggest telling the reader more explicitly about the dataset, just one sentence to say what it is and where it comes from.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the description. There's an issue with the rendering of the special character in El Nino that's causing an odd line break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added in the tilde - it looks okay in the rendered notebook (El is on a different line, but it still shows up). Not really sure how to fix this one - I'd like to keep the tilde in there
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest adding a sentence of text here to point out that pandas provides really nicely formatted table output in the notebook, which is already an improvement over inspecting a plain list or numpy array!
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; perhaps first add a line that reads "print (df)", then add a markdown cell that mentions how the output can be nicely-formatted (aka "pretty-printed").
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, and in the next few cells, we are violating the style suggestion in our template which discourages using code comments as part of the narrative.
I suggest using the text
- numpy-like interval slices
- label-based slicing
- another way of slicing
as sub-sub-headings so the narrative stays organized.
Reply via ReviewNB
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First sentence: explicitly point out that the integer indexing that we just showed above is not inclusive of the final value.
Then break this section up with a new subheading.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest that we break out the first sentence as a Tip.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good now!
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helpful to remind the reader here that we don't usually recommend converting to numpy array unless you need to, because then all the label information is lost.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest using the "Warning" alert rather than "Info" for this one -- goes along with the word "beware"!
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First sentence... I'm not sure what you mean. The name of the variable is the name we assigned when we created the new column. Clarify?
Reply via ReviewNB
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First bullet point, I suggest strengthening:
- Pandas is a very powerful tool for working with tabular (i.e. spreadsheet-style) data
Add a third bullet point, after "multiple ways of subsetting your dataframe"...
- Pandas allows you to refer to subsets of data by label, which generally makes code more readable and more robust
Reply via ReviewNB
core/pandas.md
Outdated
|
||
From the [Pandas documentation](https://pandas.pydata.org/) "is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language." | ||
|
||
Pandas can be a useful library when working with tabular data, which is a common data type in the geosciences. You should have basic familiarity with NumPy and Matplotlib prior to working through the Pandas notebooks presented here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest stronger language here, something like:
"Pandas is a very powerful library for working with tabular data (i.e. anything you might put in a spreadsheet -- a common data type in the geosciences). It allows us to use labels for our data so that we can write expressive and robust code to manipulate the data."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is still unresolved.
Great stuff @mgrover1. I left a bunch of small suggestions here and in ReviewNB. |
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should briefly define what is meant by a Pandas index (essentially, a list containing the row ID's, (by default, a sequential list of integers beginning with 0)
Reply via ReviewNB
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a tip that the "dot notation" is a "convenience feature", that is mostly interchangeable with the dict
notation, except in cases where the column name is not a valid Python object ... such as names beginning with a number, or names that contain spaces.
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reword as:
and these can be extended in ways that you might expect, but perhaps also in unexpected ways:
Reply via ReviewNB
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eliminate the two markdown cells that follow, and reword this one as follows:
These capabilities extend back to our originalDataFrame
, as well! Note there are limitations of thedict
label notation: for example, neither of the two following code cells will work:
Reply via ReviewNB
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Break up these two sentences into their own Markdown cells. The first (which might best be a Tip) should read:
For a more comprehensive explanation, which includes additional examples, limitations, and compares indexing methods betweenDataFrame
andSeries
see pandas' rules for indexing.
Reply via ReviewNB
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reword as:
Let's apply this calculation to thisSeries
; this returns anotherSeries
object.
Reply via ReviewNB
@@ -0,0 +1,931 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgrover1 and @brian-rose , I'm all set with my initial round of comments. Great work! This a very comprehensive, yet still concise, intro notebook. Do you have a sense as to how many additional notebooks will fill out this section? |
Thanks @brian-rose and @ktyle for the comments - just pushed the recent changes.
I think at least one more, if not two similar to these two notebooks from the Unidata Python Workshops |
🚀 📚 Preview for git commit SHA: 64d5293 at: https://60b8e76de8c6620ef92440a3--pythia-foundations.netlify.app |
🚀 📚 Preview for git commit SHA: 64d5293 at: https://60b8e79010c0de0d5cc29c2b--pythia-foundations.netlify.app |
Awesome, I put a few follow-up comments over on ReviewNB. I'm not sure how to make those appear here in this discussion, so I'll just repeat:
My comment above on the Pandas section title page is still unresolved. I went ahead and marked most of @ktyle 's comments as "resolved" where it was obvious that you took his suggestion and it looked all good. I left a few open in case Kevin has any more comments on them. Nice work! |
@brian-rose just pushed the most recent changes
|
🚀 📚 Preview for git commit SHA: 602d48e at: https://60ba302ce0e9bc3369c7d3ae--pythia-foundations.netlify.app |
🚀 📚 Preview for git commit SHA: 602d48e at: https://60ba3077493c41298cc21b63--pythia-foundations.netlify.app |
Just fixed my two remaining nits:
Also merged in all the latest from main just to make sure there were no conflicts. One thing to be aware of: when we merge #53, the style of the alerts will change a bit. It's possible we'll need to tweak this notebook once more. |
🚀 📚 Preview for git commit SHA: 0fb945a at: https://60ba6194af38b99cc8f04c5f--pythia-foundations.netlify.app |
🚀 📚 Preview for git commit SHA: 0fb945a at: https://60ba61a8cb95967e02074508--pythia-foundations.netlify.app |
My suggestion is that we merge now with the local data file just to get the content out there. We can open an issue about using pythia-datasets, and come back and modify this notebook in a future PR once that is up and running. In that spirit, I'm approving this now. If you prefer to wait until the datasets are ready @mgrover1, that's fine too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, I agree with @brian-rose ... good to merge!
Updated pandas notebook to include suggested templating formats.