@@ -31,6 +31,79 @@ For transfer of ``DataFrame`` objects from ``pandas`` to R, one option is to
3131use HDF5 files, see :ref: `io.external_compatibility ` for an
3232example.
3333
34+
35+ Quick Reference
36+ ---------------
37+
38+ We'll start off with a quick reference guide pairing some common R
39+ operations using `dplyr
40+ <http://cran.r-project.org/web/packages/dplyr/index.html> `__ with
41+ pandas equivalents.
42+
43+
44+ Querying, Filtering, Sampling
45+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
46+
47+ =========================================== ===========================================
48+ R pandas
49+ =========================================== ===========================================
50+ ``dim(df) `` ``df.shape ``
51+ ``head(df) `` ``df.head() ``
52+ ``slice(df, 1:10) `` ``df.iloc[:9] ``
53+ ``filter(df, col1 == 1, col2 == 1) `` ``df.query('col1 == 1 & col2 == 1') ``
54+ ``df[df$col1 == 1 & df$col2 == 1,] `` ``df[(df.col1 == 1) & (df.col2 == 1)] ``
55+ ``select(df, col1, col2) `` ``df[['col1', 'col2']] ``
56+ ``select(df, col1:col3) `` ``df.loc[:, 'col1':'col3'] ``
57+ ``select(df, -(col1:col3)) `` ``df.drop(cols_to_drop, axis=1) `` but see [#select_range ]_
58+ ``distinct(select(df, col1)) `` ``df[['col1']].drop_duplicates() ``
59+ ``distinct(select(df, col1, col2)) `` ``df[['col1', 'col2']].drop_duplicates() ``
60+ ``sample_n(df, 10) `` ``df.sample(n=10) ``
61+ ``sample_frac(df, 0.01) `` ``df.sample(frac=0.01) ``
62+ =========================================== ===========================================
63+
64+ .. [#select_range ] R's shorthand for a subrange of columns
65+ (``select(df, col1:col3) ``) can be approached
66+ cleanly in pandas, if you have the list of columns,
67+ for example ``df[cols[1:3]] `` or
68+ ``df.drop(cols[1:3]) ``, but doing this by column
69+ name is a bit messy.
70+
71+
72+ Sorting
73+ ~~~~~~~
74+
75+ =========================================== ===========================================
76+ R pandas
77+ =========================================== ===========================================
78+ ``arrange(df, col1, col2) `` ``df.sort_values(['col1', 'col2']) ``
79+ ``arrange(df, desc(col1)) `` ``df.sort_values('col1', ascending=False) ``
80+ =========================================== ===========================================
81+
82+ Transforming
83+ ~~~~~~~~~~~~
84+
85+ =========================================== ===========================================
86+ R pandas
87+ =========================================== ===========================================
88+ ``select(df, col_one = col1) `` ``df.rename(columns={'col1': 'col_one'})['col_one'] ``
89+ ``rename(df, col_one = col1) `` ``df.rename(columns={'col1': 'col_one'}) ``
90+ ``mutate(df, c=a-b) `` ``df.assign(c=df.a-df.b) ``
91+ =========================================== ===========================================
92+
93+
94+ Grouping and Summarizing
95+ ~~~~~~~~~~~~~~~~~~~~~~~~
96+
97+ ============================================== ===========================================
98+ R pandas
99+ ============================================== ===========================================
100+ ``summary(df) `` ``df.describe() ``
101+ ``gdf <- group_by(df, col1) `` ``gdf = df.groupby('col1') ``
102+ ``summarise(gdf, avg=mean(col1, na.rm=TRUE)) `` ``df.groupby('col1').agg({'col1': 'mean'}) ``
103+ ``summarise(gdf, total=sum(col1)) `` ``df.groupby('col1').sum() ``
104+ ============================================== ===========================================
105+
106+
34107Base R
35108------
36109
0 commit comments