@@ -46,9 +46,14 @@ The categorical data type is useful in the following cases:
4646
4747See also the :ref: `API docs on categoricals<api.categorical> `.
4848
49+ .. _categorical.objectcreation :
50+
4951Object Creation
5052---------------
5153
54+ Series Creation
55+ ~~~~~~~~~~~~~~~
56+
5257Categorical ``Series `` or columns in a ``DataFrame `` can be created in several ways:
5358
5459By specifying ``dtype="category" `` when constructing a ``Series ``:
@@ -77,7 +82,7 @@ discrete bins. See the :ref:`example on tiling <reshaping.tile.cut>` in the docs
7782 df[' group' ] = pd.cut(df.value, range (0 , 105 , 10 ), right = False , labels = labels)
7883 df.head(10 )
7984
80- By passing a :class: `pandas.Categorical ` object to a `Series ` or assigning it to a `DataFrame `.
85+ By passing a :class: `pandas.Categorical ` object to a `` Series `` or assigning it to a `` DataFrame ` `.
8186
8287.. ipython :: python
8388
@@ -89,6 +94,55 @@ By passing a :class:`pandas.Categorical` object to a `Series` or assigning it to
8994 df[" B" ] = raw_cat
9095 df
9196
97+ Categorical data has a specific ``category `` :ref: `dtype <basics.dtypes >`:
98+
99+ .. ipython :: python
100+
101+ df.dtypes
102+
103+ DataFrame Creation
104+ ~~~~~~~~~~~~~~~~~~
105+
106+ Similar to the previous section where a single column was converted to categorical, all columns in a
107+ ``DataFrame `` can be batch converted to categorical either during or after construction.
108+
109+ This can be done during construction by specifying ``dtype="category" `` in the ``DataFrame `` constructor:
110+
111+ .. ipython :: python
112+
113+ df = pd.DataFrame({' A' : list (' abca' ), ' B' : list (' bccd' )}, dtype = " category" )
114+ df.dtypes
115+
116+ Note that the categories present in each column differ; the conversion is done column by column, so
117+ only labels present in a given column are categories:
118+
119+ .. ipython :: python
120+
121+ df[' A' ]
122+ df[' B' ]
123+
124+
125+ .. versionadded :: 0.23.0
126+
127+ Analogously, all columns in an existing ``DataFrame `` can be batch converted using :meth: `DataFrame.astype `:
128+
129+ .. ipython :: python
130+
131+ df = pd.DataFrame({' A' : list (' abca' ), ' B' : list (' bccd' )})
132+ df_cat = df.astype(' category' )
133+ df_cat.dtypes
134+
135+ This conversion is likewise done column by column:
136+
137+ .. ipython :: python
138+
139+ df_cat[' A' ]
140+ df_cat[' B' ]
141+
142+
143+ Controlling Behavior
144+ ~~~~~~~~~~~~~~~~~~~~
145+
92146In the examples above where we passed ``dtype='category' ``, we used the default
93147behavior:
94148
@@ -108,21 +162,36 @@ of :class:`~pandas.api.types.CategoricalDtype`.
108162 s_cat = s.astype(cat_type)
109163 s_cat
110164
111- Categorical data has a specific ``category `` :ref: `dtype <basics.dtypes >`:
165+ Similarly, a ``CategoricalDtype `` can be used with a ``DataFrame `` to ensure that categories
166+ are consistent among all columns.
112167
113168.. ipython :: python
114169
115- df.dtypes
170+ df = pd.DataFrame({' A' : list (' abca' ), ' B' : list (' bccd' )})
171+ cat_type = CategoricalDtype(categories = list (' abcd' ),
172+ ordered = True )
173+ df_cat = df.astype(cat_type)
174+ df_cat[' A' ]
175+ df_cat[' B' ]
116176
117177 .. note ::
118178
119- In contrast to R's `factor ` function, categorical data is not converting input values to
120- strings and categories will end up the same data type as the original values.
179+ To perform table-wise conversion, where all labels in the entire ``DataFrame `` are used as
180+ categories for each column, the ``categories `` parameter can be determined programatically by
181+ ``categories = pd.unique(df.values.ravel()) ``.
121182
122- .. note ::
183+ If you already have ``codes `` and ``categories ``, you can use the
184+ :func: `~pandas.Categorical.from_codes ` constructor to save the factorize step
185+ during normal constructor mode:
123186
124- In contrast to R's `factor ` function, there is currently no way to assign/change labels at
125- creation time. Use `categories ` to change the categories after creation time.
187+ .. ipython :: python
188+
189+ splitter = np.random.choice([0 ,1 ], 5 , p = [0.5 ,0.5 ])
190+ s = pd.Series(pd.Categorical.from_codes(splitter, categories = [" train" , " test" ]))
191+
192+
193+ Regaining Original Data
194+ ~~~~~~~~~~~~~~~~~~~~~~~
126195
127196To get back to the original ``Series `` or NumPy array, use
128197``Series.astype(original_dtype) `` or ``np.asarray(categorical) ``:
@@ -136,14 +205,15 @@ To get back to the original ``Series`` or NumPy array, use
136205 s2.astype(str )
137206 np.asarray(s2)
138207
139- If you already have `codes ` and `categories `, you can use the
140- :func: `~pandas.Categorical.from_codes ` constructor to save the factorize step
141- during normal constructor mode:
208+ .. note ::
142209
143- .. ipython :: python
210+ In contrast to R's `factor ` function, categorical data is not converting input values to
211+ strings; categories will end up the same data type as the original values.
144212
145- splitter = np.random.choice([0 ,1 ], 5 , p = [0.5 ,0.5 ])
146- s = pd.Series(pd.Categorical.from_codes(splitter, categories = [" train" , " test" ]))
213+ .. note ::
214+
215+ In contrast to R's `factor ` function, there is currently no way to assign/change labels at
216+ creation time. Use `categories ` to change the categories after creation time.
147217
148218.. _categorical.categoricaldtype :
149219
0 commit comments