@@ -41,15 +41,37 @@ So that a ``pandas.DataFrame`` can be faithfully reconstructed, we store a
4141 'pandas_version': $VERSION}
4242
4343 Here, ``<c0> ``/``<ci0> `` and so forth are dictionaries containing the metadata
44- for each column. This has JSON form:
44+ for each column, * including the index columns * . This has JSON form:
4545
4646.. code-block :: text
4747
4848 {'name': column_name,
49+ 'field_name': parquet_column_name,
4950 'pandas_type': pandas_type,
5051 'numpy_type': numpy_type,
5152 'metadata': metadata}
5253
54+ .. note ::
55+
56+ Every index column is stored with a name matching the pattern
57+ ``__index_level_\d+__ `` and its corresponding column information is can be
58+ found with the following code snippet.
59+
60+ Following this naming convention isn't strictly necessary, but strongly
61+ suggested for compatibility with Arrow.
62+
63+ Here's an example of how the index metadata is structured in pyarrow:
64+
65+ .. code-block :: python
66+
67+ # assuming there's at least 3 levels in the index
68+ index_columns = metadata[' index_columns' ]
69+ columns = metadata[' columns' ]
70+ ith_index = 2
71+ assert index_columns[ith_index] == ' __index_level_2__'
72+ ith_index_info = columns[- len (index_columns):][ith_index]
73+ ith_index_level_name = ith_index_info[' name' ]
74+
5375 ``pandas_type `` is the logical type of the column, and is one of:
5476
5577* Boolean: ``'bool' ``
@@ -100,32 +122,39 @@ As an example of fully-formed metadata:
100122 {'index_columns': ['__index_level_0__'],
101123 'column_indexes': [
102124 {'name': None,
103- 'pandas_type': 'string',
125+ 'field_name': 'None',
126+ 'pandas_type': 'unicode',
104127 'numpy_type': 'object',
105- 'metadata': None }
128+ 'metadata': {'encoding': 'UTF-8'} }
106129 ],
107130 'columns': [
108131 {'name': 'c0',
132+ 'field_name': 'c0',
109133 'pandas_type': 'int8',
110134 'numpy_type': 'int8',
111135 'metadata': None},
112136 {'name': 'c1',
137+ 'field_name': 'c1',
113138 'pandas_type': 'bytes',
114139 'numpy_type': 'object',
115140 'metadata': None},
116141 {'name': 'c2',
142+ 'field_name': 'c2',
117143 'pandas_type': 'categorical',
118144 'numpy_type': 'int16',
119145 'metadata': {'num_categories': 1000, 'ordered': False}},
120146 {'name': 'c3',
147+ 'field_name': 'c3',
121148 'pandas_type': 'datetimetz',
122149 'numpy_type': 'datetime64[ns]',
123150 'metadata': {'timezone': 'America/Los_Angeles'}},
124151 {'name': 'c4',
152+ 'field_name': 'c4',
125153 'pandas_type': 'object',
126154 'numpy_type': 'object',
127155 'metadata': {'encoding': 'pickle'}},
128- {'name': '__index_level_0__',
156+ {'name': None,
157+ 'field_name': '__index_level_0__',
129158 'pandas_type': 'int64',
130159 'numpy_type': 'int64',
131160 'metadata': None}
0 commit comments