Apache Parquet Data Type Mappings
MATLAB® represents column-oriented data with tables and timetables. Each variable in a table or timetable can have a different data type and any number of columns. Column vectors are the most common shape of table and timetable variables.
The Apache® Parquet file format is used for column-oriented heterogeneous data. Similar to MATLAB tables and timetables, each of the columns in a Parquet file can have different data types. The MATLAB Parquet functions use Apache Arrow functionality to read and write Parquet files. MATLAB stores the original Arrow table schema in the Parquet file as custom metadata. Arrow uses the original table schema to roundtrip certain data types.
Despite their similarity, the permitted data types in MATLAB tables and timetables sometimes do not map exactly to the permitted data types in Parquet files. In some cases, it is necessary for MATLAB to perform data type conversions to retain information in the data (such as missing values). This conversion can sometimes result in a loss of precision in the data.
In general, MATLAB tables and timetables have these behaviors when they are converted to Parquet files:
Table properties set on the original table are not saved.
Table row names or timetable row times are converted into a new table variable before being written.
When reading a variable name from a Parquet file, invalid table variable names are converted to valid table variable names.
Parquet files use a small number of primitive (or physical) data types. The logical types extend the physical types by specifying how they should be interpreted. Parquet data types not covered here are not supported for reading from or writing to Parquet files (JSON, BSON, binary, and so on).
The following tables summarize the representable data types in MATLAB tables and timetables, as well as how they map to corresponding types in Apache Arrow and Parquet files.
Numeric Data Types
Reading Numeric Data Types from Apache Parquet to MATLAB
Apache Parquet Data Type | Apache Arrow Data Type | MATLAB Table or Timetable Variable Type | Notes | |
---|---|---|---|---|
Logical Type | Physical Type | |||
None |
|
|
|
|
None |
|
|
|
|
|
|
|
|
|
|
|
|
| |
|
|
|
| |
|
|
|
| |
None |
|
|
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
None |
|
|
|
|
Writing Numeric Data Types from MATLAB to Apache Parquet
MATLAB Table or Timetable Variable Type | Apache Arrow Data Type | Apache Parquet Data Type | Notes | |
---|---|---|---|---|
Logical Type | Physical Type | |||
double |
| None |
|
|
|
| None |
| |
|
|
|
|
|
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
| None |
| |
|
|
|
| |
|
|
|
| |
|
|
|
| |
|
| None |
|
|
Binary Data Types
Reading Binary Data Types from Apache Parquet to MATLAB
Apache Parquet Data Type | Apache Arrow Data Type | MATLAB Table or Timetable Variable Type | Notes | |
---|---|---|---|---|
Logical Type | Physical Type | |||
|
|
|
| — |
|
|
|
|
|
None |
| FixedSizeBinary(byte_width) | cell of | — |
None |
| Binary | cell of | — |
Writing Binary Data Types from MATLAB to Apache Parquet
MATLAB Table or Timetable Variable Type | Apache Arrow Data Type | Apache Parquet Data Type | Notes | |
---|---|---|---|---|
Logical Type | Physical Type | |||
|
|
|
|
|
|
|
|
| |
|
|
|
| |
|
|
|
| — |
Date and Time Data Types
Reading Date and Time Data Types from Apache Parquet to MATLAB
Apache Parquet Data Type | Apache Arrow Data Type | MATLAB Table or Timetable Variable Type | Notes | |
---|---|---|---|---|
Logical Type | Physical Type | |||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| INT64 |
|
|
|
Writing Date and Time Data Types from MATLAB to Apache Parquet
MATLAB Table or Timetable Variable Type | Apache Arrow Data Type | Apache Parquet Data Type | Notes | |
---|---|---|---|---|
Logical Type | Physical Type | |||
|
|
|
|
|
|
|
|
|
|
Nested Data
To write nested tables and nested timetables to Parquet files, use
parquetwrite
. To import nested structured Parquet file data, use
parquetread
.
Reading Nested Types from Apache Parquet to MATLAB
Apache Parquet Data Type | Apache Arrow Data Type | MATLAB Table or Timetable Variable Type | Notes | |
---|---|---|---|---|
Logical Type | Physical Type | |||
| Any |
| cell |
|
LIST with n-tuple organization | Any |
| nested table |
|
| Any |
| cell array of tables |
|
Writing Nested Types from MATLAB to Parquet
MATLAB Table or Timetable Variable Type | Apache Arrow Data Type | Apache Parquet Data Type | Notes | |
---|---|---|---|---|
Logical Type | Physical Type | |||
cell |
|
| Any |
|
nested table |
|
| Any |
|
nested timetable |
|
| Any |
|