Does matlab support parquet partitions

4 Ansichten (letzte 30 Tage)
Jerry Duggan
Jerry Duggan am 30 Dez. 2022
Beantwortet: Sudarshan am 2 Jan. 2023
I have a large data set written using parquet partitioning. The partition variable is called 'mdRun', and I have 10 parquet files created in 10 directories as follows:
.../events/mdRun=0/events-0.parquet
../events/mdRun=1/events-0.parquet
and so on. I created these files using pyarrow Hive partitioning.
Using pyarrow, I can read the parquet file corresponding to a single partition using the filter argument, which will read only the parquet file stored in the appropriate directory. As a nice side effect, the mdRun column is not stored in the parquet file, but it is automatically included when I read a partition file(s).
Is it possible to read a parquet partitioned dataset in matlab in the same way?
Thank you!

Antworten (1)

Sudarshan
Sudarshan am 2 Jan. 2023
Hi Jerry,
As per my knowledge, the feature is not supported by MATLAB in R2022b. This request has already been forwarded to the relevant team.
However, MATLAB R2022b does support parquet file reading and writing. I have attached a few documentation links that may help you in working with parquet functions.
You can refer to the link below for various functions that could be useful in your case:
You can refer to link below for the detailed documentation of the data type mappings:
To help you read parquet files, you can refer the link below:
I hope that this helps!

Produkte


Version

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by