Reading Parquet Key/Value Metadata
23 Ansichten (letzte 30 Tage)
Ältere Kommentare anzeigen
Is there a recommended method for reading metadata key/value pairs in the footer for a parquet file?
I'm using MATLAB to extract JSON metadata added from python's pyarrow. The footer layout is documented here https://github.com/apache/parquet-format#file-format. I have the following code which works, but would prefer a more robust solution in the future.
function metadata = parquetmeta(filename)
fid = fopen(filename);
fseek(fid, -8, 'eof');
footer_bytes = fread(fid, 1, 'uint32', 'ieee-le');
fseek(fid, -footer_bytes, 'eof');
footer = fread(fid, [1, footer_bytes], '*char');
fclose(fid);
start_idx = find(footer == '{', 1, 'first');
end_idx = find(footer == '}', 1, 'last');
metadata = struct;
if ~isempty(start_idx) && ~isempty(end_idx)
try %#ok<TRYNC>
metadata = jsondecode(footer(start_idx, end_idx));
end
end
end
I'm hoping MathWorks will add support for reading Parquet file metadata in a future release. Interestingly, it appears this use to be included as a feature before MATLAB 2019, but cannot verify on my version of MATLAB.
jobj = com.mathworks.bigdata.parquet.Reader;
md = jobj.getParquetFileReader.getFileMetaData.getKeyValueMetaData;
Thanks!
0 Kommentare
Antworten (0)
Siehe auch
Kategorien
Mehr zu Logical finden Sie in Help Center und File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!