parquetinfo

Get information about Parquet file

Description

The ParquetInfo object contains information about a Parquet file such as file size, variable names, variable types, and compression schemes. To get information about a Parquet file, create the ParquetInfo object using the parquetinfo function.

Creation

Syntax

info = parquetinfo(filename)

Description

example

info = parquetinfo(filename) returns a info object for the Parquet file specified by filename.

Input Arguments

expand all

Name of Parquet file, specified as a character vector or string scalar. ParquetInfo works with Parquet 1.0 or Parquet 2.0 files.

Depending on the location of the file, filename can take on one of these forms.

Location

Form

Current folder or folder on the MATLAB® path

Specify the name of the file in filename.

Example: 'data.parquet'

File in a folder

If the file is not in the current folder or in a folder on the MATLAB path, then specify the full or relative path name.

Example: 'C:\myFolder\data.parquet'

Example: 'myDir\myFile.ext'

Remote Location

If the file is stored at a remote location, then filename must contain the full path of the file specified as an internationalized resource identifier (IRI) of the form:

scheme_name://path_to_file/my_file.ext

Based on your remote location, scheme_name can be one of the values in this table.

Remote Locationscheme_name
Amazon S3™s3
Windows Azure® Blob Storagewasb, wasbs
HDFS™hdfs

For more information, see Work with Remote Data.

Example: 's3://bucketname/path_to_file/data.parquet'

Data Types: char | string

Properties

expand all

This property is read-only.

Absolute path to Parquet file, specified as a string scalar.

Data Types: string

This property is read-only.

File size in bytes, specified as double.

Data Types: double

This property is read-only.

Number of row groups, specified as a double.

Data Types: double

This property is read-only.

Number of rows in each row group, specified as a double.

Data Types: double

This property is read-only.

Variable names, specified as a string array. If the Parquet file contains N variables, then VariableNames is an array of size 1-by-N containing the names of the variables.

Data Types: string

This property is read-only.

Variable data types, specified as a string array. If the Parquet file contains N variables, then VariableTypes is an array of size 1-by-N containing datatype names for each variable.

Each element in the array is the name of the MATLAB datatype to which the corresponding variable in the Parquet file maps.

Data Types: string

This property is read-only.

Variable compression algorithm, specified as a string array. If the Parquet file contains N variables, then VariableCompression is an array of size 1-by-N containing compression algorithm names.

Each element in the array corresponds to the compression algorithm used to compress that variable in the Parquet file.

Examples

collapse all

Use the praquetinfo function to create a ParquetInfo object containing information about the file.

info = parquetinfo('outages.parquet')
info = 
  ParquetInfo with properties:

               Filename: "/mathworks/devel/bat/Bdoc19a/build/matlab/toolbox/matlab/demos/outages.parquet"
               FileSize: 44202
           NumRowGroups: 1
        RowGroupHeights: 1468
          VariableNames: [1x6 string]
          VariableTypes: [1x6 string]
    VariableCompression: [1x6 string]

Display the name, type, and compression scheme for the third variable in the file.

disp([info.VariableNames(3)  info.VariableTypes(3) info.VariableCompression(3)]) 
    "Loss"    "double"    "snappy"

Introduced in R2019a