Public Attributes | List of all members
cudf::io::parquet::ColumnChunkMetaData Struct Reference

Thrift-derived struct describing a column chunk. More...

#include <parquet_schema.hpp>

Public Attributes

Type type = Type::BOOLEAN
 Type of this column.
 
std::vector< Encodingencodings
 
std::vector< std::string > path_in_schema
 Path in schema.
 
Compression codec = Compression::UNCOMPRESSED
 Compression codec.
 
int64_t num_values = 0
 Number of values in this column.
 
int64_t total_uncompressed_size = 0
 Total byte size of all uncompressed pages in this column chunk (including the headers)
 
int64_t total_compressed_size = 0
 Total byte size of all compressed pages in this column chunk (including the headers)
 
int64_t data_page_offset = 0
 Byte offset from beginning of file to first data page.
 
int64_t index_page_offset = 0
 Byte offset from beginning of file to root index page.
 
int64_t dictionary_page_offset = 0
 Byte offset from the beginning of file to first (only) dictionary page.
 
Statistics statistics
 Optional statistics for this column chunk.
 
std::optional< std::vector< PageEncodingStats > > encoding_stats
 
std::optional< int64_t > bloom_filter_offset
 Byte offset from beginning of file to Bloom filter data.
 
std::optional< int32_t > bloom_filter_length
 
std::optional< SizeStatisticssize_statistics
 

Detailed Description

Thrift-derived struct describing a column chunk.

Definition at line 668 of file parquet_schema.hpp.

Member Data Documentation

◆ bloom_filter_length

std::optional<int32_t> cudf::io::parquet::ColumnChunkMetaData::bloom_filter_length

Size of Bloom filter data including the serialized header, in bytes. Added in 2.10 so readers may not read this field from old files and it can be obtained after the BloomFilterHeader has been deserialized. Writers should write this field so readers can read the bloom filter in a single I/O.

Definition at line 701 of file parquet_schema.hpp.

◆ encoding_stats

std::optional<std::vector<PageEncodingStats> > cudf::io::parquet::ColumnChunkMetaData::encoding_stats

Set of all encodings used for pages in this column chunk. This information can be used to determine if all data pages are dictionary encoded for example.

Definition at line 694 of file parquet_schema.hpp.

◆ encodings

std::vector<Encoding> cudf::io::parquet::ColumnChunkMetaData::encodings

Set of all encodings used for this column. The purpose is to validate whether we can decode those pages.

Definition at line 673 of file parquet_schema.hpp.

◆ size_statistics

std::optional<SizeStatistics> cudf::io::parquet::ColumnChunkMetaData::size_statistics

Optional statistics to help estimate total memory when converted to in-memory representations. The histograms contained in these statistics can also be useful in some cases for more fine-grained nullability/list length filter pushdown.

Definition at line 705 of file parquet_schema.hpp.


The documentation for this struct was generated from the following file: