Thrift-derived struct describing a column chunk. More...

#include <parquet_schema.hpp>

Public Attributes
Type	type = Type::BOOLEAN
	Type of this column.

std::vector< Encoding >	encodings

std::vector< std::string >	path_in_schema
	Path in schema.

Compression	codec = Compression::UNCOMPRESSED
	Compression codec.

int64_t	num_values = 0
	Number of values in this column.

int64_t	total_uncompressed_size = 0
	Total byte size of all uncompressed pages in this column chunk (including the headers)

int64_t	total_compressed_size = 0
	Total byte size of all compressed pages in this column chunk (including the headers)

int64_t	data_page_offset = 0
	Byte offset from beginning of file to first data page.

int64_t	index_page_offset = 0
	Byte offset from beginning of file to root index page.

int64_t	dictionary_page_offset = 0
	Byte offset from the beginning of file to first (only) dictionary page.

Statistics	statistics
	Optional statistics for this column chunk.

std::optional< std::vector< PageEncodingStats > >	encoding_stats

std::optional< int64_t >	bloom_filter_offset
	Byte offset from beginning of file to Bloom filter data.

std::optional< int32_t >	bloom_filter_length

std::optional< SizeStatistics >	size_statistics

Detailed Description

Thrift-derived struct describing a column chunk.

Definition at line 668 of file parquet_schema.hpp.

Member Data Documentation

◆ bloom_filter_length

std::optional<int32_t> cudf::io::parquet::ColumnChunkMetaData::bloom_filter_length

Size of Bloom filter data including the serialized header, in bytes. Added in 2.10 so readers may not read this field from old files and it can be obtained after the BloomFilterHeader has been deserialized. Writers should write this field so readers can read the bloom filter in a single I/O.

Definition at line 701 of file parquet_schema.hpp.

◆ encoding_stats

std::optional<std::vector<PageEncodingStats> > cudf::io::parquet::ColumnChunkMetaData::encoding_stats

Set of all encodings used for pages in this column chunk. This information can be used to determine if all data pages are dictionary encoded for example.

Definition at line 694 of file parquet_schema.hpp.

◆ encodings

std::vector<Encoding> cudf::io::parquet::ColumnChunkMetaData::encodings

Set of all encodings used for this column. The purpose is to validate whether we can decode those pages.

Definition at line 673 of file parquet_schema.hpp.

◆ size_statistics

std::optional<SizeStatistics> cudf::io::parquet::ColumnChunkMetaData::size_statistics

Optional statistics to help estimate total memory when converted to in-memory representations. The histograms contained in these statistics can also be useful in some cases for more fine-grained nullability/list length filter pushdown.

Definition at line 705 of file parquet_schema.hpp.

The documentation for this struct was generated from the following file: