Thrift-derived struct describing a column chunk. More...
#include <parquet_schema.hpp>
Public Attributes | |
Type | type = Type::BOOLEAN |
Type of this column. | |
std::vector< Encoding > | encodings |
std::vector< std::string > | path_in_schema |
Path in schema. | |
Compression | codec = Compression::UNCOMPRESSED |
Compression codec. | |
int64_t | num_values = 0 |
Number of values in this column. | |
int64_t | total_uncompressed_size = 0 |
Total byte size of all uncompressed pages in this column chunk (including the headers) | |
int64_t | total_compressed_size = 0 |
Total byte size of all compressed pages in this column chunk (including the headers) | |
int64_t | data_page_offset = 0 |
Byte offset from beginning of file to first data page. | |
int64_t | index_page_offset = 0 |
Byte offset from beginning of file to root index page. | |
int64_t | dictionary_page_offset = 0 |
Byte offset from the beginning of file to first (only) dictionary page. | |
Statistics | statistics |
Optional statistics for this column chunk. | |
std::optional< std::vector< PageEncodingStats > > | encoding_stats |
std::optional< int64_t > | bloom_filter_offset |
Byte offset from beginning of file to Bloom filter data. | |
std::optional< int32_t > | bloom_filter_length |
std::optional< SizeStatistics > | size_statistics |
Thrift-derived struct describing a column chunk.
Definition at line 668 of file parquet_schema.hpp.
std::optional<int32_t> cudf::io::parquet::ColumnChunkMetaData::bloom_filter_length |
Size of Bloom filter data including the serialized header, in bytes. Added in 2.10 so readers may not read this field from old files and it can be obtained after the BloomFilterHeader has been deserialized. Writers should write this field so readers can read the bloom filter in a single I/O.
Definition at line 701 of file parquet_schema.hpp.
std::optional<std::vector<PageEncodingStats> > cudf::io::parquet::ColumnChunkMetaData::encoding_stats |
Set of all encodings used for pages in this column chunk. This information can be used to determine if all data pages are dictionary encoded for example.
Definition at line 694 of file parquet_schema.hpp.
std::vector<Encoding> cudf::io::parquet::ColumnChunkMetaData::encodings |
Set of all encodings used for this column. The purpose is to validate whether we can decode those pages.
Definition at line 673 of file parquet_schema.hpp.
std::optional<SizeStatistics> cudf::io::parquet::ColumnChunkMetaData::size_statistics |
Optional statistics to help estimate total memory when converted to in-memory representations. The histograms contained in these statistics can also be useful in some cases for more fine-grained nullability/list length filter pushdown.
Definition at line 705 of file parquet_schema.hpp.