DATA MODEL
DATA MODEL
Documentation on the USE Core Data Model.
Tables, Views and Schemas would be detailed in the following sections. If you find any information missing, please feel free to speak in the Qbeast Users Slack Channel.
TABLES
These are the Descriptions and Schemas for each of the Tables.
CatalogsSummary
Table Name
<use_catalog>.system_lake_tables.catalogs_summary
Description
Schema
root
|-- catalog_name: string
|-- default_namespace: array
| |-- element: string
|-- namespaces: array
| |-- element: array
| | |-- element: string
|-- tables: array
| |-- element: struct
| | |-- catalog_name: string
| | |-- namespace_name: array
| | | |-- element: string
| | |-- name: string
|-- views: array
| |-- element: struct
| | |-- catalog_name: string
| | |-- namespace_name: array
| | | |-- element: string
| | |-- name: string
|-- functions: array
| |-- element: struct
| | |-- catalog_name: string
| | |-- namespace_name: array
| | | |-- element: string
| | |-- name: string
|-- table_catalog_capabilities: array
| |-- element: string
|-- description: string
|-- catalog_type: stringNamespacesSummary
Table Name
<use_catalog>.system_lake_tables.namespaces_summary
Description
Schema
root
|-- namespace_name: array
| |-- element: string
|-- catalog_name: string
|-- description: string
|-- location: string
|-- owner: string
|-- is_default: boolean
|-- tables: array
| |-- element: struct
| | |-- catalog_name: string
| | |-- namespace_name: array
| | | |-- element: string
| | |-- name: stringTablesSummary
Table Name
<use_catalog>.system_lake_tables.tables_summary
Description
Schema
root
|-- catalog_name: string
|-- namespace_name: array
| |-- element: string
|-- name: string
|-- created_time: string
|-- last_access: string
|-- provider: string
|-- table_type: string
|-- created_by: string
|-- location: string MetastoreSummary
Table Name
<use_catalog>.system_lake_tables.metastore_summary
Description
Schema
root
|-- catalog_name: string
|-- catalog_default_namespace: array
| |-- element: string
|-- catalog_namespaces: array
| |-- element: array
| | |-- element: string
|-- catalog_tables: array
| |-- element: struct
| | |-- catalog_name: string
| | |-- namespace_name: array
| | | |-- element: string
| | |-- name: string
|-- catalog_views: array
| |-- element: struct
| | |-- catalog_name: string
| | |-- namespace_name: array
| | | |-- element: string
| | |-- name: string
|-- catalog_functions: array
| |-- element: struct
| | |-- catalog_name: string
| | |-- namespace_name: array
| | | |-- element: string
| | |-- name: string
|-- table_catalog_capabilities: array
| |-- element: string
|-- catalog_description: string
|-- catalog_type: string
|-- namespace_name: array
| |-- element: string
|-- namespace_description: string
|-- namespace_location: string
|-- namespace_owner: string
|-- namespace_tables: array
| |-- element: struct
| | |-- catalog_name: string
| | |-- namespace_name: array
| | | |-- element: string
| | |-- name: string
|-- namespace_is_default: boolean
|-- table_name: string
|-- table_owner: string
|-- table_created_time: string
|-- table_last_access: string
|-- table_provider: string
|-- table_type: string
|-- table_created_by: string
|-- table_location: string
|-- table_columns: array
| |-- element: struct
| | |-- name: string
| | |-- description: string
| | |-- data_type: string TableHistory
Table Name
<use_catalog>.system_lake_tables.history
Description
Schema
root
|-- catalog_name: string
|-- namespace_name: array
| |-- element: string
|-- name: string
|-- version: long
|-- starting_time: timestamp
|-- ending_time: timestamp
|-- operation_type: string
|-- format_specific_operation_type: string
|-- added_rows: long
|-- deleted_rows: long
|-- updated_rows: long
|-- estimated_net_byte_change: long
|-- created_files: long
|-- created_bytes: long
|-- purged_files: long
|-- purged_bytes: long TableFileStats
Table Name
<use_catalog>.system_lake_tables.files AND <use_catalog>.system_lake_tables.all_files
Description
Schema
root
|-- catalog_name: string
|-- namespace_name: array
| |-- element: string (containsNull = true)
|-- name: string
|-- path: string
|-- schema: string
|-- addition_timestamp: long
|-- modification_timestamp: long
|-- deletion_timestamp: long
|-- creation_timestamp: long
|-- purge_timestamp: long
|-- deleted: boolean
|-- purged: boolean
|-- file_bytes: long
|-- file_rows: long
|-- metadata: map
| |-- key: string
| |-- value: string
|-- column_metrics: string TableIngestion
The path of the Ingestion should be configured when using CREATE TABLE in the TBLPROPERTIES section or by the SET TBLPROPERTIES command.
// CREATE TABLE
CREATE TABLE t TBLRPROPERTIES(use.ingestion.source.file.path = '/ingestion/path')
// ALTER TABLE
ALTER TABLE t SET TBLPROPERTIES(use.ingestion.source.file.path = '/ingestion/path');
Table Name
<use_catalog>.<catalog_name>.system_lake_tables.ingestions
Description
This table contains the information about the ingestion process of a table.
Schema
root
|-- write_id: string (nullable = true)
|-- started_at: long (nullable = false)
|-- finished_at: long (nullable = false)
|-- destination_table: string (nullable = true)
|-- number_of_input_files: long (nullable = false)
|-- byte_size_input_files: long (nullable = false)
|-- input_files: list(string) (nullable = true)
|-- process_time_ms: long (nullable = true)
|-- batch_max_timestamp: long (nullable = false)
|-- batch_min_timestamp: long (nullable = false)
|-- txn_app_id: string (nullable = true)
|-- txn_version: string (nullable = true)TableConsumption
The Consumption set up should be configured when using CREATE TABLE in the TBLPROPERTIES section or by the SET TBLPROPERTIES command.
// CREATE TABLE
CREATE TABLE t TBLRPROPERTIES(use.consumption.enabled='true')
// ALTER TABLE
ALTER TABLE t SET TBLPROPERTIES(use.consumption.enabled = 'true');
Table Name
<use_catalog>.system_lake_tables.consumptions
Description
This table contains the information about the consumption process of a table.
Schema
root
|-- event_name: string (nullable = true)
|-- event_source: string (nullable = true)
|-- event_id: string (nullable = true)
|-- region: string (nullable = true)
|-- user_id: string (nullable = true)
|-- event_time: timestamp (nullable = true)
|-- bytes_in: long (nullable = false)
|-- bytes_out: long (nullable = false)
|-- object_path: string (nullable = true)TableOptimization
DISCLAIMER: Right now, the optimization has to be triggered MANUALLY from CLI or spark-shell.
Please, read the Optimization Guide to know more about this feature.
Table Name
<use_catalog>.system_lake_tables.optimizations
Description
This table contains the information about the optimization process of a table.
Schema
root
|-- optimization_name: string (nullable = true)
|-- write_id: string (nullable = true)
|-- table_id: string (nullable = true)
|-- revision_id: long (nullable = true)
|-- started_at: long (nullable = true)
|-- finished_at: long (nullable = true)
|-- input_files_stats: struct (nullable = true)
| |-- bytes: long (nullable = true)
| |-- cubes: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- elementCount: long (nullable = true)
| |-- files: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- numCubes: long (nullable = true)
| |-- numFiles: long (nullable = true)
|-- output_files_stats: struct (nullable = true)
| |-- bytes: long (nullable = true)
| |-- cubes: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- elementCount: long (nullable = true)
| |-- files: array (nullable = true)
| | |-- element: string (containsNull = true)
| |-- numCubes: long (nullable = true)
| |-- numFiles: long (nullable = true)
|-- process_time_ms: long (nullable = true)
|-- cube_defragmentation_stats: struct (nullable = true)
| |-- average_cube_count_per_file_after: double (nullable = true)
| |-- average_cube_count_per_file_before: double (nullable = true)
| |-- average_cube_count_per_file_reduction: double (nullable = true)
| |-- average_block_element_count_per_file_after: double (nullable = true)
| |-- average_block_element_count_per_file_before: double (nullable = true)
| |-- average_block_element_count_per_file_increase: double (nullable = true)
|-- sampling_overhead_reduction_stats: struct (nullable = true)
| |-- fraction: double (nullable = true)
| |-- table_bytes_before: long (nullable = true)
| |-- table_bytes_after: long (nullable = true)
| |-- sampled_bytes_before: long (nullable = true)
| |-- sampled_bytes_after: long (nullable = true)
| |-- sampled_fraction_before: double (nullable = true)
| |-- sampled_fraction_after: double (nullable = true)
| |-- bytes_to_optimize: long (nullable = true)
| |-- optimized_bytes: long (nullable = true)SINGLE TABLE QUERIES
If you want only to load the file or history information for a single table, our ReadOnly Catalog allows to query based on Table Identifier.
For example, if you have a table named: "students", you can retrieve the metadata by executing:
SELECT * FROM <use_catalog>.spark_catalog.default.students.files // CURRENT added and deleted files
SELECT * FROM <use_catalog>.spark_catalog.default.students.all_files //ALL FILES (including deleted and purged)
SELECT * FROM <use_catalog>.spark_catalog.default.students.history
SELECT * FROM <use_catalog>.spark_catalog.default.students.ingestions
SELECT * FROM <use_catalog>.spark_catalog.default.students.consumptionsThere's also a set of Metadata Tables which contains the information from single tables aggregated by hour:
SELECT * FROM <use_catalog>.spark_catalog.default.students.files_hourly // with CURRENT added and deleted files
SELECT * FROM <use_catalog>.spark_catalog.default.students.ingestions_hourly
SELECT * FROM <use_catalog>.spark_catalog.default.students.consumptions_hourlySyntax can be reduced as:
SELECT * FROM <use_catalog>.<table_catalog_name>.<table_namespace_name>.<table_name>.<metadata_table_name>All of them would be listed on the SHOW TABLES method.
SHOW TABLES IN <use_catalog>