What's New

Qbeast – What’s New

v0.4.0 – 2025-07-04

New release of Qbeast Spark, featuring two key enhancements: DML Support and Iceberg.

DML Support

Qbeast now supports:

Deletes, Updates, and Merge operations via:
- Resilient Index Builder
- Merge On Read (MoR) strategies
- Optimization for unindexed files

Iceberg Support

Initial version of the Iceberg-Qbeast protocol:

Index metadata stored via Puffin Files Spec.
Compatible with Spark Datasource V2 APIs.
Integration with Iceberg requires specific config and catalog setup.

Example Setup:

export QBEAST_SPARK_VERSION=0.9.0-rc1
$SPARK_HOME/bin/spark-shell \
  --repositories https://maven.pkg.github.com/qbeast-io/qbeast-spark-private \
  --conf spark.jars.ivySettings=$HOME/.ivy2/ivysettings.xml \
  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.9.1,io.qbeast:qbeast-iceberg_2.12:$QBEAST_SPARK_VERSION \
  --conf spark.sql.extensions=io.qbeast.sql.IcebergQbeastSparkSessionExtension \
  --conf spark.sql.catalog.spark_catalog=io.qbeast.catalog.IcebergQbeastSessionCatalog \
  --conf spark.sql.catalog.spark_catalog.type=hadoop \
  --conf spark.sql.catalog.spark_catalog.warehouse=/tmp/iceberg-qbeast-warehouse

Supported APIs:

// Create
df.writeTo("qbeast_table").using("qbeast").option("columnsToIndex", "id").createOrReplace()
 
// Append
dfAppend.writeTo("qbeast_table").append()
 
// Save as Table
df.write.format("qbeast").option("columnsToIndex", "id").saveAsTable("qbeast_table")
 
// SQL
spark.sql("CREATE TABLE qbeast_table(id INT) USING qbeast TBLPROPERTIES('columnsToIndex' 'id')")

Experimental Performance Optimizations

Feature flags added for:

Sampling and shuffling in OTree analysis
Rollup strategies for cube generation
Reduced analysis time and improved layouting

Modular Packaging

New split JARs:

qbeast-delta: Delta-Qbeast interfaces
qbeast-hudi: Hudi-Qbeast interfaces
qbeast-iceberg: Iceberg-Qbeast interfaces (with independent IcebergQbeastCatalog)

Bug Fixes & Improvements

Fix error loading dimension count for unindexed revision
Remove Delta deps from quantile computation
Fix README + IvySettings
Denormalize cubeId as string in blocks
Apply scalafmt and scalafix
Retry write failures only a few times
Close Hudi writer timeline server

v0.3.1 – 2025-04-17

Bug Fixes & Improvements

Respect hoodie.table.timeline.timezone from hudi-defaults.conf

v0.3.0 – 2025-04-17

A major release introducing Hudi support, unindexed file optimization, and skewed column indexing.

Hudi Support

New module for Hudi integration.

Example Setup:

export QBEAST_SPARK_VERSION=0.8.0
$SPARK_HOME/bin/spark-shell --repositories https://maven.pkg.github.com/qbeast-io/qbeast-spark-private \
--packages org.apache.hudi:hudi-spark3.5-bundle_2.12:1.0.0,io.qbeast:qbeast-spark_2.12:$QBEAST_SPARK_VERSION \
--conf spark.serializer=org.apache.spark.serializer.KryoSerializer \
--conf spark.kryo.registrator=org.apache.spark.HoodieSparkKryoRegistrar \
--conf spark.sql.extensions=io.qbeast.sql.HudiQbeastSparkSessionExtension \
--conf spark.sql.catalog.spark_catalog=io.qbeast.catalog.HudiQbeastCatalog

Optimization of Unindexed Files

API support for optimizing legacy and externally-written files.

qbeastTable.optimize(0L, Seq("file1", "file2"))
qbeastTable.optimize(0L, fraction = 0.5)

Skewed Columns (Quantile Indexing)

Index highly skewed columns with quantile-based layout:

val quantiles = QbeastUtils.computeQuantilesForColumn(df, "brand")
val stats = s"""{"brand_quantiles":$quantiles}"""
 
df.write
  .mode("overwrite")
  .format("qbeast")
  .option("columnsToIndex", "brand:quantiles")
  .option("columnStats", stats)
  .save("/tmp/qbeast_table_quantiles")

Other Fixes and Enhancements

CI workflow setup, snapshot publishing, cron vulnerability checks
Improved determinism checks and error handling
Dependency updates: jinja2, werkzeug
Fix computed metrics from Delta table history
Fix Hudi commit timezone issue
Ensure deletion event timestamp follows creation
Fix issue loading table properties
Use Hadoop 3.3.6 by default

Overview

Ecosystem

What's New

Qbeast – What’s New

v0.4.0 – 2025-07-04

DML Support

Iceberg Support

Example Setup:

Supported APIs:

Experimental Performance Optimizations

Modular Packaging

Bug Fixes & Improvements

v0.3.1 – 2025-04-17

Bug Fixes & Improvements

v0.3.0 – 2025-04-17

Hudi Support

Example Setup:

Optimization of Unindexed Files

Skewed Columns (Quantile Indexing)

Other Fixes and Enhancements