FAQ: Frequently Asked Questions

Q - I get an error like this when first indexing with qbeast following the steps from Quickstart:

java.io.IOException: (null) entry in command string: null chmod 0644

A - You can find the solution here

Q - I run into an "out or memory error" when indexing with qbeast format.

java.lang.OutOfMemoryError
	at sun.misc.Unsafe.allocateMemory(Native Method)
	at java.nio.DirectByteBuffer.(DirectByteBuffer.java:127)
	at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)

A - Since we process the data per partition, large partitions can cause the JVM to run out of memory.

Try to repartition the DataFrame before writing on your Spark Application:

df.repartition(200).write.format("qbeast").option("columnsToIndex", "x,y").save("/tmp/qbeast")

Q - I get an error like this when writing from a Source or a Query:

java.lang.illegalargumentexception: requirement failed
      at scala.Predef$.require(Predef.scala:268)
      at io.qbeast.core.model.CubeId$.containers(CubeId.scala:88)

A - In this case, the error is due to the fact that the CubeId is not being generated correctly, because the boundaries of the domain of the data have changed while indexing.

This is likely to happen in two cases:

The Data Source is constantly changing, and does not have any type of Snapshot Isolation when loading the DataFrame (e.g: Parquet, Qbeast)
The Query executed over the source is non-deterministic (e.g: use Random, CurrentTimestamp, etc. in the columns to Index).

To solve this problem, we suggest to:

Use unbounded transformations such as quantiles whose limits don't depend on the changing nature of the indexed data.

df.write.format("qbeast").option("columnsToIndex", "x:quantiles").option("columnStats", s"""\{"x_quantiles":[10.0, 90.0, 120.0, 140.0]\}""").save("/tmp/qbeast")

Introduce the column boundaries before writing the data, using the columnStats option. Like the previous solution: the boundaries are not necessary to build the index, and the error will not occur.

df.write.format("qbeast").option("columnsToIndex", "x").option("columnStats", s"""\{"x_min":0.0, "x_max":100.0\}""").save("/tmp/qbeast")

Materialize the DataFrame before writing it, using the persist method or an intermediate write.

// Persist the DataFrame before writing it
df.persist().write.format("qbeast").option("columnsToIndex", "x").save("/tmp/qbeast")
// Or write the DataFrame to a temporary location before writing it
df.write.format("parquet").save("/tmp/parquet")
val persistedDF = spark.read.format("parquet").load("/tmp/parquet")
persistedDF.write.format("qbeast").option("columnsToIndex", "x").save("/tmp/qbeast")