FAQ: Frequently Asked Questions
Q - I get an error like this when first indexing with qbeast following the steps from Quickstart:
java.io.IOException: (null) entry in command string: null chmod 0644
A - You can find the solution here
Q - I run into an "out or memory error" when indexing with qbeast format.
java.lang.OutOfMemoryError
at sun.misc.Unsafe.allocateMemory(Native Method)
at java.nio.DirectByteBuffer.(DirectByteBuffer.java:127)
at java.nio.ByteBuffer.allocateDirect(ByteBuffer.java:311)
A - Since we process the data per partition, large partitions can cause the JVM to run out of memory.
Try to repartition
the DataFrame
before writing on your Spark Application:
df.repartition(200).write.format("qbeast").option("columnsToIndex", "x,y").save("/tmp/qbeast")
Q - I get an error like this when writing from a Source or a Query:
java.lang.illegalargumentexception: requirement failed
at scala.Predef$.require(Predef.scala:268)
at io.qbeast.core.model.CubeId$.containers(CubeId.scala:88)
A - In this case, the error is due to the fact that the CubeId
is not being generated correctly, because the boundaries of the domain of the data have changed while indexing.
This is likely to happen in two cases:
- The Data Source is constantly changing, and does not have any type of Snapshot Isolation when loading the DataFrame (e.g: Parquet, Qbeast)
- The Query executed over the source is non-deterministic (e.g: use Random, CurrentTimestamp, etc. in the columns to Index).
To solve this problem, we suggest to:
- Use unbounded transformations such as
quantiles
whose limits don't depend on the changing nature of the indexed data.
df.write.format("qbeast").option("columnsToIndex", "x:quantiles").option("columnStats", s"""\{"x_quantiles":[10.0, 90.0, 120.0, 140.0]\}""").save("/tmp/qbeast")
- Introduce the column boundaries before writing the data, using the
columnStats
option. Like the previous solution: the boundaries are not necessary to build the index, and the error will not occur.
df.write.format("qbeast").option("columnsToIndex", "x").option("columnStats", s"""\{"x_min":0.0, "x_max":100.0\}""").save("/tmp/qbeast")
- Materialize the DataFrame before writing it, using the
persist
method or an intermediate write.
// Persist the DataFrame before writing it
df.persist().write.format("qbeast").option("columnsToIndex", "x").save("/tmp/qbeast")
// Or write the DataFrame to a temporary location before writing it
df.write.format("parquet").save("/tmp/parquet")
val persistedDF = spark.read.format("parquet").load("/tmp/parquet")
persistedDF.write.format("qbeast").option("columnsToIndex", "x").save("/tmp/qbeast")