CREATE EXTERNAL TABLE does not save schema [Spark] #165

osopardo1 · 2023-02-16T15:29:23Z

What went wrong?

When creating an already existing table using qbeast format, the schema is not saved properly on the Glue Catalog.

And trting

How to reproduce?

1. Code that triggered the bug, or steps to reproduce:

spark.sql("""CREATE EXTERNAL TABLE tpc_ds_1gb_qbeast_store_sales
USING qbeast
LOCATION "/tmp/store_sales"
OPTIONS ('columnsToIndex'='ss_sold_date_sk,ss_item_sk')""")

And then execute:

spark.sql("""SELECT * FROM tpc_ds_1gb_qbeast_store_sales"").show()

Throws the following error:

org.apache.spark.sql.AnalysisException: Unable to resolve ss_sold_date_sk given []
  at org.apache.spark.sql.errors.QueryCompilationErrors$.cannotResolveAttributeError(QueryCompilationErrors.scala:1020)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.$anonfun$resolve$3(LogicalPlan.scala:91)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.$anonfun$resolve$1(LogicalPlan.scala:90)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
  at scala.collection.Iterator.foreach(Iterator.scala:943)

And when describing the table:

spark.sql("DESCRIBE EXTENDED  tpc_ds_1gb_qbeast_store_sales").show()

Only the properties appear:

+--------------------+--------------------+-------+
|            col_name|           data_type|comment|
+--------------------+--------------------+-------+
|                    |                    |       |
|      # Partitioning|                    |       |
|     Not partitioned|                    |       |
|                    |                    |       |
|# Detailed Table ...|                    |       |
|                Name|tpc_ds_1gb_qbeast...|       |
|            Location|s3://qbeast-priva...|       |
|            Provider|              qbeast|       |
|               Owner|              hadoop|       |
|    Table Properties|[columnsToIndex=s...|       |
+--------------------+--------------------+-------+

2. Branch and commit id:

Main at 49163e9

3. Spark version:

On the spark shell run spark.version.

3.2.2

4. Hadoop version:

On the spark shell run org.apache.hadoop.util.VersionInfo.getVersion().

3.3.1

5. How are you running Spark?

Are you running Spark inside a container? Are you launching the app on a remote K8s cluster? Or are you just running the tests in a local computer?

EMR cluster

6. Stack trace:

Trace of the log/error messages.

The text was updated successfully, but these errors were encountered:

osopardo1 added the type: bug Something isn't working label Feb 16, 2023

osopardo1 mentioned this issue Feb 23, 2023

Create External Table without schema [Spark] #168

Merged

4 tasks

osopardo1 changed the title ~~CREATE EXTERNAL TABLE does not save schema~~ CREATE EXTERNAL TABLE does not save schema [Spark] Feb 23, 2023

osopardo1 closed this as completed in #168 Mar 3, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CREATE EXTERNAL TABLE does not save schema [Spark] #165

CREATE EXTERNAL TABLE does not save schema [Spark] #165

osopardo1 commented Feb 16, 2023

CREATE EXTERNAL TABLE does not save schema [Spark] #165

CREATE EXTERNAL TABLE does not save schema [Spark] #165

Comments

osopardo1 commented Feb 16, 2023

What went wrong?

How to reproduce?

1. Code that triggered the bug, or steps to reproduce:

2. Branch and commit id:

3. Spark version:

4. Hadoop version:

5. How are you running Spark?

6. Stack trace: