Incompatible Schema in Some Files


The Spark job fails with an exception like the following while reading Parquet files:

Error in SQL statement: SparkException: Job aborted due to stage failure:
Task 20 in stage 11227.0 failed 4 times, most recent failure: Lost task 20.3 in stage 11227.0
(TID 868031,, executor 31):
java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary$PlainDoubleDictionary
    at org.apache.parquet.column.Dictionary.decodeToLong(


The java.lang.UnsupportedOperationException in this instance is caused by one or more Parquet files written to a Parquet folder with an incompatible schema.


Find the Parquet files and rewrite them with the correct schema. Try to read the Parquet dataset with schema merging enabled:"mergeSchema", "true").parquet(path)


spark.conf.set("spark.sql.parquet.mergeSchema", "true")

If you do have Parquet files with incompatible schemas, the snippets above will output an error with the name of the file that has the wrong schema.