Versioning

This section describes the versions of Apache Spark supported by Databricks ML Model Export and the Model Export versioning policy.

Supported Apache Spark versions

Databricks ML Model Export is generally available as of Databricks Runtime 4.0, which includes Apache Spark 2.3. Some earlier Databricks Runtime versions include Model Export, but only as a private beta feature.

To use Model Export with models from an earlier Spark version, you can use MLlib persistence to save models from the earlier Spark version and load them in Spark 2.3 or later. You can then use Model Export to export the models for scoring.

Databricks ML Model Export versioning

This section explains the versioning policy and backwards compatibility guarantees for Databricks ML Model Export and its companion library dbml-local.

Tip

To find the best dbml-local version to use with an exported model, check Databricks Runtime Release Notes to get the Apache Spark version of the Databricks Runtime and find the latest dbml-local version with that Apache Spark version suffix.

Apache Spark MLlib version policy

Since Databricks ML Model Export and dbml-local are designed to replicate MLlib behavior, the Model Export versioning policy is based on the MLlib policy for ML persistence. In general, MLlib maintains backwards compatibility. However, there are rare exceptions.

Model persistence

Is a model or pipeline saved using Apache Spark ML persistence in Spark version a.b.c loadable by Spark version x.y.z?

  • Major versions: No guarantees, but best-effort.
  • Minor and patch versions: Yes; these are backwards compatible.
  • Note about the format: There are no guarantees for a stable persistence format, but model loading itself is designed to be backwards compatible.
Model behavior

Does a model or pipeline in Spark version a.b.c behave identically in Spark version x.y.z?

  • Major versions: No guarantees, but best-effort.
  • Minor and patch versions: Identical behavior, except for bug fixes.

For both model persistence and model behavior, any breaking changes across a minor version or patch version are reported in the Spark version release notes. If a breakage is not reported in release notes, then it should be treated as a bug to be fixed.

Databricks ML Model Export guarantees

In general, Databricks ML Model Export and dbml-local aim to provide the same guarantees as MLlib:

  • If MLlib maintains consistent behavior across Spark versions, then dbml-local will as well.
  • If MLlib changes behavior across Spark versions, then dbml-local will have an updated version for the new Spark version.

The scoring library dbml-local is versioned as x.y.z_sparkA.B (e.g., 0.1.3_spark2.2). This version string has 2 parts:

  • dbml-local version number (e.g., 0.1.3)
    • This is the library version number, which follows semantic versioning (major.minor.patch).
    • Patch and minor versions maintain identical behavior for models.
  • Apache Spark version suffix (e.g., _spark2.2)
    • This suffix is added to make it easy to pick dbml-local versions that match the version of Apache Spark used to create the model.
    • For example, for a library with version 0.1.3_spark2.2, we guarantee:
      • You can load models exported from previous Spark minor and patch versions, that is: Spark 2.0, 2.1, 2.2.
      • Identical behavior as the suffix’s Spark minor version, in this case Spark 2.2.
      • If a Spark patch version includes a behavioral change for a bug fix, we will note this in the dbml-local release docs.