Streaming on Databricks
You can use Databricks for near real-time data ingestion, processing, machine learning, and AI for streaming data.
Databricks offers numerous optimizations for streaming and incremental processing, including the following:
- Delta Live Tables provides declarative syntax for incremental processing. See What is Delta Live Tables?.
- Auto Loader simplifies incremental ingestion from cloud object storage. See What is Auto Loader?.
- Unity Catalog adds data governance to streaming workloads. See Using Unity Catalog with Structured Streaming.
Delta Lake provides the storage layer for these integrations. See Delta table streaming reads and writes.
For real-time model serving, see Deploy models using Mosaic AI Model Serving.
- Tutorial
- Concepts
- Stateful streaming
- Custom stateful applications
- Production considerations
- Monitor streams
- Unity Catalog integration
- Streaming with Delta
- Examples
Databricks has specific features for working with semi-structured data fields contained in Avro, protocol buffers, and JSON data payloads. To learn more, see:
Additional resources
Apache Spark provides a Structured Streaming Programming Guide that has more information about Structured Streaming.
For reference information about Structured Streaming, Databricks recommends the following Apache Spark API references: