Introduction to the Lakeflow Framework

The Lakeflow Framework is a meta-data driven, data engineering framework, designed to:

  • accelerate and simplify the deployment of Spark Declarative Pipelines (SDP), and support their deployment through your SDLC.

  • support a wide variety of patterns across the medallion architecture for both batch and streaming workloads.

  • provide a structured, configuration-driven approach to building reliable and maintainable data pipelines

The Framework is designed for simplicity, performance, ease of maintenance and extensibility as the SDP product evolves.

Core Concepts

  • Lego block, pattern-based development

  • Two Parts

    • SDP wrapper components: close to the metal, exposes SDP API’s directly to minimise the need for changes.

    • Dataflow Spec abstraction layer: allows users to put the SDP components together, as they needed, like Lego blocks.

  • Key Design

    • DABS native

    • No artifacts or wheel files

    • Minimized third-party dependencies

    • No control tables

    • Extensible

    • Flexible deployment bundles

  • OO & Best Practices

    • Encapsulation

    • Abstraction & Inheritance

    • Loosely Coupled

    • Separation of Concerns & Single Responsibility

Please refer to the Framework Concepts section for an overview of the different components of the framework.