Spring Cloud Data Flow

Spring Cloud Data Flow is a toolkit for building data integration and real-time data processing pipelines.


Pipelines consist of Spring Boot apps, built using the Spring Cloud Stream or Spring Cloud Task microservice frameworks. This makes Spring Cloud Data Flow suitable for a range of data processing use cases, from import/export to event streaming and predictive analytics.

Quick Start
Fork me on GitHub

Overview

The Spring Cloud Data Flow server uses Spring Cloud Deployer, to deploy pipelines onto modern runtimes such as Cloud Foundry, Kubernetes, Apache Mesos or Apache YARN.

A selection of pre-built stream and task/batch starter apps for various data integration and processing scenarios facilitate learning and experimentation.

Custom stream and task applications, targeting different middleware or data services, can be built using the familiar Spring Boot style programming model.

A simple stream pipeline DSL makes it easy to specify which apps to deploy and how to connect outputs and inputs. A new composed task DSL was added in v1.2.

The dashboard offers a graphical editor for building new pipelines interactively, as well as views of deployable apps and running apps with metrics.

The Spring Could Data Flow server exposes a REST API for composing and deploying data pipelines. A separate shell makes it easy to work with the API from the command line.

Platform Implementations

An easy way to get started on Spring Cloud Data Flow would be to follow the platform-specific implementation links from the table below. Each of the implementations evolves in isolation with independent release cadences. It is highly recommended to review the platform-specific reference docs to learn more about the feature capabilities.

Server Type Stable Release Milestone/Snapshot Release
Local Server 1.2.3.RELEASE[docs] 1.3.0.M2[docs]
Cloud Foundry Server 1.2.4.RELEASE[docs] 1.3.0.M2[docs]
Kubernetes Server 1.2.2.RELEASE[docs] 1.3.0.M2[docs]
Apache YARN Server 1.2.2.RELEASE[docs] 1.2.3.BUILD-SNAPSHOT[docs]
Apache Mesos Server 1.0.0.RELEASE[docs] 1.1.0.BUILD-SNAPSHOT[docs]

Community Implementations

Quick Start

Step 1 - Download the Spring Cloud Data Flow Local-Server.
Mac users can use 'curl -O' instead of wget.

wget http://repo.spring.io/release/org/springframework/cloud/spring-cloud-dataflow-server-local/1.2.3.RELEASE/spring-cloud-dataflow-server-local-1.2.3.RELEASE.jar


Step 2 - Launch the Local-Server.

java -jar spring-cloud-dataflow-server-local-1.2.3.RELEASE.jar


Step 3 - Download and Start Apache Kafka 0.10 as messaging middleware.
Mac homebrew users can 'brew install kafka'

Step 4 - Open the dashboard at http://localhost:9393/dashboard.

Step 5 - Bulk Import out-of-the-box stream applications for Apache Kafka using the URI 'http://bit.ly/Bacon-RELEASE-stream-applications-kafka-10-maven'.
(latest bit.ly links here)

Bulk Import and Register Apps

Step 6 - Use 'Create Stream' under STREAMS to define and deploy a stream time | log called 'ticktock'.

Create TickTock Stream

Deploy TickTock Stream

Once the ‘ticktock’ stream is deployed, two running stream apps will appear under RUNTIME. Click on 'ticktock.log' to determine the location of the stdout log file.

Deploy TickTock Stream

Step 7 - Verify that events are being written to the ticktock log every second.

tail -f /var/folders/ ... /ticktock.log/stdout_0.log


Building Blocks of Spring Cloud Data Flow

Spring Cloud Data Flow builds upon several projects and the top-level building blocks of the ecosystem are listed in the following visual representation. Each project represents a core capability and they evolve in isolation, with separate release cadences - follow the links to find more details about each project.

Spring Cloud Data Flow Local Server
Spring Cloud Data Flow Cloud Foundry Server
Spring Cloud Data Flow Kubernetes Server
Spring Cloud Data Flow Apache Yarn Server
Spring Cloud Data Flow Apache Mesos Server

REST-APIs / Shell / DSL
Dashboard
Spring Flo
Spring Cloud Data Flow Metrics Collector
Spring Cloud Data Flow - Core

↓     Uses     ↓

Spring Cloud Deployer - Service Provider Interface (SPI)

↑     Implements     ↑

Spring Cloud Deployer Local
Spring Cloud Deployer Cloud Foundry
Spring Cloud Deployer Kubernetes
Spring Cloud Deployer Yarn
Spring Cloud Deployer Mesos

↓     Deploys     ↓

Spring Cloud Stream App Starters
Spring Cloud Task App Starters
Spring Cloud Stream
Spring Cloud Task

↓     Uses     ↓

Spring Integration
Spring Boot
Spring Batch