Spring Cloud Data Flow for Apache YARN

Spring Cloud Data Flow for Apache YARN is a cloud-native orchestration service for composable data microservices on Apache YARN. With Spring Cloud Data Flow, developers can create and orchestrate data pipelines for common use cases such as data ingest, real-time analytics, and data import/export.

Quick Start
Fork me on GitHub

Spring Cloud Data Flow for Apache YARN offers a collection of patterns and best practices for data microservices running as streaming and batch data pipelines in Apache YARN.

Features

  • Consume streaming and batch data microservices as maven artifacts
  • Create, unit-test, troubleshoot and manage data microservices in isolation
  • Develop using: DSL, Shell, REST-APIs, Dashboard, and Flo
  • Take advantage of Apache YARN value-adds such as metrics, security, logging, health checks, and remote management at each data microservice level
  • Scale stream and batch pipelines through Spring Cloud Data Flow's yarn-cli

Quick Start

Step 1 - Download Spring Cloud Data Flow's Apache YARN Server and Shell Applications

Step 2 - Start the necessary peripherals

Step 3 - Start Spring Cloud Data Flow's YARN Server Application

Step 4 - Run and Connect the Shell Application

Step 5 - Register Stream and Task applications

Step 6 - Create ‘ticktock’ Stream dataflow:>stream create ticktock --definition "time | hdfs --rollover=1000" --deploy

Step 7 - Verify ‘ticktock’ Results under /tmp/hdfs-sink/ folder

Step 8 - Launch Dashboard at: http://<SCDF_SERVER_HOST>:<PORT>/dashboard