How to Create A Cloud Dataflow Pipeline Using Java and Apache Maven

Jose Miguel Arrieta
Data Science
Published in
4 min readJan 7, 2019

--

Cloud Dataflow is a managed service for executing a wide variety of data processing patterns.

This post will explain how to create a simple Maven project with the Apache Beam SDK in order to run a pipeline on Google Cloud Dataflow service. One advantage to use Maven, is that this tool will let you manage external dependencies for the Java project, making it ideal for automation processes.

This project execute a very simple example where two strings “Hello” and “World" are the inputs and…

--

--