Unified Batch and Stream Processing with Apache Beam Training Course

Apache Beam is an open source, unified programming model for defining and executing parallel data processing pipelines. It's power lies in its ability to run both batch and streaming pipelines, with execution being carried out by one of Beam's supported distributed processing back-ends: Apache Apex, Apache Flink, Apache Spark, and Google Cloud Dataflow. Apache Beam is useful for ETL (Extract, Transform, and Load) tasks such as moving data between different storage media and data sources, transforming data into a more desirable format, and loading data onto a new system.

In this instructor-led, live training (onsite or remote), participants will learn how to implement the Apache Beam SDKs in a Java or Python application that defines a data processing pipeline for decomposing a big data set into smaller chunks for independent, parallel processing.

By the end of this training, participants will be able to:

Install and configure Apache Beam.
Use a single programming model to carry out both batch and stream processing from withing their Java or Python application.
Execute pipelines across multiple environments.

Format of the Course

Part lecture, part discussion, exercises and heavy hands-on practice

Note

This course will be available Scala in the future. Please contact us to arrange.

This course is available as onsite live training in United Kingdom or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction

Apache Beam vs MapReduce, Spark Streaming, Kafka Streaming, Storm and Flink

Installing and Configuring Apache Beam

Overview of Apache Beam Features and Architecture

Beam Model, SDKs, Beam Pipeline Runners
Distributed processing back-ends

Understanding the Apache Beam Programming Model

How a pipeline is executed

Running a sample pipeline

Preparing a WordCount pipeline
Executing the Pipeline locally

Designing a Pipeline

Planning the structure, choosing the transforms, and determining the input and output methods

Creating the Pipeline

Writing the driver program and defining the pipeline
Using Apache Beam classes
Data sets, transforms, I/O, data encoding, etc.

Executing the Pipeline

Executing the pipeline locally, on remote machines, and on a public cloud
Choosing a runner
Runner-specific configurations

Testing and Debugging Apache Beam

Using type hints to emulate static typing
Managing Python Pipeline Dependencies

Processing Bounded and Unbounded Datasets

Windowing and Triggers

Making Your Pipelines Reusable and Maintainable

Create New Data Sources and Sinks

Apache Beam Source and Sink API

Integrating Apache Beam with other Big Data Systems

Apache Hadoop, Apache Spark, Apache Kafka

Troubleshooting

Summary and Conclusion

Requirements

Experience with Python Programming.
Experience with the Linux command line.

Audience

Developers

14 Hours

Delivery Options

Private Group Training

Our identity is rooted in delivering exactly what our clients need.

Pre-course call with your trainer
Customisation of the learning experience to achieve your goals -

Bespoke outlines
Practical hands-on exercises containing data / scenarios recognisable to the learners

Training scheduled on a date of your choice
Delivered online, onsite/classroom or hybrid by experts sharing real world experience

Private Group Prices RRP from £3800 online delivery, based on a group of 2 delegates, £1200 per additional delegate (excludes any certification / exam costs). We recommend a maximum group size of 12 for most learning events.

Unified Batch and Stream Processing with Apache Beam Training Course

Course Outline

Requirements

Delivery Options

Private Group Training

Public Training

Provisional Upcoming Courses (Contact Us For More Information)

Unified Batch and Stream Processing with Apache Beam

Unified Batch and Stream Processing with Apache Beam

Unified Batch and Stream Processing with Apache Beam

Unified Batch and Stream Processing with Apache Beam

Unified Batch and Stream Processing with Apache Beam

Unified Batch and Stream Processing with Apache Beam

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites