Spark for Developers Training Course

OBJECTIVE:

This course will introduce Apache Spark. The students will learn how Spark fits into the Big Data ecosystem, and how to use Spark for data analysis. The course covers Spark shell for interactive data analysis, Spark internals, Spark APIs, Spark SQL, Spark streaming, and machine learning and graphX.

AUDIENCE :

Developers / Data Analysts

This course is available as onsite live training in United Kingdom or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Scala primer
- A quick introduction to Scala
- Labs : Getting know Scala
Spark Basics
- Background and history
- Spark and Hadoop
- Spark concepts and architecture
- Spark eco system (core, spark sql, mlib, streaming)
- Labs : Installing and running Spark
First Look at Spark
- Running Spark in local mode
- Spark web UI
- Spark shell
- Analyzing dataset – part 1
- Inspecting RDDs
- Labs: Spark shell exploration
RDDs
- RDDs concepts
- Partitions
- RDD Operations / transformations
- RDD types
- Key-Value pair RDDs
- MapReduce on RDD
- Caching and persistence
- Labs : creating & inspecting RDDs; Caching RDDs
Spark API programming
- Introduction to Spark API / RDD API
- Submitting the first program to Spark
- Debugging / logging
- Configuration properties
- Labs : Programming in Spark API, Submitting jobs
Spark SQL
- SQL support in Spark
- Dataframes
- Defining tables and importing datasets
- Querying data frames using SQL
- Storage formats : JSON / Parquet
- Labs : Creating and querying data frames; evaluating data formats
MLlib
- MLlib intro
- MLlib algorithms
- Labs : Writing MLib applications
GraphX
- GraphX library overview
- GraphX APIs
- Labs : Processing graph data using Spark
Spark Streaming
- Streaming overview
- Evaluating Streaming platforms
- Streaming operations
- Sliding window operations
- Labs : Writing spark streaming applications
Spark and Hadoop
- Hadoop Intro (HDFS / YARN)
- Hadoop + Spark architecture
- Running Spark on Hadoop YARN
- Processing HDFS files using Spark
Spark Performance and Tuning
- Broadcast variables
- Accumulators
- Memory management & caching
Spark Operations
- Deploying Spark in production
- Sample deployment templates
- Configurations
- Monitoring
- Troubleshooting

Requirements

PRE-REQUISITES

familiarity with either Java / Scala / Python language (our labs in Scala and Python)
basic understanding of Linux development environment (command line navigation / editing files using VI or nano)

21 Hours

Custom Corporate Training

Training solutions designed exclusively for businesses.

Customised Content: We adapt the syllabus and practical exercises to the real goals and needs of your project.
Flexible Schedule: Dates and times adapted to your team's agenda.
Format: Online (live), In-company (at your offices), or Hybrid.

Investment

Price per private group, online live training, starting from £4800 + VAT*

(*The final price may vary depending on the technical specialisation of the course, the level of customisation, the method of delivery and the number of learners)

Need help picking the right course?
england@nobleprog.co.uk or +44 (0)208 089 0990

Testimonials (6)

Doing similar exercises different ways really help understanding what each component (Hadoop/Spark, standalone/cluster) can do on its own and together. It gave me ideas on how I should test my application on my local machine when I develop vs when it is deployed on a cluster.

Thomas Carcaud - IT Frankfurt GmbH

Course - Spark for Developers

Ajay was very friendly, helpful and also knowledgable about the topic he was discussing.

Biniam Guulay - ICE International Copyright Enterprise Germany GmbH

Course - Spark for Developers

Ernesto did a great job explaining the high level concepts of using Spark and its various modules.

Michael Nemerouf

Course - Spark for Developers

The trainer made the class interesting and entertaining which helps quite a bit with all day training.

Ryan Speelman

Course - Spark for Developers

We know a lot more about the whole environment.

John Kidd

Course - Spark for Developers

Richard is very calm and methodical, with an analytic insight - exactly the qualities needed to present this sort of course.

Spark for Developers Training Course

OBJECTIVE:

AUDIENCE :

Course Outline

Scala primer

Spark Basics

First Look at Spark

RDDs

Spark API programming

Spark SQL

MLlib

GraphX

Spark Streaming

Spark and Hadoop

Spark Performance and Tuning

Spark Operations

Requirements

Custom Corporate Training

Testimonials (6)

Thomas Carcaud - IT Frankfurt GmbH

Course - Spark for Developers

Biniam Guulay - ICE International Copyright Enterprise Germany GmbH

Course - Spark for Developers

Michael Nemerouf

Course - Spark for Developers

Ryan Speelman

Course - Spark for Developers

John Kidd

Course - Spark for Developers

Kieran Mac Kenna

Course - Spark for Developers

Provisional Upcoming Courses (Contact Us For More Information)

Spark for Developers

Spark for Developers

Spark for Developers

Spark for Developers

Spark for Developers

Related Courses

Big Data Analytics with Google Colab and Apache Spark

Big Data Analytics in Health

Hadoop and Spark for Administrators

A Practical Introduction to Stream Processing

PySpark and Machine Learning

SMACK Stack for Data Science

Apache Spark Fundamentals

Administration of Apache Spark

Apache Spark in the Cloud

Scaling Data Pipelines with Spark NLP

Python and Spark for Big Data (PySpark)

Python, Spark, and Hadoop for Big Data

Apache Spark SQL

Stratio: Rocket and Intelligence Modules with PySpark

Related Categories

Apache Spark

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites