Apache Spark Training in Newcastle

Apache Spark Training in Newcastle

Apache Spark - an engine for big data processing training

Newcastle

Rotterdam House
116 Quayside
Newcastle upon Tyne NE1 3DY
United Kingdom
GB
Newcastle
The Newcastle Quayside Centre is in a prestigious riverside location close to the River Tyne occupying three floors of a five-storey building with a glass...Read more

Client Testimonials

Spark for Developers

I think the trainer had an excellent style of combining humor and real life stories to make the subjects at hand very approachable. I would highly recommend this professor in the future.

Spark for Developers

I think the trainer had an excellent style of combining humor and real life stories to make the subjects at hand very approachable. I would highly recommend this professor in the future.

Spark for Developers

Ernesto did a great job explaining the high level concepts of using Spark and it's various modules.

Michael Nemerouf -

Spark for Developers

Richard is very calm and methodical, with an analytical insight - exactly the qualities needed to present this sort of course

Kieran Mac Kenna - BAE Systems Applied Intelligence

Spark for Developers

The trainer made the class interesting and entertaining which helps quite a bit with all day trainings

Ryan Speelman -

Spark for Developers

We know know a lot more about the whole environment

John Kidd - Cardano Risk Management

Apache Spark Course Events - Newcastle

Code Name Venue Duration Course Date PHP Course Price [Remote / Classroom]
magellan Magellan: Geospatial Analytics with on Spark Newcastle 14 hours Thu, 2018-02-08 09:30 £2200 / £2600
hdp Hortonworks Data Platform (HDP) for administrators Newcastle 21 hours Tue, 2018-03-13 09:30 £3300 / £3900
68780 Apache Spark Newcastle 14 hours Mon, 2018-03-19 09:30 £2200 / £2600
sparkdev Spark for Developers Newcastle 21 hours Tue, 2018-03-20 09:30 £3300 / £3900
alluxio Alluxio: Unifying disparate storage systems Newcastle 7 hours Fri, 2018-03-30 09:30 £1100 / £1300
magellan Magellan: Geospatial Analytics with on Spark Newcastle 14 hours Thu, 2018-04-05 09:30 £2200 / £2600
graphcomputing Introduction to Graph Computing Newcastle 28 hours Mon, 2018-04-30 09:30 £4400 / £5200
hdp Hortonworks Data Platform (HDP) for administrators Newcastle 21 hours Wed, 2018-05-09 09:30 £3300 / £3900
68780 Apache Spark Newcastle 14 hours Thu, 2018-05-10 09:30 £2200 / £2600
sparkdev Spark for Developers Newcastle 21 hours Mon, 2018-05-14 09:30 £3300 / £3900
alluxio Alluxio: Unifying disparate storage systems Newcastle 7 hours Mon, 2018-05-21 09:30 £1100 / £1300
magellan Magellan: Geospatial Analytics with on Spark Newcastle 14 hours Wed, 2018-05-30 09:30 £2200 / £2600
68780 Apache Spark Newcastle 14 hours Mon, 2018-07-02 09:30 £2200 / £2600
hdp Hortonworks Data Platform (HDP) for administrators Newcastle 21 hours Wed, 2018-07-04 09:30 £3300 / £3900
graphcomputing Introduction to Graph Computing Newcastle 28 hours Mon, 2018-07-09 09:30 £4400 / £5200
alluxio Alluxio: Unifying disparate storage systems Newcastle 7 hours Tue, 2018-07-10 09:30 £1100 / £1300
sparkdev Spark for Developers Newcastle 21 hours Tue, 2018-07-10 09:30 £3300 / £3900
magellan Magellan: Geospatial Analytics with on Spark Newcastle 14 hours Thu, 2018-07-19 09:30 £2200 / £2600

Course Outlines

Code Name Duration Outline
68780 Apache Spark 14 hours

Why Spark?

  • Problems with Traditional Large-Scale Systems
  • Introducing Spark

Spark Basics

  • What is Apache Spark?
  • Using the Spark Shell
  • Resilient Distributed Datasets (RDDs)
  • Functional Programming with Spark

Working with RDDs

  • RDD Operations
  • Key-Value Pair RDDs
  • MapReduce and Pair RDD Operations

The Hadoop Distributed File System

  • Why HDFS?
  • HDFS Architecture
  • Using HDFS

Running Spark on a Cluster

  • Overview
  • A Spark Standalone Cluster
  • The Spark Standalone Web UI

Parallel Programming with Spark

  • RDD Partitions and HDFS Data Locality
  • Working With Partitions
  • Executing Parallel Operations

Caching and Persistence

  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

Writing Spark Applications

  • Spark Applications vs. Spark Shell
  • Creating the SparkContext
  • Configuring Spark Properties
  • Building and Running a Spark Application
  • Logging

Spark, Hadoop, and the Enterprise Data Center

  • Overview
  • Spark and the Hadoop Ecosystem
  • Spark and MapReduce

Spark Streaming

  • Spark Streaming Overview
  • Example: Streaming Word Count
  • Other Streaming Operations
  • Sliding Window Operations
  • Developing Spark Streaming Applications

Common Spark Algorithms

  • Iterative Algorithms
  • Graph Analysis
  • Machine Learning

Improving Spark Performance

  • Shared Variables: Broadcast Variables
  • Shared Variables: Accumulators
  • Common Performance Issues
sparkdev Spark for Developers 21 hours

OBJECTIVE:

This course will introduce Apache Spark. The students will learn how  Spark fits  into the Big Data ecosystem, and how to use Spark for data analysis.  The course covers Spark shell for interactive data analysis, Spark internals, Spark APIs, Spark SQL, Spark streaming, and machine learning and graphX.

AUDIENCE :

Developers / Data Analysts

  1. Scala primer
    • A quick introduction to Scala
    • Labs : Getting know Scala
  2. Spark Basics
    • Background and history
    • Spark and Hadoop
    • Spark concepts and architecture
    • Spark eco system (core, spark sql, mlib, streaming)
    • Labs : Installing and running Spark
  3. First Look at Spark
    • Running Spark in local mode
    • Spark web UI
    • Spark shell
    • Analyzing dataset – part 1
    • Inspecting RDDs
    • Labs: Spark shell exploration
  4. RDDs
    • RDDs concepts
    • Partitions
    • RDD Operations / transformations
    • RDD types
    • Key-Value pair RDDs
    • MapReduce on RDD
    • Caching and persistence
    • Labs : creating & inspecting RDDs;   Caching RDDs
  5. Spark API programming
    • Introduction to Spark API / RDD API
    • Submitting the first program to Spark
    • Debugging / logging
    • Configuration properties
    • Labs : Programming in Spark API, Submitting jobs
  6. Spark SQL
    • SQL support in Spark
    • Dataframes
    • Defining tables and importing datasets
    • Querying data frames using SQL
    • Storage formats : JSON / Parquet
    • Labs : Creating and querying data frames; evaluating data formats
  7. MLlib
    • MLlib intro
    • MLlib algorithms
    • Labs : Writing MLib applications
  8. GraphX
    • GraphX library overview
    • GraphX APIs
    • Labs : Processing graph data using Spark
  9. Spark Streaming
    • Streaming overview
    • Evaluating Streaming platforms
    • Streaming operations
    • Sliding window operations
    • Labs : Writing spark streaming applications
  10. Spark and Hadoop
    • Hadoop Intro (HDFS / YARN)
    • Hadoop + Spark architecture
    • Running Spark on Hadoop YARN
    • Processing HDFS files using Spark
  11. Spark Performance and Tuning
    • Broadcast variables
    • Accumulators
    • Memory management & caching
  12. Spark Operations
    • Deploying Spark in production
    • Sample deployment templates
    • Configurations
    • Monitoring
    • Troubleshooting
hdp Hortonworks Data Platform (HDP) for administrators 21 hours

Hortonworks Data Platform is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem.

This instructor-led live training introduces Hortonworks and walks participants through the deployment of Spark + Hadoop solution.

By the end of this training, participants will be able to:

  • Use Hortonworks to reliably run Hadoop at a large scale
  • Unify Hadoop's security, governance, and operations capabilities with Spark's agile analytic workflows.
  • Use Hortonworks to investigate, validate, certify and support each of the components in a Spark project
  • Process different types of data, including structured, unstructured, in-motion, and at-rest.

Audience

  • Hadoop administrators

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

To request a customized course outline for this training, please contact us.

 

magellan Magellan: Geospatial Analytics with on Spark 14 hours

Magellan is an open-source distributed execution engine for geospatial analytics on big data. Implemented on top of Apache Spark, it extends Spark SQL and provides a relational abstraction for geospatial analytics.

This instructor-led, live training introduces the concepts and approaches for implementing geospacial analytics and walks participants through the creation of a predictive analysis application using Magellan on Spark.

By the end of this training, participants will be able to:

  • Efficiently query, parse and join geospatial datasets at scale
  • Implement geospatial data in business intelligence and predictive analytics applications
  • Use spatial context to extend the capabilities of mobile devices, sensors, logs, and wearables

Audience

  • Application developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

To request a customized course outline for this training, please contact us.

 

alluxio Alluxio: Unifying disparate storage systems 7 hours

Alexio is an open-source virtual distributed storage system that unifies disparate storage systems and enables applications to interact with data at memory speed. It is used by companies such as Intel, Baidu and Alibaba.

In this instructor-led, live training, participants will learn how to use Alexio to bridge different computation frameworks with storage systems and efficiently manage multi-petabyte scale data as they step through the creation of an application with Alluxio.

By the end of this training, participants will be able to:

  • Develop an application with Alluxio
  • Connect big data systems and applications while preserving one namespace
  • Efficiently extract value from big data in any storage format
  • Improve workload performance
  • Deploy and manage Alluxio standalone or clustered

Audience

  • Data scientist
  • Developer
  • System administrator

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

To request a customized course outline for this training, please contact us.

 

graphcomputing Introduction to Graph Computing 28 hours

A large number of real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set of tools and mindset referred to as graph computing.

In this instructor-led, live training, participants will learn about the various technology offerings and implementations for processing graph data. The aim is to identify real-world objects, their characteristics and relationships, then model these relationships and process them as data using graph computing approaches. We start with a broad overview and narrow in on specific tools as we step through a series of case studies, hands-on exercises and live deployments.

By the end of this training, participants will be able to:

  • Understand how graph data is persisted and traversed
  • Select the best framework for a given task (from graph databases to batch processing frameworks)
  • Implement Hadoop, Spark, GraphX and Pregel to carry out graph computing across many machines in parallel
  • View real-world big data problems in terms of graphs, processes and traversals

Audience

  • Developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Introduction
    Graph databases and libraries

Understanding graph data
    The graph as a data structure
    Using vertices (dots) and edges (lines) to model real-world scenarios

Using Graph databases to model, persist and process graph data
    Local graph algorithms/traversals
    neo4j, OrientDB and Titan

Exercise: Modeling Graph Data with neo4j
    Whiteboard data modeling

Beyond Graph databases: Graph computing
    Understanding the property graph
    Graph modeling different scenarios (software graph, discussion graph, concept graph)

Solving Real-World Problems with Traversals
    Algorithmic/directed walk over the graph
    Determining circular cependencies

Case Study: Ranking Discussion Contributors
    Ranking by number and depth of conributed discussions
    A note on sentiment and concept analysis

Graph Computing: Local, in-memory graph toolkits
    Graph analysis and visualization
    JUNG, NetworkX, and iGraph

Exercise: Modeling Graph Data with NetworkX
    Using NetworkX to model a complex s

Graph Computing: Batch Processing Graph Frameworks
    Leveraging leverage Hadoop for storage (HDFS) and processing (MapReduce)
    Overview of iterative algorithms
    Hama, Giraph, and GraphLab

Graph Computing: Graph-parallel Computation
    Unifying ETL, exploratory analysis, and iterative graph computation within a single system
    GraphX

Setup and Installation
    Hadoop and Spark

GraphX Operators
    Property, structural, join, neighborhood aggregation, caching and uncaching

Iterating with Pregel API
    Passing arguments for sending, receiving and computing

Building a Graph
    Using vertices and edges in an RDD or on disk

Designing Scalable Algorithms
    GraphX Optimization

Accessing Additional Algorithms
    PageRank, Connected Components, Triangle Counting

Exercis: Page Rank and Top Users
    Building and processing graph data using text files as input

Deploying to Production

Closing Remarks

Other regions

Apache Spark training courses in Newcastle, Weekend Apache Spark courses in Newcastle, Evening Apache Spark training in Newcastle, Apache Spark instructor-led in Newcastle , Apache Spark on-site in Newcastle, Apache Spark private courses in Newcastle,Weekend Apache Spark training in Newcastle, Apache Spark one on one training in Newcastle, Apache Spark instructor in Newcastle,Apache Spark classes in Newcastle, Apache Spark instructor-led in Newcastle, Apache Spark boot camp in Newcastle, Apache Spark coaching in Newcastle, Apache Spark trainer in Newcastle

Course Discounts

Course Discounts Newsletter

We respect the privacy of your email address. We will not pass on or sell your address to others.
You can always change your preferences or unsubscribe completely.

Some of our clients