Hadoop for Developers (4 days)

Course Code

hadoopdev

Duration

28 hours (usually 4 days including breaks)

Requirements

  • comfortable with Java programming language (most programming exercises are in java)
  • comfortable in Linux environment (be able to navigate Linux command line, edit files using vi / nano)

Lab environment

Zero Install : There is no need to install hadoop software on students’ machines! A working hadoop cluster will be provided for students.

Students will need the following

  • a SSH client (Linux and Mac already have ssh clients, for Windows Putty is recommended)
  • a browser to access the cluster. We recommend Firefox browser

Overview

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop ecosystem.

 

Course Outline

Section 1: Introduction to Hadoop

  • hadoop history, concepts
  • eco system
  • distributions
  • high level architecture
  • hadoop myths
  • hadoop challenges
  • hardware / software
  • lab : first look at Hadoop

Section 2: HDFS

  • Design and architecture
  • concepts (horizontal scaling, replication, data locality, rack awareness)
  • Daemons : Namenode, Secondary namenode, Data node
  • communications / heart-beats
  • data integrity
  • read / write path
  • Namenode High Availability (HA), Federation
  • labs : Interacting with HDFS

Section 3 : Map Reduce

  • concepts and architecture
  • daemons (MRV1) : jobtracker / tasktracker
  • phases : driver, mapper, shuffle/sort, reducer
  • Map Reduce Version 1 and Version 2 (YARN)
  • Internals of Map Reduce
  • Introduction to Java Map Reduce program
  • labs : Running a sample MapReduce program

Section 4 : Pig

  • pig vs java map reduce
  • pig job flow
  • pig latin language
  • ETL with Pig
  • Transformations & Joins
  • User defined functions (UDF)
  • labs : writing Pig scripts to analyze data

Section 5: Hive

  • architecture and design
  • data types
  • SQL support in Hive
  • Creating Hive tables and querying
  • partitions
  • joins
  • text processing
  • labs : various labs on processing data with Hive

Section 6: HBase

  • concepts and architecture
  • hbase vs RDBMS vs cassandra
  • HBase Java API
  • Time series data on HBase
  • schema design
  • labs : Interacting with HBase using shell;   programming in HBase Java API ; Schema design exercise

Bookings, Prices and Enquiries

Guaranteed to run even with a single delegate!

Private Classroom

From £5000

Private Remote

From £4400 (96)

Public Classroom

Cannot find a suitable date? Choose Your Course Date >>Too expensive? Suggest your price

Course Discounts

Course Venue Course Date Course Price [Remote / Classroom]
Javascript And Ajax St Helier, Jersey, Channel Isles Mon, 2018-07-02 09:30 £4950 / £7325
PostgreSQL for Administrators Swansea- Princess House Mon, 2018-07-02 09:30 £2178 / £2478
OCUP2 UML 2.5 Certification - Advanced Exam Preparation St Helier, Jersey, Channel Isles Mon, 2018-07-23 09:30 £1980 / £2930
Introduction to R Glasgow Wed, 2018-08-01 09:30 £3861 / £4911
Subversion for Users Newcastle Fri, 2018-08-03 09:30 £1089 / £1289
OCUP2 UML 2.5 Certification - Intermediate Exam Preparation St Helier, Jersey, Channel Isles Tue, 2018-08-07 09:30 £2340 / £3290
jQuery Swansea- Princess House Wed, 2018-08-15 09:30 £1980 / £2280
AWS: A Hands-on Introduction to Cloud Computing Edinburgh Training and Conference Venue Tue, 2018-09-11 09:30 £1287 / £1487
Test Automation with Selenium St Helier, Jersey, Channel Isles Tue, 2018-09-18 09:30 £2970 / £4395

Course Discounts Newsletter

We respect the privacy of your email address. We will not pass on or sell your address to others.
You can always change your preferences or unsubscribe completely.