Apache Spark MLlib Training Course

MLlib is Spark’s machine learning (ML) library. Its goal is to make practical machine learning scalable and easy. It consists of common learning algorithms and utilities, including classification, regression, clustering, collaborative filtering, dimensionality reduction, as well as lower-level optimization primitives and higher-level pipeline APIs.

It divides into two packages:

spark.mllib contains the original API built on top of RDDs.
spark.ml provides higher-level API built on top of DataFrames for constructing ML pipelines.

Audience

This course is directed at engineers and developers seeking to utilize a built in Machine Library for Apache Spark

This course is available as onsite live training in United Kingdom or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

spark.mllib: data types, algorithms, and utilities

Data types
Basic statistics
- summary statistics
- correlations
- stratified sampling
- hypothesis testing
- streaming significance testing
- random data generation
Classification and regression
- linear models (SVMs, logistic regression, linear regression)
- naive Bayes
- decision trees
- ensembles of trees (Random Forests and Gradient-Boosted Trees)
- isotonic regression
Collaborative filtering
- alternating least squares (ALS)
Clustering
- k-means
- Gaussian mixture
- power iteration clustering (PIC)
- latent Dirichlet allocation (LDA)
- bisecting k-means
- streaming k-means
Dimensionality reduction
- singular value decomposition (SVD)
- principal component analysis (PCA)
Feature extraction and transformation
Frequent pattern mining
- FP-growth
- association rules
- PrefixSpan
Evaluation metrics
PMML model export
Optimization (developer)
- stochastic gradient descent
- limited-memory BFGS (L-BFGS)

spark.ml: high-level APIs for ML pipelines

Overview: estimators, transformers and pipelines
Extracting, transforming and selecting features
Classification and regression
Clustering
Advanced topics

Requirements

Knowledge of one of the following:

Java
Scala
Python
SparkR.

35 Hours

Delivery Options

Private Group Training

Our identity is rooted in delivering exactly what our clients need.

Pre-course call with your trainer
Customisation of the learning experience to achieve your goals -

Bespoke outlines
Practical hands-on exercises containing data / scenarios recognisable to the learners

Training scheduled on a date of your choice
Delivered online, onsite/classroom or hybrid by experts sharing real world experience

Private Group Prices RRP from £9500 online delivery, based on a group of 2 delegates, £3000 per additional delegate (excludes any certification / exam costs). We recommend a maximum group size of 12 for most learning events.

Public Training

Please see our public courses

Need help picking the right course?

Testimonials (1)

A lot of practical examples, different ways to approach the same problem, and sometimes not so obvious tricks how to improve the current solution

Apache Spark MLlib Training Course

Course Outline

spark.mllib: data types, algorithms, and utilities

spark.ml: high-level APIs for ML pipelines

Requirements

Delivery Options

Private Group Training

Public Training

Testimonials (1)

Rafal - Nordea

Course - Apache Spark MLlib

Provisional Upcoming Courses (Contact Us For More Information)

Apache Spark MLlib

Apache Spark MLlib

Apache Spark MLlib

Apache Spark MLlib

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Apache Spark MLlib Training Course

Course Outline

spark.mllib: data types, algorithms, and utilities

spark.ml: high-level APIs for ML pipelines

Requirements

Delivery Options

Private Group Training

Public Training

Testimonials (1)

Rafal - Nordea

Course - Apache Spark MLlib

Provisional Upcoming Courses (Contact Us For More Information)

Apache Spark MLlib

Apache Spark MLlib

Apache Spark MLlib

Apache Spark MLlib

Related Courses

Artificial Intelligence - the most applied stuff - Data Analysis + Distributed AI + NLP

Related Categories

Apache Spark MLlib

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites