Data Mining Training in Plymouth

Data Mining Training in Plymouth

Data Mining Courses

Plymouth Drake Circus

Regus Drake Circus
Unit MSU9A, Level 1 1 Charles Street
Plymouth PL1 1EA
United Kingdom
GB
Plymouth Drake Circus
Business travellers now have somewhere to work and meet while on the go – walk-in workspace at Drake Circus shopping centre in Plymouth. Our business lounge...Read more

Client Testimonials

Data and Analytics - from the ground up

real life practical examples

Wioleta (Vicky) Celinska-Drozd - Digital Jersey

A practical introduction to Data Analysis and Big Data

Overall the Content was good.

Sameer Rohadia - Continental AG / Abteilung: CF IT Finance

Data and Analytics - from the ground up

The patience of Kamil.

Laszlo Maros - Digital Jersey

Data Visualization

Learning about all the chart types and what they are used for. Learning the value of decluttering. Learning about the methods to show time data.

Susan Williams - Virginia Department of Education

Beyond the relational database: neo4j

Flexibility to blend in with Autodata related details to get more of a real world scenario as we went on.

Autodata Ltd

Data Protection

the discussion and exchange of ideas

RAYMOND JACKSON PAJARILLO - V.Ships Services Oceana, Inc.

Data Mining & Machine Learning with R

The trainer was so knowledgeable and included areas I was interested in

Mohamed Salama - Edmonton Police Service

Data Visualization

I am a hands-on learner and this was something that he did a lot of.

Lisa Comfort - Virginia Department of Education

Data Visualization

The examples.

peter coleman - Virginia Department of Education

Beyond the relational database: neo4j

The trainer did bring some good insight and ways to approach developing a graph database. He used examples from the slides presented but also drew on his own experience which was good.

Autodata Ltd

A practical introduction to Data Analysis and Big Data

It covered a broad range of information.

Continental AG / Abteilung: CF IT Finance

Data Protection

The interaction and facts gained / learnt.

Monna Liza Mengullo - V.Ships Services Oceana, Inc.

A practical introduction to Data Analysis and Big Data

presentation of technologies

Continental AG / Abteilung: CF IT Finance

Data Visualization

I thought that the information was interesting.

Allison May - Virginia Department of Education

Data and Analytics - from the ground up

Kamil is very knowledgeable and nice person, I have learned from him a lot.

Aleksandra Szubert - Digital Jersey

Data Mining and Analysis

The hands on exercise and the trainer capacity to explain complex topics in simple terms

youssef chamoun - Murex Services S.A.L (Offshore)

Data and Analytics - from the ground up

Detailed and comprehensive instruction given by experienced and clearly knowledgeable expert on the subject.

Justin Roche - Digital Jersey

Beyond the relational database: neo4j

The trainer did bring some good insight and ways to approach developing a graph database. He used examples from the slides presented but also drew on his own experience which was good.

Autodata Ltd

Data Mining and Analysis

The information given was interesting and the best part was towards the end when we were provided with Data from Murex and worked on Data we are familiar with and perform operations to get results.

Jessica Chaar - Murex Services S.A.L (Offshore)

A practical introduction to Data Analysis and Big Data

Willingness to share more

Balaram Chandra Paul - MOL Information Technology Asia Limited

Data Mining with R

very tailored to needs

Yashan Wang - MoneyGram International

Data Mining and Analysis

I like the exercices done

Nour Assaf - Murex Services S.A.L (Offshore)

Data Visualization

Good real world examples, reviews of existing reports

Ronald Parrish - Virginia Department of Education

Data Visualization

The examples.

peter coleman - Virginia Department of Education

Data Visualization

I really appreciated that Jeff utilized data and examples that were applicable to education data. He made it interesting and interactive.

Carol Wells Bazzichi - Virginia Department of Education

Data and Analytics - from the ground up

I enjoyed the Excel sheets provided having the exercises with examples. This meant that if Kamil was held up helping other people, I could crack on with the next parts.

Luke Pontin - Digital Jersey

Data Visualization

Content / Instructor

Craig Roberson - Virginia Department of Education

Data and Analytics - from the ground up

The way the trainer made complex subjects easy to understand.

Adam Drewry - Digital Jersey

Data and Analytics - from the ground up

learning how to use excel properly

Torin Mitchell - Digital Jersey

Beyond the relational database: neo4j

Flexibility to blend in with Autodata related details to get more of a real world scenario as we went on.

Autodata Ltd

Data Visualization

Trainer was enthusiastic.

Diane Lucas - Virginia Department of Education

Data Protection

All

Marjorie Pepito - V.Ships Services Oceana, Inc.

Data and Analytics - from the ground up

First session. Very intensive and quick.

Digital Jersey

Data Mining Course Events - Plymouth

Code Name Venue Duration Course Date PHP Course Price [Remote / Classroom]
TalendDI Talend Open Studio for Data Integration Plymouth Drake Circus 28 hours Tue, 2018-01-30 09:30 £4400 / £5000
druid Druid: Build a fast, real-time data analysis system Plymouth Drake Circus 21 hours Wed, 2018-01-31 09:30 £3300 / £3750
sspsspas Statistics with SPSS Predictive Analytics Software Plymouth Drake Circus 14 hours Thu, 2018-02-01 09:30 £2200 / £2500
mdlmrah Model MapReduce and Apache Hadoop Plymouth Drake Circus 14 hours Thu, 2018-02-08 09:30 £2200 / £2500
datama Data Mining and Analysis Plymouth Drake Circus 28 hours Mon, 2018-02-12 09:30 £5200 / £5800
dmmlr Data Mining & Machine Learning with R Plymouth Drake Circus 14 hours Thu, 2018-02-15 09:30 £2600 / £2900
bigddbsysfun Big Data & Database Systems Fundamentals Plymouth Drake Circus 14 hours Mon, 2018-02-19 09:30 £2200 / £2500
68780 Apache Spark Plymouth Drake Circus 14 hours Mon, 2018-02-19 09:30 £2200 / £2500
rprogda R Programming for Data Analysis Plymouth Drake Circus 14 hours Thu, 2018-02-22 09:30 £2200 / £2500
osqlide Oracle SQL Intermediate - Data Extraction Plymouth Drake Circus 14 hours Thu, 2018-02-22 09:30 £2200 / £2500
BigData_ A practical introduction to Data Analysis and Big Data Plymouth Drake Circus 35 hours Mon, 2018-02-26 09:30 £5500 / £6100
d2dbdpa From Data to Decision with Big Data and Predictive Analytics Plymouth Drake Circus 21 hours Tue, 2018-02-27 09:30 £3900 / £4350
matlab2 MATLAB Fundamentals Plymouth Drake Circus 21 hours Wed, 2018-02-28 09:30 £3300 / £3750
pmml Predictive Models with PMML Plymouth Drake Circus 7 hours Thu, 2018-03-01 09:30 £1100 / £1250
PentahoDI Pentaho Data Integration Fundamentals Plymouth Drake Circus 21 hours Mon, 2018-03-05 09:30 £3300 / £3750
datavault Data Vault: Building a Scalable Data Warehouse Plymouth Drake Circus 28 hours Mon, 2018-03-05 09:30 £4400 / £5000
dataminr Data Mining with R Plymouth Drake Circus 14 hours Tue, 2018-03-06 09:30 £2200 / £2500
ApHadm1 Apache Hadoop: Manipulation and Transformation of Data Performance Plymouth Drake Circus 21 hours Tue, 2018-03-06 09:30 £3300 / £3750
psr Introduction to Recommendation Systems Plymouth Drake Circus 7 hours Tue, 2018-03-06 09:30 £1100 / £1250
neo4j Beyond the relational database: neo4j Plymouth Drake Circus 21 hours Wed, 2018-03-07 09:30 £3300 / £3750
datashrinkgov Data Shrinkage for Government Plymouth Drake Circus 14 hours Thu, 2018-03-08 09:30 £2200 / £2500
matlabfundamentalsfinance MATLAB Fundamentals + MATLAB for Finance Plymouth Drake Circus 35 hours Mon, 2018-03-12 09:30 £5500 / £6250
kdd Knowledge Discover in Databases (KDD) Plymouth Drake Circus 21 hours Tue, 2018-03-13 09:30 £3300 / £3750
datamin Data Mining Plymouth Drake Circus 21 hours Tue, 2018-03-13 09:30 £3900 / £4350
rintrob Introductory R for Biologists Plymouth Drake Circus 28 hours Mon, 2018-03-19 09:30 £4400 / £5000
processmining Process Mining Plymouth Drake Circus 21 hours Wed, 2018-03-21 09:30 £3900 / £4350
sspsspas Statistics with SPSS Predictive Analytics Software Plymouth Drake Circus 14 hours Mon, 2018-03-26 09:30 £2200 / £2500
bdbiga Big Data Business Intelligence for Govt. Agencies Plymouth Drake Circus 35 hours Mon, 2018-03-26 09:30 £5500 / £6250
druid Druid: Build a fast, real-time data analysis system Plymouth Drake Circus 21 hours Wed, 2018-03-28 09:30 £3300 / £3750
matfin MATLAB for Financial Applications Plymouth Drake Circus 21 hours Mon, 2018-04-02 09:30 £3900 / £4350
TalendDI Talend Open Studio for Data Integration Plymouth Drake Circus 28 hours Mon, 2018-04-09 09:30 £4400 / £5000
bigddbsysfun Big Data & Database Systems Fundamentals Plymouth Drake Circus 14 hours Tue, 2018-04-10 09:30 £2200 / £2500
mdlmrah Model MapReduce and Apache Hadoop Plymouth Drake Circus 14 hours Tue, 2018-04-10 09:30 £2200 / £2500
dmmlr Data Mining & Machine Learning with R Plymouth Drake Circus 14 hours Tue, 2018-04-10 09:30 £2600 / £2900
68780 Apache Spark Plymouth Drake Circus 14 hours Tue, 2018-04-10 09:30 £2200 / £2500
datama Data Mining and Analysis Plymouth Drake Circus 28 hours Tue, 2018-04-10 09:30 £5200 / £5800
rprogda R Programming for Data Analysis Plymouth Drake Circus 14 hours Tue, 2018-04-17 09:30 £2200 / £2500
osqlide Oracle SQL Intermediate - Data Extraction Plymouth Drake Circus 14 hours Tue, 2018-04-17 09:30 £2200 / £2500
pmml Predictive Models with PMML Plymouth Drake Circus 7 hours Thu, 2018-04-19 09:30 £1100 / £1250
d2dbdpa From Data to Decision with Big Data and Predictive Analytics Plymouth Drake Circus 21 hours Mon, 2018-04-23 09:30 £3900 / £4350
BigData_ A practical introduction to Data Analysis and Big Data Plymouth Drake Circus 35 hours Mon, 2018-04-23 09:30 £5500 / £6100
psr Introduction to Recommendation Systems Plymouth Drake Circus 7 hours Tue, 2018-04-24 09:30 £1100 / £1250
matlab2 MATLAB Fundamentals Plymouth Drake Circus 21 hours Tue, 2018-04-24 09:30 £3300 / £3750
dataminr Data Mining with R Plymouth Drake Circus 14 hours Wed, 2018-04-25 09:30 £2200 / £2500
PentahoDI Pentaho Data Integration Fundamentals Plymouth Drake Circus 21 hours Mon, 2018-04-30 09:30 £3300 / £3750
ApHadm1 Apache Hadoop: Manipulation and Transformation of Data Performance Plymouth Drake Circus 21 hours Tue, 2018-05-01 09:30 £3300 / £3750
neo4j Beyond the relational database: neo4j Plymouth Drake Circus 21 hours Wed, 2018-05-02 09:30 £3300 / £3750
datashrinkgov Data Shrinkage for Government Plymouth Drake Circus 14 hours Thu, 2018-05-03 09:30 £2200 / £2500
kdd Knowledge Discover in Databases (KDD) Plymouth Drake Circus 21 hours Mon, 2018-05-07 09:30 £3300 / £3750
matlabfundamentalsfinance MATLAB Fundamentals + MATLAB for Finance Plymouth Drake Circus 35 hours Mon, 2018-05-07 09:30 £5500 / £6250
datavault Data Vault: Building a Scalable Data Warehouse Plymouth Drake Circus 28 hours Tue, 2018-05-08 09:30 £4400 / £5000
datamin Data Mining Plymouth Drake Circus 21 hours Tue, 2018-05-08 09:30 £3900 / £4350
processmining Process Mining Plymouth Drake Circus 21 hours Mon, 2018-05-14 09:30 £3900 / £4350
sspsspas Statistics with SPSS Predictive Analytics Software Plymouth Drake Circus 14 hours Tue, 2018-05-15 09:30 £2200 / £2500
rintrob Introductory R for Biologists Plymouth Drake Circus 28 hours Tue, 2018-05-15 09:30 £4400 / £5000
druid Druid: Build a fast, real-time data analysis system Plymouth Drake Circus 21 hours Mon, 2018-05-21 09:30 £3300 / £3750
bdbiga Big Data Business Intelligence for Govt. Agencies Plymouth Drake Circus 35 hours Mon, 2018-05-21 09:30 £5500 / £6250
matfin MATLAB for Financial Applications Plymouth Drake Circus 21 hours Wed, 2018-05-23 09:30 £3900 / £4350
bigddbsysfun Big Data & Database Systems Fundamentals Plymouth Drake Circus 14 hours Wed, 2018-05-30 09:30 £2200 / £2500
mdlmrah Model MapReduce and Apache Hadoop Plymouth Drake Circus 14 hours Wed, 2018-05-30 09:30 £2200 / £2500
dmmlr Data Mining & Machine Learning with R Plymouth Drake Circus 14 hours Wed, 2018-05-30 09:30 £2600 / £2900
68780 Apache Spark Plymouth Drake Circus 14 hours Wed, 2018-05-30 09:30 £2200 / £2500
datama Data Mining and Analysis Plymouth Drake Circus 28 hours Mon, 2018-06-04 09:30 £5200 / £5800
TalendDI Talend Open Studio for Data Integration Plymouth Drake Circus 28 hours Tue, 2018-06-05 09:30 £4400 / £5000
rprogda R Programming for Data Analysis Plymouth Drake Circus 14 hours Wed, 2018-06-06 09:30 £2200 / £2500
osqlide Oracle SQL Intermediate - Data Extraction Plymouth Drake Circus 14 hours Thu, 2018-06-07 09:30 £2200 / £2500
pmml Predictive Models with PMML Plymouth Drake Circus 7 hours Fri, 2018-06-08 09:30 £1100 / £1250
d2dbdpa From Data to Decision with Big Data and Predictive Analytics Plymouth Drake Circus 21 hours Wed, 2018-06-13 09:30 £3900 / £4350
dataminr Data Mining with R Plymouth Drake Circus 14 hours Thu, 2018-06-14 09:30 £2200 / £2500
psr Introduction to Recommendation Systems Plymouth Drake Circus 7 hours Fri, 2018-06-15 09:30 £1100 / £1250
matlab2 MATLAB Fundamentals Plymouth Drake Circus 21 hours Mon, 2018-06-18 09:30 £3300 / £3750
datashrinkgov Data Shrinkage for Government Plymouth Drake Circus 14 hours Mon, 2018-06-25 09:30 £2200 / £2500
BigData_ A practical introduction to Data Analysis and Big Data Plymouth Drake Circus 35 hours Mon, 2018-06-25 09:30 £5500 / £6250
ApHadm1 Apache Hadoop: Manipulation and Transformation of Data Performance Plymouth Drake Circus 21 hours Mon, 2018-06-25 09:30 £3300 / £3750
neo4j Beyond the relational database: neo4j Plymouth Drake Circus 21 hours Tue, 2018-06-26 09:30 £3300 / £3750
kdd Knowledge Discover in Databases (KDD) Plymouth Drake Circus 21 hours Wed, 2018-06-27 09:30 £3300 / £3750
PentahoDI Pentaho Data Integration Fundamentals Plymouth Drake Circus 21 hours Wed, 2018-06-27 09:30 £3300 / £3750
datamin Data Mining Plymouth Drake Circus 21 hours Mon, 2018-07-02 09:30 £3900 / £4350
matlabfundamentalsfinance MATLAB Fundamentals + MATLAB for Finance Plymouth Drake Circus 35 hours Mon, 2018-07-02 09:30 £5500 / £6250
datavault Data Vault: Building a Scalable Data Warehouse Plymouth Drake Circus 28 hours Mon, 2018-07-02 09:30 £4400 / £5000
processmining Process Mining Plymouth Drake Circus 21 hours Wed, 2018-07-04 09:30 £3900 / £4350
sspsspas Statistics with SPSS Predictive Analytics Software Plymouth Drake Circus 14 hours Thu, 2018-07-05 09:30 £2200 / £2500
rintrob Introductory R for Biologists Plymouth Drake Circus 28 hours Mon, 2018-07-09 09:30 £4400 / £5000
druid Druid: Build a fast, real-time data analysis system Plymouth Drake Circus 21 hours Wed, 2018-07-11 09:30 £3300 / £3750
matfin MATLAB for Financial Applications Plymouth Drake Circus 21 hours Mon, 2018-07-16 09:30 £3900 / £4350
bdbiga Big Data Business Intelligence for Govt. Agencies Plymouth Drake Circus 35 hours Mon, 2018-07-16 09:30 £5500 / £6250
bigddbsysfun Big Data & Database Systems Fundamentals Plymouth Drake Circus 14 hours Thu, 2018-07-19 09:30 £2200 / £2500
68780 Apache Spark Plymouth Drake Circus 14 hours Thu, 2018-07-19 09:30 £2200 / £2500
TalendDI Talend Open Studio for Data Integration Plymouth Drake Circus 28 hours Mon, 2018-07-30 09:30 £4400 / £5000
dmmlr Data Mining & Machine Learning with R Plymouth Drake Circus 14 hours Tue, 2018-07-31 09:30 £2600 / £2900
mdlmrah Model MapReduce and Apache Hadoop Plymouth Drake Circus 14 hours Thu, 2018-08-02 09:30 £2200 / £2500
pmml Predictive Models with PMML Plymouth Drake Circus 7 hours Thu, 2018-08-02 09:30 £1100 / £1250

Course Outlines

Code Name Duration Outline
datama Data Mining and Analysis 28 hours

Objective:

Delegates be able to analyse big data sets, extract patterns, choose the right variable impacting the results so that a new model is forecasted with predictive results.

  1. Data preprocessing

    1. Data Cleaning
    2. Data integration and transformation
    3. Data reduction
    4. Discretization and concept hierarchy generation
  2. Statistical inference

    1. Probability distributions, Random variables, Central limit theorem
    2. Sampling
    3. Confidence intervals
    4. Statistical Inference
    5. Hypothesis testing
  3. Multivariate linear regression

    1. Specification
    2. Subset selection
    3. Estimation
    4. Validation
    5. Prediction
  4. Classification methods

    1. Logistic regression
    2. Linear discriminant analysis
    3. K-nearest neighbours
    4. Naive Bayes
    5. Comparison of Classification methods
  5. Neural Networks

    1. Fitting neural networks
    2. Training neural networks issues
  6. Decision trees

    1. Regression trees
    2. Classification trees
    3. Trees Versus Linear Models
  7. Bagging, Random Forests, Boosting

    1. Bagging
    2. Random Forests
    3. Boosting
  8. Support Vector Machines and Flexible disct

    1. Maximal Margin classifier
    2. Support vector classifiers
    3. Support vector machines
    4. 2 and more classes SVM’s
    5. Relationship to logistic regression
  9. Principal Components Analysis

  10. Clustering

    1. K-means clustering
    2. K-medoids clustering
    3. Hierarchical clustering
    4. Density based clustering
  11. Model Assesment and Selection

    1. Bias, Variance and Model complexity
    2. In-sample prediction error
    3. The Bayesian approach
    4. Cross-validation
    5. Bootstrap methods
matlabfundamentalsfinance MATLAB Fundamentals + MATLAB for Finance 35 hours

This course provides a comprehensive introduction to the MATLAB technical computing environment + an introduction to using MATLAB for financial applications. The course is intended for beginning users and those looking for a review. No prior programming experience or knowledge of MATLAB is assumed. Themes of data analysis, visualization, modeling, and programming are explored throughout the course. Topics include:

  • Working with the MATLAB user interface
  • Entering commands and creating variables
  • Analyzing vectors and matrices
  • Visualizing vector and matrix data
  • Working with data files
  • Working with data types
  • Automating commands with scripts
  • Writing programs with logic and flow control
  • Writing functions
  • Using the Financial Toolbox for quantitative analysis

Part 1

A Brief Introduction to MATLAB

Objectives: Offer an overview of what MATLAB is, what it consists of, and what it can do for you

  • An Example: C vs. MATLAB
  • MATLAB Product Overview
  • MATLAB Application Fields
  • What MATLAB can do for you?
  • The Course Outline

Working with the MATLAB User Interface

Objective: Get an introduction to the main features of the MATLAB integrated design environment and its user interfaces. Get an overview of course themes.

  • MATALB Interface
  • Reading data from file
  • Saving and loading variables
  • Plotting data
  • Customizing plots
  • Calculating statistics and best-fit line
  • Exporting graphics for use in other applications

Variables and Expressions

Objective: Enter MATLAB commands, with an emphasis on creating and accessing data in variables.

  • Entering commands
  • Creating variables
  • Getting help
  • Accessing and modifying values in variables
  • Creating character variables

Analysis and Visualization with Vectors

Objective: Perform mathematical and statistical calculations with vectors, and create basic visualizations. See how MATLAB syntax enables calculations on whole data sets with a single command.

  • Calculations with vectors
  • Plotting vectors
  • Basic plot options
  • Annotating plots

Analysis and Visualization with Matrices

Objective: Use matrices as mathematical objects or as collections of (vector) data. Understand the appropriate use of MATLAB syntax to distinguish between these applications.

  • Size and dimensionality
  • Calculations with matrices
  • Statistics with matrix data
  • Plotting multiple columns
  • Reshaping and linear indexing
  • Multidimensional arrays

Part 2

Automating Commands with Scripts

Objective: Collect MATLAB commands into scripts for ease of reproduction and experimentation. As the complexity of your tasks increases, entering long sequences of commands in the Command Window becomes impractical.

  • A Modelling Example
  • The Command History
  • Creating script files
  • Running scripts
  • Comments and Code Cells
  • Publishing scripts

Working with Data Files

Objective: Bring data into MATLAB from formatted files. Because imported data can be of a wide variety of types and formats, emphasis is given to working with cell arrays and date formats.

  • Importing data
  • Mixed data types
  • Cell arrays
  • Conversions amongst numerals, strings, and cells
  • Exporting data

Multiple Vector Plots

Objective: Make more complex vector plots, such as multiple plots, and use color and string manipulation techniques to produce eye-catching visual representations of data.

  • Graphics structure
  • Multiple figures, axes, and plots
  • Plotting equations
  • Using color
  • Customizing plots

Logic and Flow Control

Objective: Use logical operations, variables, and indexing techniques to create flexible code that can make decisions and adapt to different situations. Explore other programming constructs for repeating sections of code, and constructs that allow interaction with the user.

  • Logical operations and variables
  • Logical indexing
  • Programming constructs
  • Flow control
  • Loops

Matrix and Image Visualization

Objective: Visualize images and matrix data in two or three dimensions. Explore the difference in displaying images and visualizing matrix data using images.

  • Scattered Interpolation using vector and matrix data
  • 3-D matrix visualization
  • 2-D matrix visualization
  • Indexed images and colormaps
  • True color images

Part 3

Data Analysis

Objective: Perform typical data analysis tasks in MATLAB, including developing and fitting theoretical models to real-life data. This leads naturally to one of the most powerful features of MATLAB: solving linear systems of equations with a single command.

  • Dealing with missing data
  • Correlation
  • Smoothing
  • Spectral analysis and FFTs
  • Solving linear systems of equations

Writing Functions

Objective: Increase automation by encapsulating modular tasks as user-defined functions. Understand how MATLAB resolves references to files and variables.

  • Why functions?
  • Creating functions
  • Adding comments
  • Calling subfunctions
  • Workspaces 
  • Subfunctions
  • Path and precedence

Data Types

Objective: Explore data types, focusing on the syntax for creating variables and accessing array elements, and discuss methods for converting among data types. Data types differ in the kind of data they may contain and the way the data is organized.

  • MATLAB data types
  • Integers
  • Structures
  • Converting types

File I/O

Objective: Explore the low-level data import and export functions in MATLAB that allow precise control over text and binary file I/O. These functions include textscan, which provides precise control of reading text files.

  • Opening and closing files
  • Reading and writing text files
  • Reading and writing binary files

Note that the actual delivered might be subject to minor discrepancies from the outline above without prior notification.

Part 4

Overview of the MATLAB Financial Toolbox

Objective: Learn to apply the various features included in the MATLAB Financial Toolbox to perform quantitative analysis for the financial industry. Gain the knowledge and practice needed to efficiently develop real-world applications involving financial data.

  • Asset Allocation and Portfolio Optimization
  • Risk Analysis and Investment Performance
  • Fixed-Income Analysis and Option Pricing
  • Financial Time Series Analysis
  • Regression and Estimation with Missing Data
  • Technical Indicators and Financial Charts
  • Monte Carlo Simulation of SDE Models

Asset Allocation and Portfolio Optimization

Objective: perform capital allocation, asset allocation, and risk assessment.

  • Estimating asset return and total return moments from price or return data
  • Computing portfolio-level statistics, such as mean, variance, value at risk (VaR), and conditional value at risk (CVaR)
  • Performing constrained mean-variance portfolio optimization and analysis
  • Examining the time evolution of efficient portfolio allocations
  • Performing capital allocation
  • Accounting for turnover and transaction costs in portfolio optimization problems

Risk Analysis and Investment Performance

Objective: Define and solve portfolio optimization problems.

  • Specifying a portfolio name, the number of assets in an asset universe, and asset identifiers.
  • Defining an initial portfolio allocation.

Fixed-Income Analysis and Option Pricing

Objective: Perform fixed-income analysis and option pricing.

  • Analyzing cash flow
  • Performing SIA-Compliant fixed-income security analysis
  • Performing basic Black-Scholes, Black, and binomial option-pricing

Part 5

Financial Time Series Analysis

Objective: analyze time series data in financial markets.

  • Performing data math
  • Transforming and analyzing data
  • Technical analysis
  • Charting and graphics

Regression and Estimation with Missing Data

Objective: Perform multivariate normal regression with or without missing data.

  • Performing common regressions
  • Estimating log-likelihood function and standard errors for hypothesis testing
  • Completing calculations when data is missing

Technical Indicators and Financial Charts

Objective: Practice using performance metrics and specialized plots.

  • Moving averages
  • Oscillators, stochastics, indexes, and indicators
  • Maximum drawdown and expected maximum drawdown
  • Charts, including Bollinger bands, candlestick plots, and moving averages

Monte Carlo Simulation of SDE Models

Objective: Create simulations and apply SDE models

  • Brownian Motion (BM)
  • Geometric Brownian Motion (GBM)
  • Constant Elasticity of Variance (CEV)
  • Cox-Ingersoll-Ross (CIR)
  • Hull-White/Vasicek (HWV)
  • Heston

Conclusion

Objectives: Summarise what we have learned

  • A summary of the course
  • Other upcoming courses on MATLAB

Note: the actual content delivered might differ from the outline as a result of customer requirements and the time spent on each topic.

rintrob Introductory R for Biologists 28 hours

I. Introduction and preliminaries

1. Overview

  • Making R more friendly, R and available GUIs
  • Rstudio
  • Related software and documentation
  • R and statistics
  • Using R interactively
  • An introductory session
  • Getting help with functions and features
  • R commands, case sensitivity, etc.
  • Recall and correction of previous commands
  • Executing commands from or diverting output to a file
  • Data permanency and removing objects
  • Good programming practice:  Self-contained scripts, good    readability e.g. structured scripts, documentation, markdown
  • installing packages; CRAN and Bioconductor

2. Reading data

  • Txt files  (read.delim)
  • CSV files

3. Simple manipulations; numbers and vectors  + arrays

  • Vectors and assignment
  • Vector arithmetic
  • Generating regular sequences
  • Logical vectors
  • Missing values
  • Character vectors
  • Index vectors; selecting and modifying subsets of a data set
    • Arrays
  • Array indexing. Subsections of an array
  • Index matrices
  • The array() function + simple operations on arrays e.g. multiplication, transposition  
  • Other types of objects

4. Lists and data frames

  • Lists
  • Constructing and modifying lists
    • Concatenating lists
  • Data frames
    • Making data frames
    • Working with data frames
    • Attaching arbitrary lists
    • Managing the search path

5. Data manipulation

  • Selecting, subsetting observations and variables         
  • Filtering, grouping
  • Recoding, transformations
  • Aggregation, combining data sets
  • Forming partitioned matrices, cbind() and rbind()
  • The concatenation function, (), with arrays
  • Character manipulation, stringr package
  • short intro into grep and regexpr

6. More on Reading data                                            

  • XLS, XLSX files
  • readr  and readxl packages
  • SPSS, SAS, Stata,… and other formats data
  • Exporting data to txt, csv and other formats

6. Grouping, loops and conditional execution

  • Grouped expressions
  • Control statements
  • Conditional execution: if statements
  • Repetitive execution: for loops, repeat and while
  • intro into apply, lapply, sapply, tapply

7. Functions

  • Creating functions
  • Optional arguments and default values
  • Variable number of arguments
  • Scope and its consequences

8. Simple graphics in R

  • Creating a Graph
  • Density Plots
  • Dot Plots
  • Bar Plots
  • Line Charts
  • Pie Charts
  • Boxplots
  • Scatter Plots
  • Combining Plots

II. Statistical analysis in R 

1.    Probability distributions

  • R as a set of statistical tables
  • Examining the distribution of a set of data

2.   Testing of Hypotheses

  • Tests about a Population Mean
  • Likelihood Ratio Test
  • One- and two-sample tests
  • Chi-Square Goodness-of-Fit Test
  • Kolmogorov-Smirnov One-Sample Statistic 
  • Wilcoxon Signed-Rank Test
  • Two-Sample Test
  • Wilcoxon Rank Sum Test
  • Mann-Whitney Test
  • Kolmogorov-Smirnov Test

3. Multiple Testing of Hypotheses

  • Type I Error and FDR
  • ROC curves and AUC
  • Multiple Testing Procedures (BH, Bonferroni etc.)

4. Linear regression models

  • Generic functions for extracting model information
  • Updating fitted models
  • Generalized linear models
    • Families
    • The glm() function
  • Classification
    • Logistic Regression
    • Linear Discriminant Analysis
  • Unsupervised learning
    • Principal Components Analysis
    • Clustering Methods(k-means, hierarchical clustering, k-medoids)

5.  Survival analysis (survival package)

  • Survival objects in r
  • Kaplan-Meier estimate, log-rank test, parametric regression
  • Confidence bands
  • Censored (interval censored) data analysis
  • Cox PH models, constant covariates
  • Cox PH models, time-dependent covariates
  • Simulation: Model comparison (Comparing regression models)

 6.   Analysis of Variance

  • One-Way ANOVA
  • Two-Way Classification of ANOVA
  • MANOVA

III. Worked problems in bioinformatics           

  • Short introduction to limma package
  • Microarray data analysis workflow
  • Data download from GEO: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1397
  • Data processing (QC, normalisation, differential expression)
  • Volcano plot             
  • Custering examples + heatmaps
matlabdsandreporting MATLAB Fundamentals, Data Science & Report Generation 126 hours

In the first part of this training, we cover the fundamentals of MATLAB and its function as both a language and a platform.  Included in this discussion is an introduction to MATLAB syntax, arrays and matrices, data visualization, script development, and object-oriented principles.

In the second part, we demonstrate how to use MATLAB for data mining, machine learning and predictive analytics. To provide participants with a clear and practical perspective of MATLAB's approach and power, we draw comparisons between using MATLAB and using other tools such as spreadsheets, C, C++, and Visual Basic.

In the third part of the training, participants learn how to streamline their work by automating their data processing and report generation.

Throughout the course, participants will put into practice the ideas learned through hands-on exercises in a lab environment. By the end of the training, participants will have a thorough grasp of MATLAB's capabilities and will be able to employ it for solving real-world data science problems as well as for streamlining their work through automation.

Assessments will be conducted throughout the course to gauge progress.

Format of the course

  • Course includes theoretical and practical exercises, including case discussions, sample code inspection, and hands-on implementation.

Note

  • Practice sessions will be based on pre-arranged sample data report templates. If you have specific requirements, please contact us to arrange.

Introduction
MATLAB for data science and reporting

 

Part 01: MATLAB fundamentals

Overview
    MATLAB for data analysis, visualization, modeling, and programming.

Working with the MATLAB user interface

Overview of MATLAB syntax

Entering commands
    Using the command line interface

Creating variables
    Numeric vs character data

Analyzing vectors and matrices
    Creating and manipulating
    Performing calculations

Visualizing vector and matrix data

Working with data files
    Importing data from Excel spreadsheets

Working with data types
    Working with table data

Automating commands with scripts
    Creating and running scripts
    Organizing and publishing your scripts

Writing programs with branching and loops
    User interaction and flow control

Writing functions
    Creating and calling functions
    Debugging with MATLAB Editor

Applying object-oriented programming principles to your programs

 

Part 02: MATLAB for data science

Overview
    MATLAB for data mining, machine learning and predictive analytics

Accessing data
    Obtaining data from files, spreadsheets, and databases
    Obtaining data from test equipment and hardware
    Obtaining data from software and the Web

Exploring data
    Identifying trends, testing hypotheses, and estimating uncertainty

Creating customized algorithms

Creating visualizations

Creating models

Publishing customized reports

Sharing analysis tools
    As MATLAB code
    As standalone desktop or Web applications

Using the Statistics and Machine Learning Toolbox

Using the Neural Network Toolbox

 

Part 03: Report generation

Overview
    Presenting results from MATLAB programs, applications, and sample data
    Generating Microsoft Word, PowerPoint®, PDF, and HTML reports.
    Templated reports
    Tailor-made reports
        Using organization’s templates and standards

Creating reports interactively vs programmatically
    Using the Report Explorer
    Using the DOM (Document Object Model) API

Creating reports interactively using Report Explorer
    Report Explorer Examples
        Magic Squares Report Explorer Example

    Creating reports
        Using Report Explorer to create report setup file, define report structure and content

    Formatting reports
        Specifying default report style and format for Report Explorer reports

    Generating reports
        Configuring Report Explorer for processing and running report

    Managing report conversion templates
        Copying and managing Microsoft Word , PDF, and HTML conversion templates for Report Explorer reports

    Customizing Report Conversion templates
        Customizing the style and format of Microsoft Word and HTML conversion templates for Report Explorer reports

    Customizing components and style sheets
        Customizing report components, define layout style sheets

Creating reports programmatically in MATLAB
    Template-Based Report Object (DOM) API Examples
        Functional report
        Object-oriented report
        Programmatic report formatting

    Creating report content
        Using the Document Object Model (DOM) API

    Report format basics
        Specifying format for report content

    Creating form-based reports
        Using the DOM API to fill in the blanks in a report form

    Creating object-oriented reports
        Deriving classes to simplify report creation and maintenance

    Creating and formatting report objects
        Lists, tables, and images

    Creating DOM Reports from HTML
        Appending HTML string or file to a Microsoft® Word, PDF, or HTML report generated by Document Object Model (DOM) API

    Creating report templates
        Creating templates to use with programmatic reports

    Formatting page layouts
        Formatting pages in Microsoft Word and PDF reports


Summary and closing remarks

bigddbsysfun Big Data & Database Systems Fundamentals 14 hours

The course is part of the Data Scientist skill set (Domain: Data and Technology).

Data Warehousing Concepts

  • What is Data Ware House?
  • Difference between OLTP and Data Ware Housing
  • Data Acquisition
  • Data Extraction
  • Data Transformation.
  • Data Loading
  • Data Marts
  • Dependent vs Independent data Mart
  • Data Base design

ETL Testing Concepts:

  • Introduction.
  • Software development life cycle.
  • Testing methodologies.
  • ETL Testing Work Flow Process.
  • ETL Testing Responsibilities in Data stage.      

Big data Fundamentals

  • Big Data and its role in the corporate world
  • The phases of development of a Big Data strategy within a corporation
  • Explain the rationale underlying a holistic approach to Big Data
  • Components needed in a Big Data Platform
  • Big data storage solution
  • Limits of Traditional Technologies
  • Overview of database types

NoSQL Databases

Hadoop

Map Reduce

Apache Spark

TalendDI Talend Open Studio for Data Integration 28 hours

Talend Open Studio for Data Integration is an open-source data integration product used to combine, convert and update data in various locations across a business.

In this instructor-led, live training, participants will learn how to use the Talend ETL tool to carry out data transformation, data extraction, and connectivity with Hadoop, Hive, and Pig.
 
By the end of this training, participants will be able to

  • Explain the concepts behind ETL (Extract, Transform, Load) and propagation
  • Define ETL methods and ETL tools to connect with Hadoop
  • Efficiently amass, retrieve, digest, consume, transform and shape big data in accordance to business requirements

Audience

  • Business intelligence professionals
  • Project managers
  • Database professionals
  • SQL Developers
  • ETL Developers
  • Solution architects
  • Data architects
  • Data warehousing professionals
  • System administrators and integrators

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

To request a customized course outline for this training, please contact us.

 

mdlmrah Model MapReduce and Apache Hadoop 14 hours

The course is intended for IT specialist that works with the distributed processing of large data sets across clusters of computers.

Data Mining and Business Intelligence

  • Introduction
  • Area of application
  • Capabilities
  • Basics of data exploration

Big data

  • What does Big data stand for?
  • Big data and Data mining

MapReduce

  • Model basics
  • Example application
  • Stats
  • Cluster model

Hadoop

  • What is Hadoop
  • Installation
  • Configuration
  • Cluster settings
  • Architecture and configuration of Hadoop Distributed File System
  • Console tools
  • DistCp tool
  • MapReduce and Hadoop
  • Streaming
  • Administration and configuration of Hadoop On Demand
  • Alternatives
rprogda R Programming for Data Analysis 14 hours

This course is part of the Data Scientist skill set (Domain: Data and Technology)

Introduction and preliminaries

  • Making R more friendly, R and available GUIs
  • Rstudio
  • Related software and documentation
  • R and statistics
  • Using R interactively
  • An introductory session
  • Getting help with functions and features
  • R commands, case sensitivity, etc.
  • Recall and correction of previous commands
  • Executing commands from or diverting output to a file
  • Data permanency and removing objects

Simple manipulations; numbers and vectors

  • Vectors and assignment
  • Vector arithmetic
  • Generating regular sequences
  • Logical vectors
  • Missing values
  • Character vectors
  • Index vectors; selecting and modifying subsets of a data set
  • Other types of objects

Objects, their modes and attributes

  • Intrinsic attributes: mode and length
  • Changing the length of an object
  • Getting and setting attributes
  • The class of an object

Arrays and matrices

  • Arrays
  • Array indexing. Subsections of an array
  • Index matrices
  • The array() function
  • The outer product of two arrays
  • Generalized transpose of an array
  • Matrix facilities
    • Matrix multiplication
    • Linear equations and inversion
    • Eigenvalues and eigenvectors
    • Singular value decomposition and determinants
    • Least squares fitting and the QR decomposition
  • Forming partitioned matrices, cbind() and rbind()
  • The concatenation function, (), with arrays
  • Frequency tables from factors

Lists and data frames

  • Lists
  • Constructing and modifying lists
    • Concatenating lists
  • Data frames
    • Making data frames
    • attach() and detach()
    • Working with data frames
    • Attaching arbitrary lists
    • Managing the search path

Data manipulation

  • Selecting, subsetting observations and variables          
  • Filtering, grouping
  • Recoding, transformations
  • Aggregation, combining data sets
  • Character manipulation, stringr package

Reading data

  • Txt files
  • CSV files
  • XLS, XLSX files
  • SPSS, SAS, Stata,… and other formats data
  • Exporting data to txt, csv and other formats
  • Accessing data from databases using SQL language

Probability distributions

  • R as a set of statistical tables
  • Examining the distribution of a set of data
  • One- and two-sample tests

Grouping, loops and conditional execution

  • Grouped expressions
  • Control statements
    • Conditional execution: if statements
    • Repetitive execution: for loops, repeat and while

Writing your own functions

  • Simple examples
  • Defining new binary operators
  • Named arguments and defaults
  • The '...' argument
  • Assignments within functions
  • More advanced examples
    • Efficiency factors in block designs
    • Dropping all names in a printed array
    • Recursive numerical integration
  • Scope
  • Customizing the environment
  • Classes, generic functions and object orientation

Graphical procedures

  • High-level plotting commands
    • The plot() function
    • Displaying multivariate data
    • Display graphics
    • Arguments to high-level plotting functions
  • Basic visualisation graphs
  • Multivariate relations with lattice and ggplot package
  • Using graphics parameters
  • Graphics parameters list

Automated and interactive reporting

  • Combining output from R with text
PentahoDI Pentaho Data Integration Fundamentals 21 hours

Pentaho Data Integration is an open-source data integration tool for defining jobs and data transformations.

In this instructor-led, live training, participants will learn how to use Pentaho Data Integration's powerful ETL capabilities and rich GUI to manage an entire big data lifecycle, maximizing the value of data to the organization.

By the end of this training, participants will be able to:

  • Create, preview, and run basic data transformations containing steps and hops
  • Configure and secure the Pentaho Enterprise Repository
  • Harness disparate sources of data and generate a single, unified version of the truth in an analytics-ready format.
  • Provide results to third-part applications for further processing

Audience

  • Data Analyst
  • ETL developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

To request a customized course outline for this training, please contact us.

sspsspas Statistics with SPSS Predictive Analytics Software 14 hours

Goal:

Learning to work with SPSS at the level of independence

The addressees:

Analysts, researchers, scientists, students and all those who want to acquire the ability to use SPSS package and learn popular data mining techniques.

Using the program

  • The dialog boxes
    • input / downloading data
    • the concept of variable and measuring scales
    • preparing a database
    • Generate tables and graphs
    • formatting of the report
  • Command language syntax
    • automated analysis
    • storage and modification procedures
    • create their own analytical procedures

Data Analysis

  • descriptive statistics
    • Key terms: eg variable, hypothesis, statistical significance
    • measures of central tendency
    • measures of dispersion
    • measures of central tendency
    • standardization
  • Introduction to research the relationships between variables
    • correlational and experimental methods
  • Summary: This case study and discussion
dmmlr Data Mining & Machine Learning with R 14 hours

Introduction to Data mining and Machine Learning

  • Statistical learning vs. Machine learning
  • Iteration and evaluation
  • Bias-Variance trade-off

Regression

  • Linear regression
  • Generalizations and Nonlinearity
  • Exercises

Classification

  • Bayesian refresher
  • Naive Bayes
  • Dicriminant analysis
  • Logistic regression
  • K-Nearest neighbors
  • Support Vector Machines
  • Neural networks
  • Decision trees
  • Exercises

Cross-validation and Resampling

  • Cross-validation approaches
  • Bootstrap
  • Exercises

Unsupervised Learning

  • K-means clustering
  • Examples
  • Challenges of unsupervised learning and beyond K-means

Advanced topics

  • Ensemble models
  • Mixed models
  • Boosting
  • Examples

Multidimensional reduction

  • Factor Analysis
  • Principal Component Analysis
  • Examples
datavault Data Vault: Building a Scalable Data Warehouse 28 hours

Data vault modeling is a database modeling technique that provides long-term historical storage of data that originates from multiple sources. A data vault stores a single version of the facts, or "all the data, all of the time". Its flexible, scalable, consistent and adaptable design encompasses the best aspects of 3rd normal form (3NF) and star schema.

In this instructor-led, live training, participants will learn how to build a Data Vault.

By the end of this training, participants will be able to:

  • Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI.
  • Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse
  • Develop a consistent and repeatable ETL (Extract, Transform, Load) process
  • Build and deploy highly scalable and repeatable warehouses

Audience

  • Data modelers
  • Data warehousing specialist
  • Business Intelligence specialists
  • Data engineers
  • Database administrators

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Introduction
    The shortcomings of existing data warehouse data modeling architectures
    Benefits of Data Vault modeling

Overview of Data Vault architecture and design principles
    SEI / CMM / Compliance

Data Vault applications
    Dynamic Data Warehousing
    Exploration Warehousing
    In-Database Data Mining
    Rapid Linking of External Information

Data Vault components
    Hubs, Links, Satellites

Building a Data Vault

Modeling Hubs, Links and Satellites

Data Vault reference rules

How components interact with each other

Modeling and populating a Data Vault

Converting 3NF OLTP to a Data Vault Enterprise Data Warehouse (EDW)

Understanding load dates, end-dates, and join operations

Business keys, relationships, link tables and join techniques

Query techniques

Load processing and query processing

Overview of Matrix Methodology

Getting data into data entities

Loading Hub Entities

Loading Link Entities

Loading Satellites

Using SEI/CMM Level 5 templates to obtain repeatable, reliable, and quantifiable results

Developing a consistent and repeatable ETL (Extract, Transform, Load) process

Building and deploying highly scalable and repeatable warehouses

Closing remarks
 

bdbiga Big Data Business Intelligence for Govt. Agencies 35 hours

Advances in technologies and the increasing amount of information are transforming how business is conducted in many industries, including government. Government data generation and digital archiving rates are on the rise due to the rapid growth of mobile devices and applications, smart sensors and devices, cloud computing solutions, and citizen-facing portals. As digital information expands and becomes more complex, information management, processing, storage, security, and disposition become more complex as well. New capture, search, discovery, and analysis tools are helping organizations gain insights from their unstructured data. The government market is at a tipping point, realizing that information is a strategic asset, and government needs to protect, leverage, and analyze both structured and unstructured information to better serve and meet mission requirements. As government leaders strive to evolve data-driven organizations to successfully accomplish mission, they are laying the groundwork to correlate dependencies across events, people, processes, and information.

High-value government solutions will be created from a mashup of the most disruptive technologies:

  • Mobile devices and applications
  • Cloud services
  • Social business technologies and networking
  • Big Data and analytics

IDC predicts that by 2020, the IT industry will reach $5 trillion, approximately $1.7 trillion larger than today, and that 80% of the industry's growth will be driven by these 3rd Platform technologies. In the long term, these technologies will be key tools for dealing with the complexity of increased digital information. Big Data is one of the intelligent industry solutions and allows government to make better decisions by taking action based on patterns revealed by analyzing large volumes of data — related and unrelated, structured and unstructured.

But accomplishing these feats takes far more than simply accumulating massive quantities of data.“Making sense of thesevolumes of Big Datarequires cutting-edge tools and technologies that can analyze and extract useful knowledge from vast and diverse streams of information,” Tom Kalil and Fen Zhao of the White House Office of Science and Technology Policy wrote in a post on the OSTP Blog.

The White House took a step toward helping agencies find these technologies when it established the National Big Data Research and Development Initiative in 2012. The initiative included more than $200 million to make the most of the explosion of Big Data and the tools needed to analyze it.

The challenges that Big Data poses are nearly as daunting as its promise is encouraging. Storing data efficiently is one of these challenges. As always, budgets are tight, so agencies must minimize the per-megabyte price of storage and keep the data within easy access so that users can get it when they want it and how they need it. Backing up massive quantities of data heightens the challenge.

Analyzing the data effectively is another major challenge. Many agencies employ commercial tools that enable them to sift through the mountains of data, spotting trends that can help them operate more efficiently. (A recent study by MeriTalk found that federal IT executives think Big Data could help agencies save more than $500 billion while also fulfilling mission objectives.).

Custom-developed Big Data tools also are allowing agencies to address the need to analyze their data. For example, the Oak Ridge National Laboratory’s Computational Data Analytics Group has made its Piranha data analytics system available to other agencies. The system has helped medical researchers find a link that can alert doctors to aortic aneurysms before they strike. It’s also used for more mundane tasks, such as sifting through résumés to connect job candidates with hiring managers.

Each session is 2 hours

Day-1: Session -1: Business Overview of Why Big Data Business Intelligence in Govt.

  • Case Studies from NIH, DoE
  • Big Data adaptation rate in Govt. Agencies & and how they are aligning their future operation around Big Data Predictive Analytics
  • Broad Scale Application Area in DoD, NSA, IRS, USDA etc.
  • Interfacing Big Data with Legacy data
  • Basic understanding of enabling technologies in predictive analytics
  • Data Integration & Dashboard visualization
  • Fraud management
  • Business Rule/ Fraud detection generation
  • Threat detection and profiling
  • Cost benefit analysis for Big Data implementation

Day-1: Session-2 : Introduction of Big Data-1

  • Main characteristics of Big Data-volume, variety, velocity and veracity. MPP architecture for volume.
  • Data Warehouses – static schema, slowly evolving dataset
  • MPP Databases like Greenplum, Exadata, Teradata, Netezza, Vertica etc.
  • Hadoop Based Solutions – no conditions on structure of dataset.
  • Typical pattern : HDFS, MapReduce (crunch), retrieve from HDFS
  • Batch- suited for analytical/non-interactive
  • Volume : CEP streaming data
  • Typical choices – CEP products (e.g. Infostreams, Apama, MarkLogic etc)
  • Less production ready – Storm/S4
  • NoSQL Databases – (columnar and key-value): Best suited as analytical adjunct to data warehouse/database

Day-1 : Session -3 : Introduction to Big Data-2

NoSQL solutions

  • KV Store - Keyspace, Flare, SchemaFree, RAMCloud, Oracle NoSQL Database (OnDB)
  • KV Store - Dynamo, Voldemort, Dynomite, SubRecord, Mo8onDb, DovetailDB
  • KV Store (Hierarchical) - GT.m, Cache
  • KV Store (Ordered) - TokyoTyrant, Lightcloud, NMDB, Luxio, MemcacheDB, Actord
  • KV Cache - Memcached, Repcached, Coherence, Infinispan, EXtremeScale, JBossCache, Velocity, Terracoqua
  • Tuple Store - Gigaspaces, Coord, Apache River
  • Object Database - ZopeDB, DB40, Shoal
  • Document Store - CouchDB, Cloudant, Couchbase, MongoDB, Jackrabbit, XML-Databases, ThruDB, CloudKit, Prsevere, Riak-Basho, Scalaris
  • Wide Columnar Store - BigTable, HBase, Apache Cassandra, Hypertable, KAI, OpenNeptune, Qbase, KDI

Varieties of Data: Introduction to Data Cleaning issue in Big Data

  • RDBMS – static structure/schema, doesn’t promote agile, exploratory environment.
  • NoSQL – semi structured, enough structure to store data without exact schema before storing data
  • Data cleaning issues

Day-1 : Session-4 : Big Data Introduction-3 : Hadoop

  • When to select Hadoop?
  • STRUCTURED - Enterprise data warehouses/databases can store massive data (at a cost) but impose structure (not good for active exploration)
  • SEMI STRUCTURED data – tough to do with traditional solutions (DW/DB)
  • Warehousing data = HUGE effort and static even after implementation
  • For variety & volume of data, crunched on commodity hardware – HADOOP
  • Commodity H/W needed to create a Hadoop Cluster

Introduction to Map Reduce /HDFS

  • MapReduce – distribute computing over multiple servers
  • HDFS – make data available locally for the computing process (with redundancy)
  • Data – can be unstructured/schema-less (unlike RDBMS)
  • Developer responsibility to make sense of data
  • Programming MapReduce = working with Java (pros/cons), manually loading data into HDFS

Day-2: Session-1: Big Data Ecosystem-Building Big Data ETL: universe of Big Data Tools-which one to use and when?

  • Hadoop vs. Other NoSQL solutions
  • For interactive, random access to data
  • Hbase (column oriented database) on top of Hadoop
  • Random access to data but restrictions imposed (max 1 PB)
  • Not good for ad-hoc analytics, good for logging, counting, time-series
  • Sqoop - Import from databases to Hive or HDFS (JDBC/ODBC access)
  • Flume – Stream data (e.g. log data) into HDFS

Day-2: Session-2: Big Data Management System

  • Moving parts, compute nodes start/fail :ZooKeeper - For configuration/coordination/naming services
  • Complex pipeline/workflow: Oozie – manage workflow, dependencies, daisy chain
  • Deploy, configure, cluster management, upgrade etc (sys admin) :Ambari
  • In Cloud : Whirr

Day-2: Session-3: Predictive analytics in Business Intelligence -1: Fundamental Techniques & Machine learning based BI :

  • Introduction to Machine learning
  • Learning classification techniques
  • Bayesian Prediction-preparing training file
  • Support Vector Machine
  • KNN p-Tree Algebra & vertical mining
  • Neural Network
  • Big Data large variable problem -Random forest (RF)
  • Big Data Automation problem – Multi-model ensemble RF
  • Automation through Soft10-M
  • Text analytic tool-Treeminer
  • Agile learning
  • Agent based learning
  • Distributed learning
  • Introduction to Open source Tools for predictive analytics : R, Rapidminer, Mahut

Day-2: Session-4 Predictive analytics eco-system-2: Common predictive analytic problems in Govt.

  • Insight analytic
  • Visualization analytic
  • Structured predictive analytic
  • Unstructured predictive analytic
  • Threat/fraudstar/vendor profiling
  • Recommendation Engine
  • Pattern detection
  • Rule/Scenario discovery –failure, fraud, optimization
  • Root cause discovery
  • Sentiment analysis
  • CRM analytic
  • Network analytic
  • Text Analytics
  • Technology assisted review
  • Fraud analytic
  • Real Time Analytic

Day-3 : Sesion-1 : Real Time and Scalable Analytic Over Hadoop

  • Why common analytic algorithms fail in Hadoop/HDFS
  • Apache Hama- for Bulk Synchronous distributed computing
  • Apache SPARK- for cluster computing for real time analytic
  • CMU Graphics Lab2- Graph based asynchronous approach to distributed computing
  • KNN p-Algebra based approach from Treeminer for reduced hardware cost of operation

Day-3: Session-2: Tools for eDiscovery and Forensics

  • eDiscovery over Big Data vs. Legacy data – a comparison of cost and performance
  • Predictive coding and technology assisted review (TAR)
  • Live demo of a Tar product ( vMiner) to understand how TAR works for faster discovery
  • Faster indexing through HDFS –velocity of data
  • NLP or Natural Language processing –various techniques and open source products
  • eDiscovery in foreign languages-technology for foreign language processing

Day-3 : Session 3: Big Data BI for Cyber Security –Understanding whole 360 degree views of speedy data collection to threat identification

  • Understanding basics of security analytics-attack surface, security misconfiguration, host defenses
  • Network infrastructure/ Large datapipe / Response ETL for real time analytic
  • Prescriptive vs predictive – Fixed rule based vs auto-discovery of threat rules from Meta data

Day-3: Session 4: Big Data in USDA : Application in Agriculture

  • Introduction to IoT ( Internet of Things) for agriculture-sensor based Big Data and control
  • Introduction to Satellite imaging and its application in agriculture
  • Integrating sensor and image data for fertility of soil, cultivation recommendation and forecasting
  • Agriculture insurance and Big Data
  • Crop Loss forecasting

Day-4 : Session-1: Fraud prevention BI from Big Data in Govt-Fraud analytic:

  • Basic classification of Fraud analytics- rule based vs predictive analytics
  • Supervised vs unsupervised Machine learning for Fraud pattern detection
  • Vendor fraud/over charging for projects
  • Medicare and Medicaid fraud- fraud detection techniques for claim processing
  • Travel reimbursement frauds
  • IRS refund frauds
  • Case studies and live demo will be given wherever data is available.

Day-4 : Session-2: Social Media Analytic- Intelligence gathering and analysis

  • Big Data ETL API for extracting social media data
  • Text, image, meta data and video
  • Sentiment analysis from social media feed
  • Contextual and non-contextual filtering of social media feed
  • Social Media Dashboard to integrate diverse social media
  • Automated profiling of social media profile
  • Live demo of each analytic will be given through Treeminer Tool.

Day-4 : Session-3: Big Data Analytic in image processing and video feeds

  • Image Storage techniques in Big Data- Storage solution for data exceeding petabytes
  • LTFS and LTO
  • GPFS-LTFS ( Layered storage solution for Big image data)
  • Fundamental of image analytics
  • Object recognition
  • Image segmentation
  • Motion tracking
  • 3-D image reconstruction

Day-4: Session-4: Big Data applications in NIH:

  • Emerging areas of Bio-informatics
  • Meta-genomics and Big Data mining issues
  • Big Data Predictive analytic for Pharmacogenomics, Metabolomics and Proteomics
  • Big Data in downstream Genomics process
  • Application of Big data predictive analytics in Public health

Big Data Dashboard for quick accessibility of diverse data and display :

  • Integration of existing application platform with Big Data Dashboard
  • Big Data management
  • Case Study of Big Data Dashboard: Tableau and Pentaho
  • Use Big Data app to push location based services in Govt.
  • Tracking system and management

Day-5 : Session-1: How to justify Big Data BI implementation within an organization:

  • Defining ROI for Big Data implementation
  • Case studies for saving Analyst Time for collection and preparation of Data –increase in productivity gain
  • Case studies of revenue gain from saving the licensed database cost
  • Revenue gain from location based services
  • Saving from fraud prevention
  • An integrated spreadsheet approach to calculate approx. expense vs. Revenue gain/savings from Big Data implementation.

Day-5 : Session-2: Step by Step procedure to replace legacy data system to Big Data System:

  • Understanding practical Big Data Migration Roadmap
  • What are the important information needed before architecting a Big Data implementation
  • What are the different ways of calculating volume, velocity, variety and veracity of data
  • How to estimate data growth
  • Case studies

Day-5: Session 4: Review of Big Data Vendors and review of their products. Q/A session:

  • Accenture
  • APTEAN (Formerly CDC Software)
  • Cisco Systems
  • Cloudera
  • Dell
  • EMC
  • GoodData Corporation
  • Guavus
  • Hitachi Data Systems
  • Hortonworks
  • HP
  • IBM
  • Informatica
  • Intel
  • Jaspersoft
  • Microsoft
  • MongoDB (Formerly 10Gen)
  • MU Sigma
  • Netapp
  • Opera Solutions
  • Oracle
  • Pentaho
  • Platfora
  • Qliktech
  • Quantum
  • Rackspace
  • Revolution Analytics
  • Salesforce
  • SAP
  • SAS Institute
  • Sisense
  • Software AG/Terracotta
  • Soft10 Automation
  • Splunk
  • Sqrrl
  • Supermicro
  • Tableau Software
  • Teradata
  • Think Big Analytics
  • Tidemark Systems
  • Treeminer
  • VMware (Part of EMC)
datavis1 Data Visualization 28 hours

This course is intended for engineers and decision makers working in data mining and knoweldge discovery.

You will learn how to create effective plots and ways to present and represent your data in a way that will appeal to the decision makers and help them to understand hidden information.

Day 1:

  • what is data visualization
  • why it is important
  • data visualization vs data mining
  • human cognition
  • HMI
  • common pitfalls

Day 2:

  • different type of curves
  • drill down curves
  • categorical data plotting
  • multi variable plots
  • data glyph and icon representation

Day 3:

  • plotting KPIs with data
  • R and X charts examples
  • what if dashboards
  • parallel axes mixing
  • categorical data with numeric data

Day 4:

  • different hats of data visualization
  • how can data visualization lie
  • disguised and hidden trends
  • a case study of student data
  • visual queries and region selection
datamin Data Mining 21 hours

Course can be provided with any tools, including free open-source data mining software and applications

Introduction

  • Data mining as the analysis step of the KDD process ("Knowledge Discovery in Databases")
  • Subfield of computer science
  • Discovering patterns in large data sets

Sources of methods

  • Artificial intelligence
  • Machine learning
  • Statistics
  • Database systems

What is involved?

  • Database and data management aspects
  • Data pre-processing
  • Model and inference considerations
  • Interestingness metrics
  • Complexity considerations
  • Post-processing of discovered structures
  • Visualization
  • Online updating

Data mining main tasks

  • Automatic or semi-automatic analysis of large quantities of data
  • Extracting previously unknown interesting patterns
    • groups of data records (cluster analysis)
    • unusual records (anomaly detection)
    • dependencies (association rule mining)

Data mining

  • Anomaly detection (Outlier/change/deviation detection)
  • Association rule learning (Dependency modeling)
  • Clustering
  • Classification
  • Regression
  • Summarization

Use and applications

  • Able Danger
  • Behavioral analytics
  • Business analytics
  • Cross Industry Standard Process for Data Mining
  • Customer analytics
  • Data mining in agriculture
  • Data mining in meteorology
  • Educational data mining
  • Human genetic clustering
  • Inference attack
  • Java Data Mining
  • Open-source intelligence
  • Path analysis (computing)
  • Reactive business intelligence

Data dredging, data fishing, data snooping

dsbda Data Science for Big Data Analytics 35 hours

Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.

Introduction to Data Science for Big Data Analytics

  • Data Science Overview
  • Big Data Overview
  • Data Structures
  • Drivers and complexities of Big Data
  • Big Data ecosystem and a new approach to analytics
  • Key technologies in Big Data
  • Data Mining process and problems
    • Association Pattern Mining
    • Data Clustering
    • Outlier Detection
    • Data Classification

Introduction to Data Analytics lifecycle

  • Discovery
  • Data preparation
  • Model planning
  • Model building
  • Presentation/Communication of results
  • Operationalization
  • Exercise: Case study

From this point most of the training time (80%) will be spent on examples and exercises in R and related big data technology.

Getting started with R

  • Installing R and Rstudio
  • Features of R language
  • Objects in R
  • Data in R
  • Data manipulation
  • Big data issues
  • Exercises

Getting started with Hadoop

  • Installing Hadoop
  • Understanding Hadoop modes
  • HDFS
  • MapReduce architecture
  • Hadoop related projects overview
  • Writing programs in Hadoop MapReduce
  • Exercises

Integrating R and Hadoop with RHadoop

  • Components of RHadoop
  • Installing RHadoop and connecting with Hadoop
  • The architecture of RHadoop
  • Hadoop streaming with R
  • Data analytics problem solving with RHadoop
  • Exercises

Pre-processing and preparing data

  • Data preparation steps
  • Feature extraction
  • Data cleaning
  • Data integration and transformation
  • Data reduction – sampling, feature subset selection,
  • Dimensionality reduction
  • Discretization and binning
  • Exercises and Case study

Exploratory data analytic methods in R

  • Descriptive statistics
  • Exploratory data analysis
  • Visualization – preliminary steps
  • Visualizing single variable
  • Examining multiple variables
  • Statistical methods for evaluation
  • Hypothesis testing
  • Exercises and Case study

Data Visualizations

  • Basic visualizations in R
  • Packages for data visualization ggplot2, lattice, plotly, lattice
  • Formatting plots in R
  • Advanced graphs
  • Exercises

Regression (Estimating future values)

  • Linear regression
  • Use cases
  • Model description
  • Diagnostics
  • Problems with linear regression
  • Shrinkage methods, ridge regression, the lasso
  • Generalizations and nonlinearity
  • Regression splines
  • Local polynomial regression
  • Generalized additive models
  • Regression with RHadoop
  • Exercises and Case study

Classification

  • The classification related problems
  • Bayesian refresher
  • Naïve Bayes
  • Logistic regression
  • K-nearest neighbors
  • Decision trees algorithm
  • Neural networks
  • Support vector machines
  • Diagnostics of classifiers
  • Comparison of classification methods
  • Scalable classification algorithms
  • Exercises and Case study

Assessing model performance and selection

  • Bias, Variance and model complexity
  • Accuracy vs Interpretability
  • Evaluating classifiers
  • Measures of model/algorithm performance
  • Hold-out method of validation
  • Cross-validation
  • Tuning machine learning algorithms with caret package
  • Visualizing model performance with Profit ROC and Lift curves

Ensemble Methods

  • Bagging
  • Random Forests
  • Boosting
  • Gradient boosting
  • Exercises and Case study

Support vector machines for classification and regression

  • Maximal Margin classifiers
    • Support vector classifiers
    • Support vector machines
    • SVM’s for classification problems
    • SVM’s for regression problems
  • Exercises and Case study

Identifying unknown groupings within a data set

  • Feature Selection for Clustering
  • Representative based algorithms: k-means, k-medoids
  • Hierarchical algorithms: agglomerative and divisive methods
  • Probabilistic base algorithms: EM
  • Density based algorithms: DBSCAN, DENCLUE
  • Cluster validation
  • Advanced clustering concepts
  • Clustering with RHadoop
  • Exercises and Case study

Discovering connections with Link Analysis

  • Link analysis concepts
  • Metrics for analyzing networks
  • The Pagerank algorithm
  • Hyperlink-Induced Topic Search
  • Link Prediction
  • Exercises and Case study

Association Pattern Mining

  • Frequent Pattern Mining Model
  • Scalability issues in frequent pattern mining
  • Brute Force algorithms
  • Apriori algorithm
  • The FP growth approach
  • Evaluation of Candidate Rules
  • Applications of Association Rules
  • Validation and Testing
  • Diagnostics
  • Association rules with R and Hadoop
  • Exercises and Case study

Constructing recommendation engines

  • Understanding recommender systems
  • Data mining techniques used in recommender systems
  • Recommender systems with recommenderlab package
  • Evaluating the recommender systems
  • Recommendations with RHadoop
  • Exercise: Building recommendation engine

Text analysis

  • Text analysis steps
  • Collecting raw text
  • Bag of words
  • Term Frequency –Inverse Document Frequency
  • Determining Sentiments
  • Exercises and Case study
d2dbdpa From Data to Decision with Big Data and Predictive Analytics 21 hours

Audience

If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you.

It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing.

It is not aimed at people configuring the solution, those people will benefit from the big picture though.

Delivery Mode

During the course delegates will be presented with working examples of mostly open source technologies.

Short lectures will be followed by presentation and simple exercises by the participants

Content and Software used

All software used is updated each time the course is run so we check the newest versions possible.

It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning.

Quick Overview

  • Data Sources
  • Minding Data
  • Recommender systems
  • Target Marketing

Datatypes

  • Structured vs unstructured
  • Static vs streamed
  • Attitudinal, behavioural and demographic data
  • Data-driven vs user-driven analytics
  • data validity
  • Volume, velocity and variety of data

Models

  • Building models
  • Statistical Models
  • Machine learning

Data Classification

  • Clustering
  • kGroups, k-means, nearest neighbours
  • Ant colonies, birds flocking

Predictive Models

  • Decision trees
  • Support vector machine
  • Naive Bayes classification
  • Neural networks
  • Markov Model
  • Regression
  • Ensemble methods

ROI

  • Benefit/Cost ratio
  • Cost of software
  • Cost of development
  • Potential benefits

Building Models

  • Data Preparation (MapReduce)
  • Data cleansing
  • Choosing methods
  • Developing model
  • Testing Model
  • Model evaluation
  • Model deployment and integration

Overview of Open Source and commercial software

  • Selection of R-project package
  • Python libraries
  • Hadoop and Mahout
  • Selected Apache projects related to Big Data and Analytics
  • Selected commercial solution
  • Integration with existing software and data sources
neo4j Beyond the relational database: neo4j 21 hours

Relational, table-based databases such as Oracle and MySQL have long been the standard for organizing and storing data. However, the growing size and fluidity of data have made it difficult for these traditional systems to efficiently execute highly complex queries on the data. Imagine replacing rows-and-columns-based data storage with object-based data storage, whereby entities (e.g., a person) could be stored as data nodes, then easily queried on the basis of their vast, multi-linear relationship with other nodes. And imagine querying these connections and their associated objects and properties using a compact syntax, up to 20 times lighter than SQL. This is what graph databases, such as neo4j offer.

In this hands-on course, we will set up a live project and put into practice the skills to model, manage and access your data. We contrast and compare graph databases with SQL-based databases as well as other NoSQL databases and clarify when and where it makes sense to implement each within your infrastructure.

Audience

  • Database administrators (DBAs)
  • Data analysts
  • Developers
  • System Administrators
  • DevOps engineers
  • Business Analysts
  • CTOs
  • CIOs

Format of the course

  • Heavy emphasis on hands-on practice. Most of the concepts are learned through samples, exercises and hands-on development.

Getting started with neo4j

  • neo4j vs relational databases
  • neo4j vs other NoSQL databases
  • Using neo4j to solve real world problems
  • Installing neo4j

Data modeling with neo4j

  • Mapping white-board diagrams and mind maps to neo4j

Working with nodes

  • Creating, changing and deleting nodes
  • Defining node properties

Node relationships

  • Creating and deleting relationships
  • Bi-directional relationships

Querying your data with Cypher

  • Querying your data based on relationships
  • MATCH, RETURN, WHERE, REMOVE, MERGE, etc.
  • Setting indexes and constraints

Working with the REST API

  • REST operations on nodes
  • REST operations on relationships
  • REST operations on indexes and constraints

Accessing the core API for application development

  • Working with NET, Java, Javascript, and Python APIs

Closing remarks

 

pmml Predictive Models with PMML 7 hours The course is created to scientific, developers, analysts or any other people who want to standardize or exchange their models with Predictive Model Markup Language (PMML) file format.

Predictive Models

  • Intro to predictive models
  • Predictive models supported by PMML

PMML Elements

  • Header
  • Data Dictionary
  • Data Transformations
  • Model
  • Mining Schema
  • Targets
  • Output

API

  • Overview of API providers for PMML
  • Executing your model in a cloud
processmining Process Mining 21 hours

Process mining, or Automated Business Process Discovery (ABPD), is a technique that applies algorithms to event logs for the purpose of analyzing business processes. Process mining goes beyond data storage and data analysis; it bridges data with processes and provides insights into the trends and patterns that affect process efficiency. 

Format of the course
    The course starts with an overview of the most commonly used techniques for process mining. We discuss the various process discovery algorithms and tools used for discovering and modeling processes based on raw event data. Real-life case studies are examined and data sets are analyzed using the ProM open-source framework.

Audience
    Data science professionals
    Anyone interested in understanding and applying process modeling and data mining

Overview
    Discovering, analyzing and re-thinking your processes

Types of process mining
    Discovery, conformance and enhancement

Process mining workflow
    From log data analysis to response and action

Other tools for process mining
    PMLAB, Apromoro
    Commercial offerings

Closing remarks

dataminr Data Mining with R 14 hours

Sources of methods

  • Artificial intelligence
  • Machine learning
  • Statistics
  • Sources of data

Pre processing of data

  • Data Import/Export
  • Data Exploration and Visualization
  • Dimensionality Reduction
  • Dealing with missing values
  • R Packages

Data mining main tasks

  • Automatic or semi-automatic analysis of large quantities of data
  • Extracting previously unknown interesting patterns
    • groups of data records (cluster analysis)
    • unusual records (anomaly detection)
    • dependencies (association rule mining)

Data mining

  • Anomaly detection (Outlier/change/deviation detection)
  • Association rule learning (Dependency modeling)
  • Clustering
  • Classification
  • Regression
  • Summarization
  • Frequent Pattern Mining
  • Text Mining
  • Decision Trees
  • Regression
  • Neural Networks
  • Sequence Mining
  • Frequent Pattern Mining

Data dredging, data fishing, data snooping

kdd Knowledge Discover in Databases (KDD) 21 hours

Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing.

In this course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes.

Audience
    Data analysts or anyone interested in learning how to interpret data to solve problems

Format of the course
    After a theoretical discussion of KDD, the instructor will present real-life cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations.

Introduction
    KDD vs data mining

Establishing the application domain

Establishing relevant prior knowledge

Understanding the goal of the investigation

Creating a target data set

Data cleaning and preprocessing

Data reduction and projection

Choosing the data mining task

Choosing the data mining algorithms

Interpreting the mined patterns

68780 Apache Spark 14 hours

Why Spark?

  • Problems with Traditional Large-Scale Systems
  • Introducing Spark

Spark Basics

  • What is Apache Spark?
  • Using the Spark Shell
  • Resilient Distributed Datasets (RDDs)
  • Functional Programming with Spark

Working with RDDs

  • RDD Operations
  • Key-Value Pair RDDs
  • MapReduce and Pair RDD Operations

The Hadoop Distributed File System

  • Why HDFS?
  • HDFS Architecture
  • Using HDFS

Running Spark on a Cluster

  • Overview
  • A Spark Standalone Cluster
  • The Spark Standalone Web UI

Parallel Programming with Spark

  • RDD Partitions and HDFS Data Locality
  • Working With Partitions
  • Executing Parallel Operations

Caching and Persistence

  • RDD Lineage
  • Caching Overview
  • Distributed Persistence

Writing Spark Applications

  • Spark Applications vs. Spark Shell
  • Creating the SparkContext
  • Configuring Spark Properties
  • Building and Running a Spark Application
  • Logging

Spark, Hadoop, and the Enterprise Data Center

  • Overview
  • Spark and the Hadoop Ecosystem
  • Spark and MapReduce

Spark Streaming

  • Spark Streaming Overview
  • Example: Streaming Word Count
  • Other Streaming Operations
  • Sliding Window Operations
  • Developing Spark Streaming Applications

Common Spark Algorithms

  • Iterative Algorithms
  • Graph Analysis
  • Machine Learning

Improving Spark Performance

  • Shared Variables: Broadcast Variables
  • Shared Variables: Accumulators
  • Common Performance Issues
druid Druid: Build a fast, real-time data analysis system 21 hours

Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo.

In this course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment.

Audience
    Application developers
    Software engineers
    Technical consultants
    DevOps professionals
    Architecture engineers

Format of the course
    Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding

Introduction

Installing and starting Druid

Druid architecture and design

Real-time ingestion of event data

Sharding and indexing

Loading data

Querying data

Visualizing data

Running a distributed cluster

Druid + Apache Hive

Druid + Apache Kafka

Druid + others

Troubleshooting

Administrative tasks

psr Introduction to Recommendation Systems 7 hours

Audience

Marketing department employees, IT strategists and other people involved in decisions related to the design and implementation of recommender systems.

Format

Short theoretical background follow by analysing working examples and short, simple exercises.

Challenges related to data collection

  • Information overload
  • Data types (video, text, structured data, etc...)
  • Potential of the data now and in the near future
  • Basics of Data Mining

Recommendation and searching

  • Searching and Filtering
  • Sorting
  • Determining weights of the search results
  • Using Synonyms
  • Full-text search

Long Tail

  • Chris Anderson idea
  • Drawbacks of Long Tail

Determining Similarities

  • Products
  • Users
  • Documents and web sites

Content-Based Recommendation i measurement of similarities

  • Cosine distance
  • The Euclidean distance vectors
  • TFIDF and frequency of terms

Collaborative filtering

  • Community rating

Graphs

  • Applications of graphs 
  • Determining similarity of graphs
  • Similarity between users

Neural Networks

  • Basic concepts of Neural Networks
  • Training Data and Validation Data
  • Neural Network examples in recommender systems

How to encourage users to share their data

  • Making systems more comfortable
  • Navigation
  • Functionality and UX

Case Studies

  • Popularity of recommender systems and their problems
  • Examples
BigData_ A practical introduction to Data Analysis and Big Data 35 hours

Participants who complete this training will gain a practical, real-world understanding of Big Data and its related technologies, methodologies and tools.

Participants will have the opportunity to put this knowledge into practice through hands-on exercises. Group interaction and instructor feedback make up an important component of the class.

The course starts with an introduction to elemental concepts of Big Data, then progresses into the programming languages and methodologies used to perform Data Analysis. Finally, we discuss the tools and infrastructure that enable Big Data storage, Distributed Processing, and Scalability.

Audience

  • Developers / programmers
  • IT consultants

Format of the course

  • Part lecture, part discussion, hands-on practice and implementation, occasional quizing to measure progress.

Introduction to Data Analysis and Big Data

  • What makes Big Data "big"?
    • Velocity, Volume, Variety, Veracity (VVVV)
  • Limits to traditional Data Processing
  • Distributed Processing
  • Statistical Analysis
  • Types of Machine Learning Analysis
  • Data Visualization

Languages used for Data Analysis

  • R language
    • Why R for Data Analysis?
    • Data manipulation, calculation and graphical display
  • Python
    • Why Python for Data Analysis?
    • Manipulating, processing, cleaning, and crunching data

Approaches to Data Analysis

  • Statistical Analysis
    • Time Series analysis
    • Forecasting with Correlation and Regression models
    • Inferential Statistics (estimating)
    • Descriptive Statistics in Big Data sets (e.g. calculating mean)
  • Machine Learning
    • Supervised vs unsupervised learning
    • Classification and clustering
    • Estimating cost of specific methods
    • Filtering
  • Natural Language Processing
    • Processing text
    • Understaing meaning of the text
    • Automatic text generation
    • Sentiment analysis / Topic analysis
  • Computer Vision
    • Acquiring, processing, analyzing, and understanding images
    • Reconstructing, interpreting and understanding 3D scenes
    • Using image data to make decisions

Big Data infrastructure

  • Data Storage
    • Relational databases (SQL)
      • MySQL
      • Postgres
      • Oracle
    • Non-relational databases (NoSQL)
      • Cassandra
      • MongoDB
      • Neo4js
    • Understanding the nuances
      • Hierarchical databases
      • Object-oriented databases
      • Document-oriented databases
      • Graph-oriented databases
      • Other
  • Distributed Processing
    • Hadoop
      • HDFS as a distributed filesystem
      • MapReduce for distributed processing
    • Spark
      • All-in-one in-memory cluster computing framework for large-scale data processing
      • Structured streaming
      • Spark SQL
      • Machine Learning libraries: MLlib
      • Graph processing with GraphX
  • Scalability
    • Public cloud
      • AWS, Google, Aliyun, etc.
    • Private cloud
      • OpenStack, Cloud Foundry, etc.
    • Auto-scalability
  • Choosing the right solution for the problem
  • The future of Big Data
  • Closing remarks
datashrinkgov Data Shrinkage for Government 14 hours

Why shrink data

Relational databases

  • Introduction
  • Aggregation and disaggregation
  • Normalisation and denormalisation
  • Null values and zeroes
  • Joining data
  • Complex joins

Cluster analysis

  • Applications
  • Strengths and weaknesses
  • Measuring distance
  • Hierarchical clustering
  • K-means and derivatives
  • Applications in Government

Factor analysis

  • Concepts
  • Exploratory factor analysis
  • Confirmatory factor analysis
  • Principal component analysis
  • Correspondence analysis
  • Software
  • Applications in Government

Predictive analytics

  • Timelines and naming conventions
  • Holdout samples
  • Weights of evidence
  • Information value
  • Scorecard building demonstration using a spreadsheet
  • Regression in predictive analytics
  • Logistic regression in predictive analytics
  • Decision Trees in predictive analytics
  • Neural networks
  • Measuring accuracy
  • Applications in Government
DatSci7 Data Science Programme 245 hours

The explosion of information and data in today’s world is un-paralleled, our ability to innovate and push the boundaries of the possible is growing faster than it ever has. The role of Data Scientist is one of the highest in-demand skills across industry today.

We offer much more than learning through theory; we deliver practical, marketable skills that bridge the gap between the world of academia and the demands of industry.

This 7 week curriculum  can be tailored to your specific Industry requirements, please contact us for further information or visit the Nobleprog Institute website www.inobleprog.co.uk

Audience:

This programme is aimed post level graduates as well as anyone with the required pre-requisite skills which will be determined by an assessment and interview. 

Delivery:

Delivery of the course will be a mixture of Instructor Led Classroom and Instructor Led Online; typically the 1st week will be 'classroom led', weeks 2 - 6 'virtual classroom' and week 7  back to 'classroom led'. 

 

 

Week 1 Big Data concepts

  • VVVV (Velocity, Volume, Variety, Veracity) definition
  • Limits to traditional data processing capacity
  • Distributed Processing
  • Statistical Analysis
  • Machine Learning Analysis Types
  • Data Visualization
  • Distributed Processing (e.g. map-reduce)
  • Introduction to used languages
  • R language crash-course
  • Python crash course

Weeks 2&3 Performing Data Analysis

  • Statistical Analysis
  • Descriptive Statistics in Big Data sets (e.g. calculating mean)
  • Inferential Statistics (estimating)
  • Forecasting with Correlation and Regression models
  • Time Series analysis
  • Basics of Machine Learning
  • Supervised vs unsupervised learning
  • Classification and clustering
  • Estimating cost of specific methods
  • Filter

Week 4 Natural Language Processing

  • Processing text
  • Understanding meaning of the text
  • Automatic text generation
  • Sentiment/Topic Analysis
  • Computer Vision

Week 5&6 Tooling concept

  • Data storage solution (SQL, NoSQL, hierarchical, object oriented, document oriented)
  • MySQL, Cassandra, MongoDB, Elasticsearch, HDFS, etc...)
  • Choosing right solution to the problem
  • Distributed Processing
  • Spark
  • Machine Learning with Spark (MLLib)
  • Spark SQL
  • Scalability
  • Public cloud (AWS, Google, etc...)
  • Private cloud (OpenStack, cloud foundry)
  • Autoscalability

Week 7 Soft Skills

  • Advisory & Leadership Skills
  • Making an impact: data-driven story telling
  • Understanding your audience
  • Effective data presentation - getting your message across
  • Influence effectiveness and change leadership
  • Handling difficult situations

Exam

  • End of Programme graduation exam
matlab2 MATLAB Fundamentals 21 hours

This three-day course provides a comprehensive introduction to the MATLAB technical computing environment. The course is intended for beginning users and those looking for a review. No prior programming experience or knowledge of MATLAB is assumed. Themes of data analysis, visualization, modeling, and programming are explored throughout the course. Topics include:

  •     Working with the MATLAB user interface
  •     Entering commands and creating variables
  •     Analyzing vectors and matrices
  •     Visualizing vector and matrix data
  •     Working with data files
  •     Working with data types
  •     Automating commands with scripts
  •     Writing programs with logic and flow control
  •     Writing functions

Part 1

A Brief Introduction to MATLAB

Objectives: Offer an overview of what MATLAB is, what it consists of, and what it can do for you

  • An Example: C vs. MATLAB
  • MATLAB Product Overview
  • MATLAB Application Fields
  • What MATLAB can do for you?
  • The Course Outline

Working with the MATLAB User Interface

Objective: Get an introduction to the main features of the MATLAB integrated design environment and its user interfaces. Get an overview of course themes.

  • MATALB Interface
  • Reading data from file
  • Saving and loading variables
  • Plotting data
  • Customizing plots
  • Calculating statistics and best-fit line
  • Exporting graphics for use in other applications

Va​riables and Expressions

Objective: Enter MATLAB commands, with an emphasis on creating and accessing data in variables.

  • Entering commands
  • Creating variables
  • Getting help
  • Accessing and modifying values in variables
  • Creating character variables

Analysis and Visualization with Vectors

Objective: Perform mathematical and statistical calculations with vectors, and create basic visualizations. See how MATLAB syntax enables calculations on whole data sets with a single command.

  • Calculations with vectors
  • Plotting vectors
  • Basic plot options
  • Annotating plots

Analysis and Visualization with Matrices

Objective: Use matrices as mathematical objects or as collections of (vector) data. Understand the appropriate use of MATLAB syntax to distinguish between these applications.

  • Size and dimensionality
  • Calculations with matrices
  • Statistics with matrix data
  • Plotting multiple columns
  • Reshaping and linear indexing
  • Multidimensional arrays

Part 2

Automating Commands with Scripts

Objective: Collect MATLAB commands into scripts for ease of reproduction and experimentation. As the complexity of your tasks increases, entering long sequences of commands in the Command Window becomes impractical.

  • A Modelling Example
  • The Command History
  • Creating script files
  • Running scripts
  • Comments and Code Cells
  • Publishing scripts

Working with Data Files

Objective: Bring data into MATLAB from formatted files. Because imported data can be of a wide variety of types and formats, emphasis is given to working with cell arrays and date formats.

  • Importing data
  • Mixed data types
  • Cell arrays
  • Conversions amongst numerals, strings, and cells
  • Exporting data

Multiple Vector Plots

Objective: Make more complex vector plots, such as multiple plots, and use color and string manipulation techniques to produce eye-catching visual representations of data.

  • Graphics structure
  • Multiple figures, axes, and plots
  • Plotting equations
  • Using color
  • Customizing plots

Logic and Flow Control

Objective: Use logical operations, variables, and indexing techniques to create flexible code that can make decisions and adapt to different situations. Explore other programming constructs for repeating sections of code, and constructs that allow interaction with the user.

  • Logical operations and variables
  • Logical indexing
  • Programming constructs
  • Flow control
  • Loops

Matrix and Image Visualization

Objective: Visualize images and matrix data in two or three dimensions. Explore the difference in displaying images and visualizing matrix data using images.

  • Scattered Interpolation using vector and matrix data
  • 3-D matrix visualization
  • 2-D matrix visualization
  • Indexed images and colormaps
  • True color images

Part 3

Data Analysis

Objective: Perform typical data analysis tasks in MATLAB, including developing and fitting theoretical models to real-life data. This leads naturally to one of the most powerful features of MATLAB: solving linear systems of equations with a single command.

  • Dealing with missing data
  • Correlation
  • Smoothing
  • Spectral analysis and FFTs
  • Solving linear systems of equations

Writing Functions

Objective: Increase automation by encapsulating modular tasks as user-defined functions. Understand how MATLAB resolves references to files and variables.

  • Why functions?
  • Creating functions
  • Adding comments
  • Calling subfunctions
  • Workspaces 
  • Subfunctions
  • Path and precedence

Data Types

Objective: Explore data types, focusing on the syntax for creating variables and accessing array elements, and discuss methods for converting among data types. Data types differ in the kind of data they may contain and the way the data is organized.

  • MATLAB data types
  • Integers
  • Structures
  • Converting types

File I/O

Objective: Explore the low-level data import and export functions in MATLAB that allow precise control over text and binary file I/O. These functions include textscan, which provides precise control of reading text files.

  • Opening and closing files
  • Reading and writing text files
  • Reading and writing binary files

Note that the actual delivered might be subject to minor discrepancies from the outline above without prior notification.

Conclusion

Note that the actual delivered might be subject to minor discrepancies from the outline above without prior notification.

Objectives: Summarise what we have learnt

  • A summary of the course
  • Other upcoming courses on MATLAB

Note that the course might be subject to few minor discrepancies when being delivered without prior notifications.

ApHadm1 Apache Hadoop: Manipulation and Transformation of Data Performance 21 hours


This course is intended for developers, architects, data scientists or any profile that requires access to data either intensively or on a regular basis.

The major focus of the course is data manipulation and transformation.

Among the tools in the Hadoop ecosystem this course includes the use of Pig and Hive both of which are heavily used for data transformation and manipulation.

This training also addresses performance metrics and performance optimisation.

The course is entirely hands on and is punctuated by presentations of the theoretical aspects.

1.1Hadoop Concepts

1.1.1HDFS

  • The Design of HDFS
  • Command line interface
  • Hadoop File System

1.1.2Clusters

  • Anatomy of a cluster
  • Mater Node / Slave node
  • Name Node / Data Node

1.2Data Manipulation

1.2.1MapReduce detailed

  • Map phase
  • Reduce phase
  • Shuffle

1.2.2Analytics with Map Reduce

  • Group-By with MapReduce
  • Frequency distributions and sorting with MapReduce
  • Plotting results (GNU Plot)
  • Histograms with MapReduce
  • Scatter plots with MapReduce
  • Parsing complex datasets
  • Counting with MapReduce and Combiners
  • Build reports

 

1.2.3Data Cleansing

  • Document Cleaning
  • Fuzzy string search
  • Record linkage / data deduplication
  • Transform and sort event dates
  • Validate source reliability
  • Trim Outliers

1.2.4Extracting and Transforming Data

  • Transforming logs
  • Using Apache Pig to filter
  • Using Apache Pig to sort
  • Using Apache Pig to sessionize

1.2.5Advanced Joins

  • Joining data in the Mapper using MapReduce
  • Joining data using Apache Pig replicated join
  • Joining sorted data using Apache Pig merge join
  • Joining skewed data using Apache Pig skewed join
  • Using a map-side join in Apache Hive
  • Using optimized full outer joins in Apache Hive
  • Joining data using an external key value store

1.3Performance Diagnosis and Optimization Techniques

  • Map
    • Investigating spikes in input data
    • Identifying map-side data skew problems
    • Map task throughput
    • Small files
    • Unsplittable files
  • Reduce
    • Too few or too many reducers
    • Reduce-side data skew problems
    • Reduce tasks throughput
    • Slow shuffle and sort
  • Competing jobs and scheduler throttling
  • Stack dumps & unoptimized code
  • Hardware failures
  • CPU contention
  • Tasks
    • Extracting and visualizing task execution times
    • Profiling your map and reduce tasks
  • Avoid the reducer
  • Filter and project
  • Using the combiner
  • Fast sorting with comparators
  • Collecting skewed data
  • Reduce skew mitigation
osqlide Oracle SQL Intermediate - Data Extraction 14 hours

Limiting results

  • The WHERE clause
  • Comparison operators
  • LIKE Condition
  • Prerequisite BETWEEN ... AND
  • IS NULL condition
  • Condition IN
  • Boolean operators AND, OR and NOT
  • Many of the conditions in the WHERE clause
  • The order of the operators.
  • DISTINCT clause

SQL functions

  • The differences between the functions of one and multilines
  • Features text, numeric, date,
  • Explicit and implicit conversion
  • Conversion functions
  • Nesting functions
  • Viewing the performance of the functions - dual table
  • Getting the current date function SYSDATE
  • Handling of NULL values

Aggregating data using the grouping function

  • Grouping functions
  • How grouping functions treat NULL values
  • Create groups of data - the GROUP BY clause
  • Grouping multiple columns
  • Limiting the function result grouping - the HAVING clause

Subqueries

  • Place subqueries in the SELECT command
  • Subqueries single and multi-lineage
  • Operators Subqueries single-line
  • Features grouping in subquery
  • Operators Subqueries multi-IN, ALL, ANY
  • How NULL values ​​are treated in subqueries

Operators collective

  • UNION operator
  • UNION ALL operator
  • INTERSECT operator
  • MINUS operator

Further Usage Of Joins

  • Revisit Joins
  • Combining Inner and Outer Joins
  • Partitioned Outer Joins
  • Hierarchical Queries

Further Usage Of Sub-Queries

  • Revisit sub-queries
  • Use of sub-queries as virtual tables/inline views and columns
  • Use of the WITH construction
  • Combining sub-queries and joins

Analytics functions

  • OVER clause
  • Partition Clause
  • Windowing Clause
  • Rank, Lead, Lag, First, Last functions

Retrieving data from multiple tables (if time at end)

  • Types of connectors
  • The use NATURAL JOIN
  • Aliases tables
  • Joins in the WHERE clause
  • INNER JOIN Inner join
  • External Merge LEFT, RIGHT, FULL OUTER JOIN
  • Cartesian product

Aggregate Functions (if time at end)

  • Revisit Group By function and Having clause
  • Group and Rollup
  • Group and Cube
danagr Data and Analytics - from the ground up 42 hours

Data analytics is a crucial tool in business today. We will focus throughout on developing skills for practical hands on data analysis. The aim is to help delegates to give evidence-based answers to questions: 

What has happened?

  • processing and analyzing data
  • producing informative data visualizations

What will happen?

  • forecasting future performance
  • evaluating forecasts

What should happen?

  • turning data into evidence-based business decisions
  • optimizing processes

The course itself can be delivered either as a 6 day classroom course or remotely over a period of weeks if preferred. We can work with you to deliver the course to best suit your needs.

Basic Excel

  • Navigation
  • Manipulating data
  • Working with formulas and addresses
  • Charts

Advanced Excel 

  • Logical functions
  • Scenario analysis
  • Solver
  • Macros
  • First glimpse of code: VBA

VBA

  • Data types
  • Writing a function
  • Controlling a program: conditional evaluation and loops
  • Debugging techniques

Data Analytics with R

  • Introducing R
  • Variables and types
  • Data manipulation in R
  • Writing functions
  • Data visualization using ggplot
  • Data wrangling with Dplyr

Introduction to Machine Learning with R

  • Linear Regression
  • Classification and regression trees
  • Classification using Support Vector Machines and Random Forests
  • Clustering techniques
datapro Data Protection 35 hours

This is an Instructor led course, and is the non-certification version of the "CDP - Certificate in Data Protection" course

Those experienced in data protection issues, as well as those new to the subject, need to be trained so that their organisations are confident that legal compliance is continually addressed. It is necessary to identify issues requiring expert data protection advice in good time in order that organisational reputation and credibility are enhanced through relevant data protection policies and procedures.

Objectives:

The aim of the syllabus is to promote an understanding of how the data protection principles work rather than simply focusing on the mechanics of regulation. The syllabus places the Act in the context of human rights and promotes good practice within organisations. On completion you will have:

  • an appreciation of the broader context of the Act. 
  • an understanding of the way in which the Act and the Privacy and Electronic Communications (EC Directive) Regulations 2003 work
  • a broad understanding of the way associated legislation relates to the Act
  • an understanding of what has to be done to achieve compliance

Course Synopsis:

The syllabus comprises three main parts, each sub-sections.

  • Context - this will address the origins of and reasons for the Act together with consideration of privacy in general.
  • Law – Data Protection Act - this will address the main concepts and elements of the Act and subordinate legislation.
  • Application - this will consider how compliance is achieved and how the Act works in practice.

1. Context

The objective is to ensure a basic appreciation of the context of data protection law and in particular that privacy is wider than data protection.

1.1 What is privacy?

1.1.1 The right to private and family life and the relevance of confidentiality.

1.1.1 European Convention on Human Rights and Fundamental Freedoms, UK Human Rights Act

1.2 History of data protection legislation in the UK

1.2.1 OECD Guidelines on the Protection of Privacy and Transborder Flows of Personal
Data 1980

1.2.2 Council of Europe Convention 108, 1981

1.2.3 Data Protection Act 1984

1.2.4 Data Protection Directive 95/46/EC

1.2.5 Telecommunications Directive 97/66/EC, Privacy and Electronic Communications

2. The Law

2.1 Data Protection Act

2.1.1 The definitions
The objective is to ensure that candidates know, and understand the major definitions in the Act and how to apply them in order to identify what information and processing activities are subject to the Act.

2.1.2 The Role of the Commissioner
The objective is to ensure an understanding of the role and main powers of the Information commissioner. The following are to be covered.

2.1.2.1 Enforcement (including roles of the First-tier Tribunal and the Courts)

  • Information and Enforcement Notices
  • Prosecution
  • Warrants (entry/inspection) (Schedule 9,1(1) & 12 only – that is a basic understanding of grounds for issuing and nature of offences)
  • Assessment Notices (s41A-s41C) including effect of s55 (3) added by the Coroners and Justice Act 2009 which provides that the Information Commissioner may not issue a monetary penalty notice in respect of anything found in pursuance of an assessment notice or an assessment under s51 (7).
  • Monetary penalties (s55A-55E) including the effect of the s55 (3A) provision.
  • Undertakings (NB candidates are required to have a basic understanding of how the ICO uses ‘undertakings’ and that they do not derive from any provision in the DPA98. They are not expected to know the detail of their status and provenance).

2.1.2.2 Carrying out s42 assessments

2.1.2.3 Codes of Practice (including s52A-52E Code of Practice on data sharing) and all current ICO issued Codes but not any codes issued by other bodies. Candidates will be expected to have a broad understanding of s52A-E, to appreciate the distinction between a statutory code and other ICO issued codes and have a broad understanding (but not a detailed knowledge) of ICO issued codes.

2.1.3 Notification

  • The exemptions from notification.
  • A basic understanding of the two tier fee regime.

2.1.4 The Data Protection Principles
The objective is to ensure an understanding of how the principles regulate the processing of personal data and how they are enforced, as well as an understanding of the individual principles in the light of guidance on their interpretation found in Part II of Schedule 1. Candidates will be required to show an understanding of the need to interpret and apply the principles in context.

Introduction: how the principles regulate and how they are enforced including Information and Enforcement Notices.

2.1.5 Individual Rights
The objective is to ensure an understanding of the rights conferred by the Act and how they can be applied and enforced.

2.1.6 Exemptions
The objective is to ensure awareness of the fact that there are exemptions from certain provisions of the Act, and knowledge and understanding of some of these and how to apply them in practice. Candidates are not expected to have a detailed knowledge of all the exemptions. The following are expected to be covered in some detail:

2.1.7 Offences
The objective is to ensure an awareness of the fact that there are a range of offences under the Act and of the role of the Courts as well as an appreciation of how certain specified offences apply in practice. It is not intended that candidates should have a detailed knowledge of all the offences.

The candidates will be expected to cover:

  • Unlawful obtaining and disclosure of personal data
  • Unlawful selling of personal data
  • Processing without notification
  • Failure to notify changes in processing
  • Failure to comply with an Enforcement Notice, an Information Notice or Special Information Notice.
  • Warrant offences (Schedule 9,12)

2.2 Privacy and Electronic Communications (EC Directive) Regulations 2003
The objective is to ensure an awareness of the relationship between the above Regulations and the Act, an awareness of the broad scope of the Regulations and a detailed understanding of the practical application of the main provisions relating to unsolicited marketing.

2.3 Associated legislation
The objective is to ensure a basic awareness of some other legislation which is relevant and an appreciation that data protection legislation must be considered in the context of other law.

3. Application

The objective is to ensure an understanding of the practical application of the Act in a range of circumstances. This will include detailed analysis of sometimes complex scenarios, and deciding how the Act applies in particular circumstances and explaining and justifying a decision taken or advice given.

3.1 How to comply with the Act

3.2 Addressing scenarios in specific areas

3.3 Data processing topics

  • Monitoring – internet, email, telephone calls and CCTV
  • Use of the internet (including Electronic Commerce)
  • Data matching
  • Disclosure and Data sharing
matfin MATLAB for Financial Applications 21 hours

MATLAB is a numerical computing environment and programming language developed by MathWorks.

Part I – Matlab Fundamentals

Matlab Basics

  • Matlab User interface
  • Variables and Assignments Statements
  • Basic data objects: Vector, Matrix, Table
  • Basic data manipulation
  • Character and Strings objects
  • Relational expressions
  • Built-in numerical functions
  • Data Import/Export
  • Visualizing data, Graphics options, Annotations, customizing graphics

Matlab Programming

  • Automating commands with scripts
  • Logic and flow control - if, if-else, switch, nested ifs
  • Loop statements and vectorized code
  • Writing functions

Working with Financial Data

  • Data objects – Cell arrays, Structures, Tables, Time series
  • Working with dates and times
  • Conversion amongst different data types, data operations
  • Modifying tables, table operations
  • Data filtering, Indexing, Logical indexing, Categories
  • Data preparation:
    1. Dealing with Missing data
    2. Cleaning data, Unusual observations
    3. Data Transformations
  • Statistical functions

Part II – Financial Applications

Overview of Matlab toolboxes relevant to Financial Analysis

  • Financial Toolbox
  • Financial Instruments Toolbox
  • Trading Toolbox
  • Risk Management Toolbox
  • Econometrics Toolbox
  • Optimization Toolbox
  • Statistics Toolbox

Financial modelling basics

  • Random variables, probability distributions, random processes
  • Distribution fitting
  • Linear regression
  • Simulation modelling – Monte Carlo Simulation
  • Optimization modelling
  • Optimization under uncertainty

Regression and volatility

  • Linear regression
  • Spurious regression
  • Nonstationarity
  • Cointegration
  • Conditional volatility models ARCH, GARCH

Portfolio theory and asset allocation

  • Dividend discount model
  • Modern portfolio theory

Asset pricing models

  • CAPM

Market risk management

  • VAR by the historical simulation
  • VAR by Monte Carlo simulation
  • VAR and PCA

Optimization methods

  • Convex optimization
  • Linear Programming
  • Dynamic Programming
  • Non-convex optimization

Other regions

Data Mining training courses in Plymouth, Weekend Data Mining courses in Plymouth, Evening Data Mining training in Plymouth, Data Mining instructor-led in Plymouth , Weekend Data Mining training in Plymouth, Data Mining instructor in Plymouth, Data Mining boot camp in Plymouth, Data Mining instructor-led in Plymouth, Data Mining coaching in Plymouth, Data Mining on-site in Plymouth, Data Mining private courses in Plymouth,Data Mining classes in Plymouth, Evening Data Mining courses in Plymouth, Data Mining one on one training in Plymouth

Course Discounts

Course Discounts Newsletter

We respect the privacy of your email address. We will not pass on or sell your address to others.
You can always change your preferences or unsubscribe completely.

Some of our clients