# Data Visualization Training Courses

## Data Visualization Course Outlines

Code | Name | Duration | Overview |
---|---|---|---|

datavis1 | Data Visualization | 28 hours | This course is intended for engineers and decision makers working in data mining and knoweldge discovery. You will learn how to create effective plots and ways to present and represent your data in a way that will appeal to the decision makers and help them to understand hidden information. Day 1: what is data visualization why it is important data visualization vs data mining human cognition HMI common pitfalls Day 2: different type of curves drill down curves categorical data plotting multi variable plots data glyph and icon representation Day 3: plotting KPIs with data R and X charts examples what if dashboards parallel axes mixing categorical data with numeric data Day 4: different hats of data visualization how can data visualization lie disguised and hidden trends a case study of student data visual queries and region selection |

datavisR1 | Introduction to Data Visualization with R | 28 hours | This course is intended for data engineers, decision makers and data analysts and will lead you to create very effective plots using R studio that appeal to decision makers and help them find out hidden information and take the right decisions Day 1: overview of R programming introduction to data visualization scatter plots and clusters the use of noise and jitters Day 2: other type of 2D and 3D plots histograms heat charts categorical data plotting Day 3: plotting KPIs with data R and X charts examples dashboards parallel axes mixing categorical data with numeric data Day 4: different hats of data visualization disguised and hidden trends case studies saving plots and loading Excel files |

neo4j | Beyond the relational database: neo4j | 21 hours | Audience Database administrators (DBAs) Data analysts Developers System Administrators DevOps engineers Business Analysts CTOs CIOs Format of the course 30% lectures 60% hands-on exercises 10% tests Relational, table-based databases such as Oracle and MySQL have long been the standard for organizing and storing data. However, the growing size and fluidity of data have made it difficult for these traditional systems to efficiently execute highly complex queries on the data. Imagine replacing rows-and-columns-based data storage with object-based data storage, whereby entities (e.g., a person) could be stored as data nodes, then easily queried on the basis of their vast, multi-linear relationship with other nodes. And imagine querying these connections and their associated objects and properties using a compact syntax, up to 20 times lighter than SQL? This is what graph databases, such as neo4j offer. In this hands-on course, we will set up a live project and put into practice the skills to model, manage and access your data. We contrast and compare graph databases with SQL-based databases as well as other NoSQL databases and clarify when and where it makes sense to implement each within your existing infrastructure. Getting started with neo4j neo4j vs relational databases neo4j vs other NoSQL databases Using neo4j to solve real world problems Installing neo4j Data modeling with neo4j Mapping white-board diagrams and mind maps to neo4j Working with nodes Creating, changing and deleting nodes Defining node properties Node relationships Creating and deleting relationships Bi-directional relationships Querying your data with Cypher Querying your data based on relationships MATCH, RETURN, WHERE, REMOVE, MERGE, etc. Setting indexes and constraints Working with the REST API REST operations on nodes REST operations on relationships REST operations on indexes and constraints Accessing the core API for application development Working with NET, Java, Javascript, Python APIs Closing remarks |

kdd | Knowledge Discover in Databases (KDD) | 21 hours | Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing. In this course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes. Audience Data analysts or anyone interested in learning how to interpret data to solve problems Format of the course After a theoretical discussion of KDD, the instructor will present real-life cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations. Introduction KDD vs data mining Establishing the application domain Establishing relevant prior knowledge Understanding the goal of the investigation Creating a target data set Data cleaning and preprocessing Data reduction and projection Choosing the data mining task Choosing the data mining algorithms Interpreting the mined patterns |

druid | Druid: Build a fast, real-time data analysis system | 21 hours | Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo. In this course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment. Audience Application developers Software engineers Technical consultants DevOps professionals Architecture engineers Format of the course Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding Introduction Installing and starting Druid Druid architecture and design Real-time ingestion of event data Sharding and indexing Loading data Querying data Visualizing data Running a distributed cluster Druid + Apache Hive Druid + Apache Kafka Druid + others Troubleshooting Administrative tasks |

nlpwithr | Natural Language Processing (NLP) with R | 21 hours | It is estimated that unstructured data accounts for more than 90 percent of all data, much of it in the form of text. Blog posts, tweets, social media, and other digital publications continuously add to this growing body of data. This course centers around extracting insights and meaning from this data. Utilizing the R Language and Natural Language Processing (NLP) libraries, we combine concepts and techniques from computer science, artificial intelligence, and computational linguistics to algorithmically understand the meaning behind text data. By the end of the class participants will be able to prepare data sets (large and small) from disparate sources, then apply the right algorithms to analyze and report on its significance. Audience Linguists and programmers Format of the course Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding Introduction NLP and R vs Python Installing and configuring R Studio Installing R packages related to Natural Language Processing (NLP). An overview of R’s text manipulation capabilities Getting started with an NLP project in R Reading and importing data files into R Text manipulation with R Document clustering in R Parts of speech tagging in R Sentence parsing in R Working with regular expressions in R Named-entity recognition in R Topic modeling in R Text classification in R Working with very large data sets Visualizing your results Optimization Integrating R with other languages (Java, Python, etc.) Closing remarks |

OpenNN | OpenNN: Implementing neural networks | 14 hours | OpenNN is an open-source class library written in C++ which implements neural networks, for use in machine learning. In this course we go over the principles of neural networks and use OpenNN to implement a sample application. Audience Software developers and programmers wishing to create Deep Learning applications. Format of the course Lecture and discussion coupled with hands-on exercises. Introduction to OpenNN, Machine Learning and Deep Learning Downloading OpenNN Working with Neural Designer Using Neural Designer for descriptive, diagnostic, predictive and prescriptive analytics OpenNN architecture CPU parallelization OpenNN classes Data set, neural network, loss index, training strategy, model selection, testing analysis Vector and matrix templates Building a neural network application Choosing a suitable neural network Formulating the variational problem (loss index) Solving the reduced function optimization problem (training strategy) Working with datasets The data matrix (columns as variables and rows as instances) Learning tasks Function regression Pattern recognition Compiling with QT Creator Integrating, testing and debugging your application The future of neural networks and OpenNN |

BigData_ | A practical introduction to Data Analysis and Big Data | 28 hours | Participants who complete this training will gain a practical, real-world understanding of Big Data and its related technologies, methodologies and tools. Participants will have the opportunity to put this knowledge into practice through hands-on exercises. Group interaction and instructor feedback make up an important component of the class. The course starts with an introduction to elemental concepts of Big Data, then progresses into the programming languages and methodologies used to perform Data Analysis. Finally, we discuss the tools that enable Big Data storage, Distributed Processing, and Scalability. Audience Developers / programmers IT consultants Format of the course Part lecture, part discussion, heavy hands-on practice and implementation, occasional quizing to measure progress. Introduction to Data Analysis and Big Data What makes Big Data "big"? Velocity, Volume, Variety, Veracity (VVVV) Limits to traditional Data Processing Distributed Processing Statistical Analysis Types of Machine Learning Analysis Data Visualization Distributed Processing MapReduce Languages used for Data Analysis R language (crash course) Python (crash course) Approaches to Data Analysis Statistical Analysis Time Series analysis Forecasting with Correlation and Regression models Inferential Statistics (estimating) Descriptive Statistics in Big Data sets (e.g. calculating mean) Machine Learning Supervised vs unsupervised learning Classification and clustering Estimating cost of specific methods Filter Natural Language Processing Processing text Understaing meaning of the text Automatic text generation Sentiment/Topic Analysis Computer Vision Big Data infrastructure Data Storage SQL (relational database) MySQL Postgres Oracle NoSQL Cassandra MongoDB Neo4js Understanding the nuances: hierarchical, object-oriented, document-oriented, graph-oriented, etc. Distributed File Systems HDFS Search Engines ElasticSearch Distributed Processing Spark Machine Learning libraries: MLlib Spark SQL Scalability Public cloud AWS, Google, Aliyun, etc. Private cloud OpenStack, Cloud Foundry, etc. Auto-scalability Choosing right solution for the problem |

deepmclrg | Machine Learning & Deep Learning with Python and R | 14 hours | MACHINE LEARNING 1: Introducing Machine Learning The origins of machine learning Uses and abuses of machine learning Ethical considerations How do machines learn? Abstraction and knowledge representation Generalization Assessing the success of learning Steps to apply machine learning to your data Choosing a machine learning algorithm Thinking about the input data Thinking about types of machine learning algorithms Matching your data to an appropriate algorithm Using R for machine learning Installing and loading R packages Installing an R package Installing a package using the point-and-click interface Loading an R package Summary 2: Managing and Understanding Data R data structures Vectors Factors Lists Data frames Matrixes and arrays Managing data with R Saving and loading R data structures Importing and saving data from CSV files Importing data from SQL databases Exploring and understanding data Exploring the structure of data Exploring numeric variables Measuring the central tendency – mean and median Measuring spread – quartiles and the five-number summary Visualizing numeric variables – boxplots Visualizing numeric variables – histograms Understanding numeric data – uniform and normal distributions Measuring spread – variance and standard deviation Exploring categorical variables Measuring the central tendency – the mode Exploring relationships between variables Visualizing relationships – scatterplots Examining relationships – two-way cross-tabulations Summary 3: Lazy Learning – Classification Using Nearest Neighbors Understanding classification using nearest neighbors The kNN algorithm Calculating distance Choosing an appropriate k Preparing data for use with kNN Why is the kNN algorithm lazy? Diagnosing breast cancer with the kNN algorithm Step 1 – collecting data Step 2 – exploring and preparing the data Transformation – normalizing numeric data Data preparation – creating training and test datasets Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Transformation – z-score standardization Testing alternative values of k Summary 4: Probabilistic Learning – Classification Using Naive Bayes Understanding naive Bayes Basic concepts of Bayesian methods Probability Joint probability Conditional probability with Bayes' theorem The naive Bayes algorithm The naive Bayes classification The Laplace estimator Using numeric features with naive Bayes Example – filtering mobile phone spam with the naive Bayes algorithm Step 1 – collecting data Step 2 – exploring and preparing the data Data preparation – processing text data for analysis Data preparation – creating training and test datasets Visualizing text data – word clouds Data preparation – creating indicator features for frequent words Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Summary 5: Divide and Conquer – Classification Using Decision Trees and Rules Understanding decision trees Divide and conquer The C5.0 decision tree algorithm Choosing the best split Pruning the decision tree Example – identifying risky bank loans using C5.0 decision trees Step 1 – collecting data Step 2 – exploring and preparing the data Data preparation – creating random training and test datasets Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Boosting the accuracy of decision trees Making some mistakes more costly than others Understanding classification rules Separate and conquer The One Rule algorithm The RIPPER algorithm Rules from decision trees Example – identifying poisonous mushrooms with rule learners Step 1 – collecting data Step 2 – exploring and preparing the data Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Summary 6: Forecasting Numeric Data – Regression Methods Understanding regression Simple linear regression Ordinary least squares estimation Correlations Multiple linear regression Example – predicting medical expenses using linear regression Step 1 – collecting data Step 2 – exploring and preparing the data Exploring relationships among features – the correlation matrix Visualizing relationships among features – the scatterplot matrix Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Model specification – adding non-linear relationships Transformation – converting a numeric variable to a binary indicator Model specification – adding interaction effects Putting it all together – an improved regression model Understanding regression trees and model trees Adding regression to trees Example – estimating the quality of wines with regression trees and model trees Step 1 – collecting data Step 2 – exploring and preparing the data Step 3 – training a model on the data Visualizing decision trees Step 4 – evaluating model performance Measuring performance with mean absolute error Step 5 – improving model performance Summary 7: Black Box Methods – Neural Networks and Support Vector Machines Understanding neural networks From biological to artificial neurons Activation functions Network topology The number of layers The direction of information travel The number of nodes in each layer Training neural networks with backpropagation Modeling the strength of concrete with ANNs Step 1 – collecting data Step 2 – exploring and preparing the data Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Understanding Support Vector Machines Classification with hyperplanes Finding the maximum margin The case of linearly separable data The case of non-linearly separable data Using kernels for non-linear spaces Performing OCR with SVMs Step 1 – collecting data Step 2 – exploring and preparing the data Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Summary 8: Finding Patterns – Market Basket Analysis Using Association Rules Understanding association rules The Apriori algorithm for association rule learning Measuring rule interest – support and confidence Building a set of rules with the Apriori principle Example – identifying frequently purchased groceries with association rules Step 1 – collecting data Step 2 – exploring and preparing the data Data preparation – creating a sparse matrix for transaction data Visualizing item support – item frequency plots Visualizing transaction data – plotting the sparse matrix Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Sorting the set of association rules Taking subsets of association rules Saving association rules to a file or data frame Summary 9: Finding Groups of Data – Clustering with k-means Understanding clustering Clustering as a machine learning task The k-means algorithm for clustering Using distance to assign and update clusters Choosing the appropriate number of clusters Finding teen market segments using k-means clustering Step 1 – collecting data Step 2 – exploring and preparing the data Data preparation – dummy coding missing values Data preparation – imputing missing values Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Summary 10: Evaluating Model Performance Measuring performance for classification Working with classification prediction data in R A closer look at confusion matrices Using confusion matrices to measure performance Beyond accuracy – other measures of performance The kappa statistic Sensitivity and specificity Precision and recall The F-measure Visualizing performance tradeoffs ROC curves Estimating future performance The holdout method Cross-validation Bootstrap sampling Summary 11: Improving Model Performance Tuning stock models for better performance Using caret for automated parameter tuning Creating a simple tuned model Customizing the tuning process Improving model performance with meta-learning Understanding ensembles Bagging Boosting Random forests Training random forests Evaluating random forest performance Summary DEEP LEARNING with R 1: Getting Started with Deep Learning What is deep learning? Conceptual overview of neural networks Deep neural networks R packages for deep learning Setting up reproducible results Neural networks The deepnet package The darch package The H2O package Connecting R and H2O Initializing H2O Linking datasets to an H2O cluster Summary 2: Training a Prediction Model Neural networks in R Building a neural network Generating predictions from a neural network The problem of overfitting data – the consequences explained Use case – build and apply a neural network Summary 3: Preventing Overfitting L1 penalty L1 penalty in action L2 penalty L2 penalty in action Weight decay (L2 penalty in neural networks) Ensembles and model averaging Use case – improving out-of-sample model performance using dropout Summary 4: Identifying Anomalous Data Getting started with unsupervised learning How do auto-encoders work? Regularized auto-encoders Penalized auto-encoders Denoising auto-encoders Training an auto-encoder in R Use case – building and applying an auto-encoder model Fine-tuning auto-encoder models Summary 5: Training Deep Prediction Models Getting started with deep feedforward neural networks Common activation functions – rectifiers, hyperbolic tangent, and maxout Picking hyperparameters Training and predicting new data from a deep neural network Use case – training a deep neural network for automatic classification Working with model results Summary 6: Tuning and Optimizing Models Dealing with missing data Solutions for models with low accuracy Grid search Random search Summary DEEP LEARNING WITH PYTHON I Introduction 1 Welcome Deep Learning The Wrong Way Deep Learning With Python Summary II Background 2 Introduction to Theano What is Theano? How to Install Theano Simple Theano Example Extensions and Wrappers for Theano More Theano Resources Summary 3 Introduction to TensorFlow What is TensorFlow? How to Install TensorFlow Your First Examples in TensorFlow Simple TensorFlow Example More Deep Learning Models Summary 4 Introduction to Keras What is Keras? How to Install Keras Theano and TensorFlow Backends for Keras Build Deep Learning Models with Keras Summary 5 Project: Develop Large Models on GPUs Cheaply In the Cloud Project Overview Setup Your AWS Account Launch Your Server Instance Login, Configure and Run Build and Run Models on AWS Close Your EC2 Instance Tips and Tricks for Using Keras on AWS More Resources For Deep Learning on AWS Summary III Multilayer Perceptrons 6 Crash Course In Multilayer Perceptrons Crash Course Overview Multilayer Perceptrons Neurons Networks of Neurons Training Networks Summary 7 Develop Your First Neural Network With Keras Tutorial Overview Pima Indians Onset of Diabetes Dataset Load Data Define Model Compile Model Fit Model Evaluate Model Tie It All Together Summary 8 Evaluate The Performance of Deep Learning Models Empirically Evaluate Network Configurations Data Splitting Manual k-Fold Cross Validation Summary 9 Use Keras Models With Scikit-Learn For General Machine Learning Overview Evaluate Models with Cross Validation Grid Search Deep Learning Model Parameters Summary 10 Project: Multiclass Classification Of Flower Species Iris Flowers Classification Dataset Import Classes and Functions Initialize Random Number Generator Load The Dataset Encode The Output Variable Define The Neural Network Model Evaluate The Model with k-Fold Cross Validation Summary 11 Project: Binary Classification Of Sonar Returns Sonar Object Classification Dataset Baseline Neural Network Model Performance Improve Performance With Data Preparation Tuning Layers and Neurons in The Model Summary 12 Project: Regression Of Boston House Prices Boston House Price Dataset Develop a Baseline Neural Network Model Lift Performance By Standardizing The Dataset Tune The Neural Network Topology Summary IV Advanced Multilayer Perceptrons and Keras 13 Save Your Models For Later With Serialization Tutorial Overview . Save Your Neural Network Model to JSON Save Your Neural Network Model to YAML Summary 14 Keep The Best Models During Training With Checkpointing Checkpointing Neural Network Models Checkpoint Neural Network Model Improvements Checkpoint Best Neural Network Model Only Loading a Saved Neural Network Model Summary 15 Understand Model Behavior During Training By Plotting History Access Model Training History in Keras Visualize Model Training History in Keras Summary 16 Reduce Overfitting With Dropout Regularization Dropout Regularization For Neural Networks Dropout Regularization in Keras Using Dropout on the Visible Layer Using Dropout on Hidden Layers Tips For Using Dropout Summary 17 Lift Performance With Learning Rate Schedules Learning Rate Schedule For Training Models Ionosphere Classification Dataset Time-Based Learning Rate Schedule Drop-Based Learning Rate Schedule Tips for Using Learning Rate Schedules Summary V Convolutional Neural Networks 18 Crash Course In Convolutional Neural Networks The Case for Convolutional Neural Networks Building Blocks of Convolutional Neural Networks Convolutional Layers Pooling Layers Fully Connected Layers Worked Example Convolutional Neural Networks Best Practices Summary 19 Project: Handwritten Digit Recognition Handwritten Digit Recognition Dataset Loading the MNIST dataset in Keras Baseline Model with Multilayer Perceptrons Simple Convolutional Neural Network for MNIST Larger Convolutional Neural Network for MNIST Summary 20 Improve Model Performance With Image Augmentation Keras Image Augmentation API Point of Comparison for Image Augmentation Feature Standardization ZCA Whitening Random Rotations Random Shifts Random Flips Saving Augmented Images to File Tips For Augmenting Image Data with Keras Summary 21 Project Object Recognition in Photographs Photograph Object Recognition Dataset Loading The CIFAR-10 Dataset in Keras Simple CNN for CIFAR-10 Larger CNN for CIFAR-10 Extensions To Improve Model Performance Summary 22 Project: Predict Sentiment From Movie Reviews Movie Review Sentiment Classification Dataset Load the IMDB Dataset With Keras Word Embeddings Simple Multilayer Perceptron Model One-Dimensional Convolutional Neural Network Summary VI Recurrent Neural Networks 23 Crash Course In Recurrent Neural Networks Support For Sequences in Neural Networks Recurrent Neural Networks Long Short-Term Memory Networks Summary 24 Time Series Prediction with Multilayer Perceptrons Problem Description: Time Series Prediction Multilayer Perceptron Regression Multilayer Perceptron Using the Window Method Summary 25 Time Series Prediction with LSTM Recurrent Neural Networks LSTM Network For Regression LSTM For Regression Using the Window Method LSTM For Regression with Time Steps LSTM With Memory Between Batches Stacked LSTMs With Memory Between Batches Summary 26 Project: Sequence Classification of Movie Reviews Simple LSTM for Sequence Classification LSTM For Sequence Classification With Dropout LSTM and CNN For Sequence Classification Summary 27 Understanding Stateful LSTM Recurrent Neural Networks Problem Description: Learn the Alphabet LSTM for Learning One-Char to One-Char Mapping LSTM for a Feature Window to One-Char Mapping LSTM for a Time Step Window to One-Char Mapping LSTM State Maintained Between Samples Within A Batch Stateful LSTM for a One-Char to One-Char Mapping LSTM with Variable Length Input to One-Char Output Summary 28 Project: Text Generation With Alice in Wonderland Problem Description: Text Generation Develop a Small LSTM Recurrent Neural Network Generating Text with an LSTM Network Larger LSTM Recurrent Neural Network Extension Ideas to Improve the Model Summary |

## Upcoming Courses

Course | Course Date | Course Price [Remote / Classroom] |
---|---|---|

Druid: Build a fast, real-time data analysis system - Leicester - St. Georges House | Mon, 2017-05-08 09:30 | £3000 / £3750 |

Beyond the relational database: neo4j - Cambridge | Mon, 2017-05-08 09:30 | £3000 / £3775 |

Introduction to Data Visualization with R - Edinburgh | Mon, 2017-05-08 09:30 | £4000 / £6100 |

Knowledge Discover in Databases (KDD) - Swansea- Princess House | Mon, 2017-05-08 09:30 | £3000 / £3450 |

Natural Language Processing (NLP) with R - Aberdeen - Berry Street | Wed, 2017-05-10 09:30 | £3000 / £3990 |