R Training Courses

R Training

R Programming Language, R Software Environment for statistical computing and graphics courses

Testi...Client Testimonials

Predictive Modelling with R

He was very informative and helpful.

Pratheep Ravy - UPC Schweiz GmbH

Data Mining & Machine Learning with R

The trainer was so knowledgeable and included areas I was interested in

Mohamed Salama - Edmonton Police Service

Data Mining with R

very tailored to needs

Yashan Wang - MoneyGram International

Introduction to R

Working with 1:1 with Gunnar.

Bryant Ives - EY

Neural Network in R

new insights in deep machine learning

Josip Arneric - Faculty of Economics and Business Zagreb

Neural Network in R

We gained some knowledge about NN in general, and what was the most interesting for me were the new types of NN that are popular nowadays.

Tea Poklepovic - Faculty of Economics and Business Zagreb

Neural Network in R

Graphs in R :)))

- Faculty of Economics and Business Zagreb

Advanced R

The flexible and friendly style. Learning exactly what was useful and relevant for me

Jenny Tickner - Nestlé

A practical introduction to Data Analysis and Big Data

Willingness to share more

Balaram Chandra Paul - MOL Information Technology Asia Limited

Data and Analytics - from the ground up

I enjoyed the Excel sheets provided having the exercises with examples. This meant that if Kamil was held up helping other people, I could crack on with the next parts.

Luke Pontin - Digital Jersey

Data and Analytics - from the ground up

learning how to use excel properly

Torin Mitchell - Digital Jersey

Data and Analytics - from the ground up

The way the trainer made complex subjects easy to understand.

Adam Drewry - Digital Jersey

Data and Analytics - from the ground up

Detailed and comprehensive instruction given by experienced and clearly knowledgeable expert on the subject.

Justin Roche - Digital Jersey

Data and Analytics - from the ground up

Kamil is very knowledgeable and nice person, I have learned from him a lot.

Aleksandra Szubert - Digital Jersey

R Course Outlines

Code Name Duration Overview
dataar Data Analytics With R 21 hours R is a very popular, open source environment for statistical computing, data analytics and graphics. This course introduces R programming language to students.  It covers language fundamentals, libraries and advanced concepts.  Advanced data analytics and graphing with real world data. Audience Developers / data analytics Duration 3 days Format Lectures and Hands-on Day One: Language Basics Course Introduction About Data Science Data Science Definition Process of Doing Data Science. Introducing R Language Variables and Types Control Structures (Loops / Conditionals) R Scalars, Vectors, and Matrices Defining R Vectors Matricies String and Text Manipulation Character data type File IO Lists Functions Introducing Functions Closures lapply/sapply functions DataFrames Labs for all sections Day Two: Intermediate R Programming DataFrames and File I/O Reading data from files Data Preparation Built-in Datasets Visualization Graphics Package plot() / barplot() / hist() / boxplot() / scatter plot Heat Map ggplot2 package ( qplot(), ggplot()) Exploration With Dplyr Labs for all sections Day Three: Advanced Programming With R Statistical Modeling With R Statistical Functions Dealing With NA Distributions (Binomial, Poisson, Normal) Regression Introducing Linear Regressions Recommendations Text Processing (tm package / Wordclouds) Clustering Introduction to Clustering KMeans Classification Introduction to Classification Naive Bayes Decision Trees Training using caret package Evaluating Algorithms R and Big Data Connecting R to databases Big Data Ecosystem Labs for all sections
danagr Data and Analytics - from the ground up 42 hours Data analytics is a crucial tool in business today. We will focus throughout on developing skills for practical hands on data analysis. The aim is to help delegates to give evidence-based answers to questions:  What has happened? processing and analyzing data producing informative data visualizations What will happen? forecasting future performance evaluating forecasts What should happen? turning data into evidence-based business decisions optimizing processes The course itself can be delivered either as a 6 day classroom course or remotely over a period of weeks if preferred. We can work with you to deliver the course to best suit your needs. Basic Excel Navigation Manipulating data Working with formulas and addresses Charts Advanced Excel  Logical functions Scenario analysis Solver Macros First glimpse of code: VBA VBA Data types Writing a function Controlling a program: conditional evaluation and loops Debugging techniques Data Analytics with R Introducing R Variables and types Data manipulation in R Writing functions Data visualization using ggplot Data wrangling with Dplyr Introduction to Machine Learning with R Linear Regression Classification and regression trees Classification using Support Vector Machines and Random Forests Clustering techniques
bigddbsysfun Big Data & Database Systems Fundamentals 14 hours The course is part of the Data Scientist skill set (Domain: Data and Technology). Data Warehousing Concepts What is Data Ware House? Difference between OLTP and Data Ware Housing Data Acquisition Data Extraction Data Transformation. Data Loading Data Marts Dependent vs Independent data Mart Data Base design ETL Testing Concepts: Introduction. Software development life cycle. Testing methodologies. ETL Testing Work Flow Process. ETL Testing Responsibilities in Data stage.       Big data Fundamentals Big Data and its role in the corporate world The phases of development of a Big Data strategy within a corporation Explain the rationale underlying a holistic approach to Big Data Components needed in a Big Data Platform Big data storage solution Limits of Traditional Technologies Overview of database types NoSQL Databases Hadoop Map Reduce Apache Spark
rprogda R Programming for Data Analysis 14 hours This course is part of the Data Scientist skill set (Domain: Data and Technology) Introduction and preliminaries Making R more friendly, R and available GUIs Rstudio Related software and documentation R and statistics Using R interactively An introductory session Getting help with functions and features R commands, case sensitivity, etc. Recall and correction of previous commands Executing commands from or diverting output to a file Data permanency and removing objects Simple manipulations; numbers and vectors Vectors and assignment Vector arithmetic Generating regular sequences Logical vectors Missing values Character vectors Index vectors; selecting and modifying subsets of a data set Other types of objects Objects, their modes and attributes Intrinsic attributes: mode and length Changing the length of an object Getting and setting attributes The class of an object Arrays and matrices Arrays Array indexing. Subsections of an array Index matrices The array() function The outer product of two arrays Generalized transpose of an array Matrix facilities Matrix multiplication Linear equations and inversion Eigenvalues and eigenvectors Singular value decomposition and determinants Least squares fitting and the QR decomposition Forming partitioned matrices, cbind() and rbind() The concatenation function, (), with arrays Frequency tables from factors Lists and data frames Lists Constructing and modifying lists Concatenating lists Data frames Making data frames attach() and detach() Working with data frames Attaching arbitrary lists Managing the search path Data manipulation Selecting, subsetting observations and variables           Filtering, grouping Recoding, transformations Aggregation, combining data sets Character manipulation, stringr package Reading data Txt files CSV files XLS, XLSX files SPSS, SAS, Stata,… and other formats data Exporting data to txt, csv and other formats Accessing data from databases using SQL language Probability distributions R as a set of statistical tables Examining the distribution of a set of data One- and two-sample tests Grouping, loops and conditional execution Grouped expressions Control statements Conditional execution: if statements Repetitive execution: for loops, repeat and while Writing your own functions Simple examples Defining new binary operators Named arguments and defaults The '...' argument Assignments within functions More advanced examples Efficiency factors in block designs Dropping all names in a printed array Recursive numerical integration Scope Customizing the environment Classes, generic functions and object orientation Graphical procedures High-level plotting commands The plot() function Displaying multivariate data Display graphics Arguments to high-level plotting functions Basic visualisation graphs Multivariate relations with lattice and ggplot package Using graphics parameters Graphics parameters list Automated and interactive reporting Combining output from R with text
dmmlr Data Mining & Machine Learning with R 14 hours Introduction to Data mining and Machine Learning Statistical learning vs. Machine learning Iteration and evaluation Bias-Variance trade-off Regression Linear regression Generalizations and Nonlinearity Exercises Classification Bayesian refresher Naive Bayes Dicriminant analysis Logistic regression K-Nearest neighbors Support Vector Machines Neural networks Decision trees Exercises Cross-validation and Resampling Cross-validation approaches Bootstrap Exercises Unsupervised Learning K-means clustering Examples Challenges of unsupervised learning and beyond K-means Advanced topics Ensemble models Mixed models Boosting Examples Multidimensional reduction Factor Analysis Principal Component Analysis Examples
predmodr Predictive Modelling with R 14 hours Problems facing forecasters Customer demand planning Investor uncertainty Economic planning Seasonal changes in demand/utilization Roles of risk and uncertainty Time series Forecasting Seasonal adjustment Moving average Exponential smoothing Extrapolation Linear prediction Trend estimation Stationarity and ARIMA modelling Econometric methods (casual methods) Regression analysis Multiple linear regression Multiple non-linear regression Regression validation Forecasting from regression Judgemental methods Surveys Delphi method Scenario building Technology forecasting Forecast by analogy Simulation and other methods Simulation Prediction market Probabilistic forecasting and Ensemble forecasting
intror Introduction to R with Time Series Analysis 21 hours Introduction and preliminaries Making R more friendly, R and available GUIs Rstudio Related software and documentation R and statistics Using R interactively An introductory session Getting help with functions and features R commands, case sensitivity, etc. Recall and correction of previous commands Executing commands from or diverting output to a file Data permanency and removing objects Simple manipulations; numbers and vectors Vectors and assignment Vector arithmetic Generating regular sequences Logical vectors Missing values Character vectors Index vectors; selecting and modifying subsets of a data set Other types of objects Objects, their modes and attributes Intrinsic attributes: mode and length Changing the length of an object Getting and setting attributes The class of an object Arrays and matrices Arrays Array indexing. Subsections of an array Index matrices The array() function The outer product of two arrays Generalized transpose of an array Matrix facilities Matrix multiplication Linear equations and inversion Eigenvalues and eigenvectors Singular value decomposition and determinants Least squares fitting and the QR decomposition Forming partitioned matrices, cbind() and rbind() The concatenation function, (), with arrays Frequency tables from factors Lists and data frames Lists Constructing and modifying lists Concatenating lists Data frames Making data frames attach() and detach() Working with data frames Attaching arbitrary lists Managing the search path Data manipulation Selecting, subsetting observations and variables           Filtering, grouping Recoding, transformations Aggregation, combining data sets Character manipulation, stringr package Reading data Txt files CSV files XLS, XLSX files SPSS, SAS, Stata,… and other formats data Exporting data to txt, csv and other formats Accessing data from databases using SQL language Probability distributions R as a set of statistical tables Examining the distribution of a set of data One- and two-sample tests Grouping, loops and conditional execution Grouped expressions Control statements Conditional execution: if statements Repetitive execution: for loops, repeat and while Writing your own functions Simple examples Defining new binary operators Named arguments and defaults The '...' argument Assignments within functions More advanced examples Efficiency factors in block designs Dropping all names in a printed array Recursive numerical integration Scope Customizing the environment Classes, generic functions and object orientation Graphical procedures High-level plotting commands The plot() function Displaying multivariate data Display graphics Arguments to high-level plotting functions Basic visualisation graphs Multivariate relations with lattice and ggplot package Using graphics parameters Graphics parameters list Time series Forecasting Seasonal adjustment Moving average Exponential smoothing Extrapolation Linear prediction Trend estimation Stationarity and ARIMA modelling Econometric methods (casual methods) Regression analysis Multiple linear regression Multiple non-linear regression Regression validation Forecasting from regression
mrkanar Marketing Analytics using R 21 hours Audience: Business owners (marketing managers, product managers, customer base managers) and their teams; customer insights professionals. Overview: The course follows the customer life cycle from acquiring new customers, managing the existing customers for profitability, retaining good customers, and finally understanding which customers are leaving us and why. We will be working with real (if anonymous) data from a variety of industries including telecommunications, insurance, media, and high tech. Format: Instructor-led training over the course of five half-day sessions with in-class exercises as well as homework. It can be delivered as a classroom or distance (online) course. Part 1: Inflow - acquiring new customers Our focus is direct marketing so we will not look at advertising campaigns but instead focus on understanding marketing campaigns (e.g. direct mail). This is the foundation for almost everything else in the course. We look at measuring and improving campaign effectiveness. including: The importance of test and control groups. Universal control group. Techniques: Lift curves, AUC Return on investment. Optimizing marketing spend. Part 2: Base Management: managing existing customers Considering the cost of acquiring new customers for many businesses there are probably few assets more valuable than their existing customer base, though few think of it in this way. Topics include: 1. Cross-selling and up-selling: Offering the right product or service to the customer at the right time. Techniques: RFM models. Multinomial regression. b. Value of lifetime purchases. 2. Customer segmentation: Understanding the types of customers that you have. Classification models using first simple decision trees, and then random forests and other, newer techniques. Part 3: Retention: Keeping your good customers Understanding which customers are likely to leave and what you can do about it is key to profitability in many industries, especially where there are repeat purchases or subscriptions. We look at propensity to churn models, including Logistic regression: glm (package stats) and newer techniques (especially gbm as a general tool) Tuning models (caret) and introduction to ensemble models. Part 4: Outflow: Understanding who are leaving and why Customers will leave you – that is a fact of life. What is important is to understand who are leaving and why. Is it low value customers who are leaving or is it your best customers? Are they leaving to competitors or because they no longer need your products and services? Topics include: Customer lifetime value models: Combining value of purchases with propensity to churn and the cost of servicing and retaining the customer. Analysing survey data. (Generally useful, but we will do a brief introduction here in the context of exit surveys.)
advr Advanced R 7 hours Rstudio IDE Data manipulation with dplyr, tidyr, reshape2 Object oriented programming in R Performance profiling Exception handling Debugging R code Creating R packages Reproducible research with knitr and RMarkdown C/C++ coding in R Writing and compiling C/C++ code from R
rneuralnet Neural Network in R 14 hours This course is an introduction to applying neural networks in real world problems using R-project software. Introduction to Neural Networks What are Neural Networks What is current status in applying neural networks Neural Networks vs regression models Supervised and Unsupervised learning Overview of packages available nnet, neuralnet and others differences between packages and itls limitations Visualizing neural networks Applying Neural Networks Concept of neurons and neural networks A simplified model of the brain Opportunities neuron XOR problem and the nature of the distribution of values The polymorphic nature of the sigmoidal Other functions activated Construction of neural networks Concept of neurons connect Neural network as nodes Building a network Neurons Layers Scales Input and output data Range 0 to 1 Normalization Learning Neural Networks Backward Propagation Steps propagation Network training algorithms range of application Estimation Problems with the possibility of approximation by Examples OCR and image pattern recognition Other applications Implementing a neural network modeling job predicting stock prices of listed
datascience Data Science Training 21 hours Data Science Training Aim: Obtaining the required knowledge for application of Data Science methods and also getting consultancy for establishing a Data Science team in an insurance company Order: 2-3 days training and consulting in Data Science: One goal is getting consultancy in the introduction and establishment of Data Science, and the statistical environment R as Data Science tool, within a company / organization. Another goal represents the prediction of typical Key Performance Indicators (KPI) and their confidence intervals with R. Suitable reporting and communication of these KPIs to the management board should be trained also. On the basis of use cases which are derived from actual problems in Actuarial Science and Data Science, the respective methods and their implementation in R should be trained and discussed. Content: 1.) Modelling KPIs 1a.) Based on a use case, the modelling of respective KPI via R shall be discussed. Especially following topics have to be concerned: - Using R as a tool to analyze the performance of insurance portfolios - Suitable data organization within R - Application of Bayesian Theory (preferred using Stan Library in R) - Validation of statistical models - Suitable reporting of KPIs, visualization and communication of models and statistical results to the management board Target group: Data Scientists 2) Establishing a Data Science team within an organization Based on practical experience, it should be taught how to establish a Data Science team and R as a Data Science tool within a larger company. Especially the following topics have to be concerned: - Required hardware and software - Definition of interfaces to other teams (Data Integration / Data Governance / IT) - Standardization (Projects / Coding Styles / Methods) - Information Management - Documentation, reproducibility, allocation of tasks - Networking - Compliance Target group: Data Scientists, management board 3.) Claims reserving with R using state of the art methods Using the ChainLadder R Package, reserving shall be conducted. The focus lies on: - Application of state-of-the-art claims reserving methods including o Basic Chain-Ladder o Mack Chain-Ladder o Generalized linear modelling o Bayesian Approach - Estimation of claim severity in case quickly growing portfolios - Prediction of future claim severity in case of a fixed portfolio - Modelling cancellation Target group: Data Scientists, Actuaries Extent: 2-3 day training / consulting Requirements - in-house training is preferred - Training is based on real-life insurance data / experience
rprogadv Advanced R Programming 7 hours This course is for data scientists and statisticians that already have basic R & C++ coding skills and R code and need advanced R coding skills. The purpose is to give a practical advanced R programming course to participants interested in applying the methods at work. Sector specific examples are used to make the training relevant to the audience R's environment Object oriented programming in R S3 S4 Reference classes Performance profiling Exception handling Debugging R code Creating R packages Unit testing C/C++ coding in R SEXPRs Calling dynamically loaded libraries from R Writing and compiling C/C++ code from R Improving R's performance with C++ linear algebra library
mrkfct Market Forecasting 14 hours Audience This course has been created for analysts, forecasters wanting to introduce or improve forecasting which can be related to sale forecasting, economic forecasting, technology forecasting, supply chain management and demand or supply forecasting. Description This course guides delegates through series of methodologies, frameworks and algorithms which are useful when choosing how to predict the future based on historical data. It uses standard tools like Microsoft Excel or some Open Source programs (notably R project). The principles covered in this course can be implemented by any software (e.g. SAS, SPSS, Statistica, MINITAB ...) Problems facing forecasters Customer demand planning Investor uncertainty Economic planning Seasonal changes in demand/utilization Roles of risk and uncertainty Time series methods Moving average Exponential smoothing Extrapolation Linear prediction Trend estimation Growth curve Econometric methods (casual methods) Regression analysis using linear regression or non-linear regression Autoregressive moving average (ARMA) Autoregressive integrated moving average (ARIMA) Econometrics Judgemental methods Surveys Delphi method Scenario building Technology forecasting Forecast by analogy Simulation and other methods Simulation Prediction market Probabilistic forecasting and Ensemble forecasting Reference class forecasting
67795 Numerical Methods 14 hours This course is for data scientists and statisticians that have some familiarity with numerical methods and have at least one programming language from R, Python, Octave, and some C++ options. The emphasis of this course is on the practical aspects of data/model preparation, execution, post hoc analysis and visualization. The purpose of this course is to give a practical introduction in numerical methods to participants interested in applying the methods at work. Sector specific examples are used to make the training relevant to the audience. Topics Covered: curve fitting regression robust regression linear algebra: matrix operations eigenvalue/eigenvectormatrix decompositions ordinary & partial differential equations fourier analysis interpolation & splines
dsbda Data Science for Big Data Analytics 35 hours Introduction to Data Science for Big Data Analytics Data Science Overview Big Data Overview Data Structures Drivers and complexities of Big Data Big Data ecosystem and a new approach to analytics Key technologies in Big Data Data Mining process and problems Association Pattern Mining Data Clustering Outlier Detection Data Classification Introduction to Data Analytics lifecycle Discovery Data preparation Model planning Model building Presentation/Communication of results Operationalization Exercise: Case study From this point most of the training time (80%) will be spent on examples and exercises in R and related big data technology. Getting started with R Installing R and Rstudio Features of R language Objects in R Data in R Data manipulation Big data issues Exercises Getting started with Hadoop Installing Hadoop Understanding Hadoop modes HDFS MapReduce architecture Hadoop related projects overview Writing programs in Hadoop MapReduce Exercises Integrating R and Hadoop with RHadoop Components of RHadoop Installing RHadoop and connecting with Hadoop The architecture of RHadoop Hadoop streaming with R Data analytics problem solving with RHadoop Exercises Pre-processing and preparing data Data preparation steps Feature extraction Data cleaning Data integration and transformation Data reduction – sampling, feature subset selection, Dimensionality reduction Discretization and binning Exercises and Case study Exploratory data analytic methods in R Descriptive statistics Exploratory data analysis Visualization – preliminary steps Visualizing single variable Examining multiple variables Statistical methods for evaluation Hypothesis testing Exercises and Case study Data Visualizations Basic visualizations in R Packages for data visualization ggplot2, lattice, plotly, lattice Formatting plots in R Advanced graphs Exercises Regression (Estimating future values) Linear regression Use cases Model description Diagnostics Problems with linear regression Shrinkage methods, ridge regression, the lasso Generalizations and nonlinearity Regression splines Local polynomial regression Generalized additive models Regression with RHadoop Exercises and Case study Classification The classification related problems Bayesian refresher Naïve Bayes Logistic regression K-nearest neighbors Decision trees algorithm Neural networks Support vector machines Diagnostics of classifiers Comparison of classification methods Scalable classification algorithms Exercises and Case study Assessing model performance and selection Bias, Variance and model complexity Accuracy vs Interpretability Evaluating classifiers Measures of model/algorithm performance Hold-out method of validation Cross-validation Tuning machine learning algorithms with caret package Visualizing model performance with Profit ROC and Lift curves Ensemble Methods Bagging Random Forests Boosting Gradient boosting Exercises and Case study Support vector machines for classification and regression Maximal Margin classifiers Support vector classifiers Support vector machines SVM’s for classification problems SVM’s for regression problems Exercises and Case study Identifying unknown groupings within a data set Feature Selection for Clustering Representative based algorithms: k-means, k-medoids Hierarchical algorithms: agglomerative and divisive methods Probabilistic base algorithms: EM Density based algorithms: DBSCAN, DENCLUE Cluster validation Advanced clustering concepts Clustering with RHadoop Exercises and Case study Discovering connections with Link Analysis Link analysis concepts Metrics for analyzing networks The Pagerank algorithm Hyperlink-Induced Topic Search Link Prediction Exercises and Case study Association Pattern Mining Frequent Pattern Mining Model Scalability issues in frequent pattern mining Brute Force algorithms Apriori algorithm The FP growth approach Evaluation of Candidate Rules Applications of Association Rules Validation and Testing Diagnostics Association rules with R and Hadoop Exercises and Case study Constructing recommendation engines Understanding recommender systems Data mining techniques used in recommender systems Recommender systems with recommenderlab package Evaluating the recommender systems Recommendations with RHadoop Exercise: Building recommendation engine Text analysis Text analysis steps Collecting raw text Bag of words Term Frequency –Inverse Document Frequency Determining Sentiments Exercises and Case study
dataminr Data Mining with R 14 hours Sources of methods Artificial intelligence Machine learning Statistics Sources of data Pre processing of data Data Import/Export Data Exploration and Visualization Dimensionality Reduction Dealing with missing values R Packages Data mining main tasks Automatic or semi-automatic analysis of large quantities of data Extracting previously unknown interesting patterns groups of data records (cluster analysis) unusual records (anomaly detection) dependencies (association rule mining) Data mining Anomaly detection (Outlier/change/deviation detection) Association rule learning (Dependency modeling) Clustering Classification Regression Summarization Frequent Pattern Mining Text Mining Decision Trees Regression Neural Networks Sequence Mining Frequent Pattern Mining Data dredging, data fishing, data snooping
deepmclrg Machine Learning & Deep Learning with Python and R 14 hours MACHINE LEARNING 1: Introducing Machine Learning The origins of machine learning Uses and abuses of machine learning Ethical considerations How do machines learn? Abstraction and knowledge representation Generalization Assessing the success of learning Steps to apply machine learning to your data Choosing a machine learning algorithm Thinking about the input data Thinking about types of machine learning algorithms Matching your data to an appropriate algorithm Using R for machine learning Installing and loading R packages Installing an R package Installing a package using the point-and-click interface Loading an R package Summary 2: Managing and Understanding Data R data structures Vectors Factors Lists Data frames Matrixes and arrays Managing data with R Saving and loading R data structures Importing and saving data from CSV files Importing data from SQL databases Exploring and understanding data Exploring the structure of data Exploring numeric variables Measuring the central tendency – mean and median Measuring spread – quartiles and the five-number summary Visualizing numeric variables – boxplots Visualizing numeric variables – histograms Understanding numeric data – uniform and normal distributions Measuring spread – variance and standard deviation Exploring categorical variables Measuring the central tendency – the mode Exploring relationships between variables Visualizing relationships – scatterplots Examining relationships – two-way cross-tabulations Summary 3: Lazy Learning – Classification Using Nearest Neighbors Understanding classification using nearest neighbors The kNN algorithm Calculating distance Choosing an appropriate k Preparing data for use with kNN Why is the kNN algorithm lazy? Diagnosing breast cancer with the kNN algorithm Step 1 – collecting data Step 2 – exploring and preparing the data Transformation – normalizing numeric data Data preparation – creating training and test datasets Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Transformation – z-score standardization Testing alternative values of k Summary 4: Probabilistic Learning – Classification Using Naive Bayes Understanding naive Bayes Basic concepts of Bayesian methods Probability Joint probability Conditional probability with Bayes' theorem The naive Bayes algorithm The naive Bayes classification The Laplace estimator Using numeric features with naive Bayes Example – filtering mobile phone spam with the naive Bayes algorithm Step 1 – collecting data Step 2 – exploring and preparing the data Data preparation – processing text data for analysis Data preparation – creating training and test datasets Visualizing text data – word clouds Data preparation – creating indicator features for frequent words Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Summary 5: Divide and Conquer – Classification Using Decision Trees and Rules Understanding decision trees Divide and conquer The C5.0 decision tree algorithm Choosing the best split Pruning the decision tree Example – identifying risky bank loans using C5.0 decision trees Step 1 – collecting data Step 2 – exploring and preparing the data Data preparation – creating random training and test datasets Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Boosting the accuracy of decision trees Making some mistakes more costly than others Understanding classification rules Separate and conquer The One Rule algorithm The RIPPER algorithm Rules from decision trees Example – identifying poisonous mushrooms with rule learners Step 1 – collecting data Step 2 – exploring and preparing the data Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Summary 6: Forecasting Numeric Data – Regression Methods Understanding regression Simple linear regression Ordinary least squares estimation Correlations Multiple linear regression Example – predicting medical expenses using linear regression Step 1 – collecting data Step 2 – exploring and preparing the data Exploring relationships among features – the correlation matrix Visualizing relationships among features – the scatterplot matrix Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Model specification – adding non-linear relationships Transformation – converting a numeric variable to a binary indicator Model specification – adding interaction effects Putting it all together – an improved regression model Understanding regression trees and model trees Adding regression to trees Example – estimating the quality of wines with regression trees and model trees Step 1 – collecting data Step 2 – exploring and preparing the data Step 3 – training a model on the data Visualizing decision trees Step 4 – evaluating model performance Measuring performance with mean absolute error Step 5 – improving model performance Summary 7: Black Box Methods – Neural Networks and Support Vector Machines Understanding neural networks From biological to artificial neurons Activation functions Network topology The number of layers The direction of information travel The number of nodes in each layer Training neural networks with backpropagation Modeling the strength of concrete with ANNs Step 1 – collecting data Step 2 – exploring and preparing the data Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Understanding Support Vector Machines Classification with hyperplanes Finding the maximum margin The case of linearly separable data The case of non-linearly separable data Using kernels for non-linear spaces Performing OCR with SVMs Step 1 – collecting data Step 2 – exploring and preparing the data Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Summary 8: Finding Patterns – Market Basket Analysis Using Association Rules Understanding association rules The Apriori algorithm for association rule learning Measuring rule interest – support and confidence Building a set of rules with the Apriori principle Example – identifying frequently purchased groceries with association rules Step 1 – collecting data Step 2 – exploring and preparing the data Data preparation – creating a sparse matrix for transaction data Visualizing item support – item frequency plots Visualizing transaction data – plotting the sparse matrix Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Sorting the set of association rules Taking subsets of association rules Saving association rules to a file or data frame Summary 9: Finding Groups of Data – Clustering with k-means Understanding clustering Clustering as a machine learning task The k-means algorithm for clustering Using distance to assign and update clusters Choosing the appropriate number of clusters Finding teen market segments using k-means clustering Step 1 – collecting data Step 2 – exploring and preparing the data Data preparation – dummy coding missing values Data preparation – imputing missing values Step 3 – training a model on the data Step 4 – evaluating model performance Step 5 – improving model performance Summary 10: Evaluating Model Performance Measuring performance for classification Working with classification prediction data in R A closer look at confusion matrices Using confusion matrices to measure performance Beyond accuracy – other measures of performance The kappa statistic Sensitivity and specificity Precision and recall The F-measure Visualizing performance tradeoffs ROC curves Estimating future performance The holdout method Cross-validation Bootstrap sampling Summary 11: Improving Model Performance Tuning stock models for better performance Using caret for automated parameter tuning Creating a simple tuned model Customizing the tuning process Improving model performance with meta-learning Understanding ensembles Bagging Boosting Random forests Training random forests Evaluating random forest performance Summary DEEP LEARNING with R 1: Getting Started with Deep Learning What is deep learning? Conceptual overview of neural networks Deep neural networks R packages for deep learning Setting up reproducible results Neural networks The deepnet package The darch package The H2O package Connecting R and H2O Initializing H2O Linking datasets to an H2O cluster Summary 2: Training a Prediction Model Neural networks in R Building a neural network Generating predictions from a neural network The problem of overfitting data – the consequences explained Use case – build and apply a neural network Summary 3: Preventing Overfitting L1 penalty L1 penalty in action L2 penalty L2 penalty in action Weight decay (L2 penalty in neural networks) Ensembles and model averaging Use case – improving out-of-sample model performance using dropout Summary 4: Identifying Anomalous Data Getting started with unsupervised learning How do auto-encoders work? Regularized auto-encoders Penalized auto-encoders Denoising auto-encoders Training an auto-encoder in R Use case – building and applying an auto-encoder model Fine-tuning auto-encoder models Summary 5: Training Deep Prediction Models Getting started with deep feedforward neural networks Common activation functions – rectifiers, hyperbolic tangent, and maxout Picking hyperparameters Training and predicting new data from a deep neural network Use case – training a deep neural network for automatic classification Working with model results Summary 6: Tuning and Optimizing Models Dealing with missing data Solutions for models with low accuracy Grid search Random search Summary DEEP LEARNING WITH PYTHON I Introduction 1 Welcome Deep Learning The Wrong Way Deep Learning With Python Summary II Background 2 Introduction to Theano What is Theano? How to Install Theano Simple Theano Example Extensions and Wrappers for Theano More Theano Resources Summary 3 Introduction to TensorFlow What is TensorFlow? How to Install TensorFlow Your First Examples in TensorFlow Simple TensorFlow Example More Deep Learning Models Summary 4 Introduction to Keras What is Keras? How to Install Keras Theano and TensorFlow Backends for Keras Build Deep Learning Models with Keras Summary 5 Project: Develop Large Models on GPUs Cheaply In the Cloud Project Overview Setup Your AWS Account Launch Your Server Instance Login, Configure and Run Build and Run Models on AWS Close Your EC2 Instance Tips and Tricks for Using Keras on AWS More Resources For Deep Learning on AWS Summary III Multilayer Perceptrons 6 Crash Course In Multilayer Perceptrons Crash Course Overview Multilayer Perceptrons Neurons Networks of Neurons Training Networks Summary 7 Develop Your First Neural Network With Keras Tutorial Overview Pima Indians Onset of Diabetes Dataset Load Data Define Model Compile Model Fit Model Evaluate Model Tie It All Together Summary 8 Evaluate The Performance of Deep Learning Models Empirically Evaluate Network Configurations Data Splitting Manual k-Fold Cross Validation Summary 9 Use Keras Models With Scikit-Learn For General Machine Learning Overview Evaluate Models with Cross Validation Grid Search Deep Learning Model Parameters Summary 10 Project: Multiclass Classification Of Flower Species Iris Flowers Classification Dataset Import Classes and Functions Initialize Random Number Generator Load The Dataset Encode The Output Variable Define The Neural Network Model Evaluate The Model with k-Fold Cross Validation Summary 11 Project: Binary Classification Of Sonar Returns Sonar Object Classification Dataset Baseline Neural Network Model Performance Improve Performance With Data Preparation Tuning Layers and Neurons in The Model Summary 12 Project: Regression Of Boston House Prices Boston House Price Dataset Develop a Baseline Neural Network Model Lift Performance By Standardizing The Dataset Tune The Neural Network Topology Summary IV Advanced Multilayer Perceptrons and Keras 13 Save Your Models For Later With Serialization Tutorial Overview . Save Your Neural Network Model to JSON Save Your Neural Network Model to YAML Summary 14 Keep The Best Models During Training With Checkpointing Checkpointing Neural Network Models Checkpoint Neural Network Model Improvements Checkpoint Best Neural Network Model Only Loading a Saved Neural Network Model Summary 15 Understand Model Behavior During Training By Plotting History Access Model Training History in Keras Visualize Model Training History in Keras Summary 16 Reduce Overfitting With Dropout Regularization Dropout Regularization For Neural Networks Dropout Regularization in Keras Using Dropout on the Visible Layer Using Dropout on Hidden Layers Tips For Using Dropout Summary 17 Lift Performance With Learning Rate Schedules Learning Rate Schedule For Training Models Ionosphere Classification Dataset Time-Based Learning Rate Schedule Drop-Based Learning Rate Schedule Tips for Using Learning Rate Schedules Summary V Convolutional Neural Networks 18 Crash Course In Convolutional Neural Networks The Case for Convolutional Neural Networks Building Blocks of Convolutional Neural Networks Convolutional Layers Pooling Layers Fully Connected Layers Worked Example Convolutional Neural Networks Best Practices Summary 19 Project: Handwritten Digit Recognition Handwritten Digit Recognition Dataset Loading the MNIST dataset in Keras Baseline Model with Multilayer Perceptrons Simple Convolutional Neural Network for MNIST Larger Convolutional Neural Network for MNIST Summary 20 Improve Model Performance With Image Augmentation Keras Image Augmentation API Point of Comparison for Image Augmentation Feature Standardization ZCA Whitening Random Rotations Random Shifts Random Flips Saving Augmented Images to File Tips For Augmenting Image Data with Keras Summary 21 Project Object Recognition in Photographs Photograph Object Recognition Dataset Loading The CIFAR-10 Dataset in Keras Simple CNN for CIFAR-10 Larger CNN for CIFAR-10 Extensions To Improve Model Performance Summary 22 Project: Predict Sentiment From Movie Reviews Movie Review Sentiment Classification Dataset Load the IMDB Dataset With Keras Word Embeddings Simple Multilayer Perceptron Model One-Dimensional Convolutional Neural Network Summary VI Recurrent Neural Networks 23 Crash Course In Recurrent Neural Networks Support For Sequences in Neural Networks Recurrent Neural Networks Long Short-Term Memory Networks Summary 24 Time Series Prediction with Multilayer Perceptrons Problem Description: Time Series Prediction Multilayer Perceptron Regression Multilayer Perceptron Using the Window Method Summary 25 Time Series Prediction with LSTM Recurrent Neural Networks LSTM Network For Regression LSTM For Regression Using the Window Method LSTM For Regression with Time Steps LSTM With Memory Between Batches Stacked LSTMs With Memory Between Batches Summary 26 Project: Sequence Classification of Movie Reviews Simple LSTM for Sequence Classification LSTM For Sequence Classification With Dropout LSTM and CNN For Sequence Classification Summary 27 Understanding Stateful LSTM Recurrent Neural Networks Problem Description: Learn the Alphabet LSTM for Learning One-Char to One-Char Mapping LSTM for a Feature Window to One-Char Mapping LSTM for a Time Step Window to One-Char Mapping LSTM State Maintained Between Samples Within A Batch Stateful LSTM for a One-Char to One-Char Mapping LSTM with Variable Length Input to One-Char Output Summary 28 Project: Text Generation With Alice in Wonderland Problem Description: Text Generation Develop a Small LSTM Recurrent Neural Network Generating Text with an LSTM Network Larger LSTM Recurrent Neural Network Extension Ideas to Improve the Model Summary
bigdatar Programming with Big Data in R 21 hours Introduction to Programming Big Data with R (bpdR) Setting up your environment to use pbdR Scope and tools available in pbdR Packages commonly used with Big Data alongside pbdR Message Passing Interface (MPI) Using pbdR MPI 5 Parallel processing Point-to-point communication Send Matrices Summing Matrices Collective communication Summing Matrices with Reduce Scatter / Gather Other MPI communications Distributed Matrices Creating a distributed diagonal matrix SVD of a distributed matrix Building a distributed matrix in parallel Statistics Applications Monte Carlo Integration Reading Datasets Reading on all processes Broadcasting from one process Reading partitioned data Distributed Regression Distributed Bootstrap
datamodeling Pattern Recognition 35 hours This course provides an introduction into the field of pattern recognition and machine learning. It touches on practical applications in statistics, computer science, signal processing, computer vision, data mining, and bioinformatics. The course is interactive and includes plenty of hands-on exercises, instructor feedback, and testing of knowledge and skills acquired. Audience     Data analysts     PhD students, researchers and practitioners   Introduction Probability theory, model selection, decision and information theory Probability distributions Linear models for regression and classification Neural networks Kernel methods Sparse kernel machines Graphical models Mixture models and EM Approximate inference Sampling methods Continuous latent variables Sequential data Combining models  
MLFWR1 Machine Learning Fundamentals with R 14 hours The aim of this course is to provide a basic proficiency in applying Machine Learning methods in practice. Through the use of the R programming platform and its various libraries, and based on a multitude of practical examples this course teaches how to use the most important building blocks of Machine Learning, how to make data modeling decisions, interpret the outputs of the algorithms and validate the results. Our goal is to give you the skills to understand and use the most fundamental tools from the Machine Learning toolbox confidently and avoid the common pitfalls of Data Sciences applications. Introduction to Applied Machine Learning Statistical learning vs. Machine learning Iteration and evaluation Bias-Variance trade-off Regression Linear regression Generalizations and Nonlinearity Exercises Classification Bayesian refresher Naive Bayes Logistic regression K-Nearest neighbors Exercises Cross-validation and Resampling Cross-validation approaches Bootstrap Exercises Unsupervised Learning K-means clustering Examples Challenges of unsupervised learning and beyond K-means
kdd Knowledge Discover in Databases (KDD) 21 hours Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing. In this course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes. Audience     Data analysts or anyone interested in learning how to interpret data to solve problems Format of the course     After a theoretical discussion of KDD, the instructor will present real-life cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations. Introduction     KDD vs data mining Establishing the application domain Establishing relevant prior knowledge Understanding the goal of the investigation Creating a target data set Data cleaning and preprocessing Data reduction and projection Choosing the data mining task Choosing the data mining algorithms Interpreting the mined patterns
rlang R 21 hours Day 1 Introduction and preliminaries Making R more friendly, R and available GUIs Rstudio Related software and documentation R and statistics Using R interactively An introductory session Getting help with functions and features R commands, case sensitivity, etc. Recall and correction of previous commands Executing commands from or diverting output to a file Data permanency and removing objects Simple manipulations; numbers and vectors Vectors and assignment Vector arithmetic Generating regular sequences Logical vectors Missing values Character vectors Index vectors; selecting and modifying subsets of a data set Other types of objects Objects, their modes and attributes Intrinsic attributes: mode and length Changing the length of an object Getting and setting attributes The class of an object Ordered and unordered factors A specific example The function tapply() and ragged arrays Ordered factors Arrays and matrices Arrays Array indexing. Subsections of an array Index matrices The array() function Mixed vector and array arithmetic. The recycling rule The outer product of two arrays Generalized transpose of an array Matrix facilities Matrix multiplication Linear equations and inversion Eigenvalues and eigenvectors Singular value decomposition and determinants Least squares fitting and the QR decomposition Forming partitioned matrices, cbind() and rbind() The concatenation function, (), with arrays Frequency tables from factors Day 2 Lists and data frames Lists Constructing and modifying lists Concatenating lists Data frames Making data frames attach() and detach() Working with data frames Attaching arbitrary lists Managing the search path Data manipulation Selecting, subsetting observations and variables           Filtering, grouping Recoding, transformations Aggregation, combining data sets Character manipulation, stringr package Reading data Txt files CSV files XLS, XLSX files SPSS, SAS, Stata,… and other formats data Exporting data to txt, csv and other formats Accessing data from databases using SQL language Probability distributions R as a set of statistical tables Examining the distribution of a set of data One- and two-sample tests Grouping, loops and conditional execution Grouped expressions Control statements Conditional execution: if statements Repetitive execution: for loops, repeat and while Day 3 Writing your own functions Simple examples Defining new binary operators Named arguments and defaults The '...' argument Assignments within functions More advanced examples Efficiency factors in block designs Dropping all names in a printed array Recursive numerical integration Scope Customizing the environment Classes, generic functions and object orientation Statistical analysis in R Linear regression models Generic functions for extracting model information Updating fitted models Generalized linear models Families The glm() function Classification Logistic Regression Linear Discriminant Analysis Unsupervised learning Principal Components Analysis Clustering Methods( k-means, hierarchical clustering, k-medoids) Survival analysis Survival objects in r Kaplan-Meier estimate Confidence bands Cox PH models, constant covariates Cox PH models, time-dependent covariates Graphical procedures High-level plotting commands The plot() function Displaying multivariate data Display graphics Arguments to high-level plotting functions Basic visualisation graphs Multivariate relations with lattice and ggplot package Using graphics parameters Graphics parameters list Automated and interactive reporting Combining output from R with text Creating html, pdf documents
nlpwithr NLP: Natural Language Processing with R 21 hours It is estimated that unstructured data accounts for more than 90 percent of all data, much of it in the form of text. Blog posts, tweets, social media, and other digital publications continuously add to this growing body of data. This course centers around extracting insights and meaning from this data. Utilizing the R Language and Natural Language Processing (NLP) libraries, we combine concepts and techniques from computer science, artificial intelligence, and computational linguistics to algorithmically understand the meaning behind text data. Data samples are available in various languages per customer requirements. By the end of this training participants will be able to prepare data sets (large and small) from disparate sources, then apply the right algorithms to analyze and report on its significance. Audience     Linguists and programmers Format of the course     Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding Introduction     NLP and R vs Python Installing and configuring R Studio Installing R packages related to Natural Language Processing (NLP). An overview of R’s text manipulation capabilities Getting started with an NLP project in R Reading and importing data files into R Text manipulation with R Document clustering in R Parts of speech tagging in R Sentence parsing in R Working with regular expressions in R Named-entity recognition in R Topic modeling in R Text classification in R Working with very large data sets Visualizing your results Optimization Integrating R with other languages (Java, Python, etc.) Closing remarks
webappsr Building Web Applications in R with Shiny 7 hours Description:  This is a course designed to teach R users how to create web apps without needing to learn cross-browser HTML, Javascript, and CSS. Objective: Covers the basics of how Shiny apps work. Covers all commonly used input/output/rendering/paneling functions from the Shiny library. An overview of Shiny Installation of Shiny for a local use Basic Shiny concepts Basic control accessories - Buttons, sliders, drop down menus Program structure ui.r, server.r Building first application Running your application Customizing interface Html links in Shiny JavaScript and Shiny Advanced control accessories Showing and Hiding elements of UI Dynamic user interfaces Advanced reactivity Animation Downloading uploading data Sharing Shiny web applications An overview of Shiny extensions
BigData_ A practical introduction to Data Analysis and Big Data 35 hours Participants who complete this training will gain a practical, real-world understanding of Big Data and its related technologies, methodologies and tools. Participants will have the opportunity to put this knowledge into practice through hands-on exercises. Group interaction and instructor feedback make up an important component of the class. The course starts with an introduction to elemental concepts of Big Data, then progresses into the programming languages and methodologies used to perform Data Analysis. Finally, we discuss the tools and infrastructure that enable Big Data storage, Distributed Processing, and Scalability. Audience Developers / programmers IT consultants Format of the course     Part lecture, part discussion, hands-on practice and implementation, occasional quizing to measure progress. Introduction to Data Analysis and Big Data What makes Big Data "big"? Velocity, Volume, Variety, Veracity (VVVV) Limits to traditional Data Processing Distributed Processing Statistical Analysis Types of Machine Learning Analysis Data Visualization Languages used for Data Analysis R language Why R for Data Analysis? Data manipulation, calculation and graphical display Python Why Python for Data Analysis? Manipulating, processing, cleaning, and crunching data Approaches to Data Analysis Statistical Analysis Time Series analysis Forecasting with Correlation and Regression models Inferential Statistics (estimating) Descriptive Statistics in Big Data sets (e.g. calculating mean) Machine Learning Supervised vs unsupervised learning Classification and clustering Estimating cost of specific methods Filtering Natural Language Processing Processing text Understaing meaning of the text Automatic text generation Sentiment/Topic Analysis Computer Vision Acquiring, processing, analyzing, and understanding images Reconstructing, interpreting and understanding 3D scenes Using image data to make decisions Big Data infrastructure Data Storage Relational databases (SQL) MySQL Postgres Oracle Non-relational databases (NoSQL) Cassandra MongoDB Neo4js Understanding the nuances Hierarchical databases Object-oriented databases Document-oriented databases Graph-oriented databases Other Distributed Processing Hadoop HDFS as a distributed filesystem MapReduce for distributed processing Spark All-in-one in-memory cluster computing framework for large-scale data processing Structured streaming Spark SQL Machine Learning libraries: MLlib Graph processing with GraphX Scalability Public cloud AWS, Google, Aliyun, etc. Private cloud OpenStack, Cloud Foundry, etc. Auto-scalability Choosing right solution for the problem The future of Big Data Closing remarks
rintrob Introductory R for Biologists 28 hours I. Introduction and preliminaries 1. Overview Making R more friendly, R and available GUIs Rstudio Related software and documentation R and statistics Using R interactively An introductory session Getting help with functions and features R commands, case sensitivity, etc. Recall and correction of previous commands Executing commands from or diverting output to a file Data permanency and removing objects Good programming practice:  Self-contained scripts, good    readability e.g. structured scripts, documentation, markdown installing packages; CRAN and Bioconductor 2. Reading data Txt files  (read.delim) CSV files 3. Simple manipulations; numbers and vectors  + arrays Vectors and assignment Vector arithmetic Generating regular sequences Logical vectors Missing values Character vectors Index vectors; selecting and modifying subsets of a data set Arrays Array indexing. Subsections of an array Index matrices The array() function + simple operations on arrays e.g. multiplication, transposition   Other types of objects 4. Lists and data frames Lists Constructing and modifying lists Concatenating lists Data frames Making data frames Working with data frames Attaching arbitrary lists Managing the search path 5. Data manipulation Selecting, subsetting observations and variables          Filtering, grouping Recoding, transformations Aggregation, combining data sets Forming partitioned matrices, cbind() and rbind() The concatenation function, (), with arrays Character manipulation, stringr package short intro into grep and regexpr 6. More on Reading data                                             XLS, XLSX files readr  and readxl packages SPSS, SAS, Stata,… and other formats data Exporting data to txt, csv and other formats 6. Grouping, loops and conditional execution Grouped expressions Control statements Conditional execution: if statements Repetitive execution: for loops, repeat and while intro into apply, lapply, sapply, tapply 7. Functions Creating functions Optional arguments and default values Variable number of arguments Scope and its consequences 8. Simple graphics in R Creating a Graph Density Plots Dot Plots Bar Plots Line Charts Pie Charts Boxplots Scatter Plots Combining Plots II. Statistical analysis in R  1.    Probability distributions R as a set of statistical tables Examining the distribution of a set of data 2.   Testing of Hypotheses Tests about a Population Mean Likelihood Ratio Test One- and two-sample tests Chi-Square Goodness-of-Fit Test Kolmogorov-Smirnov One-Sample Statistic  Wilcoxon Signed-Rank Test Two-Sample Test Wilcoxon Rank Sum Test Mann-Whitney Test Kolmogorov-Smirnov Test 3. Multiple Testing of Hypotheses Type I Error and FDR ROC curves and AUC Multiple Testing Procedures (BH, Bonferroni etc.) 4. Linear regression models Generic functions for extracting model information Updating fitted models Generalized linear models Families The glm() function Classification Logistic Regression Linear Discriminant Analysis Unsupervised learning Principal Components Analysis Clustering Methods(k-means, hierarchical clustering, k-medoids) 5.  Survival analysis (survival package) Survival objects in r Kaplan-Meier estimate, log-rank test, parametric regression Confidence bands Censored (interval censored) data analysis Cox PH models, constant covariates Cox PH models, time-dependent covariates Simulation: Model comparison (Comparing regression models)  6.   Analysis of Variance One-Way ANOVA Two-Way Classification of ANOVA MANOVA III. Worked problems in bioinformatics            Short introduction to limma package Microarray data analysis workflow Data download from GEO: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE1397 Data processing (QC, normalisation, differential expression) Volcano plot              Custering examples + heatmaps
DatSci7 Data Science Programme 245 hours The explosion of information and data in today’s world is un-paralleled, our ability to innovate and push the boundaries of the possible is growing faster than it ever has. The role of Data Scientist is one of the highest in-demand skills across industry today. We offer much more than learning through theory; we deliver practical, marketable skills that bridge the gap between the world of academia and the demands of industry. This 7 week curriculum  can be tailored to your specific Industry requirements, please contact us for further information or visit the Nobleprog Institute website www.inobleprog.co.uk Audience: This programme is aimed post level graduates as well as anyone with the required pre-requisite skills which will be determined by an assessment and interview.  Delivery: Delivery of the course will be a mixture of Instructor Led Classroom and Instructor Led Online; typically the 1st week will be 'classroom led', weeks 2 - 6 'virtual classroom' and week 7  back to 'classroom led'.      Week 1 Big Data concepts VVVV (Velocity, Volume, Variety, Veracity) definition Limits to traditional data processing capacity Distributed Processing Statistical Analysis Machine Learning Analysis Types Data Visualization Distributed Processing (e.g. map-reduce) Introduction to used languages R language crash-course Python crash course Weeks 2&3 Performing Data Analysis Statistical Analysis Descriptive Statistics in Big Data sets (e.g. calculating mean) Inferential Statistics (estimating) Forecasting with Correlation and Regression models Time Series analysis Basics of Machine Learning Supervised vs unsupervised learning Classification and clustering Estimating cost of specific methods Filter Week 4 Natural Language Processing Processing text Understanding meaning of the text Automatic text generation Sentiment/Topic Analysis Computer Vision Week 5&6 Tooling concept Data storage solution (SQL, NoSQL, hierarchical, object oriented, document oriented) MySQL, Cassandra, MongoDB, Elasticsearch, HDFS, etc...) Choosing right solution to the problem Distributed Processing Spark Machine Learning with Spark (MLLib) Spark SQL Scalability Public cloud (AWS, Google, etc...) Private cloud (OpenStack, cloud foundry) Autoscalability Week 7 Soft Skills Advisory & Leadership Skills Making an impact: data-driven story telling Understanding your audience Effective data presentation - getting your message across Influence effectiveness and change leadership Handling difficult situations Exam End of Programme graduation exam
rintro Introduction to R 21 hours R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has also found followers among statisticians, engineers and scientists without computer programming skills who find it easy to use. Its popularity is due to the increasing use of data mining for various goals such as set ad prices, find new drugs more quickly or fine-tune financial models. R has a wide variety of packages for data mining. This course covers the manipulation of objects in R including reading data, accessing R packages, writing R functions, and making informative graphs. It includes analyzing data using common statistical models. The course teaches how to use the R software (http://www.r-project.org) both on a command line and in a graphical user interface (GUI). Introduction and preliminaries Making R more friendly, R and available GUIs The R environment Related software and documentation R and statistics Using R interactively An introductory session Getting help with functions and features R commands, case sensitivity, etc. Recall and correction of previous commands Executing commands from or diverting output to a file Data permanency and removing objects Simple manipulations; numbers and vectors Vectors and assignment Vector arithmetic Generating regular sequences Logical vectors Missing values Character vectors Index vectors; selecting and modifying subsets of a data set Other types of objects Objects, their modes and attributes Intrinsic attributes: mode and length Changing the length of an object Getting and setting attributes The class of an object Ordered and unordered factors A specific example The function tapply() and ragged arrays Ordered factors Arrays and matrices Arrays Array indexing. Subsections of an array Index matrices The array() function Mixed vector and array arithmetic. The recycling rule The outer product of two arrays Generalized transpose of an array Matrix facilities Matrix multiplication Linear equations and inversion Eigenvalues and eigenvectors Singular value decomposition and determinants Least squares fitting and the QR decomposition Forming partitioned matrices, cbind() and rbind() The concatenation function, (), with arrays Frequency tables from factors Lists and data frames Lists Constructing and modifying lists Concatenating lists Data frames Making data frames attach() and detach() Working with data frames Attaching arbitrary lists Managing the search path Reading data from files The read.table()function The scan() function Accessing builtin datasets Loading data from other R packages Editing data Probability distributions R as a set of statistical tables Examining the distribution of a set of data One- and two-sample tests Grouping, loops and conditional execution Grouped expressions Control statements Conditional execution: if statements Repetitive execution: for loops, repeat and while Writing your own functions Simple examples Defining new binary operators Named arguments and defaults The '...' argument Assignments within functions More advanced examples Efficiency factors in block designs Dropping all names in a printed array Recursive numerical integration Scope Customizing the environment Classes, generic functions and object orientation Statistical models in R Defining statistical models; formulae Contrasts Linear models Generic functions for extracting model information Analysis of variance and model comparison ANOVA tables Updating fitted models Generalized linear models Families The glm() function Nonlinear least squares and maximum likelihood models Least squares Maximum likelihood Some non-standard models Graphical procedures High-level plotting commands The plot() function Displaying multivariate data Display graphics Arguments to high-level plotting functions Low-level plotting commands Mathematical annotation Hershey vector fonts Interacting with graphics Using graphics parameters Permanent changes: The par() function Temporary changes: Arguments to graphics functions Graphics parameters list Graphical elements Axes and tick marks Figure margins Multiple figure environment Device drivers PostScript diagrams for typeset documents Multiple graphics devices Dynamic graphics Packages Standard packages Contributed packages and CRAN Namespaces
rdataana R for Data Analysis and Research 7 hours Audience managers developers scientists students Format of the course on-line instruction and discussion OR face-to-face workshops The list below gives an idea of the topics that will be covered in the workshop. The number of topics that will be covered depends on the duration of the workshop (i.e. one, two or three days). In a one or two day workshop it may not be possible to cover all topics, and so the workshop will be tailored to suit the specific needs of the learners. A first R session Syntax for analysing one dimensional data arrays Syntax for analysing two dimensional data arrays Reading and writing data files Sub-setting data, sorting, ranking and ordering data Merging arrays Set membership The main statistical functions in R The Normal Distribution (correlation, probabilities, tests for normality and confidence intervals) Ordinary Least Squares Regression T-tests, Analysis of Variance and Multivariable Analysis of Variance Chi-square tests for categorical variables Writing functions in R Writing software (scripts) in R Control structures (e.g. Loops) Graphical methods (including scatterplots, bar charts, pie charts, histograms, box plots and dot charts) Graphical User Interfaces for R
frcr Forecasting with R 14 hours This course allows delegate to fully automate the process of forecasting with R Forecasting with R Introduction to Forecasting Exponential Smoothing ARIMA models The forecast package Package 'forecast' accuracy Acf arfima Arima arima.errors auto.arima bats BoxCox BoxCox.lambda croston CV dm.test dshw ets fitted.Arima forecast forecast.Arima forecast.bats forecast.ets forecast.HoltWinters forecast.lm forecast.stl forecast.StructTS gas gold logLik.ets ma meanf monthdays msts na.interp naive ndiffs nnetar plot.bats plot.ets plot.forecast rwf seasadj seasonaldummy seasonplot ses simulate.ets sindexf splinef subset.ts taylor tbats thetaf tsdisplay tslm wineind woolyrnq

Upcoming Courses

Other regions

Weekend R courses, Evening R training, R boot camp, R instructor-led , R on-site, R classes, R instructor, R trainer ,Weekend R training, R coaching, R one on one training , R training courses, Evening R courses

Course Discounts

Course Discounts Newsletter

We respect the privacy of your email address. We will not pass on or sell your address to others.
You can always change your preferences or unsubscribe completely.

Some of our clients