Big Data Training in Bradford

Big Data Training in Bradford

Big Data is a term that refers to solutions destined for storing and processing large data sets. Developed by Google initially, these Big Data solutions have evolved and inspired other similar projects, many of which are available as open-source. Some examples include Apache Hadoop, Cassandra and Cloudera Impala. According to Gartner’s reports, BigData is the next big step in IT just after the Cloud Computing and will be a leading trend in the next several years.

NobleProg onsite live BigData training courses start with an introduction to elemental concepts of Big Data, then progress into the programming languages and methodologies used to perform Data Analysis. Tools and infrastructure for enabling Big Data storage, Distributed Processing, and Scalability are discussed, compared and implemented in demo practice sessions.

BigData training is available in various formats, including onsite live training and live instructor-led training using an interactive, remote desktop setup. Local BigData training can be carried out live on customer premises or in NobleProg local training centers.

Bradford - Carlisle Business Centre

Carlisle Business Centre
60 Carlisle Road
Bradford, WYK BD8 8BD
United Kingdom
West Yorkshire GB
Bradford - Carlisle Business Centre
Carlisle Business Centre is a not for private profit organisation, driven by a dedicated workforce, that's supported by a voluntary board of directors. We...Read more

Client Testimonials

Big Data Course Events - Bradford

Code Name Venue Duration Course Date PHP Course Price [Remote / Classroom]
iotemi IoT (Internet of Things) for Entrepreneurs, Managers and Investors Bradford - Carlisle Business Centre 21 hours Wed, 2018-03-07 09:30 £3900 / £4650
sparkpython Python and Spark for Big Data (PySpark) Bradford - Carlisle Business Centre 21 hours Wed, 2018-03-07 09:30 £3300 / £4050
nifidev Apache NiFi for Developers Bradford - Carlisle Business Centre 7 hours Thu, 2018-03-08 09:30 £1100 / £1350
scylladb Scylla database Bradford - Carlisle Business Centre 21 hours Mon, 2018-03-12 09:30 £3300 / £4050
kdd Knowledge Discover in Databases (KDD) Bradford - Carlisle Business Centre 21 hours Mon, 2018-03-12 09:30 £3300 / £4050
matlabfundamentalsfinance MATLAB Fundamentals + MATLAB for Finance Bradford - Carlisle Business Centre 35 hours Mon, 2018-03-12 09:30 £5500 / £6750
apacheh Administrator Training for Apache Hadoop Bradford - Carlisle Business Centre 35 hours Mon, 2018-03-12 09:30 £5500 / £6750
BigData_ A practical introduction to Data Analysis and Big Data Bradford - Carlisle Business Centre 35 hours Mon, 2018-03-12 09:30 £5500 / £6750
bdbiga Big Data Business Intelligence for Govt. Agencies Bradford - Carlisle Business Centre 35 hours Mon, 2018-03-12 09:30 £5500 / £6750
apachedrill Apache Drill for On-the-Fly Analysis of Multiple Big Data Formats Bradford - Carlisle Business Centre 21 hours Tue, 2018-03-13 09:30 £3300 / £4050
datavault Data Vault: Building a Scalable Data Warehouse Bradford - Carlisle Business Centre 28 hours Tue, 2018-03-13 09:30 £4400 / £5400
pythonmultipurpose Advanced Python Bradford - Carlisle Business Centre 28 hours Tue, 2018-03-13 09:30 £4400 / £5400
hadoopdev Hadoop for Developers (4 days) Bradford - Carlisle Business Centre 28 hours Tue, 2018-03-13 09:30 £4400 / £5400
ApHadm1 Apache Hadoop: Manipulation and Transformation of Data Performance Bradford - Carlisle Business Centre 21 hours Tue, 2018-03-13 09:30 £3300 / £4050
alluxio Alluxio: Unifying disparate storage systems Bradford - Carlisle Business Centre 7 hours Tue, 2018-03-13 09:30 £1100 / £1350
hadoopadm1 Hadoop For Administrators Bradford - Carlisle Business Centre 21 hours Wed, 2018-03-14 09:30 £3300 / £4050
datamin Data Mining Bradford - Carlisle Business Centre 21 hours Wed, 2018-03-14 09:30 £3900 / £4650
samza Samza for stream processing Bradford - Carlisle Business Centre 14 hours Wed, 2018-03-14 09:30 £2200 / £2700
rprogda R Programming for Data Analysis Bradford - Carlisle Business Centre 14 hours Mon, 2018-03-19 09:30 £2200 / £2700
rneuralnet Neural Network in R Bradford - Carlisle Business Centre 14 hours Mon, 2018-03-19 09:30 £2600 / £3100
ApacheIgnite Apache Ignite: Improve speed, scale and availability with in-memory computing Bradford - Carlisle Business Centre 14 hours Tue, 2018-03-20 09:30 £2200 / £2700
68736 Hadoop for Developers (2 days) Bradford - Carlisle Business Centre 14 hours Tue, 2018-03-20 09:30 £2200 / £2700
hadoopadm Hadoop Administration Bradford - Carlisle Business Centre 21 hours Tue, 2018-03-20 09:30 £3300 / £4050
neo4j Beyond the relational database: neo4j Bradford - Carlisle Business Centre 21 hours Tue, 2018-03-20 09:30 £3300 / £4050
vespa Vespa: Serving large-scale data in real-time Bradford - Carlisle Business Centre 14 hours Wed, 2018-03-21 09:30 £2200 / £2700
bigddbsysfun Big Data & Database Systems Fundamentals Bradford - Carlisle Business Centre 14 hours Wed, 2018-03-21 09:30 £2200 / £2700
druid Druid: Build a fast, real-time data analysis system Bradford - Carlisle Business Centre 21 hours Wed, 2018-03-21 09:30 £3300 / £4050
hypertable Hypertable: Deploy a BigTable like database Bradford - Carlisle Business Centre 14 hours Wed, 2018-03-21 09:30 £2200 / £2700
hadoopforprojectmgrs Hadoop for Project Managers Bradford - Carlisle Business Centre 14 hours Wed, 2018-03-21 09:30 £2200 / £2700
deckgl deck.gl: Visualizing Large-scale Geospatial Data Bradford - Carlisle Business Centre 14 hours Thu, 2018-03-22 09:30 £2200 / £2700
psr Introduction to Recommendation Systems Bradford - Carlisle Business Centre 7 hours Thu, 2018-03-22 09:30 £1100 / £1350
osovv OpenStack Overview Bradford - Carlisle Business Centre 7 hours Mon, 2018-03-26 09:30 £1100 / £1350
smtwebint Semantic Web Overview Bradford - Carlisle Business Centre 7 hours Mon, 2018-03-26 09:30 £1100 / £1350
68780 Apache Spark Bradford - Carlisle Business Centre 14 hours Mon, 2018-03-26 09:30 £2200 / £2700
dmmlr Data Mining & Machine Learning with R Bradford - Carlisle Business Centre 14 hours Mon, 2018-03-26 09:30 £2600 / £3100
matlabpredanalytics Matlab for Predictive Analytics Bradford - Carlisle Business Centre 21 hours Mon, 2018-03-26 09:30 £3300 / £4050
bigdarch Big Data Architect Bradford - Carlisle Business Centre 35 hours Mon, 2018-03-26 09:30 £5500 / £6750
kylin Apache Kylin: From classic OLAP to real-time data warehouse Bradford - Carlisle Business Centre 14 hours Mon, 2018-03-26 09:30 £2200 / £2700
flink Flink for scalable stream and batch data processing Bradford - Carlisle Business Centre 28 hours Mon, 2018-03-26 09:30 £4400 / £5400
magellan Magellan: Geospatial Analytics with on Spark Bradford - Carlisle Business Centre 14 hours Tue, 2018-03-27 09:30 £2200 / £2700
glusterfs GlusterFS for System Administrators Bradford - Carlisle Business Centre 21 hours Mon, 2018-04-02 09:30 £3300 / £4050
bigdatar Programming with Big Data in R Bradford - Carlisle Business Centre 21 hours Wed, 2018-04-04 09:30 £3300 / £4050
hdp Hortonworks Data Platform (HDP) for administrators Bradford - Carlisle Business Centre 21 hours Wed, 2018-04-04 09:30 £3300 / £4050
bigdatastore Big Data Storage Solution - NoSQL Bradford - Carlisle Business Centre 14 hours Thu, 2018-04-05 09:30 £2200 / £2700
storm Apache Storm Bradford - Carlisle Business Centre 28 hours Mon, 2018-04-09 09:30 £4400 / £5400
matlab2 MATLAB Fundamentals Bradford - Carlisle Business Centre 21 hours Mon, 2018-04-09 09:30 £3300 / £4050
solrdev Solr for Developers Bradford - Carlisle Business Centre 21 hours Mon, 2018-04-09 09:30 £3300 / £4050
d2dbdpa From Data to Decision with Big Data and Predictive Analytics Bradford - Carlisle Business Centre 21 hours Mon, 2018-04-09 09:30 £3900 / £4650
mdlmrah Model MapReduce and Apache Hadoop Bradford - Carlisle Business Centre 14 hours Mon, 2018-04-09 09:30 £2200 / £2700
accumulo Apache Accumulo: Building highly scalable big data applications Bradford - Carlisle Business Centre 21 hours Mon, 2018-04-09 09:30 £3300 / £4050
TalendDI Talend Open Studio for Data Integration Bradford - Carlisle Business Centre 28 hours Mon, 2018-04-16 09:30 N/A / £5400
apex Apache Apex: Processing big data-in-motion Bradford - Carlisle Business Centre 21 hours Mon, 2018-04-16 09:30 £3300 / £4050
graphcomputing Introduction to Graph Computing Bradford - Carlisle Business Centre 28 hours Mon, 2018-04-16 09:30 £4400 / £5400
kdbplusandq kdb+ and q: Analyze time series data Bradford - Carlisle Business Centre 21 hours Tue, 2018-04-17 09:30 £3300 / £4050
aifortelecom AI Awareness for Telecom Bradford - Carlisle Business Centre 14 hours Tue, 2018-04-17 09:30 £2200 / £2700
PentahoDI Pentaho Data Integration Fundamentals Bradford - Carlisle Business Centre 21 hours Wed, 2018-04-18 09:30 £3300 / £4050
hadoopba Hadoop for Business Analysts Bradford - Carlisle Business Centre 21 hours Wed, 2018-04-18 09:30 £3300 / £4050
apachemdev Apache Mahout for Developers Bradford - Carlisle Business Centre 14 hours Wed, 2018-04-18 09:30 £2600 / £3100
IntroToAvro Apache Avro: Data serialization for distributed applications Bradford - Carlisle Business Centre 14 hours Thu, 2018-04-19 09:30 £2200 / £2700
flockdb Flockdb: A Simple Graph Database for Social Media Bradford - Carlisle Business Centre 7 hours Fri, 2018-04-20 09:30 £1100 / £1350
hadoopmapr Hadoop Administration on MapR Bradford - Carlisle Business Centre 28 hours Mon, 2018-04-23 09:30 £4400 / £5400
sparkdev Spark for Developers Bradford - Carlisle Business Centre 21 hours Mon, 2018-04-23 09:30 £3300 / £4050
datashrinkgov Data Shrinkage for Government Bradford - Carlisle Business Centre 14 hours Tue, 2018-04-24 09:30 £2200 / £2700
voldemort Voldemort: Setting up a key-value distributed data store Bradford - Carlisle Business Centre 14 hours Tue, 2018-04-24 09:30 £2200 / £2700
zeppelin Zeppelin for interactive data analytics Bradford - Carlisle Business Centre 14 hours Thu, 2018-04-26 09:30 £2200 / £2700
datameer Datameer for Data Analysts Bradford - Carlisle Business Centre 14 hours Thu, 2018-04-26 09:30 £2200 / £2700
bigdatabicriminal Big Data Business Intelligence for Criminal Intelligence Analysis Bradford - Carlisle Business Centre 35 hours Mon, 2018-04-30 09:30 £5500 / £6750
iotemi IoT (Internet of Things) for Entrepreneurs, Managers and Investors Bradford - Carlisle Business Centre 21 hours Mon, 2018-04-30 09:30 £3900 / £4650
dataar Data Analytics With R Bradford - Carlisle Business Centre 21 hours Mon, 2018-04-30 09:30 £3300 / £4050
bdbitcsp Big Data Business Intelligence for Telecom and Communication Service Providers Bradford - Carlisle Business Centre 35 hours Mon, 2018-04-30 09:30 £5500 / £6750
nifi Apache NiFi for Administrators Bradford - Carlisle Business Centre 21 hours Tue, 2018-05-01 09:30 £3300 / £4050
alluxio Alluxio: Unifying disparate storage systems Bradford - Carlisle Business Centre 7 hours Tue, 2018-05-01 09:30 £1100 / £1350
rintrob Introductory R for Biologists Bradford - Carlisle Business Centre 28 hours Tue, 2018-05-01 09:30 £4400 / £5400
hadoopdeva Advanced Hadoop for Developers Bradford - Carlisle Business Centre 21 hours Tue, 2018-05-01 09:30 £3300 / £4050
scylladb Scylla database Bradford - Carlisle Business Centre 21 hours Wed, 2018-05-02 09:30 £3300 / £4050
samza Samza for stream processing Bradford - Carlisle Business Centre 14 hours Thu, 2018-05-03 09:30 £2200 / £2700
apacheh Administrator Training for Apache Hadoop Bradford - Carlisle Business Centre 35 hours Mon, 2018-05-07 09:30 £5500 / £6750
matlabfundamentalsfinance MATLAB Fundamentals + MATLAB for Finance Bradford - Carlisle Business Centre 35 hours Mon, 2018-05-07 09:30 £5500 / £6750
datavault Data Vault: Building a Scalable Data Warehouse Bradford - Carlisle Business Centre 28 hours Mon, 2018-05-07 09:30 £4400 / £5400
datamin Data Mining Bradford - Carlisle Business Centre 21 hours Mon, 2018-05-07 09:30 £3900 / £4650
bdbiga Big Data Business Intelligence for Govt. Agencies Bradford - Carlisle Business Centre 35 hours Mon, 2018-05-07 09:30 £5500 / £6750
kdd Knowledge Discover in Databases (KDD) Bradford - Carlisle Business Centre 21 hours Tue, 2018-05-08 09:30 £3300 / £4050
ApHadm1 Apache Hadoop: Manipulation and Transformation of Data Performance Bradford - Carlisle Business Centre 21 hours Tue, 2018-05-08 09:30 £3300 / £4050
hadoopdev Hadoop for Developers (4 days) Bradford - Carlisle Business Centre 28 hours Tue, 2018-05-08 09:30 £4400 / £5400
rneuralnet Neural Network in R Bradford - Carlisle Business Centre 14 hours Tue, 2018-05-08 09:30 £2600 / £3100
pythonmultipurpose Advanced Python Bradford - Carlisle Business Centre 28 hours Tue, 2018-05-08 09:30 £4400 / £5400
BigData_ A practical introduction to Data Analysis and Big Data Bradford - Carlisle Business Centre 35 hours Tue, 2018-05-08 09:30 £5500 / £6500
sparkpython Python and Spark for Big Data (PySpark) Bradford - Carlisle Business Centre 21 hours Tue, 2018-05-08 09:30 £3300 / £4050
ApacheIgnite Apache Ignite: Improve speed, scale and availability with in-memory computing Bradford - Carlisle Business Centre 14 hours Wed, 2018-05-09 09:30 £2200 / £2700
hadoopadm1 Hadoop For Administrators Bradford - Carlisle Business Centre 21 hours Wed, 2018-05-09 09:30 £3300 / £4050
68736 Hadoop for Developers (2 days) Bradford - Carlisle Business Centre 14 hours Wed, 2018-05-09 09:30 £2200 / £2700
bigddbsysfun Big Data & Database Systems Fundamentals Bradford - Carlisle Business Centre 14 hours Thu, 2018-05-10 09:30 £2200 / £2700
rprogda R Programming for Data Analysis Bradford - Carlisle Business Centre 14 hours Thu, 2018-05-10 09:30 £2200 / £2700
vespa Vespa: Serving large-scale data in real-time Bradford - Carlisle Business Centre 14 hours Thu, 2018-05-10 09:30 £2200 / £2700
hadoopforprojectmgrs Hadoop for Project Managers Bradford - Carlisle Business Centre 14 hours Thu, 2018-05-10 09:30 £2200 / £2700
psr Introduction to Recommendation Systems Bradford - Carlisle Business Centre 7 hours Fri, 2018-05-11 09:30 £1100 / £1350
hypertable Hypertable: Deploy a BigTable like database Bradford - Carlisle Business Centre 14 hours Mon, 2018-05-14 09:30 £2200 / £2700
hadoopadm Hadoop Administration Bradford - Carlisle Business Centre 21 hours Mon, 2018-05-14 09:30 £3300 / £4050
neo4j Beyond the relational database: neo4j Bradford - Carlisle Business Centre 21 hours Mon, 2018-05-14 09:30 £3300 / £4050
druid Druid: Build a fast, real-time data analysis system Bradford - Carlisle Business Centre 21 hours Mon, 2018-05-14 09:30 £3300 / £4050

Course Outlines

Code Name Duration Outline
bigdatar Programming with Big Data in R 21 hours
sparkdev Spark for Developers 21 hours

OBJECTIVE:

This course will introduce Apache Spark. The students will learn how  Spark fits  into the Big Data ecosystem, and how to use Spark for data analysis.  The course covers Spark shell for interactive data analysis, Spark internals, Spark APIs, Spark SQL, Spark streaming, and machine learning and graphX.

AUDIENCE :

Developers / Data Analysts

voldemort Voldemort: Setting up a key-value distributed data store 14 hours

Voldemort is an open-source distributed data store that is designed as a key-value store.  It is used at LinkedIn by numerous critical services powering a large portion of the site.

This course will introduce the architecture and capabilities of Voldomort and walk participants through the setup and application of a key-value distributed data store.

Audience
    Software developers
    System administrators
    DevOps engineers

Format of the course
    Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding

flink Flink for scalable stream and batch data processing 28 hours

Apache Flink is an open-source framework for scalable stream and batch data processing.

This instructor-led, live training introduces the principles and approaches behind distributed stream and batch data processing, and walks participants through the creation of a real-time, data streaming application.

By the end of this training, participants will be able to:

  • Set up an environment for developing data analysis applications
  • Package, execute, and monitor Flink-based, fault-tolerant, data streaming applications
  • Manage diverse workloads
  • Perform advanced analytics using Flink ML
  • Set up a multi-node Flink cluster
  • Measure and optimize performance
  • Integrate Flink with different Big Data systems
  • Compare Flink capabilities with those of other big data processing frameworks

Audience

  • Developers
  • Architects
  • Data engineers
  • Analytics professionals
  • Technical managers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
sparkpython Python and Spark for Big Data (PySpark) 21 hours

Python is a high-level programming language famous for its clear syntax and code readibility. Spark is a data processing engine used in querying, analyzing, and transforming big data. PySpark allows users to interface Spark with Python.

In this instructor-led, live training, participants will learn how to use Python and Spark together to analyze big data as they work on hands-on exercises.

By the end of this training, participants will be able to:

  • Learn how to use Spark with Python to analyze Big Data
  • Work on exercises that mimic real world circumstances
  • Use different tools and techniques for big data analysis using PySpark

Audience

  • Developers
  • IT Professionals
  • Data Scientists

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
psr Introduction to Recommendation Systems 7 hours

Audience

Marketing department employees, IT strategists and other people involved in decisions related to the design and implementation of recommender systems.

Format

Short theoretical background follow by analysing working examples and short, simple exercises.

hbasedev HBase for Developers 21 hours

This course introduces HBase – a NoSQL store on top of Hadoop.  The course is intended for developers who will be using HBase to develop applications,  and administrators who will manage HBase clusters.

We will walk a developer through HBase architecture and data modelling and application development on HBase. It will also discuss using MapReduce with HBase, and some administration topics, related to performance optimization. The course  is very  hands-on with lots of lab exercises.


Duration : 3 days

Audience : Developers  & Administrators

BigData_ A practical introduction to Data Analysis and Big Data 35 hours

Participants who complete this training will gain a practical, real-world understanding of Big Data and its related technologies, methodologies and tools.

Participants will have the opportunity to put this knowledge into practice through hands-on exercises. Group interaction and instructor feedback make up an important component of the class.

The course starts with an introduction to elemental concepts of Big Data, then progresses into the programming languages and methodologies used to perform Data Analysis. Finally, we discuss the tools and infrastructure that enable Big Data storage, Distributed Processing, and Scalability.

Audience

  • Developers / programmers
  • IT consultants

Format of the course

  • Part lecture, part discussion, hands-on practice and implementation, occasional quizing to measure progress.
alluxio Alluxio: Unifying disparate storage systems 7 hours

Alexio is an open-source virtual distributed storage system that unifies disparate storage systems and enables applications to interact with data at memory speed. It is used by companies such as Intel, Baidu and Alibaba.

In this instructor-led, live training, participants will learn how to use Alexio to bridge different computation frameworks with storage systems and efficiently manage multi-petabyte scale data as they step through the creation of an application with Alluxio.

By the end of this training, participants will be able to:

  • Develop an application with Alluxio
  • Connect big data systems and applications while preserving one namespace
  • Efficiently extract value from big data in any storage format
  • Improve workload performance
  • Deploy and manage Alluxio standalone or clustered

Audience

  • Data scientist
  • Developer
  • System administrator

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
apachedrill Apache Drill for On-the-Fly Analysis of Multiple Big Data Formats 21 hours

Apache Drill is a schema-free, distributed, in-memory columnar SQL query engine for Hadoop, NoSQL and and other Cloud and file storage systems. Apache Drill's power lies in its ability to join data from multiple data stores using a single query. Apache Drill supports numerous NoSQL databases and file systems, including HBase, MongoDB, MapR-DB, HDFS, MapR-FS, Amazon S3, Azure Blob Storage, Google Cloud Storage, Swift, NAS and local files.

In this instructor-led, live training, participants will learn the fundamentals of Apache Drill, then leverage the power and convenience of SQL to interactively query big data without writing code. Participants will also learn how to optimize their Drill queries for distributed SQL execution.

By the end of this training, participants will be able to:

  • Perform "self-service" exploration on structured and semi-structured data on Hadoop
  • Query known as well as unknown data using SQL queries
  • Understand how Apache Drills receives and executes queries
  • Write SQL queries to analyze different types of data, including structured data in Hive, semi-structured data in HBase or MapR-DB tables, and data saved in files such as Parquet and JSON.
  • Use Apache Drill to perform on-the-fly schema discovery, bypassing the need for complex ETL and schema operations
  • Integrate Apache Drill with BI (Business Intelligence) tools such as Tableau, Qlikview, MicroStrategy and Excel

Audience

  • Data analysts
  • Data scientists
  • SQL programmers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
hadoopmapr Hadoop Administration on MapR 28 hours

Audience:

This course is intended to demystify big data/hadoop technology and to show it is not difficult to understand.

bigddbsysfun Big Data & Database Systems Fundamentals 14 hours

The course is part of the Data Scientist skill set (Domain: Data and Technology).

DatSci7 Data Science Programme 245 hours

The explosion of information and data in today’s world is un-paralleled, our ability to innovate and push the boundaries of the possible is growing faster than it ever has. The role of Data Scientist is one of the highest in-demand skills across industry today.

We offer much more than learning through theory; we deliver practical, marketable skills that bridge the gap between the world of academia and the demands of industry.

This 7 week curriculum  can be tailored to your specific Industry requirements, please contact us for further information or visit the Nobleprog Institute website www.inobleprog.co.uk

Audience:

This programme is aimed post level graduates as well as anyone with the required pre-requisite skills which will be determined by an assessment and interview. 

Delivery:

Delivery of the course will be a mixture of Instructor Led Classroom and Instructor Led Online; typically the 1st week will be 'classroom led', weeks 2 - 6 'virtual classroom' and week 7  back to 'classroom led'. 

 

 

apex Apache Apex: Processing big data-in-motion 21 hours

Apache Apex is a YARN-native platform that unifies stream and batch processing. It processes big data-in-motion in a way that is scalable, performant, fault-tolerant, stateful, secure, distributed, and easily operable.

This instructor-led, live training introduces Apache Apex's unified stream processing architecture and walks participants through the creation of a distributed application using Apex on Hadoop.

By the end of this training, participants will be able to:

  • Understand data processing pipeline concepts such as connectors for sources and sinks, common data transformations, etc.
  • Build, scale and optimize an Apex application
  • Process real-time data streams reliably and with minimum latency
  • Use Apex Core and the Apex Malhar library to enable rapid application development
  • Use the Apex API to write and re-use existing Java code
  • Integrate Apex into other applications as a processing engine
  • Tune, test and scale Apex applications

Audience

  • Developers
  • Enterprise architects

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
aifortelecom AI Awareness for Telecom 14 hours

AI is a collection of technologies for building intelligent systems capable of understanding data and the activities surrounding the data to make "intelligent decisions". For Telecom providers, building applications and services that make use of AI could open the door for improved operations and servicing in areas such as maintenance and network optimization.

In this course we examine the various technologies that make up AI and the skill sets required to put them to use. Throughout the course, we examine AI's specific applications within the Telecom industry.

Audience

  • Network engineers
  • Network operations personnel
  • Telecom technical managers

Format of the course

  •     Part lecture, part discussion, hands-on exercises
smtwebint Semantic Web Overview 7 hours

The Semantic Web is a collaborative movement led by the World Wide Web Consortium (W3C) that promotes common formats for data on the World Wide Web. The Semantic Web provides a common framework that allows data to be shared and reused across application, enterprise, and community boundaries.

bigdarch Big Data Architect 35 hours

Day 1 - provides a high-level overview of essential Big Data topic areas. The module is divided into a series of sections, each of which is accompanied by a hands-on exercise.

Day 2 - explores a range of topics that relate analysis practices and tools for Big Data environments. It does not get into implementation or programming details, but instead keeps coverage at a conceptual level, focusing on topics that enable participants to develop a comprehensive understanding of the common analysis functions and features offered by Big Data solutions.

Day 3 - provides an overview of the fundamental and essential topic areas relating to Big Data solution platform architecture. It covers Big Data mechanisms required for the development of a Big Data solution platform and architectural options for assembling a data processing platform. Common scenarios are also presented to provide a basic understanding of how a Big Data solution platform is generally used. 

Day 4 - builds upon Day 3 by exploring advanced topics relatng to Big Data solution platform architecture. In particular, different architectural layers that make up the Big Data solution platform are introduced and discussed, including data sources, data ingress, data storage, data processing and security. 

Day 5 - covers a number of exercises and problems designed to test the delegates ability to apply knowledge of topics covered Day 3 and 4. 

rprogda R Programming for Data Analysis 14 hours

This course is part of the Data Scientist skill set (Domain: Data and Technology)

ApHadm1 Apache Hadoop: Manipulation and Transformation of Data Performance 21 hours


This course is intended for developers, architects, data scientists or any profile that requires access to data either intensively or on a regular basis.

The major focus of the course is data manipulation and transformation.

Among the tools in the Hadoop ecosystem this course includes the use of Pig and Hive both of which are heavily used for data transformation and manipulation.

This training also addresses performance metrics and performance optimisation.

The course is entirely hands on and is punctuated by presentations of the theoretical aspects.

vespa Vespa: Serving large-scale data in real-time 14 hours

Vespa an open-source big data processing and serving engine created by Yahoo.  It is used to respond to user queries, make recommendations, and provide personalized content and advertisements in real-time.

This instructor-led, live training introduces the challenges of serving large-scale data and walks participants through the creation of an application that can compute responses to user requests, over large datasets in real-time.

By the end of this training, participants will be able to:

  • Use Vespa to quickly compute data (store, search, rank, organize) at serving time while a user waits
  • Implement Vespa into existing applications involving feature search, recommendations, and personalization
  • Integrate and deploy Vespa with existing big data systems such as Hadoop and Storm.

Audience

  • Developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
mdlmrah Model MapReduce and Apache Hadoop 14 hours

The course is intended for IT specialist that works with the distributed processing of large data sets across clusters of computers.

datashrinkgov Data Shrinkage for Government 14 hours
dmmlr Data Mining & Machine Learning with R 14 hours

R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has a wide variety of packages for data mining.

hadoopforprojectmgrs Hadoop for Project Managers 14 hours

As more and more software and IT projects migrate from local processing and data management to distributed processing and big data storage, Project Managers are finding the need to upgrade their knowledge and skills to grasp the concepts and practices relevant to Big Data projects and opportunities.

This course introduces Project Managers to the most popular Big Data processing framework: Hadoop.  

In this instructor-led training, participants will learn the core components of the Hadoop ecosystem and how these technologies can be used to solve large-scale problems. In learning these foundations, participants will also improve their ability to communicate with the developers and implementers of these systems as well as the data scientists and analysts that many IT projects involve.

Audience

  • Project Managers wishing to implement Hadoop into their existing development or IT infrastructure
  • Project Managers needing to communicate with cross-functional teams that include big data engineers, data scientists and business analysts

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
ApacheIgnite Apache Ignite: Improve speed, scale and availability with in-memory computing 14 hours

Apache Ignite is an in-memory computing platform that sits between the application and data layer to improve speed, scale and availability.

In this instructor-led, live training, participants will learn the principles behind persistent and pure in-memory storage as they step through the creation of a sample in-memory computing project.

By the end of this training, participants will be able to:

  • Use Ignite for in-memory, on-disk persistence as well as a purely distributed in-memory database
  • Achieve persistence without syncing data back to a relational database
  • Use Ignite to carry out SQL and distributed joins
  • Improve performance by moving data closer to the CPU, using RAM as a storage
  • Spread data sets across a cluster to achieve horizontal scalability
  • Integrate Ignite with RDBMS, NoSQL, Hadoop and machine learning processors

Audience

  • Developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
rneuralnet Neural Network in R 14 hours

This course is an introduction to applying neural networks in real world problems using R-project software.

matlab2 MATLAB Fundamentals 21 hours

This three-day course provides a comprehensive introduction to the MATLAB technical computing environment. The course is intended for beginning users and those looking for a review. No prior programming experience or knowledge of MATLAB is assumed. Themes of data analysis, visualization, modeling, and programming are explored throughout the course. Topics include:

  •     Working with the MATLAB user interface
  •     Entering commands and creating variables
  •     Analyzing vectors and matrices
  •     Visualizing vector and matrix data
  •     Working with data files
  •     Working with data types
  •     Automating commands with scripts
  •     Writing programs with logic and flow control
  •     Writing functions
altdomexp Analytics Domain Expertise 7 hours

This course is part of the Data Scientist skill set (Domain: Analytics Domain Expertise).

storm Apache Storm 28 hours

Apache Storm is a distributed, real-time computation engine used for enabling real-time business intelligence. It does so by enabling applications to reliably process unbounded streams of data (a.k.a. stream processing).

"Storm is for real-time processing what Hadoop is for batch processing!"

In this instructor-led live training, participants will learn how to install and configure Apache Storm, then develop and deploy an Apache Storm application for processing big data in real-time.

Some of the topics included in this training include:

  • Apache Storm in the context of Hadoop
  • Working with unbounded data
  • Continuous computation
  • Real-time analytics
  • Distributed RPC and ETL processing

Request this course now!

Audience

  • Software and ETL developers
  • Mainframe professionals
  • Data scientists
  • Big data analysts
  • Hadoop professionals

Format of the course

  •     Part lecture, part discussion, exercises and heavy hands-on practice
datameer Datameer for Data Analysts 14 hours

Datameer is a business intelligence and analytics platform built on Hadoop. It allows end-users to access, explore and correlate large-scale, structured, semi-structured and unstructured data in an easy-to-use fashion.

In this instructor-led, live training, participants will learn how to use Datameer to overcome Hadoop's steep learning curve as they step through the setup and analysis of a series of big data sources.

By the end of this training, participants will be able to:

  • Create, curate, and interactively explore an enterprise data lake
  • Access business intelligence data warehouses, transactional databases and other analytic stores
  • Use a spreadsheet user-interface to design end-to-end data processing pipelines
  • Access pre-built functions to explore complex data relationships
  • Use drag-and-drop wizards to visualize data and create dashboards
  • Use tables, charts, graphs, and maps to analyze query results

Audience

  • Data analysts

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
bdbiga Big Data Business Intelligence for Govt. Agencies 35 hours

Advances in technologies and the increasing amount of information are transforming how business is conducted in many industries, including government. Government data generation and digital archiving rates are on the rise due to the rapid growth of mobile devices and applications, smart sensors and devices, cloud computing solutions, and citizen-facing portals. As digital information expands and becomes more complex, information management, processing, storage, security, and disposition become more complex as well. New capture, search, discovery, and analysis tools are helping organizations gain insights from their unstructured data. The government market is at a tipping point, realizing that information is a strategic asset, and government needs to protect, leverage, and analyze both structured and unstructured information to better serve and meet mission requirements. As government leaders strive to evolve data-driven organizations to successfully accomplish mission, they are laying the groundwork to correlate dependencies across events, people, processes, and information.

High-value government solutions will be created from a mashup of the most disruptive technologies:

  • Mobile devices and applications
  • Cloud services
  • Social business technologies and networking
  • Big Data and analytics

IDC predicts that by 2020, the IT industry will reach $5 trillion, approximately $1.7 trillion larger than today, and that 80% of the industry's growth will be driven by these 3rd Platform technologies. In the long term, these technologies will be key tools for dealing with the complexity of increased digital information. Big Data is one of the intelligent industry solutions and allows government to make better decisions by taking action based on patterns revealed by analyzing large volumes of data — related and unrelated, structured and unstructured.

But accomplishing these feats takes far more than simply accumulating massive quantities of data.“Making sense of thesevolumes of Big Datarequires cutting-edge tools and technologies that can analyze and extract useful knowledge from vast and diverse streams of information,” Tom Kalil and Fen Zhao of the White House Office of Science and Technology Policy wrote in a post on the OSTP Blog.

The White House took a step toward helping agencies find these technologies when it established the National Big Data Research and Development Initiative in 2012. The initiative included more than $200 million to make the most of the explosion of Big Data and the tools needed to analyze it.

The challenges that Big Data poses are nearly as daunting as its promise is encouraging. Storing data efficiently is one of these challenges. As always, budgets are tight, so agencies must minimize the per-megabyte price of storage and keep the data within easy access so that users can get it when they want it and how they need it. Backing up massive quantities of data heightens the challenge.

Analyzing the data effectively is another major challenge. Many agencies employ commercial tools that enable them to sift through the mountains of data, spotting trends that can help them operate more efficiently. (A recent study by MeriTalk found that federal IT executives think Big Data could help agencies save more than $500 billion while also fulfilling mission objectives.).

Custom-developed Big Data tools also are allowing agencies to address the need to analyze their data. For example, the Oak Ridge National Laboratory’s Computational Data Analytics Group has made its Piranha data analytics system available to other agencies. The system has helped medical researchers find a link that can alert doctors to aortic aneurysms before they strike. It’s also used for more mundane tasks, such as sifting through résumés to connect job candidates with hiring managers.

osovv OpenStack Overview 7 hours

The course is dedicated to IT engineers and architects who are looking for a solution to host private or public IaaS (Infrastructure as a Service) cloud.
This is also great opportunity for IT managers to gain knowledge overview about possibilities which could be enabled by OpenStack.

Before You spend a lot of money on OpenStack implementation, You could consider all pros and cons by attending on our course.
This topic is also avaliable as individual consultancy.

Course goal:

  • gaining basic knowledge regarding OpenStack

cpb100 Google Cloud Platform Fundamentals: Big Data & Machine Learning 8 hours

This one-day instructor-led course introduces participants to the big data capabilities of Google Cloud Platform. Through a combination of presentations, demos, and hands-on labs, participants get an overview of the Google Cloud platform and a detailed view of the data processing and machine learning capabilities. This course showcases the ease, flexibility, and power of big data solutions on Google Cloud Platform.

This course teaches participants the following skills:

  • Identify the purpose and value of the key Big Data and Machine Learning products in the Google Cloud Platform.
  • Use Cloud SQL and Cloud Dataproc to migrate existing MySQL and Hadoop/Pig/Spark/Hive workloads to Google Cloud Platform.
  • Employ BigQuery and Cloud Datalab to carry out interactive data analysis.
  • Train and use a neural network using TensorFlow.
  • Employ ML APIs.
  • Choose between different data processing products on the Google Cloud Platform.

This class is intended for the following:

  • Data analysts, Data scientists, Business analysts getting started with Google Cloud Platform.
  • Individuals responsible for designing pipelines and architectures for data processing, creating and maintaining machine learning and statistical models, querying datasets, visualizing query results and creating reports.
  • Executives and IT decision makers evaluating Google Cloud Platform for use by data scientists.
glusterfs GlusterFS for System Administrators 21 hours

GlusterFS is an open-source distributed file storage system that can scale up to petabytes of capacity. GlusterFS is designed to provide additional space depending on the user's storage requirements. A common application for GlusterFS is cloud computing storage systems.

In this instructor-led training, participants will learn how to use normal, off-the-shelf hardware to create and deploy a storage system that is scalable and always available. 

By the end of the course, participants will be able to:

  • Install, configure, and maintain a full-scale GlusterFS system.
  • Implement large-scale storage systems in different types of environments.

Audience

  • System administrators
  • Storage administrators

Format of the Course

  • Part lecture, part discussion, exercises and heavy hands-on practice.
datavault Data Vault: Building a Scalable Data Warehouse 28 hours

Data vault modeling is a database modeling technique that provides long-term historical storage of data that originates from multiple sources. A data vault stores a single version of the facts, or "all the data, all of the time". Its flexible, scalable, consistent and adaptable design encompasses the best aspects of 3rd normal form (3NF) and star schema.

In this instructor-led, live training, participants will learn how to build a Data Vault.

By the end of this training, participants will be able to:

  • Understand the architecture and design concepts behind Data Vault 2.0, and its interaction with Big Data, NoSQL and AI.
  • Use data vaulting techniques to enable auditing, tracing, and inspection of historical data in a data warehouse
  • Develop a consistent and repeatable ETL (Extract, Transform, Load) process
  • Build and deploy highly scalable and repeatable warehouses

Audience

  • Data modelers
  • Data warehousing specialist
  • Business Intelligence specialists
  • Data engineers
  • Database administrators

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
iotemi IoT (Internet of Things) for Entrepreneurs, Managers and Investors 21 hours

Unlike other technologies, IoT is far more complex encompassing almost every branch of core Engineering-Mechanical, Electronics, Firmware, Middleware, Cloud, Analytics and Mobile. For each of its engineering layers, there are aspects of economics, standards, regulations and evolving state of the art. This is for the firs time, a modest course is offered to cover all of these critical aspects of IoT Engineering.

Summary

  • An advanced training program covering the current state of the art in Internet of Things

  • Cuts across multiple technology domains to develop awareness of an IoT system and its components and how it can help businesses and organizations.

  • Live demo of model IoT applications to showcase practical IoT deployments across different industry domains, such as Industrial IoT, Smart Cities, Retail, Travel & Transportation and use cases around connected devices & things

Target Audience

  • Managers responsible for business and operational processes within their respective organizations and want to know how to harness IoT to make their systems and processes more efficient.

  • Entrepreneurs and Investors who are looking to build new ventures and want to develop a better understanding of the IoT technology landscape to see how they can leverage it in an effective manner.

Estimates for Internet of Things or IoT market value are massive, since by definition the IoT is an integrated and diffused layer of devices, sensors, and computing power that overlays entire consumer, business-to-business, and government industries. The IoT will account for an increasingly huge number of connections: 1.9 billion devices today, and 9 billion by 2018. That year, it will be roughly equal to the number of smartphones, smart TVs, tablets, wearable computers, and PCs combined.

In the consumer space, many products and services have already crossed over into the IoT, including kitchen and home appliances, parking, RFID, lighting and heating products, and a number of applications in Industrial Internet.

However, the underlying technologies of IoT are nothing new as M2M communication existed since the birth of Internet. However what changed in last couple of years is the emergence of number of inexpensive wireless technologies added by overwhelming adaptation of smart phones and Tablet in every home. Explosive growth of mobile devices led to present demand of IoT.

Due to unbounded opportunities in IoT business, a large number of small and medium sized entrepreneurs jumped on a bandwagon of IoT gold rush. Also due to emergence of open source electronics and IoT platform, cost of development of IoT system and further managing its sizable production is increasingly affordable. Existing electronic product owners are experiencing pressure to integrate their device with Internet or Mobile app.

This training is intended for a technology and business review of an emerging industry so that IoT enthusiasts/entrepreneurs can grasp the basics of IoT technology and business.

Course Objective

Main objective of the course is to introduce emerging technological options, platforms and case studies of IoT implementation in home & city automation (smart homes and cities), Industrial Internet, healthcare, Govt., Mobile Cellular and other areas.

  1. Basic introduction of all the elements of IoT-Mechanical, Electronics/sensor platform, Wireless and wireline protocols, Mobile to Electronics integration, Mobile to enterprise integration, Data-analytics and Total control plane

  2. M2M Wireless protocols for IoT- WiFi, Zigbee/Zwave, Bluetooth, ANT+ : When and where to use which one?

  3. Mobile/Desktop/Web app- for registration, data acquisition and control –Available M2M data acquisition platform for IoT-–Xively, Omega and NovoTech, etc.

  4. Security issues and security solutions for IoT

  5. Open source/commercial electronics platform for IoT-Raspberry Pi, Arduino , ArmMbedLPC etc

  6. Open source /commercial enterprise cloud platform for AWS-IoT apps, Azure -IOT, Watson-IOT cloud in addition to other minor IoT clouds

  7. Studies of business and technology of some of the common IoT devices like Home automation, Smoke alarm, vehicles, military, home health etc.

rintrob Introductory R for Biologists 28 hours

R is an open-source free programming language for statistical computing, data analysis, and graphics. R is used by a growing number of managers and data analysts inside corporations and academia. R has also found followers among statisticians, engineers and scientists without computer programming skills who find it easy to use. Its popularity is due to the increasing use of data mining for various goals such as set ad prices, find new drugs more quickly or fine-tune financial models. R has a wide variety of packages for data mining.

cassadmin Cassandra Administration 14 hours

This course will introduce Cassandra –  a popular NoSQL database.  It will cover Cassandra principles, architecture and data model.   Students will learn data modeling  in CQL (Cassandra Query Language) in hands-on, interactive labs.  This session also discusses Cassandra internals and some admin topics.

kylin Apache Kylin: From classic OLAP to real-time data warehouse 14 hours

Apache Kylin is an extreme, distributed analytics engine for big data.

In this instructor-led live training, participants will learn how to use Apache Kylin to set up a real-time data warehouse.

By the end of this training, participants will be able to:

  • Consume real-time streaming data using Kylin
  • Utilize Apache Kylin's powerful features, including snowflake schema support, a rich SQL interface, spark cubing and subsecond query latency

Note

  • We use the latest version of Kylin (as of this writing, Apache Kylin v2.0)

Audience

  • Big data engineers
  • Big Data analysts

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
deckgl deck.gl: Visualizing Large-scale Geospatial Data 14 hours

deck.gl is an open-source, WebGL-powered library for exploring and visualizing data assets at scale. Created by Uber, it is especially useful for gaining insights from geospatial data sources, such as data on maps.

This instructor-led, live training introduces the concepts and functionality behind deck.gl and walks participants through the set up of a demonstration project.

By the end of this training, participants will be able to:

  • Take data from very large collections and turn it into compelling visual representations
  • Visualize data collected from transportation and journey-related use cases, such as pick-up and drop-off experiences, network traffic, etc.
  • Apply layering techniques to geospatial data to depict changes in data over time
  • Integrate deck.gl with React (for Reactive programming) and Mapbox GL (for visualizations on Mapbox based maps).
  • Understand and explore other use cases for deck.gl, including visualizing points collected from a 3D indoor scan, visualizing machine learning models in order to optimize their algorithms, etc.

Audience

  • Developers
  • Data scientists

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
bdbitcsp Big Data Business Intelligence for Telecom and Communication Service Providers 35 hours

Overview

Communications service providers (CSP) are facing pressure to reduce costs and maximize average revenue per user (ARPU), while ensuring an excellent customer experience, but data volumes keep growing. Global mobile data traffic will grow at a compound annual growth rate (CAGR) of 78 percent to 2016, reaching 10.8 exabytes per month.

Meanwhile, CSPs are generating large volumes of data, including call detail records (CDR), network data and customer data. Companies that fully exploit this data gain a competitive edge. According to a recent survey by The Economist Intelligence Unit, companies that use data-directed decision-making enjoy a 5-6% boost in productivity. Yet 53% of companies leverage only half of their valuable data, and one-fourth of respondents noted that vast quantities of useful data go untapped. The data volumes are so high that manual analysis is impossible, and most legacy software systems can’t keep up, resulting in valuable data being discarded or ignored.

With Big Data & Analytics’ high-speed, scalable big data software, CSPs can mine all their data for better decision making in less time. Different Big Data products and techniques provide an end-to-end software platform for collecting, preparing, analyzing and presenting insights from big data. Application areas include network performance monitoring, fraud detection, customer churn detection and credit risk analysis. Big Data & Analytics products scale to handle terabytes of data but implementation of such tools need new kind of cloud based database system like Hadoop or massive scale parallel computing processor ( KPU etc.)

This course work on Big Data BI for Telco covers all the emerging new areas in which CSPs are investing for productivity gain and opening up new business revenue stream. The course will provide a complete 360 degree over view of Big Data BI in Telco so that decision makers and managers can have a very wide and comprehensive overview of possibilities of Big Data BI in Telco for productivity and revenue gain.

Course objectives

Main objective of the course is to introduce new Big Data business intelligence techniques in 4 sectors of Telecom Business (Marketing/Sales, Network Operation, Financial operation and Customer Relation Management). Students will be introduced to following:

  • Introduction to Big Data-what is 4Vs (volume, velocity, variety and veracity) in Big Data- Generation, extraction and management from Telco perspective
  • How Big Data analytic differs from legacy data analytic
  • In-house justification of Big Data -Telco perspective
  • Introduction to Hadoop Ecosystem- familiarity with all Hadoop tools like Hive, Pig, SPARC –when and how they are used to solve Big Data problem
  • How Big Data is extracted to analyze for analytics tool-how Business Analysis’s can reduce their pain points of collection and analysis of data through integrated Hadoop dashboard approach
  • Basic introduction of Insight analytics, visualization analytics and predictive analytics for Telco
  • Customer Churn analytic and Big Data-how Big Data analytic can reduce customer churn and customer dissatisfaction in Telco-case studies
  • Network failure and service failure analytics from Network meta-data and IPDR
  • Financial analysis-fraud, wastage and ROI estimation from sales and operational data
  • Customer acquisition problem-Target marketing, customer segmentation and cross-sale from sales data
  • Introduction and summary of all Big Data analytic products and where they fit into Telco analytic space
  • Conclusion-how to take step-by-step approach to introduce Big Data Business Intelligence in your organization

Target Audience

  • Network operation, Financial Managers, CRM managers and top IT managers in Telco CIO office.
  • Business Analysts in Telco
  • CFO office managers/analysts
  • Operational managers
  • QA managers
bigdatastore Big Data Storage Solution - NoSQL 14 hours

When traditional storage technologies don't handle the amount of data you need to store there are hundereds of alternatives. This course try to guide the participants what are alternatives for storing and analyzing Big Data and what are theirs pros and cons.

This course is mostly focused on discussion and presentation of solutions, though hands-on exercises are available on demand.

dsbda Data Science for Big Data Analytics 35 hours

Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy.

matlabfundamentalsfinance MATLAB Fundamentals + MATLAB for Finance 35 hours

This course provides a comprehensive introduction to the MATLAB technical computing environment + an introduction to using MATLAB for financial applications. The course is intended for beginning users and those looking for a review. No prior programming experience or knowledge of MATLAB is assumed. Themes of data analysis, visualization, modeling, and programming are explored throughout the course. Topics include:

  • Working with the MATLAB user interface
  • Entering commands and creating variables
  • Analyzing vectors and matrices
  • Visualizing vector and matrix data
  • Working with data files
  • Working with data types
  • Automating commands with scripts
  • Writing programs with logic and flow control
  • Writing functions
  • Using the Financial Toolbox for quantitative analysis
kdbplusandq kdb+ and q: Analyze time series data 21 hours

kdb+ is an in-memory, column-oriented database and q is its built-in, interpreted vector-based language. In kdb+, tables are columns of vectors and q is used to perform operations on the table data as if it was a list. kdb+ and q are commonly used in high frequency trading and are popular with the major financial institutions, including Goldman Sachs, Morgan Stanley, Merrill Lynch, JP Morgan, etc.

In this instructor-led, live training, participants will learn how to create a time series data application using kdb+ and q.

By the end of this training, participants will be able to:

  • Understand the difference between a row-oriented database and a column-oriented database
  • Select data, write scripts and create functions to carry out advanced analytics
  • Analyze time series data such as stock and commodity exchange data
  • Use kdb+'s in-memory capabilities to store, analyze, process and retrieve large data sets at high speed
  • Think of functions and data at a higher level than the standard function(arguments) approach common in non-vector languages
  • Explore other time-sensitive applications for kdb+, including energy trading, telecommunications, sensor data, log data, and machine and network usage monitoring

Audience

  • Developers
  • Database engineers
  • Data scientists
  • Data analysts

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
apacheh Administrator Training for Apache Hadoop 35 hours

Audience:

The course is intended for IT specialists looking for a solution to store and process large data sets in a distributed system environment

Goal:

Deep knowledge on Hadoop cluster administration.

hadoopdeva Advanced Hadoop for Developers 21 hours

Apache Hadoop is one of the most popular frameworks for processing Big Data on clusters of servers. This course delves into data management in HDFS, advanced Pig, Hive, and HBase.  These advanced programming techniques will be beneficial to experienced Hadoop developers.

Audience: developers

Duration: three days

Format: lectures (50%) and hands-on labs (50%).

 

IntroToAvro Apache Avro: Data serialization for distributed applications 14 hours

This course is intended for

  • Developers

Format of the course

  • Lectures, hands-on practice, small tests along the way to gauge understanding
TalendDI Talend Open Studio for Data Integration 28 hours

Talend Open Studio for Data Integration is an open-source data integration product used to combine, convert and update data in various locations across a business.

In this instructor-led, live training, participants will learn how to use the Talend ETL tool to carry out data transformation, data extraction, and connectivity with Hadoop, Hive, and Pig.
 
By the end of this training, participants will be able to

  • Explain the concepts behind ETL (Extract, Transform, Load) and propagation
  • Define ETL methods and ETL tools to connect with Hadoop
  • Efficiently amass, retrieve, digest, consume, transform and shape big data in accordance to business requirements
  • Upload to and extract large records from Hadoop, Hive, and NoSQL databases

Audience

  • Business intelligence professionals
  • Project managers
  • Database professionals
  • SQL Developers
  • ETL Developers
  • Solution architects
  • Data architects
  • Data warehousing professionals
  • System administrators and integrators

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
hypertable Hypertable: Deploy a BigTable like database 14 hours

Hypertable is an open-source software database management system based on the design of Google's Bigtable.

In this instructor-led, live training, participants will learn how to set up and manage a Hypertable database system.

By the end of this training, participants will be able to:

  • Install, configure and upgrade a Hypertable instance
  • Set up and administer a Hypertable cluster
  • Monitor and optimize the performance of the database
  • Design a Hypertable schema
  • Work with Hypertable's API
  • Troubleshoot operational issues

Audience

  • Developers
  • Operations engineers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Part lecture, part discussion, exercises and heavy hands-on practice

hadoopadm Hadoop Administration 21 hours

The course is dedicated to IT specialists that are looking for a solution to store and process large data sets in distributed system environment

Course goal:

Getting knowledge regarding Hadoop cluster administration

hadoopdev Hadoop for Developers (4 days) 28 hours

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. This course will introduce a developer to various components (HDFS, MapReduce, Pig, Hive and HBase) Hadoop ecosystem.

 

DM7 Getting started with DM7 21 hours

Audience

  • Beginner or intermediate database developers
  • Beginner or intermediate database administrators
  • Programmers

Format of the course

  • Heavy emphasis on hands-on practice. Most of the concepts are learned through samples, exercises and hands-on development
pythonmultipurpose Advanced Python 28 hours

In this instructor-led training, participants will learn advanced Python programming techniques, including how to apply this versatile language to solve problems in areas such as distributed applications, finance, data analysis and visualization, UI programming and maintenance scripting.

Audience

  • Developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice

Notes

  • If you wish to add, remove or customize any section or topic within this course, please contact us to arrange.
flockdb Flockdb: A Simple Graph Database for Social Media 7 hours

FlockDB is an open source distributed, fault-tolerant graph database for managing wide but shallow network graphs. It was initially used by Twitter to store relationships among users.

In this instructor-led, live training, participants will learn how to setup and use a FlockDB database to help answer social media questions such as who follows whom, who blocks whom, etc.

By the end of this training, participants will be able to:

  • Install and configure FlockDB
  • Understand the unique features of FlockDB, relative to other graph databases such Neo4j
  • Use FlockDB to maintain a large graph dataset
  • Use FlockDB together with MySQL to provide provide distributed storage capabilities
  • Query, create and update extremely fast graph edges
  • Scale FlockDB horizontally for use in on-line, low-latency, high throughput web environments

Audience

  • Developers
  • Database engineers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
datamin Data Mining 21 hours

Course can be provided with any tools, including free open-source data mining software and applications

cassdev Cassandra for Developers 21 hours

This course will introduce Cassandra –  a popular NoSQL database.  It will cover Cassandra principles, architecture and data model.   Students will learn data modeling  in CQL (Cassandra Query Language) in hands-on, interactive labs.  This session also discusses Cassandra internals and some admin topics.

Audience : Developers

neo4j Beyond the relational database: neo4j 21 hours

Relational, table-based databases such as Oracle and MySQL have long been the standard for organizing and storing data. However, the growing size and fluidity of data have made it difficult for these traditional systems to efficiently execute highly complex queries on the data. Imagine replacing rows-and-columns-based data storage with object-based data storage, whereby entities (e.g., a person) could be stored as data nodes, then easily queried on the basis of their vast, multi-linear relationship with other nodes. And imagine querying these connections and their associated objects and properties using a compact syntax, up to 20 times lighter than SQL. This is what graph databases, such as neo4j offer.

In this hands-on course, we will set up a live project and put into practice the skills to model, manage and access your data. We contrast and compare graph databases with SQL-based databases as well as other NoSQL databases and clarify when and where it makes sense to implement each within your infrastructure.

Audience

  • Database administrators (DBAs)
  • Data analysts
  • Developers
  • System Administrators
  • DevOps engineers
  • Business Analysts
  • CTOs
  • CIOs

Format of the course

  • Heavy emphasis on hands-on practice. Most of the concepts are learned through samples, exercises and hands-on development.
PentahoDI Pentaho Data Integration Fundamentals 21 hours

Pentaho Data Integration is an open-source data integration tool for defining jobs and data transformations.

In this instructor-led, live training, participants will learn how to use Pentaho Data Integration's powerful ETL capabilities and rich GUI to manage an entire big data lifecycle, maximizing the value of data to the organization.

By the end of this training, participants will be able to:

  • Create, preview, and run basic data transformations containing steps and hops
  • Configure and secure the Pentaho Enterprise Repository
  • Harness disparate sources of data and generate a single, unified version of the truth in an analytics-ready format.
  • Provide results to third-part applications for further processing

Audience

  • Data Analyst
  • ETL developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
nifi Apache NiFi for Administrators 21 hours

Apache NiFi (Hortonworks DataFlow) is a real-time integrated data logistics and simple event processing platform that enables the moving, tracking and automation of data between systems. It is written using flow-based programming and provides a web-based user interface to manage dataflows in real time.

In this instructor-led, live training, participants will learn how to deploy and manage Apache NiFi in a live lab environment.

By the end of this training, participants will be able to:

  • Install and configure Apachi NiFi
  • Source, transform and manage data from disparate, distributed data sources, including databases and big data lakes
  • Automate dataflows
  • Enable streaming analytics
  • Apply various approaches for data ingestion
  • Transform Big Data and into business insights

Audience

  • System administrators
  • Data engineers
  • Developers
  • DevOps

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
d2dbdpa From Data to Decision with Big Data and Predictive Analytics 21 hours

Audience

If you try to make sense out of the data you have access to or want to analyse unstructured data available on the net (like Twitter, Linked in, etc...) this course is for you.

It is mostly aimed at decision makers and people who need to choose what data is worth collecting and what is worth analyzing.

It is not aimed at people configuring the solution, those people will benefit from the big picture though.

Delivery Mode

During the course delegates will be presented with working examples of mostly open source technologies.

Short lectures will be followed by presentation and simple exercises by the participants

Content and Software used

All software used is updated each time the course is run so we check the newest versions possible.

It covers the process from obtaining, formatting, processing and analysing the data, to explain how to automate decision making process with machine learning.

hadoopba Hadoop for Business Analysts 21 hours

Apache Hadoop is the most popular framework for processing Big Data. Hadoop provides rich and deep analytics capability, and it is making in-roads in to tradional BI analytics world. This course will introduce an analyst to the core components of Hadoop eco system and its analytics

Audience

Business Analysts

Duration

three days

Format

Lectures and hands on labs.

kdd Knowledge Discover in Databases (KDD) 21 hours

Knowledge discovery in databases (KDD) is the process of discovering useful knowledge from a collection of data. Real-life applications for this data mining technique include marketing, fraud detection, telecommunication and manufacturing.

In this course, we introduce the processes involved in KDD and carry out a series of exercises to practice the implementation of those processes.

Audience
    Data analysts or anyone interested in learning how to interpret data to solve problems

Format of the course
    After a theoretical discussion of KDD, the instructor will present real-life cases which call for the application of KDD to solve a problem. Participants will prepare, select and cleanse sample data sets and use their prior knowledge about the data to propose solutions based on the results of their observations.

hdp Hortonworks Data Platform (HDP) for administrators 21 hours

Hortonworks Data Platform is an open-source Apache Hadoop support platform that provides a stable foundation for developing big data solutions on the Apache Hadoop ecosystem.

This instructor-led live training introduces Hortonworks and walks participants through the deployment of Spark + Hadoop solution.

By the end of this training, participants will be able to:

  • Use Hortonworks to reliably run Hadoop at a large scale
  • Unify Hadoop's security, governance, and operations capabilities with Spark's agile analytic workflows.
  • Use Hortonworks to investigate, validate, certify and support each of the components in a Spark project
  • Process different types of data, including structured, unstructured, in-motion, and at-rest.

Audience

  • Hadoop administrators

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
nifidev Apache NiFi for Developers 7 hours

Apache NiFi (Hortonworks DataFlow) is a real-time integrated data logistics and simple event processing platform that enables the moving, tracking and automation of data between systems. It is written using flow-based programming and provides a web-based user interface to manage dataflows in real time.

In this instructor-led, live training, participants will learn the fundamentals of flow-based programming as they develop a number of demo extensions, components and processors using Apache NiFi.

By the end of this training, participants will be able to:

  • Understand NiFi's architecture and dataflow concepts
  • Develop extensions using NiFi and third-party APIs
  • Custom develop their own Apache Nifi processor
  • Ingest and process real-time data from disparate and uncommon file formats and data sources

Audience

  • Developers
  • Data engineers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
apachemdev Apache Mahout for Developers 14 hours

Audience

Developers involved in projects that use machine learning with Apache Mahout.

Format

Hands on introduction to machine learning. The course is delivered in a lab format based on real world practical use cases.

dataar Data Analytics With R 21 hours

R is a very popular, open source environment for statistical computing, data analytics and graphics. This course introduces R programming language to students.  It covers language fundamentals, libraries and advanced concepts.  Advanced data analytics and graphing with real world data.

Audience

Developers / data analytics

Duration

3 days

Format

Lectures and Hands-on

scylladb Scylla database 21 hours

Scylla is an open-source distributed NoSQL data store. It is compatible with Apache Cassandra but performs at significantly higher throughputs and lower latencies.

In this course, participants will learn about Scylla's features and architecture while obtaining practical experience with setting up, administering, monitoring, and troubleshooting Scylla.  

Audience
    Database administrators
    Developers
    System Engineers

Format of the course
    The course is interactive and includes discussions of the principles and approaches for deploying and managing Scylla distributed databases and clusters. The course includes a heavy component of hands-on exercises and practice.

magellan Magellan: Geospatial Analytics with on Spark 14 hours

Magellan is an open-source distributed execution engine for geospatial analytics on big data. Implemented on top of Apache Spark, it extends Spark SQL and provides a relational abstraction for geospatial analytics.

This instructor-led, live training introduces the concepts and approaches for implementing geospacial analytics and walks participants through the creation of a predictive analysis application using Magellan on Spark.

By the end of this training, participants will be able to:

  • Efficiently query, parse and join geospatial datasets at scale
  • Implement geospatial data in business intelligence and predictive analytics applications
  • Use spatial context to extend the capabilities of mobile devices, sensors, logs, and wearables

Audience

  • Application developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
matlabpredanalytics Matlab for Predictive Analytics 21 hours

Predictive analytics is the process of using data analytics to make predictions about the future. This process uses data along with data mining, statistics, and machine learning techniques to create a predictive model for forecasting future events.

In this instructor-led, live training, participants will learn how to use Matlab to build predictive models and apply them to large sample data sets to predict future events based on the data.

By the end of this training, participants will be able to:

  • Create predictive models to analyze patterns in historical and transactional data
  • Use predictive modeling to identify risks and opportunities
  • Build mathematical models that capture important trends
  • Use data to from devices and business systems to reduce waste, save time, or cut costs

Audience

  • Developers
  • Engineers
  • Domain experts

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
68736 Hadoop for Developers (2 days) 14 hours
hadoopadm1 Hadoop For Administrators 21 hours

Apache Hadoop is the most popular framework for processing Big Data on clusters of servers. In this three (optionally, four) days course, attendees will learn about the business benefits and use cases for Hadoop and its ecosystem, how to plan cluster deployment and growth, how to install, maintain, monitor, troubleshoot and optimize Hadoop. They will also practice cluster bulk data load, get familiar with various Hadoop distributions, and practice installing and managing Hadoop ecosystem tools. The course finishes off with discussion of securing cluster with Kerberos.

“…The materials were very well prepared and covered thoroughly. The Lab was very helpful and well organized”
— Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising

Audience

Hadoop administrators

Format

Lectures and hands-on labs, approximate balance 60% lectures, 40% labs.

accumulo Apache Accumulo: Building highly scalable big data applications 21 hours

Apache Accumulo is a sorted, distributed key/value store that provides robust, scalable data storage and retrieval. It is based on the design of Google's BigTable and is powered by Apache Hadoop, Apache Zookeeper, and Apache Thrift.
 
This courses covers the working principles behind Accumulo and walks participants through the development of a sample application on Apache Accumulo.

Audience

  • Application developers
  • Software engineers
  • Technical consultants

Format of the course

  • Part lecture, part discussion, hands-on development and implementation, occasional tests to gauge understanding
zeppelin Zeppelin for interactive data analytics 14 hours

Apache Zeppelin is a web-based notebook for capturing, exploring, visualizing and sharing Hadoop and Spark based data.

This instructor-led, live training introduces the concepts behind interactive data analytics and walks participants through the deployment and usage of Zeppelin in a single-user or multi-user environment.

By the end of this training, participants will be able to:

  • Install and configure Zeppelin
  • Develop, organize, execute and share data in a browser-based interface
  • Visualize results without referring to the command line or cluster details
  • Execute and collaborate on long workflows
  • Work with any of a number of plug-in language/data-processing-backends, such as Scala ( with Apache Spark ), Python ( with Apache Spark ), Spark SQL, JDBC, Markdown and Shell.
  • Integrate Zeppelin with Spark, Flink and Map Reduce
  • Secure multi-user instances of Zeppelin with Apache Shiro

Audience

  • Data engineers
  • Data analysts
  • Data scientists
  • Software developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
graphcomputing Introduction to Graph Computing 28 hours

A large number of real world problems can be described in terms of graphs. For example, the Web graph, the social network graph, the train network graph and the language graph. These graphs tend to be extremely large; processing them requires a specialized set of tools and mindset referred to as graph computing.

In this instructor-led, live training, participants will learn about the various technology offerings and implementations for processing graph data. The aim is to identify real-world objects, their characteristics and relationships, then model these relationships and process them as data using graph computing approaches. We start with a broad overview and narrow in on specific tools as we step through a series of case studies, hands-on exercises and live deployments.

By the end of this training, participants will be able to:

  • Understand how graph data is persisted and traversed
  • Select the best framework for a given task (from graph databases to batch processing frameworks)
  • Implement Hadoop, Spark, GraphX and Pregel to carry out graph computing across many machines in parallel
  • View real-world big data problems in terms of graphs, processes and traversals

Audience

  • Developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
68780 Apache Spark 14 hours
solrdev Solr for Developers 21 hours

This course introduces students to the Solr platform. Through a combination of lecture, discussion and labs students will gain hands on experience configuring effective search and indexing.

The class begins with basic Solr installation and configuration then teaches the attendees the search features of Solr. Students will gain experience with faceting, indexing and search relevance among other features central to the Solr platform. The course wraps up with a number of advanced topics including spell checking, suggestions, Multicore and SolrCloud.

Duration: 3 days

Audience: Developers, business users, administrators

druid Druid: Build a fast, real-time data analysis system 21 hours

Druid is an open-source, column-oriented, distributed data store written in Java. It was designed to quickly ingest massive quantities of event data and execute low-latency OLAP queries on that data. Druid is commonly used in business intelligence applications to analyze high volumes of real-time and historical data. It is also well suited for powering fast, interactive, analytic dashboards for end-users. Druid is used by companies such as Alibaba, Airbnb, Cisco, eBay, Netflix, Paypal, and Yahoo.

In this course we explore some of the limitations of data warehouse solutions and discuss how Druid can compliment those technologies to form a flexible and scalable streaming analytics stack. We walk through many examples, offering participants the chance to implement and test Druid-based solutions in a lab environment.

Audience
    Application developers
    Software engineers
    Technical consultants
    DevOps professionals
    Architecture engineers

Format of the course
    Part lecture, part discussion, heavy hands-on practice, occasional tests to gauge understanding

samza Samza for stream processing 14 hours

Apache Samza is an open-source near-realtime, asynchronous computational framework for stream processing.  It uses Apache Kafka for messaging, and Apache Hadoop YARN for fault tolerance, processor isolation, security, and resource management.

This instructor-led, live training introduces the principles behind messaging systems and distributed stream processing, while walking participants through the creation of a sample Samza-based project and job execution.

By the end of this training, participants will be able to:

  • Use Samza to simplify the code needed to produce and consume messages
  • Decouple the handling of messages from an application
  • Use Samza to implement near-realtime asynchronous computation
  • Use stream processing to provide a higher level of abstraction over messaging systems

Audience

  • Developers

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
bigdatabicriminal Big Data Business Intelligence for Criminal Intelligence Analysis 35 hours

Advances in technologies and the increasing amount of information are transforming how law enforcement is conducted. The challenges that Big Data pose are nearly as daunting as Big Data's promise. Storing data efficiently is one of these challenges; effectively analyzing it is another.

In this instructor-led, live training, participants will learn the mindset with which to approach Big Data technologies, assess their impact on existing processes and policies, and implement these technologies for the purpose of identifying criminal activity and preventing crime. Case studies from law enforcement organizations around the world will be examined to gain insights on their adoption approaches, challenges and results.

By the end of this training, participants will be able to:

  • Combine Big Data technology with traditional data gathering processes to piece together a story during an investigation
  • Implement industrial big data storage and processing solutions for data analysis
  • Prepare a proposal for the adoption of the most adequate tools and processes for enabling a data-driven approach to criminal investigation

Audience

  • Law Enforcement specialists with a technical background

Format of the course

  • Part lecture, part discussion, exercises and heavy hands-on practice
Big Data training courses in Bradford, Weekend Big Data courses in Bradford, Evening Big Data training in Bradford, Big Data instructor-led in Bradford , Weekend Big Data training in Bradford, Evening Big Data courses in Bradford, Big Data boot camp in Bradford, Big Data instructor in Bradford, Big Data instructor-led in Bradford, Big Data trainer in Bradford,Big Data classes in Bradford, Big Data private courses in Bradford, Big Data one on one training in Bradford, Big Data coaching in Bradford

Course Discounts

Course Discounts Newsletter

We respect the privacy of your email address. We will not pass on or sell your address to others.
You can always change your preferences or unsubscribe completely.

Some of our clients

Outlines Extract
Machine-generated

Big data with your applications and program systems to smart code of information based for windows delegates generals creating scripts elements container control and processing system service develop. Controls solutions by this course can be develop a design of the command related the students will be able to: use for mat lab creating the fundamentals scripts and a control to create the m. Types of requirements of coding scripts and sharing and security values and working with the secure against controller and plan and development control in the resolution user solutions and. Development design as a server security operations and data system comparison in the employees, techniques and explain the interactive user configure the types of the design for a solutionbig database settings and run the application views and recognition of our spark training and a control processing sense and implementing the engine and control server remote module 2: building.