Apache Spark Online Training
Learn how to use Apache Mahout. Keen Technologies Apache Mahout training helps you to learn tasks in Apache Mahout, Learning Tools for use on analyzing Big-data, how to setup Apache mahout cluster, History of Mahout…etc.
1. Introduction To Big Data And Spark
Learn how to apply data science techniques using parallel programming during Spark training, to explore big (and small) data.
- Introduction to Big Data
- Challenges with Big Data
- Batch Vs. Real Time Big Data Analytics
- Batch Analytics – Hadoop Ecosystem Overview
- Real Time Analytics Options
- Streaming Data – Storm
- In Memory Data – Spark
- What is Spark?
- Modes of Spark
- Spark Installation Demo
- Overview of Spark on a cluster
- Spark Standalone Cluster
2. Spark Baby Steps
Learn how to invoke spark shell, build spark project with sbt, distributed persistence and much more…in this module.
- Invoking Spark Shell
- Creating the Spark Context
- Loading a File in Shell
- Performing Some Basic Operations on Files in Spark Shell
- Building a Spark Project with sbt
- Running Spark Project with sbt
- Caching Overview
- Distributed Persistence
- Spark Streaming Overview
- Example: Streaming Word Count
3. Playing With RDDs In Spark
The main abstraction Spark provides is a resilient distributed dataset (RDD), which is a collection of elements partitioned across the nodes of the cluster that can be operated on in parallel.
- Spark Transformations in RDD
- Actions in RDD
- Loading Data in RDD
- Saving Data through RDD
- Spark Key-Value Pair RDD
- Map Reduce and Pair RDD Operations in Spark
- Scala and Hadoop Integration Hands on
4. Shark When Spark Meets Hive
Shark is a component of Spark, an open source, distributed and fault-tolerant, in-memory analytics system, that can be installed on the same cluster as Hadoop. This module of spark training, will give insights about Shark.
- Why Shark?
- Installing Shark
- Running Shark
- Loading of Data
- Hive Queries through Spark
- Testing Tips in Scala
- Performance Tuning Tips in Spark
- Shared Variables: Broadcast Variables
- Shared Variables: Accumulators
No Reviews found for this course.