Hadoop Online Training
HADOOP ONLINE TRAINING COURSE INTRODUCTION:
Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. Hadoop changes the economics and the dynamics of large scale computing. Hadoop training and expertise impact can be boiled down to four salient characteristics. For obvious reason Hadoop certified professional are enjoying now huge demand all over the world with fat pay package and growth potential where sky is the limit.
HADOOP TRAINING COURSE CONTENT:
1. BASICS OF HADOOP
- what is the Motivation for Hadoop
- Large scale system training
- Survey of data storage literature
- Literature survey of data processing
- Overview Of Networking constraints
- New approach requirements
2. BASIC CONCEPTS OF HADOOP
- Hadoop Introduction
- Distributed file system of Hadoop
- Map reduction of Hadoop works
- Hadoop cluster and its anatomy
- Hadoop demons
- Master demons
- Name node
- Tracking of job
- Secondary node detection
- Slave daemons
- Tracking of task
- Hadoop Distributed File System (HDFS)
- Spilts and blocks
- Input Spilts
- HDFS spilts
- Replication of data
- Awareness of Hadoop racking
- High availably of data
- Block placement and cluster architecture
- Hadoop case studies
- Practices & Tuning of performances
- Development of mass reduce programs
- Local mode
- Running without HDFS
- Pseudo-distributed mode
- All daemons running in a single mode
- Fully distributed mode
- Dedicated nodes and daemon running
3. HADOOP ADMINISTRATION
- Setup of Hadoop cluster
- Cluster of a Hadoop setup.
- Configure and Install Apache Hadoop on a multi node cluster.
- In a distributed mode, configure and install Cloud era distribution.
- In a fully distributed mode, configure and install Horton works distribution
- In a fully distributed mode, configure the Green Plum distribution.
- Monitor the cluster
- Get used to the management console of Horton works and Cloud era.
- Name the node in a safe mode
- Data backup.
- Case studies
- Monitoring of clusters
4. HADOOP DEVELOPMENT :
- What is Map Reduce Program
- Sample the mapreduce program.
- API concepts and their basics
- Driver code
- Mapper
- Reducer
- Hadoop AVI streaming
- Performing several Hadoop jobs
- Configuring close methods
- files Sequencing
- Record reading
- Record writer
- Reporter and its role
- Counters
- Output collection
- Assessing HDFS
- Tool runner
- Use of distributed CACHE
- Several MapReduce jobs (In Detailed)
- SEARCH USING MAPREDUCE
- GENERATING THE RECOMMENDATIONS USING MAPREDUCE
- PROCESSING THE LOG FILES USING MAPREDUCE
- Mapper Identification
- Reducer Identification
- Exploring the problems using this application
- Debugging the MapReduce Programs
- MR unit testing
- Logging
- Debugging strategies
- Advanced MapReduce Programming
- Secondary sort
- Output and input format customization
- Mapreduce joins
- Monitoring & debugging on a Production Cluster
- Counters
- Skipping Bad Records
- Running the local mode
- MapReduce performance tuning
- Reduction network traffic by combiner
- Partitioners
- Reducing of input data
- Using Compression
- Reusing the JVM
- Running speculative execution
- Performance Aspects
- CASE STUDIES
5. CDH4 ENHANCEMENTS :
- Name Node Availability
- Name Node federation
- Fencing
- MapReduce
6. HADOOP ANALYST
- Hive Concepts
- Hive and its architecture
- Install and configure hive on cluster
- Type of tables in hive
- Functions of Hive library
- Buckets
- Partitions
- Joins ( Inner joins and Outer Joins )
- Hive UDF
7. PIG
- Basics Of Pig
- Install and configure PIG
- PIG Library Functions
- Pig Vs Hive
- Writing of sample Pig Latin scripts
- Modes of running 1. Grunt shell 2. Java program 7. PIG UDFs 8. Macros of Pig 9. Debugging the PIG
8. IMPALA
- Difference between Pig and Impala Hive
- Does Impala give good performance?
- Exclusive features
- Impala and its Challenges
- Use cases
9. NOSQL
- Introduction to HBase
- Explain HBase concepts
- Overview Of HBase architecture
- Server architecture
- File storage architecture
- Column access
- Scans
- HBase cases
- Installation and configuration of HBase on a multi node
- Create database, Develop and run sample applications
- Access data stored in HBase using clients like Python, Java and Pearl
- Map Reduce client
- HBase and Hive Integration
- HBase administration tasks
- Defining Schema and its basic operations.
- Cassandra Basics
- MongoDB Basics
10. ECOSYSTEM COMPONENTS
- Sqoop
- Configure and Install Sqoop
- Connecting RDBMS
- Installation of Mysql
- Importing the data from Oracle/Mysql to hive
- Exporting the data to Oracle/Mysql
- Internal mechanism
11. OOZIE
- Oozie and its architecture
- XML file
- Install and configuring Apache
- Work flow Specification
- Action nodes
- Control nodes
- Job coordinator
- Avro, Scribe, Flume, Chukwa, Thrift 1. Concepts of Flume and Chukwa 2. Use cases of Scribe, Thrift and Avro 3. Installation and configuration of flume 4. Creation of a sample application
Course Reviews
No Reviews found for this course.
Write a Review