(Click Category to List Courses)

37 - ITC - Information Technology - Miscellaneous


ITC 174 - Hands on Hadoop: The Complete Guide

Code Start Date Duration Venue
ITC 174 26 September 2022 5 Days Istanbul Registration Form Link
ITC 174 21 November 2022 5 Days Istanbul Registration Form Link
ITC 174 18 December 2023 5 Days Istanbul Registration Form Link
Please contact us for fees

 

Course Description

Hadoop is an open source, Java based framework used for storing and processing big data. The data is stored on inexpensive commodity servers that run as clusters. Its distributed file system enables concurrent processing and fault tolerance. Hadoop solves two key challenges with traditional databases: capcity and speed. In this course participants will learn about cluster and its architecture and will discuss Hadoop cluster administration and maintenance. 

Course Objectives

  • Understanding Big Data and Hadoop
  • Learning about Hadoop cluster administration and maintenance
  • Understanding computational frameworks, managing resources and scheduling
  • Discusing pig, hive installation and working 
  • Understanding Oozie 

Who Should Attend?

  • Business and IT professionals
  • Data analysts 
  • Data Engineers
  • Individuals who have no knowledge or experience in data engineering

Course Details/Schedule

Day 1

  • Understanding Big Data and Hadoop
  • Introduction to big data 
  • Common big data domain scenarios
  • Limitations of traditional solutions 
  • What is Hadoop?
  • Hadoop 1.0 ecosystem and its Core Components
  • Hadoop 2.x ecosystem and its Core Components
  • Application submission in YARN
  • Hadoop Cluster and its Architecture
  • Distributed File System 
  • Hadoop Cluster Architecture
  • Replication rules 
  • Hadoop Cluster Modes
  • Rack awareness theory 
  • Hadoop cluster administrator responsibilities
  • Understand working of HDFS 
  • NTP server
  • Initial configuration required before installing Hadoop
  • Deploying Hadoop in a pseudo distributed mode
  • Hadoop Cluster Setup and Working
  • OS Tuning for Hadoop Performance 
  • Pre-requisite for installing Hadoop
  • Hadoop Configuration 
  • Files Stale Configuration
  • RPC and HTTP Server 
  • Properties Properties of Namenode, Datanode and Secondary Namenode
  • Log Files in Hadoop 
  • Deploying a multi-node Hadoop cluster

Day 2

  • Hadoop Cluster Administration And Maintenance
  • Commisioning and Decommissioning of Node
  • HDFS Balancer
  • Namenode Federation in Hadoop 
  • High Availabilty in Hadoop
  • Trash Functionality 
  • Checkpointing in Hadoop
  • Distcp 
  • Disk balancer
  • Computational Frameworks, Managing Resources and Scheduling
  • Different Processing Frameworks 
  • Different phases in Mapreduce
  • Spark and its Features 
  • Application Workflow in YARN
  • YARN Metrics 
  • YARN Capacity Scheduler and Fair Scheduler
  • Service Level Authorization (SLA)

Day 3

  • Hadoop 2.x Cluster: Planning and Management
  • Planning a Hadoop 2.x cluster 
  • Cluster sizing
  • Hardware, Network and Software considerations
  • Popular Hadoop distributions
  • Workload and usage patterns 
  • Industry recommendations
  • Pig, Hive Installation and Working (Self-paced)
  • Explain Hive 
  • Hive Setup
  • Hive Configuration 
  • Working with Hive
  • Setting Hive in local and remote metastore mode
  • Pig setup
  • Working with Pig

Day 4

  • HBase, Zookeeper Installation and Working (Self-paced)
  • What is NoSQL Database 
  • HBase data model
  • HBase Architecture 
  • MemStore, WAL, BlockCache
  • HBase Hfile 
  • Compactions
  • HBase Read and Write 
  • HBase balancer and hbck
  • HBase setup 
  • Working with HBase
  • Installing Zookeeper
  • Understanding Oozie (Self-paced)
  • Oozie overview Oozie Features
  • Oozie workflow, coordinator and bundle Start, End and Error Node
  • Action Node Join and Fork
  • Decision Node Oozie CLI
  • Install Oozie

Day 5

  • Data Ingestion using Sqoop and Flume (Self-paced)
  • Types of Data Ingestion HDFS data loading commands
  • Purpose and features of Sqoop Perform operations like, Sqoop Import,
  • Export and Hive Import
  • Sqoop 2 Install Sqoop
  • Import data from RDBMS into HDFS Flume features and architecture
  • Types of flow Install Flume
  • Hadoop Security and Cluster Monitoring
  • Monitoring Hadoop Clusters Hadoop Security System Concepts
  • Securing a Hadoop Cluster With Kerberos Common Misconfigurations
  • Overview on Kerberos Checking log files to understand Hadoop
  • Clusters for troubleshooting