(Click Category to List Courses)
37 - ITC - Information Technology - Miscellaneous
ITC 174 - Hands on Hadoop: The Complete Guide
Code | Start Date | Duration | Venue | |
---|---|---|---|---|
ITC 174 | 14 October 2024 | 5 Days | Istanbul | Registration Form Link |
ITC 174 | 18 November 2024 | 5 Days | Istanbul | Registration Form Link |
ITC 174 | 23 December 2024 | 5 Days | Istanbul | Registration Form Link |
Course Description
Hadoop is an open source, Java based framework used for storing and processing big data. The data is stored on inexpensive commodity servers that run as clusters. Its distributed file system enables concurrent processing and fault tolerance. Hadoop solves two key challenges with traditional databases: capcity and speed. In this course participants will learn about cluster and its architecture and will discuss Hadoop cluster administration and maintenance.
Course Objectives
- Understanding Big Data and Hadoop
- Learning about Hadoop cluster administration and maintenance
- Understanding computational frameworks, managing resources and scheduling
- Discusing pig, hive installation and working
- Understanding Oozie
Who Should Attend?
- Business and IT professionals
- Data analysts
- Data Engineers
- Individuals who have no knowledge or experience in data engineering
Course Details/Schedule
Day 1
- Understanding Big Data and Hadoop
- Introduction to big data
- Common big data domain scenarios
- Limitations of traditional solutions
- What is Hadoop?
- Hadoop 1.0 ecosystem and its Core Components
- Hadoop 2.x ecosystem and its Core Components
- Application submission in YARN
- Hadoop Cluster and its Architecture
- Distributed File System
- Hadoop Cluster Architecture
- Replication rules
- Hadoop Cluster Modes
- Rack awareness theory
- Hadoop cluster administrator responsibilities
- Understand working of HDFS
- NTP server
- Initial configuration required before installing Hadoop
- Deploying Hadoop in a pseudo distributed mode
- Hadoop Cluster Setup and Working
- OS Tuning for Hadoop Performance
- Pre-requisite for installing Hadoop
- Hadoop Configuration
- Files Stale Configuration
- RPC and HTTP Server
- Properties Properties of Namenode, Datanode and Secondary Namenode
- Log Files in Hadoop
- Deploying a multi-node Hadoop cluster
Day 2
- Hadoop Cluster Administration And Maintenance
- Commisioning and Decommissioning of Node
- HDFS Balancer
- Namenode Federation in Hadoop
- High Availabilty in Hadoop
- Trash Functionality
- Checkpointing in Hadoop
- Distcp
- Disk balancer
- Computational Frameworks, Managing Resources and Scheduling
- Different Processing Frameworks
- Different phases in Mapreduce
- Spark and its Features
- Application Workflow in YARN
- YARN Metrics
- YARN Capacity Scheduler and Fair Scheduler
- Service Level Authorization (SLA)
Day 3
- Hadoop 2.x Cluster: Planning and Management
- Planning a Hadoop 2.x cluster
- Cluster sizing
- Hardware, Network and Software considerations
- Popular Hadoop distributions
- Workload and usage patterns
- Industry recommendations
- Pig, Hive Installation and Working (Self-paced)
- Explain Hive
- Hive Setup
- Hive Configuration
- Working with Hive
- Setting Hive in local and remote metastore mode
- Pig setup
- Working with Pig
Day 4
- HBase, Zookeeper Installation and Working (Self-paced)
- What is NoSQL Database
- HBase data model
- HBase Architecture
- MemStore, WAL, BlockCache
- HBase Hfile
- Compactions
- HBase Read and Write
- HBase balancer and hbck
- HBase setup
- Working with HBase
- Installing Zookeeper
- Understanding Oozie (Self-paced)
- Oozie overview Oozie Features
- Oozie workflow, coordinator and bundle Start, End and Error Node
- Action Node Join and Fork
- Decision Node Oozie CLI
- Install Oozie
Day 5
- Data Ingestion using Sqoop and Flume (Self-paced)
- Types of Data Ingestion HDFS data loading commands
- Purpose and features of Sqoop Perform operations like, Sqoop Import,
- Export and Hive Import
- Sqoop 2 Install Sqoop
- Import data from RDBMS into HDFS Flume features and architecture
- Types of flow Install Flume
- Hadoop Security and Cluster Monitoring
- Monitoring Hadoop Clusters Hadoop Security System Concepts
- Securing a Hadoop Cluster With Kerberos Common Misconfigurations
- Overview on Kerberos Checking log files to understand Hadoop
- Clusters for troubleshooting