Big Data And Hadoop

Big Data And Hadoop

    • Description

      Data has become an integral part of every organization, be it small or large; and maintaining it in a proper form has become difficult. Hadoop is a revolutionary open-source framework for software programming that took the data storage and processing to next level. Hadoop platform is used for structuring data and solves formatting problem for subsequent analytic purposes. Hadoop Administration is one of the specialization areas of Hadoop framework which helps in Hadoop Installation, Hadoop Security, Setting up Hadoop clusters and log files and designing, testing and building Hadoop environments.

      Course Objective

      After the completion of this course, Trainee will:

      1. Understand how Hadoop solves the Big Data problems, about Hadoop cluster architecture, its core components and ecosystem
      2. Have knowledge on different Hadoop components, understand working of HDFS, Hadoop cluster modes and configuration files
      3. Be expertised in Hadoop 1.0 cluster setup and configuration, setting up Hadoop Clients using Hadoop 1.0 and resolve problems simulated from real-time environment.
      4. Work on the secondary namenode, working with Hadoop distributed cluster, enabling rack awareness, maintenance mode of Hadoop cluster, adding or removing nodes to your cluster in adhoc.
      5. Gain knowledge day to day cluster administration tasks, balancing data in cluster, protecting data by enabling trash, attempting a manual failover, creating backup within or across clusters, safeguarding your metadata and doing metadata recovery or manual failover of NameNode recovery.
      6. Have capability to cluster, cluster sizing, hardware, network and software considerations, popular Hadoop distributions, workload and usage patterns, industry recommendations in Hadoop 2.0 environment.

      Prepare for Certification

      Our training and certification program gives you a solid understanding of the key topics covered on the Cloudera (CCAH). In addition to boosting your income potential, getting certified in Hadoop Administration, demonstrates your knowledge of the skills necessary to be an effective Hadoop Professional. The certification validates your ability to produce reliable, high-quality results with increased efficiency and consistency.

    • Unit 1: What is Big Data

      1. Need for a different technique for Data Storage
      2. Need for a different paradigm for Data Analysis
      3. The 3 V’s of Big Data
      4. Different distributions of Hadoop

      Unit 2: The Case for Apache Hadoop

      1. A Brief History of Hadoop
      2. Core Hadoop Components
      3. Fundamental Concepts
      4. Hadoop Eco-Systems – Overview

      Unit 3: The Hadoop Distributed File System

      HDFS FeaturesHDFS Design AssumptionsOverview of HDFS ArchitectureWriting and Reading Files

      Unit 4: MapReduce

      1. What Is MapReduce?
      2. Features of MapReduce
      3. Basic MapReduce Concepts
      4. Architectural Overview
      5. What is a Combiner?
      6. What is a Practitioner?

      Unit 5: An Overview of the Hadoop Ecosystem

      1. What is the Hadoop Ecosystem?
      2. Integration Tools
      3. Analysis Tools
      4. Data Storage and Retrieval Tools

      Unit 6: Planning your Hadoop Cluster

      1. General planning Considerations
      2. Choosing the Right Hardware
      3. Network Considerations
      4. Configuring Nodes

      Unit 7: Hadoop Installation

      1. Deployment TypesInstalling Hadoop
      2. Basic Configuration Parameters
      3. Hands-On Exercise on a Pseudo – Cluster
      4. Hands-On Exercise on a Multi-Node Cluster

      Unit 8: Advanced Configuration

      1. Advanced Parameters
      2. core-site.xml parameters
      3. mapred-site.xml parameters
      4. hdfs-site.xml parameters
      5. Configuring Rack Awareness

      Unit 9: Hadoop Security

      1. Why Hadoop Security Is Important
      2. Hadoop’ s Security System Concepts
      3. What Kerberos Is and How it Works
      4. Integrating a Secure Cluster with Other Systems

      Unit 10: Managing and Scheduling Jobs

      1. Managing Running Jobs
      2. The FIFO Scheduler
      3. The Fair Scheduler
      4. The Capacity Scheduler
      5. Configuring the Fair Scheduler
      6. Evaluating the different schedulers

      Unit 11: Cluster Maintenance

      1. Checking HDFS Status
      2. Copying Data Between Clusters
      3. Adding and Removing Cluster Nodes
      4. Rebalancing the Cluster
      5. Name Node Metadata Backup
      6. Cluster Upgrading

      Unit 12: Cluster Monitoring and Troubleshooting

      1. General System Monitoring
      2. Managing Hadoop’s Log Files
      3. Using the Name Node and Job Tracker Web UIs
      4. Cluster Monitoring with Ganglia
      5. Common Troubleshooting Issues
      6. Benchmarking Your Cluster

      Unit 13: Installing and Managing Other Hadoop Projects

      1. Hive
      2. Pig
      3. Hbase
      4. Oozie
    • “The course provides adequate knowledge and information on business intelligence which helps to improve business efficiency and management” – Mehul Thakkar
      “This course is best for any business enthusiast, the course explains in detail data reporting and warehousing methods” – Ankit Doshi
  • FAQ's
    • Content will be coming soon

Copyrights @2018-All rights reserved