Download Training Directory (2021) Download Now

Big Data Engineer

Course Information

Start Date Anytime
End Date 3 Month Access
Mode Self-Paced E-Learning
Fee $1,299 (excluding GST)
Contact 6720 3333 (Ms. Felicia) training.aventis@gmail.com
Register Now
Get Group Quote
LIVE Stream

Course Overview

IBM is the second-largest predictive analytics and Machine Learning solutions provider globally (The Forrester Wave report, September 2018). A joint partnership with Simplilearn and IBM introduces students to integrated blended learning, making them experts in Big Data Engineering. This Big Data Engineer certification course developed in collaboration with IBM will make students industry ready to start their career as Big Data Engineer.

IBM is a leading cognitive solution and cloud platform company, headquartered in Armonk, New York, offering a plethora of technology and consulting services. Each year, IBM invests $6 billion in research and development and has achieved five Nobel prizes, nine US National Medals of Technology, five US National Medals of Science, six Turing Awards, and 10 Inductions in US Inventors Hall of Fame.

This self-paced E learning course is done in partnership with SimpliLearn, with over 120 hours of live interactive learning by industry experts.


The Big Data Engineer certification course is ideal for anyone who wishes to pursue a career in Big Data Engineering. There are no prerequisites to take this course, but prior knowledge of the listed skills and technologies are beneficial, including:

  • Algorithms and data structures
  • SQL
  • Programming knowledge of Python and Java
  • Cloud platforms and distributed systems
  • Data pipelines
Key Takeaways

The Big Data Engineer learning path ensures that you master the various components of the Hadoop ecosystem, such as MapReduce, Pig, Hive, Impala, HBase, and Sqoop, and learn real-time processing in Spark and Spark SQL. By the end of this Big Data Engineer certification course, you will: 

  • Gain insights on how to improve business productivity by processing Big Data on platforms that can handle its volume, velocity, variety, and veracity
  • Master the various components of the Hadoop ecosystem, such as Hadoop, Yarn, MapReduce, Pig, Hive, Impala, HBase, ZooKeeper, Oozie, Sqoop, and Flume
  • Become an expert in MongoDB by gaining an in-depth knowledge of NoSQL and mastering the skills of data modeling, ingestion, query, sharding, and data replication
  • Learn how Kafka is used in the real world, including its architecture and components, get hands-on experience connecting Kafka to Spark, and work with Kafka Connect
  • Get a solid understanding of the fundamentals of the Scala language, it’s tooling and the development process
  • Identify AWS concepts, terminologies, benefits, and deployment options to meet the business requirements
  • Understand how to use Amazon EMR for processing the data using Hadoop ecosystem tools
  • Understand how to use Amazon Kinesis for big data processing in real-time
  • Analyze and transform big data using Kinesis Streams
  • Visualize data and perform queries using Amazon QuickSight
Course Content

Course 1: Big Data for Data Engineering

01: Welcome & Introduction

02: What is Big Data

03: Beyond the Hype

04: Big Data and Data Science

05: Big Data use Cases

06: Processing Big Data

07: Course Summary

Free Course: Data Engineering with Hadoop, Data Engineering with Scala


Course 2: Big Data Hadoop and Spark Developer

01: Course Introduction

02: Introduction to Big Data and Hadoop

03: Hadoop Architecture, Distributed Storage (HDFS) and YARN

04: Data Ingestion into Big Data Systems and ETL

05: Distributed Processing – MapReduce Framework and Pig

06: Apache Hive

07: NoSQL Databases – HBase

08: Basics of Functional Programming and Scala

09: Apache Spark Next Generation Big Data Framework

10: Spark Core Processing RDD

11: Spark SQL – Processing DataFrames

12: Spark MLLib – Modelling BigData with Spark

13: Stream Processing Frameworks and Spark Streaming

14: Spark GraphX

15: Practice Projects

Free Course: Core Java, Linux Training


Course 3: PySpark Training Course

01: Introduction on PySpark

02: Resilient Distributed Datasets

03: Resilient Distributed Datasets and Actions

04: DataFrames and Transformations

05: Data Processing with Spark DataFrames

Free Course: Python for Data Science


Course 4: Apache Kafka

01: Introduction to Apache Kafka

02: Kafka Producer

03: Kafka Consumer

04: Kafka operations and performance tuning

05: Kafka cluster architecture and administering kafka

06: Kafka monitoring and schema registry

07: Kafka streams and Kafka connectors

08: Integration of Kafka with storm

09: Kafka Integration with Spark and Flume

10: Admin client and securing kafka


Course 5: MongoDB Developer and Administrator

01: Course Introduction 

02: Introduction to NoSQL databases

03: MongoDB a database for the modern web

04: CRUD operations in MongoDB

05: Indexing and Aggregation

06: Replication and sharding

07: Developing Java and Node JS application with MongoDB

08: Administration of MongoDB cluster operations


Course 6: AWS Big Data Certification Training (Section 1: Self-paced Curriculum)

01: Big Data on AWS Certification Course Overview

02: Introduction 

03: AWS Big Data Collection Services

04: AWS Big Data Storage Services

05: AWS Big Data Processing Services

06: Analysis

07: Visualization

08: Security

(Section 2: Live Virtual Class Curriculum)

01: Course Introduction

02: AWS in Big Data Introduction

03: Collection

04: Storage

05: Processing I

06: Processing II

07: ETL with Redshift

08: Analysis with Machine Learning

09: Analysis and Visualization

10: Security

Practice Projects 

Free Course: AWS Technical Essentials


Course 7: Big Data Capstone