Top Menu

Sub Navigation
- Mail
- library
- Student Portal
- Alumni
- E-Learning
- News

Big Data Engineering

Continuing Education Center

Big Data Engineering

Course Overview:

Big Data Engineering is critical for data-driven decision-making in industries like finance, healthcare, and e-commerce. With the explosion of data, companies need experts to build scalable pipelines, ensuring efficient storage, processing, and analysis. Certified professionals can secure roles like Data Engineer, Cloud Architect, or Analytics Specialist with competitive salaries.

Learning Outcomes:

Big Data Tools: Students will gain expertise in technologies like HDFS, Spark, Hive, and Flink for both real-time and batch data processing.
Data Processing Frameworks: Learners will work with tools like MapReduce, Elasticsearch, HBase, and Kafka for distributed data management.
Big Data Strategy: The course prepares trainees to store, analyze, and manage large datasets, implementing effective big data strategies for organizations

Prerequisites:

Network Essentials .

Target Certification:

The participant who will pass the course exam will obtain a certificate from MIU.
The level grants one certificate:
- Huawei HCIP Big Data (H13-723) certificate.

Who Should Attend

Graduates or students in Computer Science, Communications, Electronics, or Computer Engineering fields looking to enter the AI industry.

Estimated Time to Completion: 80 hours

Housr/Day	No. of days / week	Total no. of days
6 Hours/day	4 days/week	14 Days

Contents:

Huawei HCIP Big Data (H13-723) (80hrs)

BigData Application Development Overall Guide

Introduction to Big Data
Mainstream big data technologies
Big data scenario-based solution
Big data application development

Experience with big data processing frameworks

Offline batch processing solution
HDFS: Hadoop Distributed File System
Map Reduce: distributed offline batch processing and Yarn Resource Negotiator
Zookeeper: Cluster distributed Coordination service

Non-relational databases

HBase: distributed NoSQL database
HBase Overview and Data Models
HBase Architecture
HBase Performance Tuning
Common Shell Commands of HBase

Distributed search engine: Elasticsearch

System Architecture
Basic functions and concepts of Elasticsearch
Key Features
Concepts of Shard and Replica

Compute Engines

Spark2x: in memory distributed computing engine

Spark Overview
Spark Data Structure
Spark Principles and Architecture
SparkSQL offline analysis tool

Flink: stream processing and batch processing platform

Flink Principles and Architecture
Flink Time and Window
Flink Watermark
Flink Fault Tolerance Mechanism

Extract Transform Load (ETL) tools

Sqoop: data transformation
Flume: massive Logs aggregation
Apache NiFi
Apache Airflow

Converged Data Warehouse

Introduction to DWS
Hive distributed Data Warehouse

Kafka: distributed message subscription system