When you enroll in this course, you'll also be enrolled in this Professional Certificate.
Learn new concepts from industry experts
Gain a foundational understanding of a subject or tool
Develop job-relevant skills with hands-on projects
Earn a shareable career certificate from Microsoft
There are 5 modules in this course
This course introduces distributed computing frameworks and big data visualization techniques. Learners will explore MapReduce, work with Apache Spark, implement transformations with PySpark, and use Spark SQL for large-scale analysis. The course concludes with building compelling dashboards and reports using Power BI for actionable business insights.
By the end of this course, you will be able to:
- Explain distributed computing and MapReduce concepts
- Process large datasets using Apache Spark and PySpark
- Apply Spark SQL for advanced queries and transformations
- Create dashboards and visualizations using Power BI
Tools & Software:
Apache Spark, PySpark, Azure Databricks, Power BI
Skills:
Distributed computing, Data analysis, PySpark, Spark SQL, Data visualization
Distributed Computing and MapReduce Concepts explores the foundational principles that enable modern organizations to process massive datasets that have outgrown the limits of single-machine computing. Through real-world examples, visual walkthroughs, hands-on labs, and guided design activities, you'll examine how data is broken into parallel tasks and executed across clusters of machines, how the Map, shuffle, and Reduce phases work together, and how common MapReduce patterns—such as counting, filtering, joining, and aggregation—solve practical big data problems efficiently and at scale.
MapReduce Patterns and Applications Assessment•30 minutes
Distributed Computing and MapReduce Mastery Graded Quiz•30 minutes
Apache Spark Architecture and Fundamentals
Module 2•6 hours to complete
Module details
Apache Spark Architecture and Fundamentals provides a comprehensive introduction to the distributed processing engine that revolutionized big data analytics by overcoming traditional MapReduce limitations. Through real-world examples, visual walkthroughs, hands-on labs, and guided design activities, you'll examine Spark's core components, including the driver, executors, and cluster manager, explore how in-memory processing delivers dramatic performance improvements, and learn to configure and manage Spark clusters and applications for efficient large-scale data processing.
What's included
7 videos3 readings9 assignments
Show info about module content
7 videos•Total 40 minutes
Spark's Revolution in Big Data Processing•4 minutes
Spark Cluster Setup and Configuration – Part 1•6 minutes
Spark Cluster Setup and Configuration – Part 2•5 minutes
Data Processing with PySpark RDDs and DataFrames focuses on practical data processing using PySpark's Python API for Apache Spark. Through real-world examples, visual walkthroughs, hands-on labs, and guided design activities, you'll implement data processing operations using both RDDs and DataFrames, develop transformation pipelines, apply common data cleaning and preparation techniques, and optimize PySpark code for better performance across enterprise-scale big data scenarios.
What's included
6 videos3 readings10 assignments
Show info about module content
6 videos•Total 37 minutes
Python Meets Big Data with PySpark•4 minutes
PySpark Development Workflow•9 minutes
DataFrames: Structured Big Data Made Simple•4 minutes
DataFrame Operations and Schema Management•8 minutes
Advanced Analytics with PySpark Transformations•5 minutes
Building Complex Transformation Pipelines•7 minutes
3 readings•Total 30 minutes
PySpark Development Environment and Best Practices•10 minutes
PySpark DataFrame Programming Guide•10 minutes
Advanced PySpark DataFrame Operations•10 minutes
10 assignments•Total 300 minutes
PySpark Environment Setup•30 minutes
PySpark Development Environment•30 minutes
PySpark Development Fundamentals Assessment•30 minutes
DataFrame Schema and Operations•30 minutes
DataFrame Data Cleaning Pipeline•30 minutes
DataFrame Operations and Schema Assessment•30 minutes
PySpark Data Processing Mastery Graded Quiz•30 minutes
Advanced Data Processing with Spark SQL
Module 4•6 hours to complete
Module details
Advanced Data Processing with Spark SQL introduces Spark SQL as a powerful interface for structured data processing in distributed environments. Through real-world examples, visual walkthroughs, hands-on labs, and guided design activities, you'll master SQL operations at scale, from basic queries to complex analytical operations, learn to create and manage temporary views and tables, and optimize query performance for production workloads that would overwhelm traditional database systems.
What's included
6 videos3 readings10 assignments
Show info about module content
6 videos•Total 35 minutes
SQL at Scale with Spark SQL•4 minutes
Spark SQL Environment and Basic Queries•7 minutes
Enterprise Analytics with Advanced Spark SQL•5 minutes
Implementing Complex Analytical Queries•7 minutes
Optimizing Spark SQL for Production Performance•5 minutes
Query Performance Analysis and Tuning•7 minutes
3 readings•Total 30 minutes
Spark SQL Architecture and Programming Model•10 minutes
Advanced Spark SQL Operations and Optimization•10 minutes
Spark SQL Performance Tuning and Optimization•10 minutes
Data Visualization for Big Data with Power BI introduces comprehensive visualization techniques specifically designed for big data environments using Microsoft Power BI. Through real-world examples, visual walkthroughs, hands-on labs, and guided design activities, you'll learn to connect Power BI to various big data sources, create effective visualizations for large datasets, build interactive dashboards that enable self-service analytics, and implement best practices for handling performance challenges when visualizing massive datasets.
What's included
7 videos3 readings10 assignments
Show info about module content
7 videos•Total 42 minutes
Bridging Big Data and Business Intelligence•4 minutes
Big Data Source Configuration•9 minutes
The Art of Big Data Storytelling•5 minutes
Building Effective Big Data Visualizations•9 minutes
Interactive Analytics at Enterprise Scale•5 minutes
Advanced Dashboard Development - Part 1•5 minutes
Advanced Dashboard Development - Part 2•6 minutes
3 readings•Total 30 minutes
Power BI Big Data Connectivity Guide•10 minutes
Big Data Visualization Design Principles•10 minutes
Interactive Dashboard Design for Big Data•10 minutes
Our goal at Microsoft is to empower every individual and organization on the planet to achieve more.
In this next revolution of digital transformation, growth is being driven by technology. Our integrated cloud approach creates an unmatched platform for digital transformation. We address the real-world needs of customers by seamlessly integrating Microsoft 365, Dynamics 365, LinkedIn, GitHub, Microsoft Power Platform, and Azure to unlock business value for every organization—from large enterprises to family-run businesses. The backbone and foundation of this is Azure.
When will I have access to the lectures and assignments?
To access the course materials, assignments and to earn a Certificate, you will need to purchase the Certificate experience when you enroll in a course. You can try a Free Trial instead, or apply for Financial Aid. The course may offer 'Full Course, No Certificate' instead. This option lets you see all course materials, submit required assessments, and get a final grade. This also means that you will not be able to purchase a Certificate experience.
What will I get if I subscribe to this Certificate?
When you enroll in the course, you get access to all of the courses in the Certificate, and you earn a certificate when you complete the work. Your electronic Certificate will be added to your Accomplishments page - from there, you can print your Certificate or add it to your LinkedIn profile.