Building Big Data Pipelines with PySpark + MongoDB + Bokeh
Introduction
Project Files
Python Installation
Installing Third Party Libraries
Installing Apache Spark
Installing Java (Optional)
Testing Apache Spark Installation
Installing MongoDB
Installing NoSQL Booster for MongoDB
Integrating PySpark with Jupyter Notebook
Data Extraction
Data Transformation
Loading Data into MongoDB
Data Pre-processing
Building the Predictive Model
Creating the Prediction Dataset
Loading the Data Sources from MongoDB
Creating a Map Plot
Creating a Bar Chart
Creating a Magnitude Plot
Creating a Grid Plot
Installing Visual Studio Code
Creating the PySpark ETL Script
Creating the Machine Learning Script
Creating the Dashboard Server
Sources Code