Big Data Analytics
Section outline
-
-
Opened: Monday, 2 June 2025, 12:00 AMDue: Saturday, 7 June 2025, 4:00 PM
Practical Take-home Assignment
The GOAL of this take-home task is to ultimately do research and get our hands dirty by discovering how to prepare working environments to run some selected Big Data tools within an Oracle VirtualBox.
Note: “A mind ready to dive into technical research, with proper documentation at every stage and eagerness to fix missing dots is probably going to be your friend on this journey”.
1. Essential Downloads
A] Download Oracle VirtualBox 7.1.8 for windows hosts and appropriate extension pack(s).
B] Download Ubuntu 24.04.2 LTS from its official site.
C] Download Kaggle Chronic illness Dataset from:
https://www.kaggle.com/datasets/flaredown/flaredown-autoimmune-symptom-tracker
-
Any other downloads you may feel are necessary for you to achieve assigned tasks.
FOR FULL TASK DETAILS SEE ATTACHED FILE
-
-
-
-
Setting Up Apache Spark/PySpark on Windows 11 /10 machine
You are required to open above link and follow the instructions for installing Apache Spark in windows.
You will be required to take screen shots of every successfully step taken and share them in a single final pdf.
-
Opened: Saturday, 24 May 2025, 12:00 AMDue: Saturday, 31 May 2025, 1:00 PM
Share you final pdf of screenshots taken during installation of above exercise.
-
Example to try out after Apache Spark Installation
-
Due: Saturday, 24 May 2025, 2:38 PM
Apache Spark installation on Windows Forum discussion for idea troubleshooting.
-