Outline & Objectives

Course OutlineThis course studies technology for big data analysis such as storage and retrieval of large-scale data and data analysis algorithms.
Learning ObjectivesUnderstand the concepts of big data analysis and examine major big data analysis tools and platforms. Study the fundamental concepts and techniques used for building big data analysis tools and platforms. Conduct a research project for improving big data analysis performance. Understand the concepts of big data analysis and examine major big data analysis tools and platforms. Study the fundamental concepts and techniques used for building big data analysis tools and platforms. Conduct a research project for improving big data analysis performance.

Evaluation Criteria

Mid Term30%
Final Term30%
Assignments20%
Presentation10%
Attendance10%

Lecture Schedule

WeekLecture Topics and Contents
Week 1Lecture Overview
Week 2Introduction to Hadoop
Week 3Importing and Modeling Structured Data: Sqoop, Impala, Hive
Week 4Importing and Modeling Structured Data: Modeling & Managing Data
Week 5Importing and Modeling Structured Data: Data Formats & Data File Partitioning
Week 6Ingesting and Streaming Data: Capturing Data with Apache Flume
Week 7Distributed Data Processing with Spark: Spark Basics
Week 8Midterm Exam
Week 9Distributed Data Processing with Spark: Working with RDDs Distributed Data Processing with Spark: Aggregating/Pair RDDs
Week 10Distributed Data Processing with Spark: Writing and Deploying Applications
Week 11Distributed Data Processing with Spark: Parallel Processing & RDD Persistence
Week 12Distributed Data Processing with Spark: Common Patterns
Week 13Distributed Data Processing with Spark: Data Frames/SQL
Week 14Project Presentation
Week 15Final Exam
Week 16Reserved