Outline & Objectives
Course Outline | This course studies technology for big data analysis such as storage and retrieval of large-scale data and data analysis algorithms. |
Learning Objectives | Understand the concepts of big data analysis and examine major big data analysis tools and platforms. Study the fundamental concepts and techniques used for building big data analysis tools and platforms. Conduct a research project for improving big data analysis performance. Understand the concepts of big data analysis and examine major big data analysis tools and platforms. Study the fundamental concepts and techniques used for building big data analysis tools and platforms. Conduct a research project for improving big data analysis performance. |
Evaluation Criteria
Mid Term | 30% |
Final Term | 30% |
Assignments | 20% |
Presentation | 10% |
Attendance | 10% |
Lecture Schedule
Week | Lecture Topics and Contents |
---|---|
Week 1 | Lecture Overview |
Week 2 | Introduction to Hadoop |
Week 3 | Importing and Modeling Structured Data: Sqoop, Impala, Hive |
Week 4 | Importing and Modeling Structured Data: Modeling & Managing Data |
Week 5 | Importing and Modeling Structured Data: Data Formats & Data File Partitioning |
Week 6 | Ingesting and Streaming Data: Capturing Data with Apache Flume |
Week 7 | Distributed Data Processing with Spark: Spark Basics |
Week 8 | Midterm Exam |
Week 9 | Distributed Data Processing with Spark: Working with RDDs Distributed Data Processing with Spark: Aggregating/Pair RDDs |
Week 10 | Distributed Data Processing with Spark: Writing and Deploying Applications |
Week 11 | Distributed Data Processing with Spark: Parallel Processing & RDD Persistence |
Week 12 | Distributed Data Processing with Spark: Common Patterns |
Week 13 | Distributed Data Processing with Spark: Data Frames/SQL |
Week 14 | Project Presentation |
Week 15 | Final Exam |
Week 16 | Reserved |