CONTENUTI:
Module 1: Overview of Big Data
- What is big data
- The big data pipeline
- Big data architectural principals
Module 2: Big Data ingestion and transfer
- Overview: Data ingestion
- Transferring data
Module 3: Big data streaming and Amazon Kinesis
- Stream processing of big data
- Amazon Kinesis
- Amazon Kinesis Data Firehose
- Amazon Kinesis Video Streams
- Amazon Kinesis Data Analytics
- Hands-on lab 1: Streaming and Processing Apache Server Logs Using Amazon Kinesis
Module 4: Big data storage solutions
- AWS data storage options
- Storage solutions concepts
- Factors in choosing a data store
Module 5: Big data processing and analytics
- Big data processing and analytics
- Amazon Athena
- Hands-on lab 2: Using Amazon Athena to Analyze Log Data
Module 6: Apache Hadoop and Amazon EMR
- Introduction to Amazon EMR and Apache Hadoop
- Best practices for ingesting data
- Amazon EMR
- Amazon EMR architecture
- Hands-on lab 3: Storing and Querying Data on Amazon DynamoDB
Module 7: Using Amazon EMR
- Developing and running your application
- Launching your cluster
- Handling output from your completed jobs
Module 8: Hadoop programming frameworks
- Hadoop frameworks
- Other frameworks for use on Amazon EMR
- Hands-on lab 4: Processing Server Logs with Hive on Amazon EMR
Module 9: Web interfaces on Amazon EMR
- Hue on Amazon EMR
- Monitoring your cluster
- Hands-on lab 5: Running Pig Scripts in Hue on Amazon EMR
Module 10: Apache Spark on Amazon EMR
- Apache Spark
- Using Spark
- Hands-on lab 6: Processing NY Taxi Data Using Apache Spark
Module 11: Using AWS Glue to automate ETL workloads
- What is AWS Glue?
- AWS Glue: Job orchestration
Module 12: Amazon Redshift and big data
- Data warehouses vs. traditional databases
- Amazon Redshift
- Amazon Redshift architecture
Module 13: Securing your Amazon deployments
- Securing your Amazon deployments
- Amazon EMR security overview
- AWS Identity and Access Management (IAM) overview
- Securing data
- Amazon Kinesis security overview
- Amazon DynamoDB security overview
- Amazon Redshift security overview
Module 14: Managing big data costs
- Total cost considerations for Amazon EMR
- Amazon EC2 pricing models
- Amazon Kinesis pricing models
- Cost considerations for Amazon DynamoDB
- Cost considerations and pricing models for Amazon Redshift
- Optimizing cost with AWS
Module 15: Visualizing and orchestrating big data
- Visualizing big data
- Amazon QuickSight
- Orchestrating a big data workflow
- Hands-on lab 7: Using TIBCO Spotfire to visualize data
Module 16: Big data design patterns
Module 17: Course wrap-up