From the course: AWS Essential Training for Developers

Unlock this course with a free trial

Join today to access over 24,900 courses taught by industry experts.

Data Analytics in AWS

Data Analytics in AWS

- [Instructor] To extract information from the raw files in your data lake, you first have to tell AWS that this blob of files on your S3 bucket is raw data and then convert it into a semi-structured form that's usable by most AWS services, such as Redshift Spectrum or Athena. Redshift isn't magic, so it can't directly read your messy text files and understand their meaning, so you might need to first convert them into another semi-structured format like Parquet. To convert your raw data into something usable, you'll need an ETL pipeline or extract, transform, and load, and Amazon's tool for this is AWS Glue. With Glue, you create a data catalog that other AWS apps can use, like telling Athena that this folder of files is web server access logs. Glue can then read from your messy flat files or even from other databases and perform any necessary transformations, and then put your newly structured or semi-structured data…

Contents