AWS Redshift Spectrum

Overview of AWS Redshift Spectrum

What is AWS Redshift Spectrum?

Redshift Spectrum is a feature in AWS Redshift data warehousing service.

How is Redshift Spectrum different from other features of AWS Redshift?

With AWS Redshift Spectrum, users can query and retrieve data from files in Amazon S3 with out the need of loading the data into Redshift tables. This is especially useful if performance is not a top priority and avoid complexity of data movement.

What are Key features of AWS Redshift Spectrum?

  1. Redshift Spectrum is highly cost effective due to pay per usage functionality. You will pay for the amount of data scanned i.e. $5 per TB of data scan.
  2. Redshift Spectrum is easy to setup and use. Minimal Administration required when compared to other features of Redshift.
  3. Redshift Spectrum is serverless and highly scalable. AWS will scale the capacity as per the user load and can handle large concurrent transactions.
  4. Redshift Spectrum can act as Lakehouse layer.

How to setup, configure & use AWS Redshift Spectrum?

AWS Redshift Spectrum requires a external data catalog service. AWS Recommends AWS Glue. You can also utilize Hive metastore running on EMR. Spectrum also needs a base Redshift Cluster since any queries goes through the cluster. To Setup, Configure & use AWS Redshift Spectrum, follow the below steps

  • Build a Redshift Cluster with minimum configuration.
  • Create the required IAM roles & Policies with necessary permission.
  • Integrate with External Data Catalog.
  • Create External tables with location pointing to S3 bucket paths.
  • To query the data using JDBC & ODBC drivers, download the drivers and install in the JDBC/ODBC tools.
  • User management can be handled locally on Redshift cluster or you can integrate with AD through SSO.

What are some of AWS Redshift Spectrum Best Practices?

  • Ensure the data stored in S3 buckets referred in Spectrum tables utilize Columnar format, especially Parquet which is suitable.
  • Implement Partitioning where available. Partitioning the data will avoid scanning the entire data set and improve performance.

Redshift Spectrum Pricing

With Redshift Spectrum, you can query on already existing S3 data files utilizing Spectrum nodes. You are charged for number of bytes scanned, rounded to the next MB with 10 MB minimum per query. Charges are $5 per TB of data Scanned.

On top of the above Scan cost, You are charged for the Amazon Redshift cluster used to query data with Redshift Spectrum. Redshift Spectrum queries data directly in Amazon S3. You are charged standard S3 rates for storing objects in your S3 buckets, and for requests made against your S3 buckets.

With Amazon Redshift Serverless, there is no longer separate charge for Redshift Spectrum. It will be part of Amazon Redshift serverless cost measured in RPUs

For more details about Amazon Redshift Pricing, check out our post –