AWS Redshift Best Practices

Amazon Redshift is a fully transactional, ACID compliant Cloud data warehouse.

Redshift has three Cluster types

  1. Redshift : Leader node and Compute Node with Data stored locally on compute nodes. Computes nodes can be of Dense Compute (DC2) and Dense Storage (DS2 types)
  2. Redshift Spectrum : Redshift Spectrum nodes with data stored in S3. You still need a Redshift cluster .
  3. Redshift Managed Storage : Storage and compute completely decoupled. Storage will be managed by S3 and data transfers back & forth between Compute and Storage layers through Redshift AQUA.

Redshift Key Components

Leader Node : Stores Metadata. Interacts with Worker Nodes for executing the SQL Processing in Parallel.
Compute Nodes : Data stored in Local disks and executes queries in Parallel.

Best practices

  1. Create sort key on frequently filtered columns with lowest cardinality.
  2. Do not create more than four sort key columns.
  3. Implement compression on columns (use AZ64 with best performance where possible).
  4. Keep data types as wide as possible.
  5. If columns do not cause skew, co-locate large tables using DISTSTYLE KEY.
  6. Limit Commits to reduce commit overhead.
  7. For Data Ingestion utilizing COPY statement, split the single file in to equal number of available slices for parallelism and improved performance. To find slices – SELECT count (slice) from stv_slices.
  8. Redshift spectrum –
    1. Save cluster resources for querying and reporting rather than on ELT.
    2. Aggregate incoming data.
    3. Select subset of columns and/or rows.
  9. Consider using DROP TABLE or TRUNCATE instead of DELETE Staging.
  10. For copying millions of rows, use ALTER TABLE APPEND instead of INSERT INTO SELECT.
  11. For staging tables, use temporary table or table with “BACKUP NO” option.
  12. WLM –
    1. Use Auto WLM , if your workload is unpredictable.
    2. Use Manual WLM, if you have well architected workload patterns.
    3. Keep number of WLM queues to Minimum (around 3).
  13. Cluster setup – Keep at least 20% of Free space.

Pricing

Redshift Pricing depends on the type of cluster and usage details. For more information about Redshift Pricing , check out our post at http://www.cloudinfonow.com/aws-redshift-pricing/