Amazon Athena

Overview of Amazon Athena, Features, Architecture, Best Practices, Pricing

What is Amazon Athena ?

Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Amazon Athena
  1. Following are some of the features of Athena
  1. Start querying instantly – Serverless, no ETL – Athena is serverless. You can quickly query your data without having to setup and manage any servers or data warehouses.
  2. Pay per query – Only pay for data scanned – With Amazon Athena, you pay only for the queries that you run. You are charged $5 per terabyte scanned by your queries.
  3. Open, powerful, standard – Built on Presto, runs standard SQL – Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet.
  4. Highly available & durable – Amazon Athena is highly available and executes queries using compute resources across multiple facilities, automatically routing queries appropriately if a particular facility is unreachable. 
  5. Federated query – Athena enables you to run SQL queries across data stored in relational, non-relational, object, and custom data sources.
  6. Athena Grafana Plug-in – Simple configuration setup Built-in sample dashboard SQL query input interface Visualize Athena results.
  7. Step Functions integration – Embedded console experience , Use case templates , State machine access
  8. Glue Partition Indexes – Athena now supports AWS Glue Partition Indexes to Reduce metadata transfer and improves query performance

Amazon Athena Architectural Patterns

Data exploration – In this Pattern, Data is ingested from multiple data sources , crawled & cataloged through Glue and query through Athena – Quick Sight, Athena – BI tools for Data exploration

Amazon Athena – Data Exploration Pattern

ETL and Query – In this Pattern, Athena is utilized for ETL processing using CTAS queries from raw data, transformed into Curated data which is then crawled by Glue Catalog and queried by Amazon Athena

Amazon Athena – ETL query Pattern

Data Integration – In this Pattern, Amazon Athena is Integrated with multiple sources using Federation capabilities

Amazon Athena – Data Integration

Machine Learning – In this Pattern, Amazon Athena is Integrated with Amazon Sagemaker to query the Inference data

Amazon Athena – Machine Learning

Amazon Athena ACID Transactions

ACID transactions enable multiple users to concurrently and reliably add and delete Amazon S3 objects in an atomic manner, while isolating any existing queries by maintaining read consistency for queries against the data lake.

Athena ACID transactions add single-table support for write, delete, update, and time travel operations to the Athena SQL data manipulation language (DML). 

Amazon Athena Governed Tables

Athena supports read operations using AWS Lake Formation governed tables. The ACID features help ensure that queries are reliable in the face of complex changes to the underlying data. Governed tables in AWS Lake Formation provide the following capabilities:

Amazon Athena Governed Tables
  1. ACID transactions – Read and write to and from multiple tables in your Amazon S3 data lake using ACID (atomic, consistent, isolated, and durable) transactions. 
  2. Time travel and version travel queries – Each governed table maintains a versioned manifest of the Amazon S3 objects that it comprises. Previous versions of the manifest can be used for time travel and version travel queries. 
  3. Automatic data compaction – For improved performance, Lake Formation automatically compacts small Amazon S3 objects from governed tables into larger objects.
  4. Security – Supports row-level, cell-level, column-level permissions

Amazon Athena Iceberg Tables

Apache Iceberg is an open table format for very large analytic datasets. Iceberg manages large collections of files as tables, and it supports modern analytical data lake operations such as record-level insert, update, delete, and time travel queries.

Athena supports read, time travel, and write queries for Apache Iceberg tables that use the Apache Parquet format for data and the AWS Glue catalog for their metastore.

Amazon Athena Iceberg tables

Following are some of the features

  1. Athena supports Iceberg tables in Parquet file format only. ORC and AVRO are not supported.
  2. Schema Evolution – Iceberg schema updates are metadata-only changes. No data files are changed when you perform a schema update.

More information on Amazon Athena Transactions can be found at https://docs.aws.amazon.com/athena/latest/ug/acid-transactions.html

Amazon Athena Pricing

With Amazon Athena, you only pay for the queries that you run. You are charged based on the amount of data scanned by each query. Amazon Athena has simple pricing model. For more information on Amazon Athena Pricing, refer to our post http://www.cloudinfonow.com/amazon-athena-pricing/

This Post Has 2 Comments

Comments are closed.