Overview of Amazon Athena, Features, Architecture, Best Practices, Pricing
What is Amazon Athena ?
Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.
- Following are some of the features of Athena
- Start querying instantly – Serverless, no ETL – Athena is serverless. You can quickly query your data without having to setup and manage any servers or data warehouses.
- Pay per query – Only pay for data scanned – With Amazon Athena, you pay only for the queries that you run. You are charged $5 per terabyte scanned by your queries.
- Open, powerful, standard – Built on Presto, runs standard SQL – Amazon Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet.
- Highly available & durable – Amazon Athena is highly available and executes queries using compute resources across multiple facilities, automatically routing queries appropriately if a particular facility is unreachable.
- Federated query – Athena enables you to run SQL queries across data stored in relational, non-relational, object, and custom data sources.
- Athena Grafana Plug-in – Simple configuration setup Built-in sample dashboard SQL query input interface Visualize Athena results.
- Step Functions integration – Embedded console experience , Use case templates , State machine access
- Glue Partition Indexes – Athena now supports AWS Glue Partition Indexes to Reduce metadata transfer and improves query performance
Amazon Athena Architectural Patterns
Data exploration – In this Pattern, Data is ingested from multiple data sources , crawled & cataloged through Glue and query through Athena – Quick Sight, Athena – BI tools for Data exploration
ETL and Query – In this Pattern, Athena is utilized for ETL processing using CTAS queries from raw data, transformed into Curated data which is then crawled by Glue Catalog and queried by Amazon Athena
Data Integration – In this Pattern, Amazon Athena is Integrated with multiple sources using Federation capabilities
Machine Learning – In this Pattern, Amazon Athena is Integrated with Amazon Sagemaker to query the Inference data
Amazon Athena ACID Transactions
ACID transactions enable multiple users to concurrently and reliably add and delete Amazon S3 objects in an atomic manner, while isolating any existing queries by maintaining read consistency for queries against the data lake.
Athena ACID transactions add single-table support for write, delete, update, and time travel operations to the Athena SQL data manipulation language (DML).
Amazon Athena Governed Tables
Athena supports read operations using AWS Lake Formation governed tables. The ACID features help ensure that queries are reliable in the face of complex changes to the underlying data. Governed tables in AWS Lake Formation provide the following capabilities:
- ACID transactions – Read and write to and from multiple tables in your Amazon S3 data lake using ACID (atomic, consistent, isolated, and durable) transactions.
- Time travel and version travel queries – Each governed table maintains a versioned manifest of the Amazon S3 objects that it comprises. Previous versions of the manifest can be used for time travel and version travel queries.
- Automatic data compaction – For improved performance, Lake Formation automatically compacts small Amazon S3 objects from governed tables into larger objects.
- Security – Supports row-level, cell-level, column-level permissions
Amazon Athena Iceberg Tables
Apache Iceberg is an open table format for very large analytic datasets. Iceberg manages large collections of files as tables, and it supports modern analytical data lake operations such as record-level insert, update, delete, and time travel queries.
Athena supports read, time travel, and write queries for Apache Iceberg tables that use the Apache Parquet format for data and the AWS Glue catalog for their metastore.
Following are some of the features
- Athena supports Iceberg tables in Parquet file format only. ORC and AVRO are not supported.
- Schema Evolution – Iceberg schema updates are metadata-only changes. No data files are changed when you perform a schema update.
More information on Amazon Athena Transactions can be found at https://docs.aws.amazon.com/athena/latest/ug/acid-transactions.html
Amazon Athena Pricing
With Amazon Athena, you only pay for the queries that you run. You are charged based on the amount of data scanned by each query. Amazon Athena has simple pricing model. For more information on Amazon Athena Pricing, refer to our post http://www.cloudinfonow.com/amazon-athena-pricing/