A data analyst must run ad-hoc SQL queries on JSON and CSV files already sitting in S3, with no infrastructure to manage and pay only for what is queried. Which service fits?

Amazon Athena. Amazon Athena is serverless and queries S3 data in place with standard SQL, billing per terabyte scanned, so there is no cluster to size and no idle cost. Redshift and EMR require managing clusters, and RDS would first require loading the data into a database.

An Athena query over a 10 TB raw CSV dataset is both slow and expensive. Which change most reduces the cost per query?

Convert the data to partitioned Parquet and select only needed columns. Athena bills $5 per terabyte scanned, so converting to columnar, date-partitioned Parquet and projecting only needed columns lets queries read a tiny fraction of the data and slashes the bill. Athena is serverless, so there are no worker nodes or auto-scaling groups to tune, and bucket size has no effect on scanned bytes.

An operations team needs to centralize application logs, run full-text searches with relevance ranking, and build interactive log dashboards. Which service is the best fit?

Amazon OpenSearch Service. Amazon OpenSearch Service is purpose-built for full-text search with relevance scoring and log analytics, and ships with OpenSearch Dashboards for interactive log visualization. QuickSight is a BI presentation layer, Athena is SQL-on-S3 without search relevance, and Redshift is a structured data warehouse.

Business users need interactive dashboards built on data already queryable through Athena, with fast rendering and billing only for the people who actually view reports. Which service should be added?

Amazon QuickSight using SPICE with per-session pricing. Amazon QuickSight is serverless BI that reads from Athena, accelerates rendering with the in-memory SPICE engine, and offers per-session pricing so you pay only for active viewers. EMR is for big-data processing, OpenSearch Dashboards target log/search data, and a self-managed Grafana server reintroduces the servers and patching the requirement wants to avoid.

Analytics Services — Athena, OpenSearch, and | Free Guide 2026

Key Takeaways

Amazon Athena runs standard SQL directly on S3 data with no infrastructure and bills $5 per TB of data scanned, so cost is controlled by scanning less data, not by sizing servers.
Converting data to columnar Parquet or ORC, partitioning by a query predicate (such as date), and compressing files can cut Athena's scanned bytes by 30–90 percent and the bill with it.
Amazon OpenSearch Service (successor to Elasticsearch) is the answer for full-text search, log analytics, and dashboards; ingest logs via Kinesis Data Firehose or CloudWatch Logs subscription filters.
Amazon QuickSight is serverless business intelligence with the in-memory SPICE engine and per-session pricing, ideal for embedded dashboards and occasional viewers.
Match the workload: Athena for ad-hoc SQL on a data lake, OpenSearch for search and log analytics, QuickSight for visual BI, Redshift for a structured data warehouse, EMR for big-data ETL.

Amazon Athena: Serverless SQL on S3

Amazon Athena is a serverless, interactive query service that runs standard SQL (Trino/Presto engine) directly against data in Amazon S3. There is nothing to provision; you define a table schema in the AWS Glue Data Catalog and query immediately. Athena is the default answer whenever a question says "run SQL on data already in S3" or "analyze CloudTrail / VPC Flow Logs / ALB access logs without a database."

Pricing is the load-bearing fact: $5 per terabyte of data scanned. There are no compute hours and no idle charge, so the only lever is reducing bytes read.

Cost-reduction technique	Effect on bytes scanned
Columnar format (Parquet or ORC)	Reads only needed columns; 30–90% less
Partition by query predicate (e.g., date)	Skips irrelevant partitions entirely
Compress files (Snappy, gzip)	Fewer bytes per row read
Avoid `SELECT *`; project specific columns	Reads fewer columns

Worked example: A 10 TB raw CSV log queried with SELECT * scans the full 10 TB ($50). Converting to date-partitioned Parquet and selecting three columns for one day might scan 5 GB instead, dropping the same query to a few cents.

Amazon OpenSearch Service: Search and Log Analytics

Amazon OpenSearch Service (the successor to the former Amazon Elasticsearch Service) provides full-text search with relevance scoring, log and trace analytics, and OpenSearch Dashboards (the Kibana successor) for visualization. It is the exam's answer for "search application logs and build dashboards," "full-text search with relevance ranking," or "real-time observability."

There are two flavors: provisioned domains (you choose instance types and node counts, scaling to petabytes) and OpenSearch Serverless (auto-scaling collections, no node management). Choose serverless when the question stresses unpredictable load and no capacity planning.

Typical ingestion architectures the exam expects you to recognize:

CloudWatch Logs --(subscription filter)--> Lambda --> OpenSearch
VPC Flow Logs / App Logs --> Kinesis Data Firehose --> OpenSearch

OpenSearch strength	When it is the answer
Full-text search + relevance	"search a product catalog", "fuzzy text search"
Log / trace analytics	"centralized log analysis with dashboards"
Near-real-time indexing	"observe events seconds after they occur"

Trap: if the requirement is plain SQL on S3 with no search and no dashboards, that is Athena, not OpenSearch. OpenSearch shines specifically at text relevance and operational log exploration.

Amazon QuickSight and Choosing the Right Tool

Amazon QuickSight is AWS's serverless business intelligence (BI) service for interactive dashboards and visualizations. Its in-memory engine, SPICE (Super-fast, Parallel, In-memory Calculation Engine), pre-loads data for fast charting without hammering the source. QuickSight connects to S3, Athena, Redshift, RDS, and on-premises databases, supports per-session pricing for occasional readers, and can embed dashboards in your own applications. ML Insights adds anomaly detection and forecasting.

QuickSight is the answer for "business users need interactive dashboards/reports" or "embed analytics in our app and pay only for active viewers." It is a presentation layer, not a query engine; it reads from sources like Athena or Redshift.

Pick-the-service matrix

Need	Service
Ad-hoc SQL on S3, pay per scan	Athena
Full-text search + log dashboards	OpenSearch
Business dashboards / visual BI	QuickSight
Structured data warehouse, complex joins	Redshift
Spark / Hadoop big-data ETL	EMR
Streaming SQL / Flink in real time	Managed Service for Apache Flink

Combined pattern: a classic data-lake answer is S3 (Parquet) -> Athena (query) -> QuickSight (dashboard). Athena does the serverless querying, QuickSight visualizes, and neither requires a running cluster. If the question instead emphasizes sub-second text search over millions of log lines with relevance ranking, switch the query layer to OpenSearch.

The Glue Data Catalog and the Wider Analytics Family

Athena, Redshift Spectrum, and EMR all read table definitions from the AWS Glue Data Catalog, a central metadata store. A Glue crawler scans S3 and infers schema and partitions automatically, so "discover the schema of files in S3 without writing DDL" points to a Glue crawler feeding the catalog, which Athena then queries. AWS Glue itself is the serverless extract-transform-load (ETL) service for converting raw CSV/JSON into the partitioned Parquet that makes Athena cheap.

Know where each neighboring service fits so you can eliminate distractors:

Amazon Redshift is a petabyte-scale columnar data warehouse for repeated, complex joins and aggregations over structured data; pick it over Athena when the same heavy queries run constantly and predictable performance matters more than pay-per-scan. Redshift Spectrum lets a Redshift cluster query S3 directly, blending warehouse and lake.
Amazon EMR runs managed Apache Spark, Hadoop, Hive, and Presto for large-scale custom ETL and machine learning; choose it when the workload needs Spark code or framework control that Athena's SQL-only model cannot express.
Amazon Managed Service for Apache Flink (formerly Kinesis Data Analytics) runs continuous SQL or Flink over streaming data for real-time aggregations.
Amazon Kinesis Data Firehose is the near-real-time loader that buffers streaming records and delivers them to S3, Redshift, or OpenSearch, often the ingestion arrow drawn in analytics architecture diagrams.

Question signal	Service
"infer schema from S3 automatically"	Glue crawler + Data Catalog
"serverless ETL to Parquet"	AWS Glue
"constant complex BI queries, predictable perf"	Redshift
"Spark / Hadoop custom processing"	EMR
"real-time streaming aggregation"	Managed Service for Apache Flink

AWS Solutions Architect Associate

AWS Solutions Architect

7.3 Analytics Services — Athena, OpenSearch, and QuickSight

Key Takeaways

Amazon Athena: Serverless SQL on S3

Amazon OpenSearch Service: Search and Log Analytics

Amazon QuickSight and Choosing the Right Tool

Pick-the-service matrix

The Glue Data Catalog and the Wider Analytics Family

AWS Solutions Architect Associate

1Introduction

2Domain 1: Design Secure Architectures (30%)

3Domain 2: Design Resilient Architectures (26%)

4Domain 3: Design High-Performing Architectures (24%)

5Domain 4: Design Cost-Optimized Architectures (20%)

6VPC and Networking Deep Dive

7Migration, Transfer, and Hybrid Services

8Serverless Architecture and Application Services

9Advanced Topics and Exam Scenarios

AWS Solutions Architect

7.3 Analytics Services — Athena, OpenSearch, and QuickSight

Key Takeaways

Amazon Athena: Serverless SQL on S3

Amazon OpenSearch Service: Search and Log Analytics

Amazon QuickSight and Choosing the Right Tool

Pick-the-service matrix

The Glue Data Catalog and the Wider Analytics Family