7.3 Analytics Services — Athena, OpenSearch, and QuickSight
Key Takeaways
- Amazon Athena queries S3 data directly using SQL without loading data into a database — serverless, pay per query (\$5 per TB scanned).
- Amazon OpenSearch (successor to Elasticsearch) provides full-text search, log analytics, and real-time application monitoring with dashboards.
- Amazon QuickSight is a serverless BI tool that creates interactive dashboards and visualizations from multiple data sources.
- Use Athena for ad-hoc SQL queries on S3 data lakes; OpenSearch for full-text search and log analytics; QuickSight for business intelligence dashboards.
- Athena performance improves with columnar formats (Parquet, ORC) and partitioning — reducing scanned data reduces cost.
Analytics Services — Athena, OpenSearch, and QuickSight
Quick Answer: Athena = SQL queries on S3 (serverless, pay per scan). OpenSearch = full-text search + log analytics + dashboards. QuickSight = serverless BI dashboards. Use Athena for ad-hoc S3 queries, OpenSearch for log/search use cases, QuickSight for business reporting.
Amazon Athena
Athena is a serverless, interactive query service that analyzes data in S3 using standard SQL.
| Feature | Detail |
|---|---|
| Serverless | No infrastructure to manage |
| Query language | Standard SQL (Presto engine) |
| Data source | S3 (CSV, JSON, Parquet, ORC, Avro) |
| Cost | $5 per TB of data scanned |
| Integration | Uses Glue Data Catalog for schemas |
| Federated query | Query data in RDS, DynamoDB, and other sources |
Athena Cost Optimization
| Technique | Impact |
|---|---|
| Use columnar formats (Parquet, ORC) | 30-90% less data scanned |
| Partition data | Skip scanning irrelevant partitions (e.g., by date) |
| Compress data | Less data = less scanning = less cost |
| Use LIMIT | Limits data processed in simple queries |
When to Use Athena
- Ad-hoc queries on data in S3
- Query CloudTrail logs, VPC Flow Logs, ELB access logs
- Data exploration in a data lake
- One-time analysis without setting up a database
Amazon OpenSearch Service
OpenSearch (successor to Amazon Elasticsearch Service) provides search, log analytics, and real-time application monitoring.
| Feature | Detail |
|---|---|
| Search | Full-text search with relevance scoring |
| Analytics | Log analytics, trace analytics, metrics |
| Dashboards | Built-in OpenSearch Dashboards (Kibana successor) |
| Managed | Automated patching, backups, monitoring |
| Scaling | Up to 3 PB of data, horizontal scaling |
| Serverless | OpenSearch Serverless option (auto-scaling) |
Common OpenSearch Architecture
CloudWatch Logs → Subscription Filter → Lambda → OpenSearch
VPC Flow Logs → Kinesis Data Firehose → OpenSearch
Application Logs → Kinesis Data Firehose → OpenSearch
On the Exam: "Search through application logs and create dashboards" → OpenSearch. "Run SQL queries on data in S3" → Athena.
Amazon QuickSight
QuickSight is a serverless business intelligence (BI) service for creating visualizations and dashboards.
| Feature | Detail |
|---|---|
| Serverless | No servers to manage |
| Data sources | S3, RDS, Redshift, Athena, DynamoDB, on-premises databases |
| SPICE | In-memory engine for fast dashboard performance |
| ML Insights | Anomaly detection, forecasting, natural language queries |
| Embedding | Embed dashboards in applications |
| Per-session pricing | Pay per user session (cost-effective for occasional use) |
Analytics Service Comparison
| Service | Type | Best For |
|---|---|---|
| Athena | Ad-hoc SQL on S3 | Data lake queries, log analysis |
| OpenSearch | Full-text search + analytics | Log analytics, search applications |
| QuickSight | BI dashboards | Business reporting, visualizations |
| Redshift | Data warehouse | Complex analytics on structured data |
| EMR | Big data processing | Spark, Hadoop, large-scale ETL |
| Kinesis Data Analytics | Real-time SQL/Flink | Streaming data analytics |
A data analyst needs to run SQL queries on CSV files stored in S3 without setting up any infrastructure. Which service should they use?
How can you reduce the cost of Amazon Athena queries?