• 沒有找到結果。

Clean up your resources

在文檔中 Amazon Redshift (頁 30-34)

By using workload management, you can run queries in different query queues so that you don't need to wait for another query to complete. The workload manager creates a separate queue, called the Superuser queue, that you can use for troubleshooting. To use the Superuser queue, log on a superuser and set the query group to 'superuser' using the SET command. After running your commands, reset the query group using the RESET command.

To cancel a query using the Superuser queue, run these commands.

SET query_group TO 'superuser';

CANCEL 610;

RESET query_group;

Task 8: Clean up your resources

If you deployed a cluster to complete this exercise, when you are finished with the exercise delete the cluster. Deleting the cluster stops it accruing charges to your AWS account.

To delete the cluster, follow the steps in Deleting a cluster in the Amazon Redshift Cluster Management Guide.

If you want to keep the cluster, keep the sample data for reference. Most of the examples in this guide use the tables that you create in this exercise. The size of the data won't have any significant effect on your available storage.

If you want to keep the cluster, but want to clean up the sample data, run the following command to drop the SALESDB database.

DROP DATABASE SALESDB;

If you didn't create a SALESDB database, or if you don't want to drop the database, run the following commands to drop just the tables.

DROP TABLE DEMO;

DROP TABLE users;

DROP TABLE venue;

DROP TABLE category;

DROP TABLE date;

DROP TABLE event;

DROP TABLE listing;

DROP TABLE sales;

Getting started querying data lakes

Getting started with querying

data sources outside your Amazon Redshift database

Following, you can find information about how to get started querying data on remote sources, including remote Amazon Redshift clusters. You can also find information about training machine learning (ML) models using Amazon Redshift.

Topics

• Getting started querying your data lake (p. 28)

• Getting started querying data on remote data sources (p. 28)

• Getting started accessing data in other Amazon Redshift clusters (p. 29)

• Getting started training machine learning models with Amazon Redshift data (p. 29)

Getting started querying your data lake

You can use Amazon Redshift Spectrum to query data in Amazon S3 files without having to load the data into Amazon Redshift tables. You can query data in many formats, including Parquet, ORC, RCFile, TextFile, SequenceFile, RegexSerde, OpenCSV, and AVRO. To define the structure of the files in Amazon S3, you create external schemas and tables. Then, you use an external data catalog such as AWS Glue or your own Apache Hive metastore. Changes to either type of data catalog are immediately available to any of your Amazon Redshift clusters.

After your data is registered with an AWS Glue Data Catalog and enabled with AWS Lake Formation, you can query it by using Redshift Spectrum.

Redshift Spectrum resides on dedicated Amazon Redshift servers that are independent of your cluster.

Redshift Spectrum pushes many compute-intensive tasks, such as predicate filtering and aggregation, to the Redshift Spectrum layer. Redshift Spectrum also scales intelligently to take advantage of massively parallel processing.

You can partition the external tables on one or more columns to optimize query performance through partition elimination. You can query and join the external tables with Amazon Redshift tables. You can access external tables from multiple Amazon Redshift clusters and query the Amazon S3 data from any cluster in the same AWS Region. When you update Amazon S3 data files, the data is immediately available for queries from any of your Amazon Redshift clusters.

For more information about Redshift Spectrum, including how to work with Redshift Spectrum and data lakes, see Getting started with Amazon Redshift Spectrum in Amazon Redshift Database Developer Guide.

Getting started querying data on remote data sources

You can join data from an Amazon RDS database, an Amazon Aurora database, or Amazon S3 with data in your Amazon Redshift database using a federated query. You can use Amazon Redshift to query

Getting started accessing data in other clusters

operational data directly (without moving it), apply transformations, and insert data into your Redshift tables. Some of the computation for federated queries is distributed to the remote data sources.

To run federated queries, Amazon Redshift first makes a connection to the remote data source. Amazon Redshift then retrieves metadata about the tables in the remote data source, issues queries, and then retrieves the result rows. Amazon Redshift then distributes the result rows to Amazon Redshift compute nodes for further processing.

For information about setting up your environment for federated queries, see one of the following topics in the Amazon Redshift Database Developer Guide:

• Getting started with using federated queries to PostgreSQL

• Getting started with using federated queries to MySQL

Getting started accessing data in other Amazon Redshift clusters

Using Amazon Redshift data sharing, you can share live data with high security and greater ease across Amazon Redshift clusters or AWS accounts for read purposes. You can have instant, granular, and high-performance access to data across Amazon Redshift clusters without your needing to manually copy or move it. Your users can see the most up-to-date and consistent information as it's updated in Amazon Redshift clusters.

Amazon Redshift data sharing is especially useful for these use cases:

• Centralizing business-critical workloads – Use a central extract, transform, and load (ETL) cluster that shares data with multiple business intelligence (BI) or analytic clusters. This approach provides read workload isolation and chargeback for individual workloads.

• Sharing data between environments – Share data among development, test, and production environments. You can improve team agility by sharing data at different levels of granularity.

For more information about data sharing, see Getting started data sharing in the Amazon Redshift Database Developer Guide.

Getting started training machine learning models with Amazon Redshift data

Using Amazon Redshift machine learning (Amazon Redshift ML), you can train a model by providing the data to Amazon Redshift. Then Amazon Redshift ML creates models that capture patterns in the input data. You can then use these models to generate predictions for new input data without incurring additional costs. By using Amazon Redshift ML, you can train machine learning models using SQL statements and invoke them in SQL queries for prediction. You can continue to improve the accuracy of the predictions by iteratively changing parameters and improving your training data.

Amazon Redshift ML makes it easier for SQL users to create, train, and deploy machine learning models using familiar SQL commands. By using Amazon Redshift ML, you can use your data in Amazon Redshift clusters to train models with Amazon SageMaker. You can then localize the models, and predictions then can be made within an Amazon Redshift database.

For more information about Amazon Redshift ML, see Getting started with Amazon Redshift ML in the Amazon Redshift Database Developer Guide.

Additional resources

When you have completed these tutorials, we recommend that you continue to learn more about the concepts introduced in this guide by using the following Amazon Redshift resources:

• Amazon Redshift Cluster Management Guide: This guide builds upon this Amazon Redshift Getting Started Guide. It provides in-depth information about the concepts and tasks for creating, managing, and monitoring clusters.

• Amazon Redshift Database Developer Guide: This guide also builds upon this Amazon Redshift Getting Started Guide. It provides in-depth information for database developers about designing, building, querying, and maintaining the databases that make up your data warehouse.

• SQL reference: This topic describes SQL commands and function references for Amazon Redshift.

• System tables and views: This topic describes system tables and views for Amazon Redshift.

• Tutorials for Amazon Redshift: This topic shows tutorials to learn about Amazon Redshift features.

• Using spatial SQL functions with Amazon Redshift: This tutorial demonstrates how to use some of the spatial SQL functions with Amazon Redshift.

• Loading data from Amazon S3: This tutorial describes how to load data into your Amazon Redshift database tables from data files in an Amazon S3 bucket.

• Querying nested data with Amazon Redshift Spectrum: This tutorial describes how to use Redshift Spectrum to query nested data in Parquet, ORC, JSON, and Ion file formats using external tables.

• Configuring manual workload management (WLM) queues: This tutorial describes how to configure manual workload management (WLM) in Amazon Redshift.

• Feature videos: These videos help you learn about Amazon Redshift features.

• To learn how to get started with Amazon Redshift, watch the following video: Getting stated with Amazon Redshift.

• To learn how Amazon Redshift data sharing works, watch the following video: Amazon Redshift data sharing workflow.

• To learn how Amazon Redshift Machine Learning (ML) works, watch the following video: Amazon Redshift ML.

• To learn how to monitor, isolate, and optimize your queries using the query monitoring features on the Amazon Redshift console, watch the following video: Query Monitoring with Amazon Redshift.

• What's new: This webpage lists Amazon Redshift new features and product updates.

Document history

The following table describes the important changes for the Amazon Redshift Getting Started Guide.

Latest documentation update: June 30, 2021

Change Description Release date

Documentation

update Updated the guide to include new sections about getting started with common database tasks, querying your data lake, querying data on remote sources, sharing data, and training machine learning models with Amazon Redshift data.

June 30, 2021

New feature Updated the guide to describe the new sample load

procedure. June 4, 2021

Documentation

update Updated the guide to remove the original Amazon

Redshift console and improve step flow. August 14, 2020 New console Updated the guide to describe the new Amazon Redshift

console. November 11, 2019

New feature Updated the guide to describe the quick-launch cluster

procedure. August 10, 2018

New feature Updated the guide to launch clusters from the Amazon

Redshift dashboard. July 28, 2015

New feature Updated the guide to use new node type names. June 9, 2015 Documentation

update Updated screenshots and procedure for configuring VPC

security groups. April 30, 2015

Documentation

update Updated screenshots and procedures to match the

current console. November 12, 2014

Documentation

update Moved loading data from Amazon S3 information into its own section and moved next steps section into the final step for better discoverability.

May 13, 2014

Documentation

update Removed the Welcome page and incorporated the

content into the main Getting Started page. March 14, 2014 Documentation

update This is a new release of the Amazon Redshift Getting Started Guide that addresses customer feedback and service updates.

March 14, 2014

New guide This is the first release of the Amazon Redshift Getting

Started Guide. February 14, 2013

在文檔中 Amazon Redshift (頁 30-34)

相關文件