• 沒有找到結果。

ORDER BY clause

在文檔中 Amazon Kinesis Data Analytics (頁 79-83)

ORDER BY clause

A streaming query can use ORDER BY if its leading expression is time-based and monotonic. For example, a streaming query whose leading expression is based on the ROWTIME column can use ORDER BY to do the following operations:

• Sort the results of a streaming GROUP BY.

• Sort a batch of rows arriving within a fixed time window of a stream.

• Perform streaming ORDER BY on windowed-joins.

The "time-based and monotonic" requirement on the leading expression means that the query

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM DISTINCT ticker FROM trades ORDER BY ticker

will fail, but the query

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM DISTINCT rowtime, ticker FROM trades ORDER BY ROWTIME, ticker

will succeed.

ORDER BY clause

NoteThe preceding examples use the DISTINCT clause to remove duplicate instances of the same ticker symbol from the result set, so that the results will be monotonic.

Streaming ORDER BY sorts rows using SQL-2008 compliant syntax for the ORDER BY clause. It can be combined with a UNION ALL statement, and can sort on expressions, such as:

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM x, y FROM t1 UNION ALL

SELECT STREAM a, b FROM t2 ORDER BY ROWTIME, MOD(x, 5)

The ORDER BY clause can specify ascending or descending sort order, and can use column ordinals, as well as ordinals specifying (referring to) the position of items in the select list.

NoteThe UNION statement in the preceding query collects records from two separate streams for ordering.

Streaming ORDER BY SQL Declarations

The streaming ORDER BY clause includes the following functional attributes:

• Gathers rows until the monotonic expression in streaming ORDER BY clause does not change.

• Does not require streaming GROUP BY clause in the same statement.

• Can use any column with a basic SQL data type of TIMESTAMP, DATE, DECIMAL, INTEGER, FLOAT, CHAR, VARCHAR.

• Does not require that columns/expressions in the ORDER BY clause be present in the SELECT list of the statement.

• Applies all the standard SQL validation rules for ORDER BY clause.

The following query is an example of streaming ORDER BY:

CREATE OR REPLACE PUMP "STREAM_PUMP" AS INSERT INTO "DESTINATION_SQL_STREAM"

SELECT STREAM state, city, SUM(amount) FROM orders

GROUP BY FLOOR(ROWTIME TO HOUR), state, city

ORDER BY FLOOR(ROWTIME TO HOUR), state, SUM(amount)

T-sorting Stream Input

Amazon Kinesis Data Analytics real-time analytics use the fact that arriving data is ordered by ROWTIME.

However, sometimes data arriving from multiple sources may not be time-synchronized.

While Amazon Kinesis Data Analytics can sort data from individual data sources that have been independently inserted into Amazon Kinesis Data Analytics application's native stream, in some cases such data may have already combined from multiple sources (such as for efficient consumption at an earlier stage in processing). At other times, high volume data sources could make direct insertion impossible.

In addition, an unreliable data source could block progress by forcing Amazon Kinesis Data Analytics application to wait indefinitely, unable to proceed until all connected data sources deliver. In this case, data from this source could be unsynchronized.

You can use the ORDER BY clause to resolve these issues. Amazon Kinesis Data Analytics uses a sliding time-based window of incoming rows to reorder those rows by ROWTIME.

ROWTIME

Syntax

You specify the time-based parameter for sorting and the time-based window in which the streaming rows are to be time-sorted, using the following syntax:

 ORDER BY <timestamp_expr> WITHIN <interval_literal>

Restrictions

The T-sort has the following restrictions:

• The datatype of the ORDER BY expression must be timestamp.

• The partially-ordered expression <timestamp_expr> must be present in the select list of the query with the alias ROWTIME.

• The leading expression of the ORDER BY clause must not contain the ROWTIME function and must not use the DESC keyword.

• The ROWTIME column needs to be fully qualified. For example:

• ORDER BY FLOOR(ROWTIME TO MINUTE), ... fails.

• ORDER BY FLOOR(s.ROWTIME TO MINUTE), ... works.

If any of these requirements are not met, the statement will fail with errors.

Additional notes:

• You cannot use incoming rowtimebounds. These are ignored by the system.

• If <timestamp_expr> evaluates to NULL, the corresponding row is discarded.

ROWTIME

ROWTIME is an operator and system column that returns the time at which a particular row of a stream was created.

It is used in four distinct ways:

• As an operator

• As a system column of a stream

• As a column alias, to override the timestamp on the current row

• As an ordinary column in a table

For more details, see the topics Timestamp, ROWTIME, and CURRENT_ROW_TIMESTAMP (p. 150) in this guide.

ROWTIME operator

When used in the SELECT clause of a streaming query, without being qualified by a preceding 'alias.' , ROWTIME is an operator that evaluates to the timestamp of the row that is just about to be generated.

Its type is always TIMESTAMP NOT NULL.

ROWTIME system column

ROWTIME

Every stream has a ROWTIME column. To reference this column from within a query, qualify it with the stream name (or alias). For example, the following join query returns three timestamp columns: the system columns of its input streams, and the timestamp of the generated row.

SELECT STREAM

 o.ROWTIME AS leftRowtime,  s.ROWTIME AS rightRowtime,  ROWTIME AS joinRowtime FROM Orders AS o

 JOIN Shipments OVER (RANGE INTERVAL '1' HOUR FOLLOWING) AS s  ON o.orderId = s.orderId

leftRowtime         rightRowtime        joinRowtime

=================== =================== ===================

2008-02-20 10:15:00 2008-02-20 10:30:00 2008-02-20 10:15:00 2008-02-20 10:25:00 2008-02-20 11:15:00 2008-02-20 10:25:00 2008-02-20 10:25:30 2008-02-20 11:05:00 2008-02-20 10:25:30

As it happens, leftRowtime is always equal to joinRowtime, because the join is specified such that the output row timestamp is always equal to the ROWTIME column from the Orders stream.

It follows that every streaming query has a ROWTIME column. However, the ROWTIME column is not returned from a top-level JDBC query unless you explicitly include it in the SELECT clause. For example:

CREATE STREAM Orders(

 "orderId" INTEGER NOT NULL,  "custId" INTEGER NOT NULL);

SELECT columnName

SELECT STREAM ROWTIME, * FROM Orders;

This is mainly to ensure compatibility with JDBC: the stream Orders declares two columns, so it makes sense that "SELECT STREAM *" should return two columns.

在文檔中 Amazon Kinesis Data Analytics (頁 79-83)

相關文件