The SQL SELECT Statement - TIBCO ActiveSpaces®

A SQL SELECT statement is used to query data in the data grid. The table to query is determined by parsing the SELECT string when creating the Statement object. The rows that satisfy the query are returned in a ResultSet. A WHERE clause can be used in the SELECT statement to control which rows of a table must be used in the query result. An ORDER BY clause can be appended after the WHERE clause to sort the resulting rows of the query. A LIMIT clause can be appended as the last clause of the SELECT string to control the number of rows ultimately returned by the query.

SQL keywords, table, and column identifiers are not treated as case sensitive when used in a SQL SELECT statement. However, string values are treated as case sensitive and must be surrounded by single quotes.

Unsupported SQL Features

TIBCO ActiveSpaces^® does not support the following SQL features:

● GLOB operator

● UNIQUE

● EXISTS

The SELECT syntax supported has the following format:

SELECT <select list> <FROM clause> [<WHERE clause>] [<GROUP BY clause>] [<ORDER BY clause>] [<LIMIT clause>]

Notice that <FROM clause> is not optional. Use of a SELECT statement with ActiveSpaces is always intended to query data in the data grid and the table to query must always be specified by including

<FROM clause>.

The order of rows returned for a query is non-deterministic unless an ORDER BY clause is included in the SELECT statement.

The FROM Clause

The FROM clause specifies the table to query. When using ActiveSpaces, a FROM clause is required in a SELECT statement. The FROM clause syntax supported by ActiveSpaces is the following:

FROM <table name> [ [AS] <correlation name> ]

A correlation name is an identifier that is associated with the specified table and can be used in place of the table name anywhere within the SELECT statement. The following code snippet serves as an example:

SELECT t.* FROM mytable AS t WHERE t.col1 = 100

The Select List

The select list specifies the columns of the rows in the query result. These columns are not necessarily columns from a table, but may be columns whose value is derived by applying a function or equation to the rows of the table selected for the query.

ActiveSpaces uses the following select list syntax:

[<identifier>.]<asterisk> | <value expression> [ [AS] <label> ]

Asterisk

An asterisk, '*', used in the select list refers to all columns of the table that is specified in the FROM clause. Each column of the table is included exactly once in each row of the query results.

To select all the columns of a table or correlation, use ".*' (dot asterisk) as the suffix as shown in the following example:

SELECT mytable.* FROM mytable

When an asterisk is used in the select list, the names of the columns as defined for the table are used as the label when accessing the columns in the rows of the query result.

Value Expression

A value expression specifies the value returned by a particular column for each row of the query results.

A value expression can consist of the following items:

TIBCO ActiveSpaces^® Concepts

● Column Identifiers

● Functions

● Literals (For example, 1234, 'somestring')

● Expressions (For example, x+y, x AND y)

A value expression can be given a label to use when accessing the data for each column in a row of the query results as shown in the following example:

SELECT col1 AS c1 FROM mytable

If a value expression is not given a label, the entire string used to specify the value expression is used as the label as shown in the following example:

SELECT col1, col2 + col3, ‘row totals’ FROM mytable

To access the columns in the rows of the query result for the SELECT statement above, the following labels would be used:

● 'col1'

● 'col2 + col3'

● 'row totals'

For more information about expressions, see SQL Expressions.

Restrictions on Using a Value Expression

A value expression can be composed of any of the items described in Value Expression, but with the following restrictions:

● Functions must be supported by ActiveSpaces.

● A value expression cannot contain an aggregate function.

● A select list that contains an aggregate function cannot also contain a value expression.

● A value expression cannot contain parameters.

● A value expression cannot contain SELECT statements.

● Nested expressions are limited to a depth of 100.

For information on the limitations of using value expressions that are aggregate functions, see the section on "Aggregate Functions" in Functions Used in The Select List .

CASE Expressions used in the Select List

You can use CASE expressions in the select list of SELECT statements.

For more details on CASE Expressions, see CASE Expressions.

Using a CASE expression in the select list of a SELECT statement can either help categorize results based on their values or help apply different calculations to a result based on its value. For example, the following query can be used to find out the number of employees that are in 'in-state' versus 'out-of-state':

SELECT empid, CASE state WHEN ‘IL’ THEN ‘in-state’ ELSE ‘out-of-state’ END AS location FROM employee

Case Expression Restrictions

In addition to the restrictions on value expressions listed in Restrictions on Using a Value Expression, the use of a CASE expression in a select list has the following additional restrictions:

● CASE expressions cannot contain functions that use the DISTINCT keyword.

● All resulting values of the CASE expression must be of the same data type. The resulting data types are not evaluated until the query is executed. Therefore, it is possible that a bad SELECT statement can be created, which later causes an exception to be returned after executing the query.

● CASE expressions that contain subexpressions are parsed, but do not return an error.

Additionally, an internal limit is applied to nested CASE expressions to prevent the exhaustion of resources. This limit is different from the expression depth limit and can vary based on the composition of the query. The limit is applied during the parsing of the nested CASE expression. When the limit is reached, the following error is returned:

Error Code = Resource limit reached Description = SQL parser stack overflow

Functions Used in The Select List

TIBCO ActiveSpaces supports the use of aggregate functions, date and time functions, and string functions as a value expression or as part of a value expression.

When used as the value expression, the result of the function is used as the value for columns in the select list.

Date and Time Functions

Date and time function usage is supported for value expressions as shown in the following example:

SELECT date(dtmcol) FROM mytable

For more information on date and time functions, see Date and Time Functions.

When using a timevalue of ^now with a date and time function in a value expression, ^now is converted to the current date and time by each node of the data grid as the query is executed and rows are found to use in the query results. Carefully analyze your use of ^now to ensure the query returns the results you expect. The same holds true for using the SQL variables CURRENT_DATE CURRENT_TIME and CURRENT_TIMESTAMP in a value expression.

String Functions

String functions can be used in value expressions. For example, to return the ^idcolumn's value and the uppercase value of the ^lastnamecolumn from all rows of a table you can use the following query:

SELECT id, UPPER(lastname) FROM mytable

For more information on the string functions supported by ActiveSpaces, see SQL String Functions.

Aggregate Functions

Aggregate functions as value expressions use the column values from multiple rows to calculate a single value. Aggregate functions are applied to column values of the rows selected to be used for the query results.

Depending on the syntax used for the SELECT statement, these rows can be the result of the GROUP BY clause or the WHERE clause. Rows from the GROUP BY clause take precedence over rows from the WHERE clause. If both the GROUP BY and WHERE clauses are omitted, values from all rows of the table are used for the aggregation.

Aggregate functions ignore NULL values. If all the values are NULL or cannot be converted to the appropriate data type for the function, the result of the function is NULL. The aggregate functions supported are listed in the following table:

TIBCO ActiveSpaces^® Concepts

Function Description

AVG(<column>) Computes the arithmetic mean of the non-NULL values in the column. The type of the result is always a Double, regardless of the data type of the underlying column. If the data type of the underlying column is String, the value is converted to a Double. If the data type of the underlying column is Opaque, the data is treated as a String and then converted to a Double.

COUNT(<column>) COUNT(*) Computes the number of non-NULL values in the column, or the number of rows. The result is always of type Long.

COUNT() is not supported.

MIN(<column>) The minimum non-NULL value in the column. The type of the result matches the type of the underlying column.

MAX(<column>) The maximum non-NULL value in the column. The type of the result matches the type of the underlying column.

SUM(<column>) Computes the sum of the non-NULL values in the column.

The type of the result is Long if the data type of the

underlying column is Long, otherwise it is Double. If the data type of the underlying column is String, the value is

converted to a Double. If the data type of the underlying column is Opaque, the data is treated as a String and then converted to a Double.

GROUP_CONCAT( DISTINCT

<column>) Concatenates the unique values for a column in the group into a comma separated list. The DISTINCT keyword is required. The order of the values in the list is not defined.

The type of the result is always a String, irrespective of the data type of the underlying column. If the data type of the underlying column is Long or Double, the value is converted to a String. If the data type of the underlying column is Opaque, the data is treated as String.

For more information on the aggregate functions supported by ActiveSpaces, see Aggregate Functions.

The WHERE Clause

A WHERE clause is used to determine whether a row of a table must be used when composing the results of a query.

When applied to the row of a table, the WHERE clause must result in a boolean value indicating whether or not a row must be used when calculating the results of a query. The WHERE clause syntax supported by ActiveSpaces is the following:

WHERE [NOT] <predicate> [ AND | OR [NOT] <predicate> ] . . .

A predicate is a condition expression that evaluates to true or false. If the predicate evaluates to true for a row, that row is used when calculating the results of the query. For example:

city='Chicago' percent<=75.0

The AND and OR operators are used to filter rows based on more than one condition. For example:

city='Chicago' AND lastname='Dailey'

The NOT operator includes a row if the condition is not true. For example:

NOT city='Chicago'

When you create an SQL statement, the efficiency of the SQL statement depends on how well you construct a WHERE clause. For example, a WHERE clause must be used to prevent a full table scan from being done. For more information about constructing an efficient WHERE clause, see Tips on Constructing an Efficient WHERE Clause.

Tips on Constructing an Efficient WHERE Clause

When you create a SQL statement, it is important to construct an efficient WHERE clause.

Indexes and WHERE Clauses

The performance of a SQL statement that includes a WHERE clause depends partly on the way you construct your WHERE clause and partly on the definition of table indexes by the data grid

administrator. In general, you want to construct a WHERE clause that results in the primary key or a secondary index to be selected for evaluating the WHERE clause predicates against a subset of the rows of a table.

Omitting the WHERE clause or using a WHERE clause that results in no key or index being selected might end up in a full table scan. Full table scans iterate over every row in a table to see if the row must be used by the SQL statement or whether they must be avoided as they are very inefficient.

For example, if the primary key for a table is defined on the empid column, the following query can directly access one row of the table:

SELECT * FROM mytable WHERE empid=’ID-1234’

If there are no secondary indexes defined for the table, the following query performs a full table scan:

SELECT * FROM mytable WHERE lastname LIKE ‘B%’

However, if there is a secondary index defined on the ^lastname column, then the query above would reduce the number of rows scanned to only those rows that begin with a 'B' or 'b'.

Programmers: consult your data grid administrator for information about the definition of indexes.

Programmers and administrators can use the rules of thumb in the following sections to help promote efficiency and high performance of queries.

Primary Key and Secondary Indexes

Rule of Thumb: Construct WHERE clauses in which every predicate refers to columns of the primary key or a secondary index.

A WHERE Clause is composed of one or more predicates. A predicate is a condition expression that evaluates to either true or false. Each predicate must be composed of columns from the primary key or a secondary index. This allows the primary key or a secondary index to be selected to find the rows of the table to use for evaluating the predicate. If a primary key or secondary index cannot be selected for even one predicate in the WHERE clause, a full table scan is performed even if the primary key or a secondary index is selected for all of the other predicates in the WHERE clause.

Left-Most Columns

Rule of Thumb: Construct WHERE clauses that reference to the left-most columns of a primary key or secondary index using the operators =, ==, <=, >=, <, >, IN, IS, or BETWEEN.

Not all columns of the primary key or secondary indexes need to be referenced by a WHERE clause predicate. When a primary key or secondary index includes more than one column, the administrator has defined them from left to right. For the primary key or a secondary index to be considered for use in evaluating a predicate, at least the left-most column of the key or index must be referenced by the predicate. The primary key or a secondary index is not selected, if the WHERE clause skips the left-most column but refers only to columns defined further to the right.

TIBCO ActiveSpaces^® Concepts

Similarly WHERE clauses that refer to the left-most two columns can be even more efficient. Queries can achieve maximum efficiency when they use WHERE clauses that refer to all of the columns of the primary key or a secondary index.

When a WHERE clause does not contain the left-most column of the primary key or any of the

secondary indexes, a full table scan results. A query that does a full table scan is least efficient and must be avoided.

The order in which columns appear in the WHERE clause does not affect query efficiency. Only the order of columns when defining the index matters.

Avoid Left-Most NOT

Rule of Thumb: Do not construct WHERE clauses that reference the left-most columns of the primary key or secondary index using the operators NOT, IS NOT, !=, <>, ISNULL, IS NULL, NOTNULL, NOT NULL, and IS NOT NULL.

In contrast to the rule of left-most columns, a WHERE clause that references the left-most columns with these operators have the opposite effect: to guarantee a full table scan, which is the least efficient.

This rule does not imply that operators in the NOT family are always inefficient. For example, a query can still be efficient if it obeys the left-most columns rule and also tests columns further to the right using NOT. For example, if the administrator defined an index on the columns ^lastnameand

firstname, then this WHERE clause can be efficient:

WHERE lastname='Smith' AND firstname IS NOT 'Dan'

Limit Range Queries From Both Ends

Rule of Thumb: When using the operators > or >=, which specify a lower limit on a column's value, also include the opposite operators, < or <=, to specify an upper limit on the same column.

A query searches an index from its lower limit to its upper limit. If you omit the upper limit, the query continues searching to the end of the index. If you omit the lower limit, the query begins with the first row of the index.

Operators Used in the WHERE Clause

A predicate can use the operators as described in Operators. For detailed information about using the LIKE operator, see LIKE Operator.

The following sections contain additional information about using the LIKE operator in the predicate of a WHERE clause.

Indexed Columns with LIKE Operator

If the left operand of the LIKE operator is the name of an indexed column of type string, ActiveSpaces converts the LIKE operator into a range query using >= and <. This enables the ActiveSpaces index selection algorithm to select the index and use it for scanning rows when processing the query. For example, take a look at the following statement:

SELECT * FROM mytable WHERE lastname LIKE ‘Long’

Internally this statement is converted to the following statement and the index on ^lastname can be used to perform a range scan.

SELECT * FROM mytable WHERE lastname >= ‘Long’ AND lastname < ‘long’

However, the index for ^lastname cannot be used if <character sequence pattern> is a ^long value or starts with a wildcard (%, _) or digit.

Pattern Matching with LIKE Operator

When using the LIKE operator in a WHERE clause, using a character sequence pattern that begins with a wild card results in a full table scan.

It is also important to be aware that you might end up scanning more rows than you would expect because of the way pattern matching works when matching is not case sensitive. For example, if you have an index defined for ^lastnameand you run the following query:

SELECT * FROM mytable WHERE lastname LIKE 'm%'

The index is used, but all the rows are scanned where the value of ^lastnamestart with "M" through "Z"

and "a" through "m".

Using Date and Time Functions in the WHERE Clause

Date and time information can be stored in the data grid as date and time strings, Julian Day long or double values, or as tibDateTime values.

Date and time functions can be used in the WHERE clause for querying these different types of columns as shown in the following example:

在文檔中 TIBCO ActiveSpaces® (頁 42-52)