TIBCO ActiveSpaces®

(1)

TIBCO ActiveSpaces®

Concepts

Version 4.6.1 February 2021

(2)

2 | Contents

2

TIBCO Documentation and Support Services

5

About This Product

7

Overview of TIBCO ActiveSpaces

9

Why ActiveSpaces? 9

What Is ActiveSpaces? 9

Benefits of ActiveSpaces 10

Attributes of ActiveSpaces 10

Redesigned from the Ground Up

13

Terminology Used to Address the TIBCO FTL Realm

14

Grid Computing in ActiveSpaces

15

What Is a Data Grid? 15

How Is the Data Stored in a Data Grid? 16

Replication 19

Processes in ActiveSpaces

20

The Workflow for a PUT Operation

23

Log Levels

24

(3)

3 | Contents

Gridsets 28

Types of Data Grids 29

Mirroring 29

Best Practices for a Development Environment

31

Best Practices for a Production Environment

33

Programming with ActiveSpaces

35

Structuring Programs 37

Task A: Initializing ActiveSpaces Objects 37

Task B: Performing Data Grid Operations 38

Task C: Cleaning up and Closing the Connection 38

Connection 39

Session 40

Table 41

Statement 45

SQL Identifiers 49

Column Data Types 49

tibDateTime 50

Querying a Data Grid Table 54

Table Iterator 54

Session Statement 55

Data Consistency for Queries 56

The SQL SELECT Statement 57

Modifying Data in a Table 74

The SQL INSERT Statement 74

The INSERT OR REPLACE Statement 78

SQL Expressions 78

Operators 79

LIKE Operator 80

Negation 81

Compound Expressions 82

(4)

4 | Contents

Order of Operations 82

CASE Expressions 83

SQL Functions 85

Aggregate Functions 85

Date and Time Functions 87

SQL String Functions 92

The ActiveSpaces JDBC Driver

95

Connecting to the Data Grid by Using ActiveSpaces JDBC Driver 95

Setting up the Environment 96

Registering the ActiveSpaces JDBC Driver with the Driver Manager 96

Creating the ActiveSpaces JDBC Connection 97

Using the ActiveSpaces JDBC Driver With Third Party Tools 100

JDBC Implementation Notes 101

JDBC Data Types 101

DatabaseMetaData Pattern Parameters 102

ResultSetMetaData and Function Return Values 102

JDBC Compliance 103

Sizing Guide

105

Example of a Sizing Calculation 105

Comparison Matrix

109

Error Codes

111

Legal and Third-Party Notices

115

(5)

5 | TIBCO Documentation and Support Services

TIBCO Documentation and Support Services

How to Access TIBCO Documentation

Documentation for TIBCO products is available on the TIBCO Product Documentation website, mainly in HTML and PDF formats.

The TIBCO Product Documentation website is updated frequently and is more current than any other documentation included with the product. To access the latest

documentation, visit https://docs.tibco.com.

Product-Specific Documentation

Documentation for TIBCO ActiveSpaces® is available on the TIBCO ActiveSpaces® Product Documentation page.

To directly access documentation for this product, double-click the following file:

TIBCO_HOME/release_notes/TIB_as_4.6.1_docinfo.html where TIBCO_HOME is the top- level directory in which TIBCO products are installed. On Windows, the default TIBCO_

HOME is C:\tibco. On UNIX systems, the default TIBCO_HOME is /opt/tibco.

The following documents for this product can be found in the TIBCO Documentation site:

^l TIBCO ActiveSpaces® Release Notes ^l TIBCO ActiveSpaces® Installation ^l TIBCO ActiveSpaces® Concepts ^l TIBCO ActiveSpaces® Administration ^l TIBCO ActiveSpaces® API Reference ^l TIBCO ActiveSpaces® Security Guidelines How to Contact TIBCO Support

You can contact TIBCO Support in the following ways:

^l For an overview of TIBCO Support, visit http://www.tibco.com/services/support.

^l For accessing the Support Knowledge Base and getting personalized content about products you are interested in, visit the TIBCO Support portal at

(6)

6 | TIBCO Documentation and Support Services

https://support.tibco.com.

^l For creating a Support case, you must have a valid maintenance or support contract with TIBCO. You also need a user name and password to log in to

https://support.tibco.com. If you do not have a user name, you can request one by clicking Register on the website.

How to Join TIBCO Community

TIBCO Community is the official channel for TIBCO customers, partners, and employee subject matter experts to share and access their collective experience. TIBCO Community offers access to Q&A forums, product wikis, and best practices. It also offers access to extensions, adapters, solution accelerators, and tools that extend and enable customers to gain full value from TIBCO products. In addition, users can submit and vote on feature requests from within the TIBCO Ideas Portal. For a free registration, go to

https://community.tibco.com.

(7)

7 | About This Product

About This Product

The TIBCO ActiveSpaces® software is a distributed in-memory data grid product. Some features of ActiveSpaces^® include use of familiar database concepts, high I/O capacity, and network scalability.

ActiveSpaces features a complete redesign and reimplementation of the product and is straightforward to understand, use, and administer.

Product Editions

ActiveSpaces is now available in two editions: Community Edition and Enterprise Edition.

ActiveSpaces - Community Edition

ActiveSpaces^® - Community Edition is ideal for getting started with ActiveSpaces for

implementing application projects, including proof of concept projects, for testing, and for deploying applications in a production environment. Although the community license limits the number of production instances, you can easily upgrade to the enterprise edition as your use of ActiveSpaces expands.

The community edition is available free of charge. It is a full installation of the

ActiveSpaces product. The limitation of using the community edition is that the users can run up to 25 nodes (a total of the copyset nodes or proxies in your data grid).

ActiveSpaces - Community Edition is compatible with both the enterprise and community editions of TIBCO FTL^®.

ActiveSpaces - Enterprise Edition

ActiveSpaces^® - Enterprise Edition is ideal for all application development projects, and for deploying and managing applications in the production environment of an enterprise. It includes all features presented in this documentation set, and you also have access to TIBCO Support. Choose the enterprise edition for production deployments with more than 25 nodes (a total of the copyset nodes or proxies in your data grid) and for enterprise monitoring using dashboards.

ActiveSpaces - Enterprise Edition depends on the enterprise edition of TIBCO FTL for monitoring and management of data grid components and secure communication.

(8)

8 | About This Product

“Node ” means for TIBCO ActiveSpaces a copyset node or proxy where each copyset node or proxy is an operating system process with a unique process ID. For the purposes of the definition of Node, “Process ID” means a standard computer industry term that uniquely identifies each operating system process. For the purposes of the definition of Node,

“Copyset” means a logical grouping of nodes such that a portion of the data is shared uniformly by all the nodes that form a copyset.

(9)

9 | Overview of TIBCO ActiveSpaces

Overview of TIBCO ActiveSpaces

TIBCO ActiveSpaces software is a distributed in-memory data grid product.

To lift the burden of big data, ActiveSpaces provides a distributed in-memory data grid that can increase processing speed to reduce reliance on costly transactional systems.

ActiveSpaces provides an infrastructure for building highly scalable, fault-tolerant applications. It creates virtual data caches from the aggregate memory of participating nodes, scaling automatically as nodes join the data grid. By combining the features and performance of databases, caching systems, and messaging software, it supports very large, highly volatile data sets and event-driven applications.

Why ActiveSpaces?

A traditional RDBMS fails to keep up with the growing volume of data and the high rate of I/O operations per second. This drawback of RDBMS can impact the performance and slow down the system.

TIBCO ActiveSpaces is ideal for enterprises that handle a large amount of data or have a high volume of I/O activities per second. ActiveSpaces provides horizontal scalability, where you have the flexibility to segregate the data across a group of computers. For example, if you have 25000 operations per second, they can be divided across 10

computers that handle 2500 operations per second. Or, if your enterprise needs to store 10 TB of data, it can be distributed across 5 computers that contribute 2 TB each to store the data.

What Is ActiveSpaces?

ActiveSpaces uses the concepts of grid computing to provide a scalable, distributed, and durable data grid. The data grid serves as a system of record to store terabytes of data in an enterprise. ActiveSpaces provides a fast, consistent, and fault-tolerant system that supports a high rate of I/O operations in a scalable manner.

For more information about the ActiveSpaces concepts, see the Grid Computing in ActiveSpaces section.

(10)

Benefits of ActiveSpaces

ActiveSpaces offers many advantages as compared with a traditional RDBMS.

The following list highlights the benefits of using ActiveSpaces:

^l Accelerates performance and customer experience

^l Requires minimum investment because the system scales on low cost commodity or virtualized hardware

^l Updates data and systems continually and provides immediate and accurate response

^l Supports many hardware and software platforms, so programs running on different kinds of computers in a network can communicate seamlessly

^l Scales linearly and transparently when nodes are added (An increase in the number of nodes produces a corresponding increase in the memory and processing power available to the data grid)

^l Enables smooth and continuous working of your application without code modification or restarts

^l Provides location transparency without the hassles of determining where or how to store data and search for it

^l Notifies applications automatically as soon as data is modified

Attributes of ActiveSpaces

These attributes of ActiveSpaces set it apart from traditional RDBMS.

Scalability

The biggest advantage of using ActiveSpaces is scalability. You can scale up the system horizontally to hold terabytes of data without bringing the system down. You also have complete administrative control over data redistribution.

System of Record

(11)

Faster Access to Data

ActiveSpaces support queries and indexes that improve performance. Queries run faster because data is cached in-memory. Queries in ActiveSpaces are a subset of the SQL language. The filtering and indexing capabilities offered by ActiveSpaces expedite the execution of queries.

TIBCO FTL® for Secure Communication

ActiveSpaces uses the capabilities of TIBCO FTL® 6.1.0 or later. A specific FTL realm contains configuration information and connectivity parameters for communication between the ActiveSpaces data grid processes and client applications. ActiveSpaces uses TIBCO FTL for the following key tasks:

^l Communication between application programs and the data grid.

^l Internal communication among data grid component processes ^l Configuration, monitoring, and management of data grid components

Note: With TIBCO FTL 6.1.0 or later, ActiveSpaces uses the realm service capabilities or processes of the TIBCO FTL server. In this documentation, the term "realm service" is used to refer to TIBCO FTL 5.x realm server or TIBCO FTL 6.x realm service.

For the versions of TIBCO FTL that are compatible with TIBCO ActiveSpaces, see the readme.txt file.

High-Performance ACID-Compliant Data Grid

The data grid provides atomicity, consistency, isolation, and durability (ACID) of data.

ACID-compliance is achieved by using transactions and concurrency control across multiple tables.

Transaction Isolation

A transaction comprises a set of operations that can modify the content of the data grid. Owing to transaction isolation, an ongoing transaction does not affect the queries that are being run on the content. For example, if another system is trying to read a row that you are trying to modify, that system either gets the data before the

modification or it gets the data after the modification. The system does not get partially committed transactions. Even if the transaction is distributed over a network that involves multiple rows and tables, transaction isolation ensures that there are no

(12)

uncommitted reads (dirty reads).

To achieve the highest level of transaction isolation, pessimistic transactions are used.

This guarantees that a row is consistently accessed by the operation that initially

accessed the row until the transaction commits or rolls back. This blocks any operation that can violate database consistency or isolation. Transactions take care of rolling back partially committed transactions.

Easy-to-Use APIs

ActiveSpaces provides tools for data definitions that are akin to the SQL language. You can also define how data is distributed across a configurable number of nodes. The support functions in the API are easy to use. You can use the functions to retrieve metadata information about the data grid, a specific table, or a result set.

Real-Time Push Events

ActiveSpaces provides real-time push events over the network to servers and client applications to change the data grid. Table listeners receive data change events through callback notifications.

Cloud Ready

It is easy to deploy TIBCO ActiveSpaces on cloud, on-premises, or hybrid environments.

You can easily build TIBCO ActiveSpaces into microservices with container deployment products such as Docker.

(13)

13 | Redesigned from the Ground Up

Redesigned from the Ground Up

Since the 3.0 version of ActiveSpaces, ActiveSpaces software is completely redesigned and reimplemented to make it more user-friendly for both end users and administrators.

ActiveSpaces 3.x is not backward compatible with the earlier versions of the product.

ActiveSpaces 3.x is faster because it relies on TIBCO FTL for the underlying communication.

ActiveSpaces 3.x and later use the terminology of a traditional RDBMS. See the Comparison Matrix.

(14)

14 | Terminology Used to Address the TIBCO FTL Realm

Terminology Used to Address the TIBCO FTL Realm

With TIBCO FTL 6.1 or later, ActiveSpaces uses the realm service capabilities or processes of the TIBCO FTL server. The following changes are made to the terminology to generically address the components of TIBCO FTL 5.x and TIBCO FTL 6.x:

The Term Used in the

Document

The Equivalent Component in TIBCO FTL 5.4.1

The Equivalent Component in TIBCO FTL 6.1 or Later

Realm service Realm server Realm service running on the TIBCO FTL server Realm service

URL

Realm server URL TIBCO FTL server URL

Backup realm service

Backup realm server TIBCO FTL server that is a member of a cluster of three or more TIBCO FTL servers

Primary Realm Primary Realm Server and its Backup Realm Server

A cluster of primary TIBCO FTL servers that provide realm services for the data grid.

Satellite Realm Satellite Realm Server and its Backup Realm Server

A cluster of satellite TIBCO FTL servers that are connected to a cluster of primary TIBCO FTL servers.

(15)

15 | Grid Computing in ActiveSpaces

Grid Computing in ActiveSpaces

ActiveSpaces uses grid computing to bring together computers in your network that can contribute their processing power, memory, and storage to solve a complex problem.

ActiveSpaces uses grid computing concepts to store and process the contents of a data grid.

What Is a Data Grid?

ActiveSpaces stores data in data grids. In a data grid, data is stored in the form of tables.

A data grid is equivalent to a database of a traditional RDBMS.

Tables

A table comprises multiple rows that are spread out in the data grid. The tables are similar to the tables in a traditional RDBMS, made of rows and columns. Unlike the traditional RDBMS where all the data in the table reside on one computer, a data grid segregates the table row-wise and stores the rows in different ActiveSpaces processes called nodes.

Rows

Like the traditional RDBMS, a row comprises a set of columns and is uniquely identified by the primary index. A row becomes the unit of measurement for the data grid. Rows are distributed across the data grid. When scaling up, the data grid controls where a newly added row must be stored.

Columns

A row is made of a collection of columns. Every column uniquely identifies a piece of information. Every column has a type and a value associated with it. For example, the Employee Name column is of data type String and has the value "Joe Smith".

Primary Index

Uniquely identifies a row in a table. It is equivalent to a primary key in a traditional RDBMS. You can have more than one column that forms a primary index.

(16)

Secondary Index

Is similar to a primary index but can refer to multiple rows in a table. A secondary index comprises one or more columns of a table and is used to efficiently retrieve the rows of a table by reducing the number of rows scanned for retrieval by queries. Without a secondary index, this would involve a full table scan to identify which rows match the query. With a secondary index, additional space is used to help speed up the query and quickly identify matching rows without a full-table scan.

Supported Data Types

A column can be of the following data types:

^l long

^l double

^l string

^l datetime

^l opaque

How Is the Data Stored in a Data Grid?

Unlike traditional RDBMS, a data grid is not stored in one place. An ActiveSpaces data grid leverages the storage capacity and computing power from multiple computers.

To understand how the data is stored, you must first familiarize yourself with the following concepts:

Nodes

A node is an ActiveSpaces process running within a computer. The node holds a portion of the data forming the data grid both in memory and on disk. The smallest unit of data held by a node is a row. Other than storing data of a row, the node is also responsible for handling requests to read or update the row. As a result, the data spanning across a group of nodes collectively form a data grid.

Nodes can be run from a physical computer, a virtual machine, or a Docker container.

(17)

Copysets

Copysets are logical grouping of nodes such that a portion of the data is shared

uniformly by all the nodes that form a copyset. This ensures fault tolerance. Every node in the copyset, also known as the replica, has an identical copy of the data. For

example, assume that a row (R1) comprises employee name, employee ID, and department. There are nodes, N1, N2, and N3 in copyset1. N1, N2, and N3 store

identical copies of R1. When you add new data or request for an update on a row in a copyset, the update is written to all the nodes in the copyset before acknowledging the success of the operation. Keeping the nodes of a copyset on different computers helps prevent data loss during system failures.

Copysets help you scale your data horizontally. When you add a new copyset to a data grid, you can redistribute the existing data to the new nodes of that copyset, thereby distributing the load on the data grid with the help of the newly added copyset.

The following image is a logical diagram showing how rows of three tables are distributed across two copysets in the data grid.

Rows Distributed Across Copysets

In the following image, the rows of a table are broken down into four sets (each owned by a different copyset). The nodes running in a given copyset are identical replicas of each other.

(18)

How One Table is Distributed in a Data Grid with Four Copysets and Three Nodes

To understand more about sizing a copyset and a data grid, see Sizing Guide.

Primary Node

When a copyset has more than one node in a copyset, one of the nodes is the primary node, which stores data and provides read access. The other nodes in the copyset are secondary nodes that store backup copies of the data. The key role of the primary node is to interact with the proxy process. The primary node receives the client operation and replicates it to the other nodes in the copyset. The client operation is applied in parallel at the primary node and all secondary nodes. The primary node is responsible for sharing the result of the request with the proxy.

If the primary node goes down for some reason, one of the other nodes in the copyset

(19)

reside on different machines to ensure that one machine failure does not cause data loss.

Reasons for Using Multiple Nodes

There are several reasons for using multiple nodes:

^l Nodes in different copysets are created with the goal of scaling horizontally.

Thus, multiple copysets are created, each with a slice of the data.

^l Nodes in the same copyset are created to provide multiple replicas for fault tolerance. These contain identical copies of the data.

^l In a production environment, you might decide to use multiple nodes for a combination of reasons. For example, you might choose to have two replicas per copyset and multiple copysets (say three) to scale horizontally. In this example, your environment would have a total of six nodes.

To sum it up, the data is stored in copysets as described in the previous sections. The copysets put together form a data grid.

Replication

To replicate data, you must configure the copysets in the data grid such that copyset_size

is greater than 1.

The copyset_size configuration setting applies to all copysets in the data grid. When the

copyset_size is greater than 1, one node in each copyset acts the primary node that stores data and provides read access to that data. The other nodes in each copyset are secondary nodes that store copies of the data on the primary node. Every time data is written to the primary node, data is synchronized at the primary node and all secondary nodes in the copyset.

When the primary node of a copyset is down, one of the secondary nodes in the copyset takes over as the primary node. As each secondary node of the copyset contains copies of the same data that resides on the primary node, no data loss occurs and data grid

operations continues as long as at least one node of the copyset remains running.

(20)

20 | Processes in ActiveSpaces

Processes in ActiveSpaces

The following processes are involved in creating, maintaining, and querying the data grid:

^l TIBCO ActiveSpaces Client Applications ^l Proxy

^l Realm Service ^l State Keeper ^l Node

TIBCO ActiveSpaces Client Applications

The client applications use the API libraries shipped with the product to build custom applications. Client applications interact with the data grid by using the proxy process.

Proxy

A proxy is a mediator between a client request and the data grid. Based on the client request, the proxy identifies the primary node in a copyset and interacts with the primary node till the request is processed and shared with the client. You can have many proxies in a data grid.

Realm Service

A data grid is run inside a TIBCO FTL realm. A TIBCO FTL realm serves as a repository for data grid configuration information and provides communication services that enable all data grid processes to communicate with each other.

A client application accesses the data grid by using the realm service URL. In TIBCO FTL 6.0.0 or later, the realm service URL is the URL of the TIBCO FTL server. The realm service offers the following capabilities:

^l Stores data grid definitions

^l Communicates with the administrative tools to store and retrieve data grid

(21)

Fault Tolerance in Realm Services Used in TIBCO 6.0.0 or Later

TIBCO FTL 6.1.0 or later uses a quorum-based fault tolerance mechanism. A cluster of at least three TIBCO FTL core servers must be run. Each core server provides a realm service. Those realm services all cooperate to provide fault tolerance for the data grid.

Fault tolerance is assured as long as a quorum of servers is always running. Each core server must be run on a separate machine. Clients receive a list of URLs at which they can connect to those TIBCO FTL core servers.

State Keeper

A state keeper runs internally in the data grid and tracks all the data in the data grid.

Each state keeper saves the data locally on the disk. When you start the realm service, the state keeper receives the data grid configuration information from the realm service. State keepers are responsible for the following functions:

^l Tracking and managing all the copysets in a data grid ^l Tracking the proxies in a data grid

^l Identifying a primary node in each copyset

^l Promoting one of the secondary nodes as primary, in case the primary node of a copyset goes down

^l Ensuring consistency as the data grid scales up Fault Tolerance in State Keepers

It is good practice to have three state keepers running in a production environment.

A set of fault-tolerant state keeper processes protects the data grid's run time state information and ensures nonstop access to it. One of the state keepers is designated the lead state keeper and supplies this information to the proxies and copyset

nodes. If the lead state keeper goes down, one of the secondary state keepers takes over as the lead. In a fault-tolerant set of three state keepers, a quorum of two state keepers must always be running to ensure data consistency in split brain scenarios.

If a state keeper is restarted while a quorum is running, one of the running state keepers updates the state of the restarted state keeper. If the number of running state keepers falls below the quorum and the state of a copyset changes (for example, a node goes down), operations on the data grid fails. When this happens, the remaining state keepers must be brought down and then all state keepers must be restarted.

(22)

Node

For more information on nodes, see the "Nodes" section in How Is the Data Stored in a Data Grid?.

Fault Tolerance in Nodes

To prevent data loss, you can run up to three nodes per copyset. For production deployments, TIBCO recommends using at least two nodes per copyset.

(23)

23 | The Workflow for a PUT Operation

The Workflow for a PUT Operation

A client application initiates a PUT request. The request reaches the proxy. Like all the ActiveSpaces processes, the proxy identifies the data grid by the data grid name and the realm service URL (In TIBCO FTL 6.0.0 or later, the realm service URL is the URL of the TIBCO FTL server). The proxy forwards the request to the appropriate primary node. The primary node handles the processing of the data. After all the secondary nodes are updated with the changes, the result is returned to the proxy and the proxy then shares the result with the client application. The realm service and the state keeper run outside of the operation datapath.

The Workflow

(24)

24 | Log Levels

Log Levels

The log level determines the level of detail and the quantity of log statements. Typically, log levels must not be adjusted because producing excess log output can affect

performance. However, there are situations such as debugging an issue where different log levels must be configured.

ActiveSpaces uses the logging mechanism provided by TIBCO FTL. For more information, see "Log Levels" in TIBCO FTL® Development.

The tibdg client library as well as the tibdgkeeper, tibdgproxy, and tibdgnode process can all be configured with nondefault log levels. The client library has an API used to set the log level. The data grid processes can be configured by using the -t command-line parameter. The log levels are set by using one of the following forms:

element:level

or

element:level;element:level;element:level

Often, additional debug log statements can be gathered by using tibdg:debug as the log level. The output of this command shows more log statements than the default log level (tibdg:info). The specific syntax when used with the tibdgproxy data grid process would be:

tibdgproxy -r http://realm_url:port -t tibdg:debug -n p_01

Other log levels or elements might be requested to be set when investigating specific issues as needed.

You can use the logs to trace client API calls on a thread basis. To trace the calls, use the client API to set the log level to tibdgapi:debug3. This triggers the client library to produce log statements for calls to API functions.

(25)

25 | Transaction Isolation

Transaction Isolation

ActiveSpaces enforces the highest level of transaction isolation: serializable. As a result, serialization can delay database operations as transactions wait for other transactions to commit or roll back.

ActiveSpaces uses a pessimistic transaction model: blocking any operations that can violate database consistency or isolation. For example, when an operation in transaction A refers to table row R, and an operation in a second transaction B also refers to row R, then the second operation blocks until transaction A either commits or rolls back.

Similarly, an operation within a transaction can block operations in non-transacted sessions.

(26)

26 | Checkpoints

Checkpoints

TIBCO ActiveSpaces checkpoints provide the ability to save the state and data of a running data grid. A checkpoint can then be used to restore a complete data grid on the same computer, to move the entire data grid, or to replicate a data grid to another data grid for disaster recovery.

The data collected by a checkpoint is guaranteed to be logically consistent across the entire data grid. A checkpoint does not contain the data from a partially committed transaction.

On creation, ActiveSpaces checkpoint performs the following activities:

^l The realm database is saved.

^l The configuration of the data grid in the realm is saved.

^l Each state keeper's internal governing state information is saved.

^l The relevant files needed to restore each node of a data grid are saved in the checkpoints subdirectory of each node's data directory.

Creating a checkpoint fails in the following scenarios:

^l A realm is not reachable.

^l A quorum of state keepers is not running.

Checkpoint Types

A manual checkpoint is created manually by using the ^tibdg administrative tool and a periodic checkpoint is created automatically by configuring the data grid to create periodic checkpoints.

Manual Checkpoints

(27)

27 | Checkpoints

Periodic Checkpoints

^l Configured at the data grid level.

^l Taken at a fixed interval while the data grid is running.

^l Cannot be given a name.

Both manual and periodic checkpoints can be manually removed, and are subject to removal based on the retention setting.

(28)

28 | Disaster Recovery

Disaster Recovery

Disaster Recovery is a situation where a set of running systems must be replaced by another set of running systems due to failure, damage, loss of connectivity, or other traumatic event. Disaster Recovery is a large scale event and is not intended to replace fault tolerance where the failure of individual components can be recovered or otherwise accommodated without stopping a running system.

In a disaster recovery scenario, running systems are not expected to seamlessly or automatically failover to backup or alternative systems. Recovering from a disaster

scenario implies a substantial and potentially large scale system stoppage and a restart of an entirely new instance of the previously running system. It is not intended for short term outages such as normal maintenance operations.

Note: The replacement systems activated during disaster recovery are designed to remain in operation for days, weeks, or even months depending on the severity of the disaster.

ActiveSpaces supports disaster recovery by creating a gridset.

Gridsets

The purpose of gridsets is to help set up the disaster recovery process. A group of data grids that share the same set of consistent data is referred to as a gridset. Each gridset has a name, which exists in the same namespace as data grid names (For example, you cannot have a data grid named “prod” and a gridset named “prod”). Each gridset also has a single primary data grid. Within a gridset, there is a single authoritative schema, which is owned by the primary data grid of the gridset.

Each data grid might belong to at most one gridset. Data grids in a gridset do not need to have the same replication factor, number of copysets, or number of state keepers, but

(29)

Types of Data Grids

The following list differentiates the types of data grids in ActiveSpaces:

Stand-Alone Data Grids

Any data grid that does not belong to a gridset is a stand-alone data grid. All

operations included in the ActiveSpaces API are permitted on stand-alone data grids.

Primary Data Grids

Any data grid that is listed as the primary of a gridset is a primary grid. All operations included in the ActiveSpaces API are permitted on primary grids. Primary grids are responsible for supplying mirror grids with data on request.

Mirror Data Grids

Any data grid that is included in a gridset, but is not currently the primary of that gridset is a mirror grid. Only read operations are allowed on mirror grids (For example, GET, queries, iterators, and so on). Read operations are executed against the most recent checkpoint that has been mirrored from the primary grid. Mirror grids are responsible for requesting updates from the primary grid.

Mirroring

The process by which data is copied from one data grid to another is called mirroring.

Data mirrored between data grids is a logical copy of the user data available on the primary grid and is copied to the mirror grid only if a checkpoint has been taken at the primary grid (either a user-created checkpoint or a periodic checkpoint causes mirroring).

Until all copysets in the primary grid have confirmed that their data has been mirrored, data in the checkpoint being mirrored is not available.

Bulk Mirroring

If a mirror grid has no previous checkpoints available or if the primary grid has insufficient information to identify the rows that changed since the last mirrored checkpoint, bulk mirroring is used. During bulk mirroring, all rows present in the checkpoint being mirrored are sent to the mirror grid.

Incremental Mirroring

ActiveSpaces attempts to minimize the data sent between grids whenever possible by using incremental mirroring. When a mirror grid has a previous checkpoint as the

(30)

starting point, and the primary grid has sufficient information to identify all rows that changed, only rows that were updated or deleted are sent to the mirror grid.

(31)

31 | Best Practices for a Development Environment

Best Practices for a Development Environment

In many enterprises, programmers act as administrators during the development and test phases of a project.

To develop and test application programs that use ActiveSpaces software, deploy the following processes:

Processes Numbers

Realm service One State keeper One

Node One

Proxy One

Your application programs Your application programs appropriate

In a development environment, you can run all of these processes on the same host computer.

Sample Scripts

Refer to the TIBCO_HOME/as/<version>/samples/readme.md before using the sample scripts.

The following scripts are available:

TIBCO_HOME/as/<version>/samples/scripts/as-start defines a simple data grid and starts its component processes.

TIBCO_HOME/as/<version>/samples/scripts/as-stop stops those component processes.

Sample Docker Environment

The docker-compose sample environment is provided to demonstrate how to deploy an ActiveSpaces data grid in Docker. For more information, see TIBCO_

HOME/as/<version>/samples/docker/README.md.

(32)

32 | Best Practices for a Development Environment

Note: The installation environment of ActiveSpaces is referenced as TIBCO_

HOME. For example, on Microsoft Windows, TIBCO_HOME might be C:\tibco. When you are ready use ActiveSpaces to scale your data beyond one computer, you can create additional copysets and nodes in the data grid and run the nodes on separate computers.

(33)

33 | Best Practices for a Production Environment

Best Practices for a Production Environment

To use ActiveSpaces software in a production environment, deploy the following processes.

Processes Minimum Required

Description

Realm service

Three (You need at least three processes to run a full quorum.)

TIBCO recommends that you run a fault-tolerant set of realm services. Run a quorum of realm services. Each realm service must be run on separate host computers.

State keeper

Three state keeper processes

To ensure high availability during a network partition or hardware failure, each state keeper process must run on a separate host computer. Not doing so might result in grid-wide data loss. At any given time, you must maintain a quorum of running state keepers. To run more than one state keeper, configure three state keepers and ensure you have at least two running state keepers.

Node Two nodes per copyset

For greater data protection you can run three nodes per copyset.

In a fault tolerant setup, if there are more than one node, one node acts as a primary node and the other nodes are secondary nodes.

Note: Additional copies can become expensive in two ways: Increasing the node count by one adds one complete copy of all the data.

Every node process must run on a separate host computer.

Usually this requirement determines the number of host computers you must maintain. For example, a data grid with three copysets and two nodes per copyset requires six nodes, all on separate hosts. Increasing to three nodes per copyset would require nine nodes, all on separate hosts.

(34)

34 | Best Practices for a Production Environment

Processes Minimum Required

Description

Proxy One proxy process

You can run additional proxies to increase the capacity for client programs and to improve response time. For best results, run proxy processes on a separate host computer.

Your application programs

Run as many processes as appropriate.

Components Sharing a Host Computer

You can reduce number of host computers in a production environment by running more than one component per host.

For example, you can run a realm service, a state keeper, a node, and a proxy, all on one host. (In contrast, do not run two state keepers on the same host.) For effective fault tolerance, run the nodes of each copyset on separate host computers.

Warning: Combining component processes on a host computer increases the risk that a single point of failure on the host can disrupt all those processes simultaneously. Assess the risk tolerance of your enterprise.

Best Practices for Cloud Environments

For cloud environments, TIBCO recommends using a persistent, local Solid-State Drive (SSD) type that provides consistent performance and does not artificially throttle the Input Output Operations per Second (IOPS). An example of throttling IOPS is the burst throttling done by gp2 SSD types on AWS. For more information on different EBS volume types provided by Amazon, look for "Amazon EBS Volume Types" on

https://docs.aws.amazon.com. For information on monitoring the performance of your EBS volume, look for "EBS Performance - I/O Characteristics and Monitoring" on https://docs.aws.amazon.com.

(35)

35 | Programming with ActiveSpaces

Programming with ActiveSpaces

These concepts and definitions pave the way to a more detailed understanding of applications programming with ActiveSpaces software.

Data Grid

A distributed database, including all the component processes that implement it.

Connection

An application program connects to a data grid. A Connection object is analogous to a traditional database connection.

Session

An application program interacts with a data grid through one or more Session objects.

Each session insulates the data grid interactions within one program thread from the interactions in other threads.

A session can be transacted or non-transacted. GET, PUT, and DELETE operations in a transacted session occur within a transaction, and do not take effect until the program explicitly commits the transaction.

A session can be used to define the tables and indexes of the data grid by using SQL Data Definition Language (DDL) statements such as CREATE TABLE, DROP TABLE, CREATE INDEX, and DROP INDEX.

Table

An ActiveSpaces data grid organizes and presents data as rows in tables, like a traditional relational database.

Administrators define tables within the data grid.

Programs can GET a row from a table, PUT a row into a table, and DELETE a row from a table.

Programs can query a table for the rows that match a filter.

Primary Key

Each table distinguishes a primary key, or more briefly, the key.

Values of the key are unique: no two rows in a table have the same key value.

(36)

Secondary Index

A table can have zero or more secondary indexes, which facilitate queries. The tibdg

tool can be used to create a secondary index on a table by using the index create

command.

Iterator

An iterator is associated with a single table. An application can use an iterator to perform queries on a table. An iterator always returns entire rows from a table for its results.

Statement

A Statement is used to execute SQL statements. A Statement is not tied to a particular table. The table to act on is obtained from the SQL string used to create the Statement

object. Statements can be used to execute SQL Data Manipulation Language (DML) statements. SQL DML statements include SELECT, INSERT, and so on.

ResultSet

A ResultSet contains the results of a SQL SELECT statement executed by using a

Statement object. A ResultSet is used to iterate over the rows that satisfy the conditions of the SELECT statement. The columns of the rows in ResultSet are determined by the select list specified in the SQL SELECT statement.

Metadata

Metadata contains descriptive information about the data grid. There are two types of metadata: GridMetadata and ResultSetMetadata.

GridMetadata is retrieved from a Connection object. GridMetadata can be used to programmatically retrieve ActiveSpaces version information, the data grid name, and information about the tables that have been defined in the data grid. The table information includes the names of the tables that have been defined and also the information about the columns and indexes defined for each table.

ResultSetMetadata is retrieved from a ResultSet object. ResultSetMetadata can be used to find information about the columns in each row of a ResultSet. This column information includes the number of columns in a row, the labels and data types of the

(37)

Structuring Programs

These steps outline the main structural components of most application programs that access an ActiveSpaces data grid. The steps assume that a table has already been

configured for the data grid. When updating and querying data in ActiveSpaces, you can use Table objects or Statement objects or both in your application. Table objects provide a key/value interface to the data grid and Statement objects provide a SQL interface to the data grid.

An Overview of the Tasks

The following procedure summarizes the tasks ActiveSpaces application programs perform.

Procedure

1. Initialize the ActiveSpaces objects as listed in Task A: Initializing ActiveSpaces Objects.

2. On a specific table in the data grid, perform the appropriate operations as listed in Task B: Performing Data Grid Operations .

3. After you are done, clean up the objects as listed in Task C: Cleaning up and Closing the Connection.

Task A: Initializing ActiveSpaces Objects

Procedure

1. Initialize the ActiveSpaces library, if required.

^l C API - call tibdg_Open()

^l Java API - not required ^l Go API - not required

2. Connect to a data grid. For details, see Connection.

3. Create Session objects.

See Session.

4. Create objects that stay open for the duration of the Connection.

(38)

a. Open Table objects to execute key/value ops on the table. See Table.

b. Create TableListener objects to monitor events corresponding to changes in the table. See Table Listener.

c. Create Statement objects to support executing the same SQL more than once (commonly known as a prepared statement). See Statement.

Task B: Performing Data Grid Operations

Procedure

1. Access the data grid by using the appropriate APIs.

a. Use key/value or iterator methods of a table object. See the "Table Operations" section in Table.

b. Query the data grid by executing queries from SQL SELECT statements. See Statement.

c. Modify the data grid by executing updates from SQL DML statements. See Statement.

2. Close any object created when accessing the data grid.

a. Destroy all Row objects.

b. Close all Iterator objects.

c. Close all ResultSet objects.

Task C: Cleaning up and Closing the Connection

Procedure

1. Close all Statement objects.

2. Close all objects.

(39)

5. Close the data grid Connection object.

Connection

Programs begin their interactions with an ActiveSpaces data grid by first creating a

Connection object. The Connection object can then be used to retrieve grid metadata or to create Session objects.

From the Connection object, a program can create one or many Session objects.

Grid Metadata

Grid metadata is retrieved from a Connection object by calling the get grid metadata API.

Each time the GridMetadata is retrieved, the information returned reflects the current table information in the data grid.

A program must destroy the GridMetadata object after it has finished using it and before making any subsequent calls to retrieve updated GridMetadata.

To learn more about grid metadata, see the section on "Metadata" in Programming with ActiveSpaces.

Table Metadata

Table information is retrieved from the GridMetadata object as a TableMetadata object. A

TableMetadata object is retrieved by using the table's name.

If the application program does not know the names of the tables that have been defined in the data grid, the GridMetadata object provides a method to get an array of all table names that have been defined. This array can then be used to get a single TableMetadata

object.

A column name or index name is used to get information about a particular column or index from a TableMetadata object. If the application program does not have the names of the columns or indexes of a table, the TableMetadata object provides methods to get an array of the column names or index names.

A separate method to get the name of the primary index is provided by the TableMetadata

object. The name of the primary index is then used to retrieve information about the columns of the primary index. The columns of the primary index make up the primary key of the table.

(40)

The TableMetadata object and strings retrieved from it do not have to be destroyed as these objects are owned by the GridMetadata object and are destroyed when the grid metadata is destroyed.

Session

Programs use sessions to insulate data grid operations within program threads and to group operations into atomic transactions.

Sessions and Threads

It is good practice to create a separate session for each program thread that accesses the data grid.

Programs must use sessions in a thread-safe way. That is, two threads must not

simultaneously access the same session. Violating this constraint can yield unpredictable results.

Sessions and Transactions

Each session can be either transacted or non-transacted. Programs determine this semantic property when creating each session.

In a transacted session, all GET, PUT, UPDATE and DELETE operations occur within a transaction, which is bound to the session. The session implicitly starts the transaction.

Programs explicitly call the session's commit and rollback methods. (As these methods complete, they automatically start a new transaction in the session.)

If a program operates within several open transactions simultaneously, use a separate session and thread for each transaction.

In a nontransacted session, GET, PUT, UPDATE, and DELETE operations are immediate:

that is, when the method completes, the effect of the operation is also complete.

However, operations in a transacted session can block operations in a non-transacted session. For further explanation, see Transaction Isolation.

Only GET, PUT, UPDATE and DELETE operations are affected by a transacted session, the

(41)

Sessions and Defining Tables

After a session has been created, it can be used to define tables programmatically. For more information, see "Defining a Table by Using SQL DDL Commands" in the TIBCO ActiveSpaces® Administration guide.

Table

Table objects represent data grid tables within an application program.

Tables and Sessions

A program opens a table object by calling a session's open table method. The program can use the table object's methods to operate on the corresponding table in the data grid.

Opening a table object does not lock the table in the data grid.

If the session is transacted, then table operations occur within the session's transaction.

Within a transaction you can interact with multiple tables.

If the session is non-transacted, then table operations are not transacted.

Table Operations

Tables support the following data grid operations:

^l PUT a row into the table ^l GET a row of the table ^l UPDATE a row in the table.

^l DELETE a row from the table

^l Create an iterator to present the results of a table query Primary Key

Every table requires a primary key, which can consist of one or more columns. The data type of primary key columns can be long, string, or tibDateTime.

Examples of primary keys include employee number, invoice number, or MAC address.

The value of the primary key always remains unique across all the rows of the table. That is, database operations can never create two rows with the same key value; instead, they overwrite data in the existing row with that key value.

(42)

Creating Tables

Before a program can use a table or its rows for operations, the table must first be defined. A table can be defined programmatically by using a Session object or an

administrator can define a table by using the ActiveSpaces administration tool. For details, see "Defining a Table" and "Defining a Table by Using SQL DDL Commands" in TIBCO ActiveSpaces Administration.

PUT

The PUT operation adds a row to a data grid table.

Before calling the put() method, your program must first create a row object and set its columns with values.

The row object must contain a value in all columns of the primary key. The value of the key is unique. If the table already contains a row with that key value, then the PUT operation replaces the existing row within the table. The PUT operation overwrites any unchanged columns in the row. The columns that are not part of the primary key can either contain data or be NULL.

GET

The GET operation retrieves a row of a data grid table.

Before calling the get() method, your program must first create a row object and set a value in all columns of the primary key. The value of the key is unique.

If the table contains a row with that key value, then the GET operation returns the

contents of that row in a new row object. If the table does not contain a row with that key value, then the method returns null.

UPDATE

The UPDATE operation modifies rows that already exist in a data grid table. Before calling the update() method, you must first create a row object and set the primary key columns to uniquely identify the row that is going to be updated. Next, set any non-primary key

(43)

If a row with the primary key exists in the table, the update() method returns 1 whether or not any columns are updated. If a row with the primary key does not exist in the table, no update is done and 0 is returned.

DELETE

The DELETE operation deletes a row from a data grid table.

Before calling the delete() method, your program must first create a row object and set a value in all columns of the primary key. The value of the key is unique.

If the table contains a row with that key value, then the DELETE operation deletes that row from the table.

If the table does not contain a row with that key value, then the method returns without changing the table.

Iterator

A table iterator is used to iterate over all of the rows or a specific subset of the rows in the table. The create iterator operation submits a query on a data grid table and creates an iterator object to present the query results.

Supply a filter string as an argument to the create iterator operation. The filter string follows the syntax of the WHERE clause of a SQL SELECT statement excluding the WHERE keyword. All rows in the table for which the filter string evaluates to true are returned by the iterator.

An iterator object receives batches of matching rows from the data grid. The prefetch property of the iterator determines the batch size.

Properties can be set when an iterator is created thereby affecting the query behavior. For more information, see Data Consistency for Queries.

The iterator object presents the program with the individual rows that match the query one at a time.

To release resources within the data grid component processes, the program must close the iterator object and close each row object retrieved by using the iterator.

An implicit timeout limits the lifespan of iterator objects. Program calls that access an iterator after that timeout elapses return an error.

(44)

Avoid queries that result in full table scans, which can be resource-intensive and time- consuming.

Table Listener

A table listener is used to monitor events corresponding to changes in a table. A table listener is created from a specific table.

When you use a table listener, you can either monitor the contents of a specified table or a subset of rows in a specific table. For example, by using a filter, your application can track a table containing customer data and more specifically, can track the activity in a particular region to know when new customers are added or deleted, or when customers move to another region.

Events

An event indicates that the data in a table has changed.

An event can be of the following types:

^l PUT: Indicates that new data has been added to the table. PUT is also used to indicate that existing data in the table has been updated.

PUT events have a current value, which is a copy of the row that was added to or updated in the table. If the PUT operation replaces an existing row, or the UPDATE operation modifies an existing row, the event additionally has a previous value. The previous value is a copy of the row before the PUT or UPDATE operation.

^l DELETE: Indicates that a row has been deleted from the table. DELETE events have a previous value, which is a copy of the row before the DELETE operation.

^l ERROR: Indicates that something has happened in the system that indicates that the flow of events is disrupted. ERROR events have an error code and an error

description. The application must destroy the table listener. Depending on the error code, it might or might not make sense for the application to re-create the table listener. The ActiveSpaces API documentation provides more details on the specific error codes that are possible.

^l EXPIRED: Indicates that a row has expired. When rows are removed from a table due

(45)

application. The callback function executes in a thread that is internal to the ActiveSpaces client library and is expected to complete in a timely fashion. The client library retains ownership of the events and the rows they contain so any data that is required outside of the callback must be copied and managed by the application itself.

Filtering Events

When the table listener is being created you can optionally specify a filter string to further narrow the scope of events received.

The filter string specifies the criteria that events must match in order for them to be delivered to the table listener. The filter is equivalent to the WHERE clause of a SQL SELECT statement, excluding the WHERE keyword, and is applied to both the current and previous values for the row that has changed.

For example, in a table containing customer data with a column called state, the filter

state = “CA” limits the events delivered to only those involving customers in California.

Listening to Specific Event Types

When the table listener is being created, you can optionally provide a Properties object containing a list of event types that restrict the listener to only receiving events of those types. This feature is commonly used to listen to expired events, but to ignore any PUT or DELETE events that occur on the table.

The property name is TIBDG_LISTENER_PROPERTY_STRING_EVENT_TYPE_LIST and the

property value must be a comma separated list of string event types. The valid choices are any combination of "put", "delete", and "expired". For example, to listen to only expired events, you would use a property value of "expired". To listen to both expired events and delete events, you would use a property value of "delete,expired".

Statement

Statement objects are used to run SQL commands on the data grid. Queries (SELECT statements) and data manipulation language (DML) SQL commands can be run by using

Statement objects. Statement objects are created by invoking the createStatement()

method on the Session object.

A Statement object is created for each individual SQL command. A Statement can be run multiple times and must be closed when it is no longer needed.

(46)

You can either create a query by using the SELECT statement or update rows by using an INSERT or an INSERT OR REPLACE statement. The INSERT statement is supported for both, transacted and non-transacted sessions. For details about INSERT statement, see The SQL INSERT Statement. For information about INSERT OR REPLACE statement, see The INSERT OR REPLACE Statement.

Properties

Statement properties affect the behavior of the statement. Statement properties can be set when a statement is first created or when a Statement is executed.

Examples of the properties that can be set are:

^l Query prefetch - Number of rows to return in a batch when a query is first executed and each time more rows are requested while iterating through the results.

^l Query fetch timeout - Number of seconds to wait for a batch of rows to be returned before the method waiting for the rows time out.

For specific information about Statement properties, see each language API's documentation for the following tasks:

^l Creating a Statement from a Session object ^l Executing a query by using a Statement object

^l Executing a DML command by using a Statement object

Parameters

Parameters serve as placeholders for values in a SQL command. Parameters are used to separate the data of a SQL command from the command itself. This can be useful when the same command can be run multiple times by just varying the data of the command thereby increasing performance of the data grid. Parameters can be used to prevent SQL injection attacks in queries.

Parameters in a SQL command are specified by using '?' (question mark). For SELECT statements, parameters are supported for the values of comparisons in WHERE clauses. For