TIBCO ActiveSpaces®

(1)

Concepts

Software Release 4.2

August 2019

(2)

Important Information

SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO SOFTWARE IS SOLELY TO ENABLE THE FUNCTIONALITY (OR PROVIDE LIMITED ADD-ON FUNCTIONALITY) OF THE LICENSED TIBCO SOFTWARE. THE EMBEDDED OR BUNDLED SOFTWARE IS NOT LICENSED TO BE USED OR ACCESSED BY ANY OTHER TIBCO SOFTWARE OR FOR ANY OTHER PURPOSE.

USE OF TIBCO SOFTWARE AND THIS DOCUMENT IS SUBJECT TO THE TERMS AND CONDITIONS OF A LICENSE AGREEMENT FOUND IN EITHER A SEPARATELY EXECUTED SOFTWARE LICENSE AGREEMENT, OR, IF THERE IS NO SUCH SEPARATE AGREEMENT, THE CLICKWRAP END USER LICENSE AGREEMENT WHICH IS DISPLAYED DURING DOWNLOAD OR INSTALLATION OF THE SOFTWARE (AND WHICH IS DUPLICATED IN THE LICENSE FILE) OR IF THERE IS NO SUCH SOFTWARE LICENSE AGREEMENT OR CLICKWRAP END USER LICENSE AGREEMENT, THE LICENSE(S) LOCATED IN THE “LICENSE” FILE(S) OF THE

SOFTWARE. USE OF THIS DOCUMENT IS SUBJECT TO THOSE TERMS AND CONDITIONS, AND YOUR USE HEREOF SHALL CONSTITUTE ACCEPTANCE OF AND AN AGREEMENT TO BE BOUND BY THE SAME.

ANY SOFTWARE ITEM IDENTIFIED AS THIRD PARTY LIBRARY IS AVAILABLE UNDER SEPARATE SOFTWARE LICENSE TERMS AND IS NOT PART OF A TIBCO PRODUCT. AS SUCH, THESE SOFTWARE ITEMS ARE NOT COVERED BY THE TERMS OF YOUR AGREEMENT WITH TIBCO, INCLUDING ANY TERMS CONCERNING SUPPORT, MAINTENANCE, WARRANTIES, AND INDEMNITIES. DOWNLOAD AND USE OF THESE ITEMS IS SOLELY AT YOUR OWN

DISCRETION AND SUBJECT TO THE LICENSE TERMS APPLICABLE TO THEM. BY PROCEEDING TO DOWNLOAD, INSTALL OR USE ANY OF THESE ITEMS, YOU ACKNOWLEDGE THE

FOREGOING DISTINCTIONS BETWEEN THESE ITEMS AND TIBCO PRODUCTS.

This document is subject to U.S. and international copyright laws and treaties. No part of this document may be reproduced in any form without the written authorization of TIBCO Software Inc.

TIBCO, the TIBCO logo, and the TIBCO O logo are either registered trademarks or trademarks of TIBCO Software Inc. in the United States and/or other countries.

TIBCO FTL^® is an embedded and bundled component of TIBCO ActiveSpaces^® Enterprise Edition.

Java and all Java based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates.

All other product and company names and marks mentioned in this document are the property of their respective owners and are mentioned for identification purposes only.

This software may be available on multiple operating systems. However, not all operating system platforms for a specific software version are released at the same time. Please see the readme.txt file for the availability of this software version on a specific operating system platform.

THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF

MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT.

THIS DOCUMENT COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS. CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN; THESE CHANGES WILL BE INCORPORATED IN NEW EDITIONS OF THIS DOCUMENT. TIBCO SOFTWARE INC. MAY MAKE IMPROVEMENTS AND/OR CHANGES IN THE PRODUCT(S)

(3)

(4)

Figures

Rows Distributed Across Copysets . . . .14 How One Table is Distributed in a Data Grid with Four Copysets and Three Nodes . . . .15 The Workflow . . . 20

(7)

TIBCO Documentation and Support Services

How to Access TIBCO Documentation

Documentation for TIBCO products is available on the TIBCO Product Documentation website, mainly in HTML and PDF formats.

The TIBCO Product Documentation website is updated frequently and is more current than any other documentation included with the product. To access the latest documentation, visit https://

docs.tibco.com.

Product-Specific Documentation

The following documentation for TIBCO ActiveSpaces^® is available on the TIBCO ActiveSpaces^® Product Documentation page:

● TIBCO ActiveSpaces^® Concepts

● TIBCO ActiveSpaces^® Administration

● TIBCO ActiveSpaces^® Release Notes

● TIBCO ActiveSpaces^® Installation

● TIBCO ActiveSpaces^® API Reference

How to Contact TIBCO Support

You can contact TIBCO Support in the following ways:

● For an overview of TIBCO Support, visit http://www.tibco.com/services/support.

● For accessing the Support Knowledge Base and getting personalized content about products you are interested in, visit the TIBCO Support portal at https://support.tibco.com.

● For creating a Support case, you must have a valid maintenance or support contract with TIBCO.

You also need a user name and password to log in to https://support.tibco.com. If you do not have a user name, you can request one by clicking Register on the website.

How to Join TIBCO Community

TIBCO Community is the official channel for TIBCO customers, partners, and employee subject matter experts to share and access their collective experience. TIBCO Community offers access to Q&A forums, product wikis, and best practices. It also offers access to extensions, adapters, solution accelerators, and tools that extend and enable customers to gain full value from TIBCO products. In addition, users can submit and vote on feature requests from within the TIBCO Ideas Portal. For a free registration, go to https://community.tibco.com.

(8)

About This Product

The TIBCO ActiveSpaces^® software is a distributed in-memory data grid product. Some features of ActiveSpaces^® include use of familiar database concepts, high I/O capacity, and network scalability.

ActiveSpaces features a complete redesign and reimplementation of the product and is straightforward to understand, use, and administer.

Product Editions

ActiveSpaces is now available in two editions: Community Edition and Enterprise Edition.

TIBCO ActiveSpaces^® Community Edition

ActiveSpaces^® Community Edition is ideal for getting started with ActiveSpaces, for implementing application projects, including proof of concept projects, for testing, and for deploying applications in a production environment. Although the community license limits the number of production instances, you can easily upgrade to the enterprise edition as your use of ActiveSpaces expands.

The community edition is available free of charge. It is a full installation of the ActiveSpaces product.

The limitation of using the community edition is that the users can run up to 25 nodes (a total of the copyset nodes or proxies in your data grid).

ActiveSpaces Community Edition is compatible with both the enterprise and community editions of TIBCO FTL^®.

TIBCO ActiveSpaces^® Enterprise Edition

ActiveSpaces is now available in a community edition and an enterprise edition.

ActiveSpaces^® Enterprise Edition is ideal for all application development projects, and for deploying and managing applications in an enterprise production environment. It includes all features presented in this documentation set, and access to TIBCO Support. Choose the enterprise edition for production deployments with more than 25 nodes (a total of the copyset nodes or proxies in your data grid) and for enterprise monitoring using dashboards.

ActiveSpaces Enterprise Edition depends on the enterprise edition of TIBCO FTL for monitoring and management of grid components and secure communication.

(9)

Overview of TIBCO ActiveSpaces

TIBCO ActiveSpaces software is a distributed in-memory data grid product.

To lift the burden of big data, ActiveSpaces provides a distributed in-memory data grid that can increase processing speed to reduce reliance on costly transactional systems. ActiveSpaces provides an infrastructure for building highly scalable, fault-tolerant applications. It creates virtual data caches from the aggregate memory of participating nodes, scaling automatically as nodes join the data grid. By combining the features and performance of databases, caching systems, and messaging software, it supports very large, highly volatile data sets and event-driven applications.

Why ActiveSpaces?

A traditional RDBMS fails to keep up with the growing volume of data and the high rate of I/O operations per second. This drawback of RDBMS can impact the performance and slow down the system.

TIBCO ActiveSpaces is ideal for enterprises that handle a large amount of data or have a high volume of I/O activities per second. ActiveSpaces provides horizontal scalability, where you have the flexibility to segregate the data across a group of computers. For example, if you have 25000 operations per second, they can be divided across 10 computers that handle 2500 operations per second. Or, if your enterprise needs to store 10 TB of data, it can be distributed across 5 computers that contribute 2 TB each to store the data.

What Is ActiveSpaces?

ActiveSpaces uses the concepts of grid computing to provide a scalable, distributed, and durable data grid. The data grid serves as a system of record to store terabytes of data in an enterprise. ActiveSpaces provides a fast, consistent, and fault-tolerant system that supports a high rate of I/O operations in a scalable manner.

For more information about the ActiveSpaces concepts, see the Grid Computing in ActiveSpaces section.

Benefits of ActiveSpaces

ActiveSpaces offers many advantages as compared with a traditional RDBMS.

The following list highlights the benefits of using ActiveSpaces:

● Accelerates performance and customer experience

● Requires minimum investment because the system scales on low cost commodity or virtualized hardware

● Updates data and systems continually and provides immediate and accurate response

● Supports many hardware and software platforms, so programs running on different kinds of computers in a network can communicate seamlessly

● Scales linearly and transparently when nodes are added ( An increase in the number of nodes produces a corresponding increase in the memory and processing power available to the data grid )

● Enables smooth and continuous working of your application without code modification or restarts

● Provides location transparency without the hassles of determining where or how to store data and search for it

● Notifies applications automatically as soon as data is modified

(10)

Attributes of ActiveSpaces

These attributes of ActiveSpaces set it apart from traditional RDBMS.

Scalability

The biggest advantage of using ActiveSpaces is scalability. You can scale up the system horizontally to hold terabytes of data without bringing the system down. You also have complete administrative control over data redistribution.

System of Record

ActiveSpaces serves as a distributed, large-scale system of record that spans across nodes in an enterprise. The system of record uses the concepts of the traditional RDBMS such as table, rows, and columns. In addition, every node saves a portion of the data locally in a fault-tolerant and durable manner.

Faster Access to Data

ActiveSpaces support queries and indexes that improve performance. Queries run faster because data is cached in-memory. Queries in ActiveSpaces are a subset of the SQL language. The filtering and indexing capabilities offered by ActiveSpaces expedite the execution of queries.

TIBCO FTL^® for Secure Communication

ActiveSpaces uses the capabilities of TIBCO FTL^® 6.1.0 or later. A specific FTL realm contains

configuration information and connectivity parameters for communication between the ActiveSpaces data grid processes and client applications. ActiveSpaces uses TIBCO FTL for the following key tasks:

● Communication between application programs and the data grid.

● Internal communication among data grid component processes

● Configuration, monitoring, and management of data grid components

With TIBCO FTL 6.1.0 or later, ActiveSpaces uses the realm service capabilities or processes of the TIBCO FTL server. In this documentation, the term "realm service" is used to refer to TIBCO FTL 5.x realm server or TIBCO FTL 6.x realm service.

For the versions of TIBCO FTL that are compatible with TIBCO ActiveSpaces, see the readme.txt file.

High-Performance ACID-Compliant Data Grid

The data grid provides atomicity, consistency, isolation, and durability (ACID) of data. ACID- compliance is achieved by using transactions and concurrency control across multiple tables.

Transaction Isolation

A transaction comprises a set of operations that can modify the content of the data grid. Owing to transaction isolation, an ongoing transaction does not affect the queries that are being run on the content. For example, if another system is trying to read a row that you are trying to modify, that system either gets the data before the modification or it gets the data after the modification. The system does not get partially committed transactions. Even if the transaction is distributed over a network that involves multiples rows and tables, transaction isolation ensures that there are no uncommitted reads (dirty reads).

To achieve the highest level of transaction isolation, pessimistic transactions are used. This guarantees that a row is consistently accessed by the operation that initially accessed the row until the transaction

(11)

easy to use. You can use the functions to retrieve metadata information about the data grid, a specific table, or a result set.

Real-Time Push Events

ActiveSpaces provides real-time push events over the network to servers and client applications to change the data grid. Table listeners receive data change events through callback notifications.

Cloud Ready

It is easy to deploy TIBCO ActiveSpaces on cloud, on-premises, or hybrid environments. You can easily build TIBCO ActiveSpaces into microservices with container deployment products such as Docker.

(12)

Redesigned from the Ground Up

Since the 3.0 version of ActiveSpaces, ActiveSpaces software is completely redesigned and

reimplemented to make it more user-friendly for both end users and administrators. ActiveSpaces 3.x is not backward compatible with the earlier versions of the product. ActiveSpaces 3.x is faster because it relies on TIBCO FTL for the underlying communication.

ActiveSpaces 3.x and later use the terminology of a traditional RDBMS. See the Comparison Matrix.

(13)

Grid Computing in ActiveSpaces

ActiveSpaces uses grid computing to bring together computers in your network that can contribute their processing power, memory, and storage to solve a complex problem. ActiveSpaces uses grid computing concepts to store and process the contents of a data grid.

What Is a Data Grid?

ActiveSpaces stores data in data grids. In a data grid, data is stored in the form of tables. A data grid is equivalent to a database of a traditional RDBMS.

Tables

A table comprises multiple rows that are spread out in the data grid. The tables are similar to the tables in a traditional RDBMS, made of rows and columns. Unlike the traditional RDBMS where all the data in the table reside on one computer, a data grid segregates the table row-wise and stores the rows in different ActiveSpaces processes called nodes.

Rows

Like the traditional RDBMS, a row comprises a set of columns and is uniquely identified by the primary index. A row becomes the unit of measurement for the data grid. Rows are distributed across the data grid. When scaling up, the data grid controls where a newly added row must be stored.

Columns

A row is made of a collection of columns. Every column uniquely identifies a piece of information.

Every column has a type and a value associated with it. For example, the Employee Name column is of data type String and has the value "Joe Smith".

Primary Index

Uniquely identifies a row in a table. It is equivalent to a primary key in a traditional RDBMS. You can have more than one column that forms a primary index.

Secondary Index

Is similar to a primary index but can refer to multiple rows in a table. A secondary index comprises one or more columns of a table and is used to efficiently retrieve the rows of a table by reducing the number of rows scanned for retrieval by queries. Without a secondary index, this would involve a full table scan to identify which rows match the query. With a secondary index, additional space is used to help speed up the query and quickly identify matching rows without a full-table scan.

Supported Data Types

A column can be of the following data types:

● long

● double

● string

● datetime

● opaque

How Is the Data Stored in a Data Grid?

Unlike traditional RDBMS, a data grid is not stored in one place. An ActiveSpaces data grid leverages the storage capacity and computing power from multiple computers.

To understand how the data is stored, you must first familiarize yourself with the following concepts:

Nodes

(14)

A node is an ActiveSpaces process running within a computer. The node holds a portion of the data forming the data grid both in memory and on disk. The smallest unit of data held by a node is a row.

Other than storing data of a row, the node is also responsible for handling requests to read or update the row. As a result, the data spanning across a group of nodes collectively form a data grid.

Nodes can be run from a physical computer, a virtual machine, or a Docker container.

Persistence on Nodes

ActiveSpaces supports the Shared Nothing mode of persistence where every node saves its data locally to the disk.

Copysets

Copysets are logical grouping of nodes such that a portion of the data is shared uniformly by all the nodes that form a copyset. This ensures fault tolerance. Every node in the copyset, also known as the replica, has an identical copy of the data. For example, assume that a row (R1) comprises employee name, employee ID, and department. There are nodes, N1, N2, and N3 in copyset1. N1, N2, and N3 store identical copies of R1. When you add new data or request for an update on a row in a copyset, the update is written to all the nodes in the copyset before acknowledging the success of the

operation. Keeping the nodes of a copyset on different computers helps prevent data loss during system failures.

Copysets help you scale your data horizontally. When you add a new copyset to a grid, you can redistribute the existing data to the new nodes of that copyset, thereby distributing the load on the grid with the help of the newly added copyset.

The following image is a logical diagram showing how rows of three tables are distributed across two copysets in the data grid.

Rows Distributed Across Copysets

In the following image, the rows of a table are broken down into four sets (each owned by a different copyset). The nodes running in a given copyset are identical replicas of each other.

(15)

How One Table is Distributed in a Data Grid with Four Copysets and Three Nodes

To understand more about sizing a copyset and a data grid, see Sizing Guide.

Primary Node

When a copyset has more than one node in a copyset, one of the nodes is the primary node, which stores data and provides read access. The other nodes in the copyset are secondary nodes that store backup copies of the data. The key role of the primary node is to interact with the proxy process. The primary node receives the client operation and replicates it to the other nodes in the copyset. The client operation is applied in parallel at the primary node and all secondary nodes. The primary node is responsible for sharing the result of the request with the proxy.

If the primary node goes down for some reason, one of the other nodes in the copyset takes over as the primary node. Updates from client applications continue as usual without any loss of data because all of the data has been replicated from the original primary node to all of the other nodes in the copyset.

The nodes of a copyset must reside on different machines to ensure that one machine failure does not cause data loss.

Reasons for Using Multiple Nodes

There are several reasons for using multiple nodes:

● Nodes in different copysets are created with the goal of scaling horizontally. Thus, multiple copysets are created, each with a slice of the data.

● Nodes in the same copyset are created to provide multiple replicas for fault tolerance. These contain identical copies of the data.

● In a production environment, you might decide to use multiple nodes for a combination of reasons. For example, you might choose to have two replicas per copyset and multiple copysets (say three) to scale horizontally. In this example, your environment would have a total of six nodes.

(16)

To sum it up, the data is stored in copysets as described in the previous sections. The copysets put together form a data grid.

Replication

To replicate data, you must configure the copysets in the data grid such that copyset_size is greater than 1.

The copyset_size configuration setting applies to all copysets in the data grid. When the

copyset_size is greater than 1, one node in each copyset acts the primary node that stores data and provides read access to that data. The other nodes in each copyset are secondary nodes that store copies of the data on the primary node. Every time data is written to the primary node, data is synchronized at the primary node and all secondary nodes in the copyset.

When the primary node of a copyset is down, one of the secondary nodes in the copyset takes over as the primary node. As each secondary node of the copyset contains copies of the same data that resides on the primary node, no data loss occurs and data grid operations continues as long as at least one node of the copyset remains running.

(17)

Processes in ActiveSpaces

The following processes are involved in creating, maintaining, and querying the data grid:

● TIBCO ActiveSpaces Client Applications

● Proxy

● State Keeper

● Realm Service

● Node

TIBCO ActiveSpaces Client Applications

The client applications use the API libraries shipped with the product to build custom applications.

Client applications interact with the data grid by using the proxy process.

Proxy

A proxy is a mediator between a client request and the data grid. Based on the client request, the proxy identifies the primary node in a copyset and interacts with the primary node till the request is processed and shared with the client. You can have many proxies in a data grid.

Realm Service

A data grid is run inside a TIBCO FTL realm. A TIBCO FTL realm serves as a repository for data grid configuration information and provides communication services that enable all data grid processes to communicate with each other.

A client application accesses the data grid by using the realm service URL. In TIBCO FTL 6.0.0 or later, the realm service URL is the URL of the TIBCO FTL server. The realm service offers the following capabilities:

● Stores data grid definitions

● Communicates with the administrative tools to store and retrieve data grid definitions

● Communicates with all the processes running in the data grid and updates the internal configuration if anything changes

● Collects monitoring data from all processes

Fault Tolerance in Realm Services Used in TIBCO 6.0.0 or Later

TIBCO FTL 6.1.0 or later uses a quorum-based fault tolerance mechanism. A cluster of at least three TIBCO FTL core servers must be run. Each core server provides a realm service. Those realm services all cooperate to provide fault tolerance for the data grid. Fault tolerance is assured as long as a quorum of servers is always running. Each core server must be run on a separate machine. Clients receive a list of URLs at which they can connect to those TIBCO FTL core servers.

State Keeper

A state keeper runs internally in the data grid and tracks all the data in the data grid. Each state keeper saves the data locally on the disk. When you start the realm service, the state keeper receives the grid configuration information from the realm service. State keepers are responsible for the following functions:

● Tracking and managing all the copysets in a data grid

● Tracking the proxies in a data grid

● Identifying a primary node in each copyset

● Promoting one of the secondary nodes as primary, in case the primary node of a copyset goes down

(18)

● Ensuring consistency as the data grid scales up Fault Tolerance in State Keepers

It is a good practice to have one to three state keepers running in a production environment. A set of fault-tolerant state keeper processes protects the data grid's run time state information and ensures nonstop access to it. One of the state keepers is designated the lead state keeper and supplies this information to the proxies and copyset nodes. If the lead state keeper goes down, one of the

secondary state keepers takes over as the lead. In a fault-tolerant set of three state keepers, a quorum of two state keepers must always be running to ensure data consistency in split brain scenarios. If a state keeper is restarted while a quorum is running, one of the running state keepers updates the state of the restarted state keeper. If the number of running state keepers falls below the quorum and the state of a copyset changes (for example, a node goes down), operations on the data grid fails. When this happens, the remaining state keepers must be brought down and then all state keepers must be restarted.

Node

For more information on nodes, see Nodes.

Fault Tolerance in Nodes

To prevent data loss, you can run up to three nodes per copyset. Every node must have at least one backup node that has an identical copy of the data. For production deployments, TIBCO recommends using at least two nodes per copyset.

(19)

Terminology Used to Address the TIBCO FTL Realm

With TIBCO FTL 6.1 or later, ActiveSpaces uses the realm service capabilities or processes of the TIBCO FTL server. The following changes are made to the terminology to generically address the components of TIBCO FTL 5.x and TIBCO FTL 6.x:

The term used in the document

The Equivalent Component in TIBCO FTL 5.4.1

The Equivalent Component in TIBCO FTL 6.1 or Later

Realm service Realm server Realm service running on the FTL

server

Realm service URL Realm server URL FTL server URL

Backup realm service Backup realm server FTL server that is a member of a cluster of three or more FTL servers Primary Realm Primary Realm Server and its

Backup Realm Server A cluster of primary FTL servers that provide realm services for the data grid.

Satellite Realm Satellite Realm Server and its

Backup Realm Server A cluster of satellite FTL servers that are connected to a cluster of primary FTL servers.

(20)

The Workflow for a PUT Operation

A client application initiates a PUT request. The request reaches the proxy. Like all the ActiveSpaces processes, the proxy identifies the data grid by the data grid name and the realm service URL (In TIBCO FTL 6.0.0 or later, the realm service URL is the URL of the TIBCO FTL server). The proxy forwards the request to the appropriate primary node. The primary node handles the processing of the data. After all the secondary nodes are updated with the changes, the result is returned to the proxy and the proxy then shares the result with the client application. The realm service and the state keeper run outside of the operation datapath.

The Workflow

(21)

Checkpoints

TIBCO ActiveSpaces checkpoints provide the ability to save the state and data of a running data grid. A checkpoint can then be used to restore a complete data grid on the same computer, to move the entire data grid, or to replicate a data grid to another data grid for disaster recovery.

The data collected by a checkpoint is guaranteed to be logically consistent across the entire data grid. A checkpoint does not contain the data from a partially committed transaction.

On creation, ActiveSpaces checkpoint performs the following activities:

● The realm database is saved.

● The configuration of the data grid in the realm is saved.

● Each state keeper's internal governing state information is saved.

● The relevant files needed to restore each node of a data grid are saved in the checkpoints subdirectory of each node's data directory.

Creating a checkpoint fails in the following scenarios:

● A realm is not reachable.

● A quorum of state keepers is not running.

Checkpoint Types

A manual checkpoint is created manually using the ^tibdg administrative tool and a periodic checkpoint is created automatically by configuring the data grid to create periodic checkpoints.

Manual Checkpoints

● Initiated using the ^tibdg administrative tool.

● Are given a unique name to help with checkpoint identification Periodic Checkpoints

● Configured at the grid level.

● Taken at a fixed interval while the data grid is running.

● Cannot be given a name.

Both manual and periodic checkpoints can be manually removed, and are subject to removal based on the retention setting.

(22)

Disaster Recovery Concepts

Disaster Recovery is a situation where a set of running systems must be replaced by another set of running systems due to failure, damage, loss of connectivity, or other traumatic event. Disaster Recovery is a large scale event and is not intended to replace fault tolerance where the failure of individual components can be recovered or otherwise accommodated without stopping a running system.

In a disaster recovery scenario, running systems are not expected to seamlessly or automatically failover to backup or alternative systems. Recovering from a disaster scenario implies a substantial and potentially large scale system stoppage and a restart of an entirely new instance of the previously running system. It is not intended for short term outages such as normal maintenance operations.

The replacement systems activated during disaster recovery are designed to remain in operation for days, weeks, or even months depending on the severity of the disaster.

ActiveSpaces supports disaster recovery by creating a gridset.

Gridsets

The purpose of gridsets is to help set up the disaster recovery process. A group of data grids that share the same set of consistent data is referred to as a gridset. Each gridset has a name, which exists in the same namespace as grid names (For example, you cannot have a grid named “prod” and a gridset named “prod”). Each gridset also has a single primary grid. Within a gridset, there is a single authoritative schema, which is owned by the primary grid of the gridset.

Each grid might belong to at most one gridset. Grids in a gridset do not need to have the same

replication factor, number of copysets, or number of state keepers, but care must be taken to ensure that a mirror grid has sufficient capacity to handle the required load if administrators choose to make it the primary grid.

Types of Grids

The following list differentiates the types of data grids in ActiveSpaces:

Stand-Alone Grids

Any grid that does not belong to a gridset is a stand-alone grid. All operations included in the ActiveSpaces API are permitted on stand-alone grids.

Primary Grids

Any grid that is listed as the primary of a gridset is a primary grid. All operations included in the ActiveSpaces API are permitted on primary grids. Primary grids are responsible for supplying mirror grids with data on request.

Mirror Grids

Any grid that is included in a gridset, but is not currently the primary of that gridset is a mirror grid.

Only read operations are allowed on mirror grids (For example, GET, queries, iterators, and so on).

Read operations are executed against the most recent checkpoint that has been mirrored from the primary grid. Mirror grids are responsible for requesting updates from the primary grid.

(23)

checkpoint or a periodic checkpoint causes mirroring). Until all copysets in the primary grid have confirmed that their data has been mirrored, data in the checkpoint being mirrored is not available.

Bulk Mirroring

If a mirror grid has no previous checkpoints available or if the primary grid has insufficient information to identify the rows that changed since the last mirrored checkpoint, bulk mirroring is used. During bulk mirroring, all rows present in the checkpoint being mirrored are sent to the mirror grid.

Incremental Mirroring

ActiveSpaces attempts to minimize the data sent between grids whenever possible by using incremental mirroring. When a mirror grid has a previous checkpoint as the starting point, and the primary grid has sufficient information to identify all rows that changed, only rows that were updated or deleted are sent to the mirror grid.

(24)

Best Practices for a Development Environment

In many enterprises, programmers act as administrators during the development and test phases of a project.

To develop and test application programs that use ActiveSpaces software, deploy the following processes:

Processes Numbers

Realm service One

State keeper One

Node One

Proxy One

Your application programs Your application programs appropriate

In a development environment, you can run all of these processes on the same host computer.

Sample Scripts

Refer to the TIBCO_HOME/as/<version>/samples/readme.md before using the sample scripts.

The following scripts are available:

TIBCO_HOME/as/<version>/samples/scripts/as-start defines a simple data grid and starts its component processes.

TIBCO_HOME/as/<version>/samples/scripts/as-stop stops those component processes.

Sample Docker Environment

The docker-compose sample environment is provided to demonstrate how to deploy an ActiveSpaces data grid in Docker. For more information, see TIBCO_HOME/as/<version>/samples/docker/

README.md.

When you are ready to explore using ActiveSpaces to scale your data beyond one machine, you can create additional copysets and nodes in the grid and run the nodes on separate machines.

(25)

Best Practices for a Production Environment

To use ActiveSpaces software in a production environment, deploy the following processes.

Processes Minimum Required Description

Realm service One TIBCO recommends that you

run a fault-tolerant set of realm services. Run a quorum of realm services. Each realm service must be run on separate host computers.

State keeper Three state keeper processes To ensure high availability during a network partition or hardware failure, each state keeper process must run on a separate host computer. Not doing so might result in grid- wide data loss. At any given time, you must maintain a quorum of running state keepers. If you want to run more than one state keeper, configure three state keepers and make sure you have at least two running state keepers.

(26)

Processes Minimum Required Description

Node Two nodes per copyset For greater data protection you

can run three nodes per copyset. In a fault tolerant setup, if there are more than one node, one node acts as a primary node and the other nodes are secondary nodes.

Additional copies can become expensive in two ways: Increasing the node count by one adds one complete copy of all the data.

Every node process must run on a separate host computer.

Usually this requirement determines the number of host computers you must maintain.

For example, a data grid with three copysets and two nodes per copyset requires six nodes, all on separate hosts.

Increasing to three nodes per copyset would require nine nodes, all on separate hosts.

Proxy One proxy process You can run additional proxies

to increase the capacity for client programs and to improve response time. For best results, run proxy processes on a separate host computers.

Your application programs Run as many processes as appropriate.

Components Sharing a Host Computer

You can reduce number of host computers in a production environment by running more than one component per host.

For example, you can run a realm service, a state keeper, a node, and a proxy, all on one host. (In contrast, do not run two state keepers on the same host.) For effective fault tolerance, run the nodes of each copyset on separate host computers.

(27)

For more information on different EBS volume types provided by Amazon, look for "Amazon EBS Volume Types" on https://docs.aws.amazon.com. For information on monitoring the performance of your EBS volume, look for "EBS Performance - I/O Characteristics and Monitoring" on https://

docs.aws.amazon.com.

(28)

Programming Concepts

These concepts and definitions pave the way to a more detailed understanding of applications programming with ActiveSpaces software.

Data Grid

A distributed database, including all the component processes that implement it.

Connection

An application program connects to a data grid. A connection object is analogous to a traditional database connection.

Session

An application program interacts with a data grid through one or more session objects. Each session insulates the data grid interactions within one program thread from the interactions in other threads.

A session can be transacted or non-transacted. Get, put, and delete operations in a transacted session occur within a transaction, and do not take effect until the program explicitly commits the transaction.

Table

An ActiveSpaces data grid organizes and presents data as rows in tables, like a traditional relational database.

Administrators define tables within the data grid.

Programs can get a row from a table, put a row into a table, and delete a row from a table.

Programs can query a table for the rows that match a filter.

Primary Key

Each table distinguishes a primary key, or more briefly, the key.

Values of the key are unique: no two rows in a table have the same key value.

Secondary Index

A table can have zero or more secondary indexes, which facilitate queries. The tibdg tool can be used to create a secondary index on a table by using the index create command.

Iterator

An iterator presents an application program with the results of a data grid query, one row at a time.

Structuring Programs

These steps outline the main structural components of most application programs that access a ActiveSpaces data grid.

Procedure

Task A Initialize ActiveSpaces Objects 1. Initialize the ActiveSpaces library.

(29)

Task B Data Grid Operations

5. Access the data grid using methods of a table object.

See Table Operations.

Task C Clean-Up 6. Close table objects.

7. Destroy session objects.

8. Close the data grid connection object.

Session

Programs use sessions to insulate data grid operations within program threads and to unite operations within transactions.

An application program usually creates only one connection to a data grid. From that connection object, a program can create one or many session objects.

Session objects are lightweight.

Sessions and Threads

It is good practice to create a separate session for each program thread that accesses the data grid.

Programs must use sessions in a thread-safe way. That is, two threads must not simultaneously access the same session. Violating this constraint can yield unpredictable results.

Sessions and Transactions

Each session can be either transacted or non-transacted. Programs determine this semantic property when creating each session.

In a transacted session, all get, put, and delete operations occur within a transaction, which is bound to the session. The session implicitly starts the transaction. Programs explicitly call the session's commit and rollback methods. (As these methods complete, they automatically start a new transaction in the session.)

If a program operates within several open transactions simultaneously, use a separate session and thread for each transaction.

In a non-transacted session, put, get, and delete operations are immediate: that is, when the method completes, the effect of the operation is also complete.

However, operations in a transacted session can block operations in a non-transacted session. For further explanation, see Transaction Isolation.

Only get, put, and delete operations are affected by a transacted session, the corresponding commit, and rollback APIs. Other commands such as iterators, queries, DDL updates, and so on do not have different behavior when running on a transacted session versus a non-transacted session.

Sessions and Defining Tables

Once a session has been created, it can be used to define tables programmatically. For more information, see "Defining a Table Using SQL DDL Commands" in the TIBCO ActiveSpaces Administration guide.

(30)

Table

Table objects represent data grid tables within an application program.

Tables and Sessions

A program opens a table object by calling a session's open table method. The program can use the table object's methods to operate on the corresponding table in the data grid.

Opening a table object does not lock the table in the data grid.

If the session is transacted, then table operations occur within the session's transaction. Within a transaction you can interact with multiple tables.

If the session is non-transacted, then table operations are not transacted.

Table Operations

Tables support the following data grid operations:

● Put a row into the table

● Get a row of the table

● Delete a row from the table

● Create an iterator to present the results of a table query Primary Key

Every table requires a primary key, which can consist of one or more columns. The data type of key columns must be either long or string.

Examples of primary keys include employee number, invoice number, or MAC address.

The value of the primary key always remains unique across all the rows of the table. That is, database operations can never create two rows with the same key value; instead, they overwrite data in the existing row with that key value.

Creating Tables

Before a program can use a table or its rows for operations, an administrator must first create the table within the data grid. See "Defining a Table" in TIBCO ActiveSpaces Administration.

Put

The put operation adds a row to a data grid table.

Before calling the put method, your program must first create a row object and set its columns with values.

The row object must contain a value in all columns of the primary key. The value of the key is unique. If the table already contains a row with that key value, then the put operation replaces the existing row within the table. The put operation overwrites any unchanged columns in the row. The columns that are not part of the primary key can either contain data or be NULL.

(31)

If the table contains a row with that key value, then the get operation returns the contents of that row in a new row object. If the table does not contain a row with that key value, then the method returns null.

Delete

The delete operation deletes a row from a data grid table.

Before calling the delete method, your program must first create a row object and set a value in all columns of the primary key. The value of the key is unique.

If the table contains a row with that key value, then the delete operation deletes that row from the table.

If the table does not contain a row with that key value, then the method returns without changing the table.

Create Iterator

The create iterator operation submits a query on a data grid table and creates an iterator object to present the query results.

Supply a filter string as an argument. This filter specifies the content of the query: that is, criteria for selecting a subset of rows from the table.

An iterator object receives batches of matching rows from the data grid. The prefetch property of the iterator determines the batch size.

The iterator consistency property allows the client to choose between the following snapshot consistency level:

● global snapshot (default level)

This level ensures that none of the results in the snapshot are from a partially committed transaction, although more coordination is required when taking the snapshot.

● snapshot

This level makes no guarantee about the results in the snapshot containing partially committed transactions but requires less coordination when taking the snapshot.

The iterator object presents the program with the individual rows that match the query, one at a time.

The program must close the iterator object to release resources within the data grid component processes. For more information, see Query Behavior.

An implicit timeout limits the lifespan of iterator objects. Program calls that access an iterator after that timeout elapses return an error.

Avoid queries that result in full table scans, which can be resource-intensive and time-consuming. For more information, see Efficiency of Filters.

Statement

Statement objects are used to run SQL commands on the data grid. Queries (^SELECT statements) and data manipulation language (DML) SQL commands can be run by using Statement objects. Statement objects are created by invoking the createStatement() method on the Session object.

A Statement object is created for each individual SQL command. A Statement can be run multiple times and must be closed when it is no longer needed.

You can either create a query using the ^SELECT statement or update rows using an ^INSERT or an

INSERT OR REPLACE statement. The ^INSERT statement is supported for both, transacted and non- transacted sessions. For details on ^INSERT statement, see The SQL INSERT Statement. For information about INSERT OR REPLACE statement, see The INSERT OR REPLACE Statement.

(32)

Parameters

Parameters are used to separate the data of an SQL command from the command itself. This can be useful when the same command can be run multiple times by just varying the data of the command thereby increasing performance of the data grid. Parameters can be used to prevent SQL injection attacks in queries.

Parameters in an SQL command are specified using ^'?' (question mark). For ^SELECT statements, parameters are supported for the values of comparisons in ^WHERE clauses. For ^INSERT statements, parameters are supported for column values.

The Statement interface provides methods for setting the values of any parameters used in an SQL command. Separate methods for setting parameter values are provided for each data type supported by ActiveSpaces. The setNull() method is provided to specify that a parameter's value must be empty (SQL

NULL). All parameter values must be specified prior to running the statement or an error is returned.

Running Statements

The Statement interface provides two methods for executing the statement. The executeQuery() method is used to run statements, which have been created using a ^SELECT command. The executeQuery() method returns a ResultSet object that contains the resulting rows of a query.

The executeUpdate() method is used to run statements which have been created using a DML

command. The executeUpdate() method returns the number of rows that were successfully processed.

If the wrong method is used to run a statement, an error is returned. For information about the current list of DML commands supported, see the TIBCO ActiveSpaces Release Notes.

ResultSet

A ResultSet contains the set of rows that make up the result of a query. A ResultSet object is returned when executeQuery() is invoked on a Statement created for a ^SELECT statement. The ResultSet object must be closed when it is no longer needed. A ResultSet object is returned even if no rows were found for the query.

The ResultSet object contains methods that allow you to iterate over the rows of the query result. A ResultSet object can be iterated over only once. The hasNext() method is used to check if there is a row that can be retrieved. The next() method is used to retrieve the next row object of the result.

Row Objects

A row object retrieved from a ResultSet contains the columns specified in the result list of a query. Each row object retrieved from a ResultSet must be closed when it is no longer needed. A row object contains the methods to find out the data type of each of its columns, whether a column has a value, and the methods to retrieve a column's value by its data type.

The label of a column in the result list is used to access the data for each column. The columns of a row are accessed by using the label specified for the column in the result list using ActiveSpaces. For example,

SELECT col1 AS myname FROM table1

where ^myname is used as the column label.

If a label was not specified for a column in the result list, the column's name from the table is used as

(33)

If a label was not specified and the column of the result list is an expression, the expression string is used as the label. For example,

SELECT col1, date('now') FROM table1 WHERE col1 <= 10

where the expression string, date('now'), is used as the column label.

When a label is specified for a column, or the column is an expression, the label or expression must be used exactly as it was specified in the original query string and is case-sensitive. If a label was not specified for a column and the column is from the table, the label is not case-sensitive.

ResultSet MetaData

The result of running a ^SELECT statement using the executeQuery() method is a ResultSet object, which contains the resulting rows of the ^SELECT statement. Information about the rows in the ResultSet is obtained when the statement is first created by invoking the getResultSetMetadata() method of the Statement object.

This information includes the number of columns in a result row, each column's data type, the label given to the column, the name of the column from the table, and the name of the table for each column.

If a label was not specified for a column, the name of the column from the table is used as the label. If a column is not from a table, but is an expression that is calculated as part of the result, the entire

expression string is used as the label. It is always safest to access a column of a ResultSet by its label as a ResultSet column always has a label, but might not necessarily have a name from a table.

Metadata

Application programs can use the metadata APIs to retrieve metadata information about the data grid or a specific table.

The outline of the process involved in retrieving metadata information:

1. Your program needs to get a reference to valid connection object if you don't already have one.

Your program needs to get a reference to a valid connection object if you do not already have one.

For convenience, a program can use a table accessor method to get the reference to the parent session and then the parent connection.

2. Call the ^{get grid} metadata API for the connection.

This makes a network request to the data grid for the metadata information (including all tables) at that point in time.

The program must destroy the grid metadata object after it has finished using it. If updated grid metadata is needed, the existing metadata object must be destroyed. Another request must be made using the same API to retrieve the latest metadata information.

3. The table metadata object and any strings retrieved from it do not need to be destroyed since they are all owned by the grid metadata object and are destroyed as part of its destroy method.

Grid Metadata Object

The grid metadata object contains the array of table metadata objects that exist in the data grid at the time the request was made.

A single metadata object for a table is retrieved using its table name. If the names of the tables are not known to the program, the grid metadata object provides a method to get the array of all table names, which in turn can be used to get a single table metadata object as described previously.

Similarly, a column name or index name is used to get information about that column or index from a table metadata object. If the column names or index names are not known to the program, the table metadata object provides a method to get the array of those names.

(34)

Additionally, a method to get the primary index name exists in the table metadata object and is used to determine the table's primary index and then the column (or columns) that make up that index (often known as the primary key(s) of the table).

Querying a Data Grid Table

Application programs have the following options to query tables in a data grid.

The following options are available:

● Table iterator

A table iterator is used when you have created a table object and need to iterate over all of the rows or a specific subset of the rows in the table.

You can create an iterator to query the contents of the table and then iterate over the query results.

The contents of the query results for the iterator are controlled by the use of a filter string when the iterator is first created. Using a NULL filter string when creating the iterator returns all of the rows of the table.

To restrict the query results to a subset of the rows of a table, provide a non-NULL filter string which contains a filter expression as described in Query Language.

For more information, see Table Iterator.

● Session statement

A statement is created from a session object and is not tied to any one particular table. A session statement is created using a SQL ^SELECT string and the table for the query is determined by parsing the SELECT string. The SQL SELECT string supported has the form:

SELECT * FROM table_name WHERE filter

The syntax for a filter string is the same as for table iterators with the exception that parameter markers using '?' are supported. Parameter markers allow you to optimize the scenario where you want to run the same query several times with different values in a filter expression. For example,

SELECT * FROM mytable WHERE key > ?

Before a query is run for the session statement, you must provide values for all of the parameter markers in the SQL ^SELECT string by calling the appropriate Statement.setXXX() method for each parameter value depending on the data type of the value.

The parameter number is required when calling the setXXX()( method with the first parameter in the statement being numbered 1.

For more information, see Session Statement.

The advantages of using a session statement over a table iterator:

● The same query can be run multiple times using a single statement object.

● You can use parameter markers to vary the results of your query each time it is run.

● You do not have to create a table object before you can query a table.

● You can specify a subset of the columns of a table to be returned for the query results.

● To aid with data security, you can decouple the query from the data values used in the query.

(35)

Table Iterator

Application programs can query a data grid table to retrieve a subset of its rows.

Procedure

1. Formulate a filter string that specifies the query, that is, the constraints that determine the rows that the query retrieves.

See Query Language.

2. Call the create iterator method of the table object.

Supply the filter string as an argument.

The API library sends the filter string to a proxy process, which retrieves matching rows from the data grid, and funnels them to the application through the iterator object.

3. Iterate over the resulting rows.

The iterator presents the rows in an order that is deterministic and repeatable, however, the application program cannot influence that order.

4. Close the iterator object.

Closing the iterator releases the resources that hold the query results within the component processes of the data grid.

The SQL SELECT Statement

An SQL ^SELECT statement is used to query a table. The table to query is determined by parsing the

SELECT string when creating the ^SELECT statement. The rows which satisfy the query are returned in a ResultSet. A ^WHERE clause can be used in the ^SELECT statement to control which rows of a table should be returned. An ^{ORDER BY}clause can be optionally appended after the ^WHERE clause to sort the

resulting rows of the query. A ^LIMIT clause can be optionally appended as the last clause of the ^SELECT string to control the number of rows ultimately returned for the query.

Procedure

1. Formulate an SQL ^SELECT for the query you want to use to retrieve rows from a table.

2. Call the create statement method of the session object.

Supply the SQL ^SELECT string as an argument.

3. Call the statement methods to set the parameter values for the query.

4. Run the statement by calling executeQuery() method for the SQL statement.

5. Read the rows of the query ResultSet by looping and calling the following methods until no more rows are available to read:

a) Call the ^hasNext() method of the ResultSet to see if there is a row to read.

b) Call the ^next() method of the ResultSet to read the next row.

c) When all of the rows have been read, close the ResultSet object.

6. Optional: Set different parameter values and rerun the statement by repeating 3 - Step 5.

7. Close the statement object.

Iterator Query Language

The rows to be returned when querying a table with an iterator can be constrained by specifying a filter string when the iterator is created. The filter string is the equivalent of a SQL ^WHERE clause, which must result in a boolean value indicating whether or not a row must be included in the results of a query.

To specify a query when using an iterator, supply a filter expression. For example:

column_name > 100

(36)

Iterator Queries return all columns of a table rather than a subset of specific columns. That is, all queries implicitly begin with SELECT * FROM table_name WHERE. Nonetheless, programs do not specify this string, they specify only the filter that would follow the ^WHERE keyword.

SQL keywords and table and column identifiers are not case sensitive when used in a filter expression.

String values, surrounded by single quotes, are case-sensitive.

Filter Expression Syntax Reference

Query filter expressions have the following form.

[ NOT ] column operator value { [ AND | OR ] [ NOT ] column operator value }*

The following sections describe further details of filter expression syntax and semantics.

Column

column can be the name of any column defined in the table.

Operator

operator can be any operator in the following table:

Operator Description

=

==

IS

!=

<>

IS NOT

>

<

>=

<=

ISNULL IS NULL

Tests that the row does not contain a value in this column.

NOTNULL NOT NULL

Tests that the row contains a value in this column.

(37)

Operator Description

IN ( value [, value ]* ) Requires a set of values, separated by commas, surrounded by parentheses.

Value

value can be any value of the same data type as the column's data type.

Surround string values in single quote characters: for example, ^{'My Value'}. Conjunctions

● AND joins multiple conditions. The overall condition is true if and only if every individual condition is true.

● OR joins multiple conditions. The overall condition is true if at least one of the individual conditions is true.

Negation

NOT reverses the boolean value of a logical expression that follows it. For example, you can use the operators NOT BETWEEN and NOT IN.

You can also precede an operator clause with NOT, for example:

NOT column operator value

Order of Operations

Order of operations is similar to SQL. NOT takes precedence over conjunctions. The conjunction AND takes precedence over OR.

You can use parentheses to group expressions, overriding that order.

Performance

See Efficiency of Filters.

SQL LIKE Operator

The LIKE operator is used in a WHERE clause to search for a specified pattern in a column's value.

Syntax:

SELECT <result_list> FROM <table_name> WHERE <column_name> [NOT]

LIKE <column_value> [ESCAPE <char>]

Indexed Columns

If the left side of the LIKE operator is the name of an indexed column of type string, ActiveSpaces converts the LIKE operator into a range query using >= and <. This enables the ActiveSpaces index selection algorithm to select the index and use it for scanning rows when processing the query. For example, take a look at the following statement:

SELECT * FROM table1 WHERE lastname LIKE 'Long'

Internally, this statement is converted to the following statement:

SELECT * FROM table1 WHERE lastname >= 'LONG' AND lastname < 'long'

Even if we have an index on the column, using LIKE <column_value> results in a full table scan if the <column_value> is a string that starts with a wildcard (%, _) or digit, or if <column_value> is a long value.

TIBCO ActiveSpaces®

Concepts

Software Release 4.2

August 2019

Contents

Figures

TIBCO Documentation and Support Services

About This Product

Overview of TIBCO ActiveSpaces

Why ActiveSpaces?

What Is ActiveSpaces?

Benefits of ActiveSpaces

Attributes of ActiveSpaces

Redesigned from the Ground Up

Grid Computing in ActiveSpaces

What Is a Data Grid?

How Is the Data Stored in a Data Grid?

Replication

Processes in ActiveSpaces

Terminology Used to Address the TIBCO FTL Realm

The Workflow for a PUT Operation

Checkpoints

Checkpoint Types

Disaster Recovery Concepts

Gridsets

Types of Grids

Best Practices for a Development Environment

Best Practices for a Production Environment

Programming Concepts

Structuring Programs

Session

Table

Statement

Metadata

Querying a Data Grid Table

Table Iterator

The SQL SELECT Statement

Iterator Query Language

Filter Expression Syntax Reference

SQL LIKE Operator