• 沒有找到結果。

TIBCO ActiveSpaces®

N/A
N/A
Protected

Academic year: 2022

Share "TIBCO ActiveSpaces®"

Copied!
39
0
0

加載中.... (立即查看全文)

全文

(1)

Concepts

Software Release 4.0 November 2018

Two-Second Advantage®

(2)

Important Information

SOME TIBCO SOFTWARE EMBEDS OR BUNDLES OTHER TIBCO SOFTWARE. USE OF SUCH EMBEDDED OR BUNDLED TIBCO SOFTWARE IS SOLELY TO ENABLE THE FUNCTIONALITY (OR PROVIDE LIMITED ADD-ON FUNCTIONALITY) OF THE LICENSED TIBCO SOFTWARE. THE EMBEDDED OR BUNDLED SOFTWARE IS NOT LICENSED TO BE USED OR ACCESSED BY ANY OTHER TIBCO SOFTWARE OR FOR ANY OTHER PURPOSE.

USE OF TIBCO SOFTWARE AND THIS DOCUMENT IS SUBJECT TO THE TERMS AND CONDITIONS OF A LICENSE AGREEMENT FOUND IN EITHER A SEPARATELY EXECUTED SOFTWARE LICENSE AGREEMENT, OR, IF THERE IS NO SUCH SEPARATE AGREEMENT, THE CLICKWRAP END USER LICENSE AGREEMENT WHICH IS DISPLAYED DURING DOWNLOAD OR INSTALLATION OF THE SOFTWARE (AND WHICH IS DUPLICATED IN THE LICENSE FILE) OR IF THERE IS NO SUCH SOFTWARE LICENSE AGREEMENT OR CLICKWRAP END USER LICENSE AGREEMENT, THE LICENSE(S) LOCATED IN THE “LICENSE” FILE(S) OF THE

SOFTWARE. USE OF THIS DOCUMENT IS SUBJECT TO THOSE TERMS AND CONDITIONS, AND YOUR USE HEREOF SHALL CONSTITUTE ACCEPTANCE OF AND AN AGREEMENT TO BE BOUND BY THE SAME.

ANY SOFTWARE ITEM IDENTIFIED AS THIRD PARTY LIBRARY IS AVAILABLE UNDER SEPARATE SOFTWARE LICENSE TERMS AND IS NOT PART OF A TIBCO PRODUCT. AS SUCH, THESE SOFTWARE ITEMS ARE NOT COVERED BY THE TERMS OF YOUR AGREEMENT WITH TIBCO, INCLUDING ANY TERMS CONCERNING SUPPORT, MAINTENANCE, WARRANTIES, AND INDEMNITIES. DOWNLOAD AND USE OF THESE ITEMS IS SOLELY AT YOUR OWN

DISCRETION AND SUBJECT TO THE LICENSE TERMS APPLICABLE TO THEM. BY PROCEEDING TO DOWNLOAD, INSTALL OR USE ANY OF THESE ITEMS, YOU ACKNOWLEDGE THE

FOREGOING DISTINCTIONS BETWEEN THESE ITEMS AND TIBCO PRODUCTS.

This document contains confidential information that is subject to U.S. and international copyright laws and treaties. No part of this document may be reproduced in any form without the written

authorization of TIBCO Software Inc.

TIBCO, Two-Second Advantage, ActiveSpaces, and FTL are either registered trademarks or trademarks of TIBCO Software Inc. in the United States and/or other countries.

TIBCO FTL® is an embedded and bundled component of TIBCO ActiveSpaces® - Enterprise Edition.

Enterprise Java Beans (EJB), Java Platform Enterprise Edition (Java EE), Java 2 Platform Enterprise Edition (J2EE), and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle Corporation in the U.S. and other countries.

All other product and company names and marks mentioned in this document are the property of their respective owners and are mentioned for identification purposes only.

THIS SOFTWARE MAY BE AVAILABLE ON MULTIPLE OPERATING SYSTEMS. HOWEVER, NOT ALL OPERATING SYSTEM PLATFORMS FOR A SPECIFIC SOFTWARE VERSION ARE RELEASED AT THE SAME TIME. SEE THE README FILE FOR THE AVAILABILITY OF THIS SOFTWARE VERSION ON A SPECIFIC OPERATING SYSTEM PLATFORM.

THIS DOCUMENT IS PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF

MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT.

THIS DOCUMENT COULD INCLUDE TECHNICAL INACCURACIES OR TYPOGRAPHICAL ERRORS. CHANGES ARE PERIODICALLY ADDED TO THE INFORMATION HEREIN; THESE CHANGES WILL BE INCORPORATED IN NEW EDITIONS OF THIS DOCUMENT. TIBCO SOFTWARE INC. MAY MAKE IMPROVEMENTS AND/OR CHANGES IN THE PRODUCT(S) AND/OR THE PROGRAM(S) DESCRIBED IN THIS DOCUMENT AT ANY TIME.

THE CONTENTS OF THIS DOCUMENT MAY BE MODIFIED AND/OR QUALIFIED, DIRECTLY OR INDIRECTLY, BY OTHER DOCUMENTATION WHICH ACCOMPANIES THIS SOFTWARE,

INCLUDING BUT NOT LIMITED TO ANY RELEASE NOTES AND "READ ME" FILES.

(3)

Copyright © 2009-2018. TIBCO Software Inc. All Rights Reserved.

(4)

Contents

Figures . . . 6

About This Product. . . .7

TIBCO Documentation and Support Services. . . .8

Overview of TIBCO ActiveSpaces®. . . . 9

Why ActiveSpaces?. . . .9

What Is ActiveSpaces?. . . .9

Benefits of ActiveSpaces. . . .9

Attributes of ActiveSpaces. . . .10

Redesigned from the Ground Up. . . .12

Grid Computing in ActiveSpaces. . . .13

What Is a Data Grid?. . . .13

How Is the Data Stored in a Data Grid?. . . .13

Replication. . . .16

Processes in ActiveSpaces. . . .17

The Workflow for a PUT Operation. . . .19

Checkpoints. . . .20

Checkpoint Types. . . .20

Disaster Recovery Concepts. . . .21

Gridsets. . . .21

Types of Grids. . . .21

Mirroring. . . .21

Best Practices for a Development Environment. . . .23

Best Practices for a Production Environment. . . .24

Programming Concepts. . . . 26

Structuring Programs. . . .26

Session. . . .27

Table. . . .28

Put. . . .28

Get. . . 28

Delete. . . .29

Create Iterator. . . .29

Metadata. . . .29

Querying a Data Grid Table. . . .30

Table Iterator. . . .31

Session Statement. . . .31

Query Language. . . .32

(5)

Filter Expression Syntax Reference. . . .32

Special Characters in Column Names. . . .34

Efficiency of Filters. . . .34

Query Behavior. . . .35

Transaction Isolation. . . .36

Listeners. . . .36

Filtering Events. . . .36

Sizing Guide. . . .37

Example of a Sizing Calculation. . . 37

Comparison Matrix. . . .39

(6)

Figures

Rows Distributed Across Copysets . . . .14 How One Table is Distributed in a Data Grid with Four Copysets and Three Nodes . . . .15 The Workflow . . . 19

(7)

About This Product

The TIBCO ActiveSpaces® software is a distributed in-memory data grid product. Some features of ActiveSpaces® include use of familiar database concepts, high I/O capacity, and network scalability.

ActiveSpaces features a complete redesign and reimplementation of the product and is straightforward to understand, use, and administer.

Product Editions

ActiveSpaces is now available in two editions: Community Edition and Enterprise Edition.

ActiveSpaces Community Edition

ActiveSpaces - Community Edition is ideal for getting started with ActiveSpaces, for implementing application projects, including proof of concept projects, for testing, and for deploying applications in a production environment. Although the community license limits the number of production instances, you can easily upgrade to the enterprise edition as your use of ActiveSpaces expands.

The community edition is available free of charge. It is a full installation of the ActiveSpaces product.

The limitation of using the community edition is that the users can run up to 25 nodes (a total of the copyset nodes or proxies in your data grid).

ActiveSpaces - Community Edition is compatible with both the enterprise and community editions of TIBCO FTL®.

ActiveSpaces Enterprise Edition

ActiveSpaces is now available in a community edition and an enterprise edition.

ActiveSpaces - Enterprise Edition is ideal for all application development projects, and for deploying and managing applications in an enterprise production environment. It includes all features presented in this documentation set, and access to TIBCO Support. Choose the enterprise edition for production deployments with more than 25 nodes (a total of the copyset nodes or proxies in your data grid) and for enterprise monitoring using dashboards.

ActiveSpaces - Enterprise Edition depends on the enterprise edition of TIBCO FTL for monitoring and management of grid components and secure communication.

(8)

TIBCO Documentation and Support Services

How to Access TIBCO Documentation

Documentation for TIBCO products is available on the TIBCO Product Documentation website, mainly in HTML and PDF formats.

The TIBCO Product Documentation website is updated frequently and is more current than any other documentation included with the product. To access the latest documentation, visit https://

docs.tibco.com.

Product-Specific Documentation

The following documentation for TIBCO ActiveSpaces® is available on the TIBCO ActiveSpaces® Product Documentation page:

TIBCO ActiveSpaces® Concepts

TIBCO ActiveSpaces® Administration

TIBCO ActiveSpaces® Release Notes

TIBCO ActiveSpaces® Installation

TIBCO ActiveSpaces® API Reference

How to Contact TIBCO Support

You can contact TIBCO Support in the following ways:

For an overview of TIBCO Support, visit http://www.tibco.com/services/support.

For accessing the Support Knowledge Base and getting personalized content about products you are interested in, visit the TIBCO Support portal at https://support.tibco.com.

For creating a Support case, you must have a valid maintenance or support contract with TIBCO.

You also need a user name and password to log in to https://support.tibco.com. If you do not have a user name, you can request one by clicking Register on the website.

How to Join TIBCO Community

TIBCO Community is the official channel for TIBCO customers, partners, and employee subject matter experts to share and access their collective experience. TIBCO Community offers access to Q&A forums, product wikis, and best practices. It also offers access to extensions, adapters, solution accelerators, and tools that extend and enable customers to gain full value from TIBCO products. In addition, users can submit and vote on feature requests from within the TIBCO Ideas Portal. For a free registration, go to https://community.tibco.com.

(9)

Overview of TIBCO ActiveSpaces ®

TIBCO ActiveSpaces® software is a distributed in-memory data grid product.

To lift the burden of big data, TIBCO ActiveSpaces® provides a distributed in-memory data grid that can increase processing speed to reduce reliance on costly transactional systems. ActiveSpaces®

provides an infrastructure for building highly scalable, fault-tolerant applications. It creates virtual data caches from the aggregate memory of participating nodes, scaling automatically as nodes join the data grid. By combining the features and performance of databases, caching systems, and messaging software, it supports very large, highly volatile data sets and event-driven applications.

Why ActiveSpaces?

A traditional RDBMS fails to keep up with the growing volume of data and the high rate of I/O operations per second. This drawback of RDBMS can impact the performance and slow down the system.

TIBCO ActiveSpaces is ideal for enterprises that handle a large amount of data or have a high volume of I/O activities per second. ActiveSpaces provides horizontal scalability, where you have the flexibility to segregate the data across a group of computers. For example, if you have 25000 operations per second, they can be divided across 10 computers that handle 2500 operations per second. Or, if your enterprise needs to store 10 TB of data, it can be distributed across 5 computers that contribute 2 TB each to store the data.

What Is ActiveSpaces?

ActiveSpaces uses the concepts of grid computing to provide a scalable, distributed, and durable data grid. The data grid serves as a system of record to store terabytes of data in an enterprise. ActiveSpaces provides a fast, consistent, and fault-tolerant system that supports a high rate of I/O operations in a scalable manner.

For more information about the ActiveSpaces concepts, see the Grid Computing in ActiveSpaces section.

Benefits of ActiveSpaces

ActiveSpaces offers many advantages as compared with a traditional RDBMS.

The following list highlights the benefits of using ActiveSpaces:

Accelerates performance and customer experience

Requires minimum investment because the system scales on low cost commodity or virtualized hardware

Updates data and systems continually and provides immediate and accurate response

Supports many hardware and software platforms, so programs running on different kinds of computers in a network can communicate seamlessly

Scales linearly and transparently when nodes are added ( An increase in the number of nodes produces a corresponding increase in the memory and processing power available to the data grid )

Enables smooth and continuous working of your application without code modification or restarts

Provides location transparency without the hassles of determining where or how to store data and search for it

Notifies applications automatically as soon as data is modified

(10)

Attributes of ActiveSpaces

These attributes of ActiveSpaces set it apart from traditional RDBMS.

Scalability

The biggest advantage of using ActiveSpaces is scalability. You can scale up the system horizontally to hold terabytes of data without bringing the system down. You also have complete administrative control over data redistribution.

System of Record

ActiveSpaces serves as a distributed, large-scale system of record that spans across nodes in an enterprise. The system of record uses the concepts of the traditional RDBMS such as table, rows, and columns. In addition, every node saves a portion of the data locally in a fault-tolerant and durable manner.

Data Caching

Nodes in ActiveSpaces can temporarily hold data in memory for faster read and write operations before it is saved to the disk.

Faster Access to Data

ActiveSpaces support queries and indexes that improve performance. Queries run faster because data is cached in-memory. Queries in ActiveSpaces are a subset of the SQL language. The filtering and indexing capabilities offered by ActiveSpaces expedite the execution of queries.

TIBCO FTL® for Secure Communication

In ActiveSpaces, data is cached in-memory. Thus, queries run faster. The filtering and indexing capabilities add to its efficiency. Query syntax in ActiveSpaces uses a subset of the SQL language.

ActiveSpaces uses the high-speed TIBCO FTL® messaging software for these key tasks:

Communication between application programs and the data grid. Presenting the data results of queries to application programs

Internal communication among data grid component processes

Configuration, monitoring, and management of data grid components

For the versions of TIBCO FTL that are compatible with TIBCO ActiveSpaces, see the readme.txt file.

High-Performance ACID-Compliant Data Grid

The data grid provides atomicity, consistency, isolation, and durability (ACID) of data. ACID- compliance is achieved by using transactions and concurrency control across multiple tables.

Transaction Isolation

A transaction comprises a set of operations that can modify the content of the data grid. Owing to transaction isolation, an ongoing transaction does not affect the queries that are being run on the content. For example, if another system is trying to read a row that you are trying to modify, that system either gets the data before the modification or it gets the data after the modification. The system does not get partially committed transactions. Even if the transaction is distributed over a network that involves multiples rows and tables, transaction isolation ensures that there are no uncommitted reads (dirty reads).

To achieve the highest level of isolation, pessimistic locks are used. This lock isolates a resource from the time a transaction begins until the transaction is completed. This blocks any operation that could violate database consistency or isolation. Transactions take care of rolling back partially committed transactions.

Easy-to-Use APIs

(11)

ActiveSpaces provides tools for data definitions that are akin to the SQL language. You can also define how data is distributed across a configurable number of nodes. The support functions in the API are easy to use. You can use the functions to retrieve metadata information about the data grid, a specific table, or a result set.

Real-Time Push Events

ActiveSpaces provides real-time push events over the network to servers and client applications to change the data grid. Table listeners receive data change events through callback notifications.

Cloud Ready

It is easy to deploy TIBCO ActiveSpaces on cloud, on-premises, or hybrid environments. You can easily build TIBCO ActiveSpaces into microservices with container deployment products such as Docker.

(12)

Redesigned from the Ground Up

Since the 3.0 version of ActiveSpaces, ActiveSpaces software is completely redesigned and

reimplemented to make it more user-friendly for both end users and administrators. ActiveSpaces 3.x is not backward compatible with the earlier versions of the product. ActiveSpaces 3.x is faster because it relies on TIBCO FTL for the underlying communication.

ActiveSpaces 3.x and later use the terminology of a traditional RDBMS. See the Comparison Matrix.

(13)

Grid Computing in ActiveSpaces

ActiveSpaces uses grid computing to bring together computers in your network that can contribute their processing power, memory, and storage to solve a complex problem. ActiveSpaces uses grid computing concepts to store and process the contents of a data grid.

What Is a Data Grid?

ActiveSpaces stores data in data grids. In a data grid, data is stored in the form of tables. A data grid is equivalent to a database of a traditional RDBMS.

Tables

A table comprises multiple rows that are spread out in the data grid. The tables are similar to the tables in a traditional RDBMS, made of rows and columns. Unlike the traditional RDBMS where all the data in the table reside on one computer, a data grid segregates the table row-wise and stores the rows in different ActiveSpaces processes called nodes.

Rows

Like the traditional RDBMS, a row comprises a set of columns and is uniquely identified by the primary index. A row becomes the unit of measurement for the data grid. Rows are distributed across the data grid. When scaling up, the data grid controls where a newly added row must be stored.

Columns

A row is made of a collection of columns. Every column uniquely identifies a piece of information.

Every column has a type and a value associated with it. For example, the Employee Name column is of data type String and has the value "Joe Smith".

Primary Index

Uniquely identifies a row in a table. It is equivalent to a primary key in a traditional RDBMS. You can have more than one column that forms a primary index.

Secondary Index

Is similar to a primary index but can refer to multiple rows in a table. A secondary index comprises one or more columns of a table and is used to efficiently retrieve the rows of a table by reducing the number of rows scanned for retrieval by queries. Without a secondary index, this would involve a full table scan to identify which rows match the query. With a secondary index, additional space is used to help speed up the query and quickly identify matching rows without a full-table scan.

Supported Data Types

A column can be of the following data types:

long

double

string

datetime

opaque

How Is the Data Stored in a Data Grid?

Unlike traditional RDBMS, a data grid is not stored in one place. An ActiveSpaces data grid leverages the storage capacity and computing power from multiple computers.

To understand how the data is stored, you must first familiarize yourself with the following concepts:

Nodes

(14)

A node is an ActiveSpaces process running within a computer. The node holds a portion of the data forming the data grid both in memory and on disk. The smallest unit of data held by a node is a row.

Other than storing data of a row, the node is also responsible for handling requests to read or update the row. As a result, the data spanning across a group of nodes collectively form a data grid.

Nodes can be run from a physical computer, a virtual machine, or a Docker container.

Persistence on Nodes

ActiveSpaces supports the Shared Nothing mode of persistence where every node saves its data locally to the disk.

Copysets

Copysets are logical grouping of nodes such that a portion of the data is shared uniformly by all the nodes that form a copyset. This ensures fault tolerance. Every node in the copyset, also known as the replica, has an identical copy of the data. For example, assume that a row (R1) comprises employee name, employee ID, and department. There are nodes, N1, N2, and N3 in copyset1. N1, N2, and N3 store identical copies of R1. When you add new data or request for an update on a row in a copyset, the update is written to all the nodes in the copyset before acknowledging the success of the

operation. Keeping the nodes of a copyset on different computers helps prevent data loss during system failures.

Copysets help you scale your data horizontally. When you add a new copyset to a grid, you can redistribute the existing data to the new nodes of that copyset, thereby distributing the load on the grid with the help of the newly added copyset.

The following image is a logical diagram showing how rows of three tables are distributed across two copysets in the data grid.

Rows Distributed Across Copysets

In the following image, the rows of a table are broken down into four sets (each owned by a different copyset). The nodes running in a given copyset are identical replicas of each other.

(15)

How One Table is Distributed in a Data Grid with Four Copysets and Three Nodes

To understand more about sizing a copyset and a data grid, see Sizing Guide.

Primary Node

When a copyset has more than one node in a copyset, one of the nodes is the primary node, which stores data and provides read access. The other nodes in the copyset are secondary nodes that store backup copies of the data. The key role of the primary node is to interact with the proxy process. The primary node receives the client operation and replicates it to the other nodes in the copyset. The client operation is applied in parallel at the primary node and all secondary nodes. The primary node is responsible for sharing the result of the request with the proxy.

If the primary node goes down for some reason, one of the other nodes in the copyset takes over as the primary node. Updates from client applications continue as usual without any loss of data because all of the data has been replicated from the original primary node to all of the other nodes in the copyset.

The nodes of a copyset must reside on different machines to ensure that one machine failure does not cause data loss.

Reasons for Using Multiple Nodes

There are several reasons for using multiple nodes:

Nodes in different copysets are created with the goal of scaling horizontally. Thus, multiple copysets are created, each with a slice of the data.

Nodes in the same copyset are created to provide multiple replicas for fault tolerance. These contain identical copies of the data.

In a production environment, you might decide to use multiple nodes for a combination of reasons. For example, you might choose to have two replicas per copyset and multiple copysets (say three) to scale horizontally. In this example, your environment would have a total of six nodes.

(16)

To sum it up, the data is stored in copysets as described in the previous sections. The copysets put together form a data grid.

Replication

To replicate data, you must configure the copysets in the data grid such that copyset_size is greater than 1.

The copyset_size configuration setting applies to all copysets in the data grid. When the

copyset_size is greater than 1, one node in each copyset acts the primary node that stores data and provides read access to that data. The other nodes in each copyset are secondary nodes that store copies of the data on the primary node. Every time data is written to the primary node, data is synchronized at the primary node and all secondary nodes in the copyset.

When the primary node of a copyset is down, one of the secondary nodes in the copyset takes over as the primary node. As each secondary node of the copyset contains copies of the same data that resides on the primary node, no data loss occurs and data grid operations continues as long as at least one node of the copyset remains running.

(17)

Processes in ActiveSpaces

The following processes are involved in creating, maintaining, and querying the data grid:

TIBCO ActiveSpaces Client Applications

Proxy

State Keeper

Realm Server

Node

TIBCO ActiveSpaces Client Applications

The client applications use the API libraries shipped with the product to build custom applications.

Client applications interact with the data grid by using the proxy process.

Proxy

A proxy is a mediator between a client request and the data grid. Based on the client request, the proxy identifies the primary node in a copyset and interacts with the primary node till the request is processed and shared with the client. You can have many proxies in a data grid.

Realm Server

A data grid is run inside a TIBCO FTL realm. A TIBCO FTL realm server supplies configuration data to the data grid components.

A client application accesses the data grid using the realm server URL. The realm server offers the following capabilities:

Stores data grid definitions

Communicates with the administrative tools to store and retrieve data grid definitions

Communicates with all the processes running in the data grid and updates the internal configuration if anything changes

Collects monitoring data from all processes Fault Tolerance in Realm Servers

In fault tolerant mode, only the primary realm server accepts updates on realm server configuration.

In case the primary realm server is down, the secondary realm server takes over as the primary. In a production environment, it is a good practice to have two realm servers running in separate host computers.

State Keeper

A state keeper runs internally in the data grid and tracks all the data in the data grid. Each state keeper saves the data locally on the disk. When you start the realm server, the state keeper receives the grid configuration information from the realm server. State keepers are responsible for the following functions:

Tracking and managing all the copysets in a data grid

Tracking the proxies in a data grid

Identifying a primary node in each copyset

Promoting one of the secondary node as primary, in case the primary node of a copyset goes down

Ensuring consistency as the data grid scales up Fault Tolerance in State Keepers

(18)

In fault tolerant mode, one of the state keepers is the primary state keeper. In case the primary state keeper is down, one of the secondary state keepers takes over as the primary. It is a good practice to have one to three state keepers running in a production environment.

A set of fault-tolerant state keeper processes protects this crucial information and ensures nonstop access to it. One of the state keepers is designated the primary state keeper and supplies this information to the proxies and copyset nodes. If the lead state keeper goes down, one of the secondary state keepers takes over as the primary. In a fault-tolerant set of three state keepers, a quorum of two state keepers must always be running to ensure data consistency in split brain scenarios. If a state keeper is restarted while a quorum is running, one of the running state keepers updates the state of the restarted state keeper. If the number of running state keepers falls below the quorum and the copyset state changes state of the copyset (for example, a node goes down),

operations on the data grid fails. When this happens, the remaining state keepers must be brought down and then all state keepers must be restarted.

Node

For more information on nodes, see Nodes.

Fault Tolerance in Nodes

To prevent data loss, you can run up to three nodes per copyset. Every node must have at least one backup node that has an identical copy of the data. For production deployments, TIBCO recommends using at least two nodes per copyset.

(19)

The Workflow for a PUT Operation

A client application initiates a PUT request. The request reaches the proxy. Like all the ActiveSpaces processes, the proxy identifies the data grid by the data grid name and the realm server URL. The proxy forwards the request to the appropriate primary node. The primary node handles the processing of the data. After all the secondary nodes are updated with the changes, the result is returned to the proxy and the proxy then shares the result with the client application. The realm server and the state keeper run outside of the operation datapath.

The Workflow

(20)

Checkpoints

TIBCO ActiveSpaces checkpoints provide the ability to save the state and data of a running data grid. A checkpoint can then be used to restore a complete data grid on the same computer, to move the entire data grid, or to replicate a data grid to another data grid for disaster recovery.

The data collected by a checkpoint is guaranteed to be logically consistent across the entire data grid. A checkpoint does not contain the data from a partially committed transaction.

On creation, ActiveSpaces checkpoint performs the following activities:

The primary realm server’s database is saved.

The configuration of the data grid in the realm server is saved.

Each state keeper's internal governing state information is saved.

The relevant files needed to restore each node of a data grid are saved in the checkpoints subdirectory of each node's data directory.

Creating a checkpoint fails in the following scenarios:

A realm server is not reachable (for example, if the primary and backup realm servers are down).

A quorum of state keepers is not running.

Checkpoint Types

A manual checkpoint is created manually using the tibdg administrative tool and a periodic checkpoint is created automatically by configuring the data grid to create periodic checkpoints.

Manual Checkpoints

Initiated using the tibdg administrative tool.

Must be manually removed by using the tibdg tool.

Are given a unique name to help with checkpoint identification Periodic Checkpoints

Configured at the grid level.

Taken at a fixed interval while the data grid is running.

Removed based on a retention setting.

Cannot be given a name.

(21)

Disaster Recovery Concepts

Disaster Recovery is a situation where a set of running systems must be replaced by another set of running systems due to failure, damage, loss of connectivity, or other traumatic event. Disaster Recovery is a large scale event and is not intended to replace fault tolerance where the failure of individual components can be recovered or otherwise accommodated without stopping a running system.

In a disaster recovery scenario, running systems are not expected to seamlessly or automatically failover to backup or alternative systems. Recovering from a disaster scenario implies a substantial and potentially large scale system stoppage and a restart of an entirely new instance of the previously running system. It is not intended for short term outages such as normal maintenance operations.

After a disaster recovery plan has been activated, it is anticipated that the replacement systems are in operation for days, weeks, or even months depending on the severity of the disaster.

ActiveSpaces supports disaster recovery by creating a gridset.

Gridsets

The purpose of gridsets is to help set up the disaster recovery process. A group of data grids that share the same set of consistent data is referred to as a gridset. Each gridset has a name, which exists in the same namespace as grid names (For example, you cannot have a grid named “prod” and a gridset named “prod”). Each gridset also has a single primary grid. Within a gridset, there is a single authoritative schema, which is owned by the primary grid of the gridset.

Each grid might belong to at most one gridset. Grids in a gridset do not need to have the same

replication factor, number of copysets, or number of state keepers, but care must be taken to ensure that a mirror grid has sufficient capacity to handle the required load if administrators choose to make it the primary grid.

Types of Grids

The following list differentiates the types of data grids in ActiveSpaces:

Stand-Alone Grids

Any grid that does not belong to a gridset is a stand-alone grid. All operations included in the ActiveSpaces API are permitted on stand-alone grids.

Primary Grids

Any grid that is listed as the primary of a gridset is a primary grid. All operations included in the ActiveSpaces API are permitted on primary grids. Primary grids are responsible for supplying mirror grids with data upon request.

Mirror Grids

Any grid that is included in a gridset, but is not currently the primary of that gridset is a mirror grid.

Only read operations are allowed on mirror grids (For example, GET, queries, iterators, and so on).

Read operations are executed against the most recent checkpoint that has been mirrored from the primary grid. Mirror grids are responsible for requesting updates from the primary grid.

Mirroring

The process by which data is copied from one grid to another is called mirroring.

Data mirrored between grids is a logical copy of the user data available on the primary grid and is copied to the mirror grid only if a checkpoint has been taken at the primary grid (either a user-created

(22)

checkpoint or a periodic checkpoint causes mirroring). Until all copysets in the primary grid have confirmed that their data has been mirrored, data in the checkpoint being mirrored is not available.

Bulk Mirroring

If a mirror grid has no previous checkpoints available or if the primary grid has insufficient information to identify the rows that changed since the last mirrored checkpoint, bulk mirroring is used. During bulk mirroring, all rows present in the checkpoint being mirrored are sent to the mirror grid.

Incremental Mirroring

ActiveSpaces attempts to minimize the data sent between grids whenever possible by using incremental mirroring. When a mirror grid has a previous checkpoint as the starting point, and the primary grid has sufficient information to identify all rows that changed, only rows that were updated or deleted are sent to the mirror grid.

(23)

Best Practices for a Development Environment

In many enterprises, programmers act as administrators during the development and test phases of a project.

To develop and test application programs that use ActiveSpaces software, deploy the following processes:

Processes Numbers

Realm server One process

State keeper One process

Node One process

Proxy One process

Your application programs Your application programs appropriate

In a development environment, you can run all of these processes on the same host computer.

Sample Scripts

Refer to the TIBCO_HOME/as/<version>/samples/readme.md before using the sample scripts.

The following scripts are available:

TIBCO_HOME/as/<version>/samples/scripts/as-start defines a simple data grid and starts its component processes.

TIBCO_HOME/as/<version>/samples/scripts/as-stop stops those component processes.

Sample Docker Environment

The docker-compose sample environment is provided to demonstrate how to deploy an ActiveSpaces data grid in Docker. For more information, see TIBCO_HOME/as/<version>/samples/docker/

README.md.

When you are ready to explore using ActiveSpaces to scale your data beyond one machine, you can create additional copysets and nodes in the grid and run the nodes on separate machines.

(24)

Best Practices for a Production Environment

To use ActiveSpaces software in a production environment, deploy the following processes.

Processes Minimum Required Description

Realm server One primary realm server In fault tolerance mode, you can run a backup realm server.

The primary realm server and its backup must run on two separate host computers.

State keeper Three state keeper processes To ensure high availability during a network partition or hardware failure, each state keeper process must run on a separate host computer. Not doing so might result in grid- wide data loss. At any given time, you must maintain a quorum of running state keepers. If you want to run more than one state keeper, configure three state keepers and make sure you have at least two running state keepers.

Node Two nodes per copyset For greater data protection you

can run three nodes per copyset. In a fault tolerant setup, if there are more than one node, one node acts as a primary node and the other nodes are secondary nodes.

Additional copies can become expensive in two ways: Increasing the node count by one adds one complete copy of all the data.

Every node process must run on a separate host computer.

Usually this requirement determines the number of host computers you must maintain.

For example, a data grid with three copysets and two nodes per copyset requires six nodes, all on separate hosts.

Increasing to three nodes per copyset would require nine nodes, all on separate hosts.

(25)

Processes Minimum Required Description

Proxy One proxy process You can run additional proxies

to increase the capacity for client programs and to improve response time. For best results, run proxy processes on a separate host computers.

Your application programs Run as many processes as appropriate.

Components Sharing a Host Computer

You can reduce number of host computers in a production environment by running more than one component per host.

For example, you can run a realm server, a state keeper, a node, and a proxy, all on one host. (In contrast, do not run two state keepers on the same host.) For effective fault tolerance, run the nodes of each copyset on separate host computers.

Combining component processes on a host computer increases the risk that a single point of failure on the host could disrupt all those processes simultaneously. Assess the risk tolerance of your enterprise.

(26)

Programming Concepts

These concepts and definitions pave the way to a more detailed understanding of applications programming with ActiveSpaces software.

Data Grid

A distributed database, including all the component processes that implement it.

Connection

An application program connects to a data grid. A connection object is analogous to a traditional database connection.

Session

An application program interacts with a data grid through one or more session objects. Each session insulates the data grid interactions within one program thread from the interactions in other threads.

A session can be transacted or non-transacted. Get, put, and delete operations in a transacted session occur within a transaction, and do not take effect until the program explicitly commits the transaction.

Table

An ActiveSpaces data grid organizes and presents data as rows in tables, like a traditional relational database.

Administrators define tables within the data grid.

Programs can get a row from a table, put a row into a table, and delete a row from a table.

Programs can query a table for the rows that match a filter.

Primary Key

Each table distinguishes a primary key, or more briefly, the key.

Values of the key are unique: no two rows in a table have the same key value.

Secondary Index

A table can have zero or more secondary indexes, which facilitate queries. The tibdg tool can be used to create a secondary index on a table by using the index create command.

Iterator

An iterator presents an application program with the results of a data grid query, one row at a time.

Structuring Programs

These steps outline the main structural components of most application programs that access a ActiveSpaces data grid.

Procedure

Task A Initialize ActiveSpaces Objects 1. Initialize the ActiveSpaces library.

2. Connect to a data grid.

3. Create session objects.

See Session.

4. Open table objects.

See Table.

(27)

Task B Data Grid Operations

5. Access the data grid using methods of a table object.

See Table Operations.

Task C Clean-Up 6. Close table objects.

7. Destroy session objects.

8. Close the data grid connection object.

Session

Programs use sessions to insulate data grid operations within program threads and to unite operations within transactions.

An application program usually creates only one connection to a data grid. From that connection object, a program can create one or many session objects.

Session objects are lightweight.

Sessions and Threads

It is good practice to create a separate session for each program thread that accesses the data grid.

Programs must use sessions in a thread-safe way. That is, two threads must not simultaneously access the same session. Violating this constraint can yield unpredictable results.

Sessions and Transactions

Each session can be either transacted or non-transacted. Programs determine this semantic property when creating each session.

In a transacted session, all get, put, and delete operations occur within a transaction, which is bound to the session. The session implicitly starts the transaction. Programs explicitly call the session's commit and rollback methods. (As these methods complete, they automatically start a new transaction in the session.)

If a program operates within several open transactions simultaneously, use a separate session and thread for each transaction.

In a non-transacted session, put, get, and delete operations are immediate: that is, when the method completes, the effect of the operation is also complete.

However, operations in a transacted session can block operations in a non-transacted session. For further explanation, see Transaction Isolation.

Only get, put, and delete operations are affected by a transacted session, the corresponding commit, and rollback APIs. Other commands such as iterators, queries, DDL updates, and so on do not have different behavior when running on a transacted session versus a non-transacted session.

Sessions and Defining Tables

Once a session has been created, it can be used to define tables programmatically. For more information, see "Defining a Table Using SQL DDL Commands" in the TIBCO ActiveSpaces Administration guide.

(28)

Table

Table objects represent data grid tables within an application program.

Tables and Sessions

A program opens a table object by calling a session's open table method. The program can use the table object's methods to operate on the corresponding table in the data grid.

Opening a table object does not lock the table in the data grid.

If the session is transacted, then table operations occur within the session's transaction. Within a transaction you can interact with multiple tables.

If the session is non-transacted, then table operations are not transacted.

Table Operations

Tables support the following data grid operations:

Put a row into the table

Get a row of the table

Delete a row from the table

Create an iterator to present the results of a table query Primary Key

Every table requires a primary key, which can consist of one or more columns. The data type of key columns must be either long or string.

Examples of primary keys include employee number, invoice number, or MAC address.

The value of the primary key always remains unique across all the rows of the table. That is, database operations can never create two rows with the same key value; instead, they overwrite data in the existing row with that key value.

Creating Tables

Before a program can use a table or its rows for operations, an administrator must first create the table within the data grid. See "Defining a Table" in TIBCO ActiveSpaces Administration.

Put

The put operation adds a row to a data grid table.

Before calling the put method, your program must first create a row object and set its columns with values.

The row object must contain a value in all columns of the primary key. The value of the key is unique. If the table already contains a row with that key value, then the put operation replaces the existing row within the table.

All other columns contain or omit values.

Get

The get operation retrieves a row of a data grid table.

Before calling the get method, your program must first create a row object and set a value in all columns of the primary key. The value of the key is unique.

(29)

If the table contains a row with that key value, then the get operation returns the contents of that row in a new row object. If the table does not contain a row with that key value, then the method returns null.

Delete

The delete operation deletes a row from a data grid table.

Before calling the delete method, your program must first create a row object and set a value in all columns of the primary key. The value of the key is unique.

If the table contains a row with that key value, then the delete operation deletes that row from the table.

If the table does not contain a row with that key value, then the method returns without changing the table.

Create Iterator

The create iterator operation submits a query on a data grid table and creates an iterator object to present the query results.

Supply a filter string as an argument. This filter specifies the content of the query: that is, criteria for selecting a subset of rows from the table.

An iterator object receives batches of matching rows from the data grid. The prefetch property of the iterator determines the batch size.

The iterator consistency property allows the client to choose between the following snapshot consistency level:

global snapshot (default level)

This level ensures that none of the results in the snapshot are from a partially committed transaction, although more coordination is required when taking the snapshot.

snapshot

This level makes no guarantee about the results in the snapshot containing partially committed transactions but requires less coordination when taking the snapshot.

The iterator object presents the program with the individual rows that match the query, one at a time.

The program must close the iterator object to release resources within the data grid component processes. For more information, see Query Behavior.

An implicit timeout limits the lifespan of iterator objects. Program calls that access an iterator after that timeout elapses return an error.

Avoid queries that result in full table scans, which can be resource-intensive and time-consuming. For more information, see Efficiency of Filters.

Metadata

Application programs can use the metadata APIs to retrieve metadata information about the data grid or a specific table.

The outline of the process involved in retrieving metadata information:

1. Your program needs to get a reference to valid connection object if you don't already have one.

If the program only has a reference to a single table, it must use accessor methods to get the reference to the parent session and then the parent connection.

2. Call the get grid metadata API for the connection.

This makes a network request to the data grid for the metadata information (including all tables) at that point in time.

(30)

The program destroys the grid matadata after the request is completed. If updated grid metadata is needed, the existing metadata object must be destroyed. Another request must be made using the same API to retrieve the latest metadata information.

3. The table metadata object and any strings retrieved from it do not need to be destroyed since they are all owned by the grid metadata object and are destroyed as part of its destroy method.

Grid Metadata Object

The grid metadata object contains the array of table metadata objects that exist in the data grid at the time the request was made.

A single metadata object for a table is retrieved using its table name. If the names of the tables are not known to the program, the grid metadata object provides a method to get the array of all table names, which in turn can be used to get a single table metadata object as described previously.

Similarly, a column name or index name is used to get information about that column or index from a table metadata object. If the column names or index names are not known to the program, the table metadata object provides a method to get the array of those names.

Additionally, a method to get the primary index name exists in the table metadata object and is used to determine the table's primary index and then the column (or columns) that make up that index (often known as the primary key(s) of the table).

Querying a Data Grid Table

Application programs have the following options to query tables in a data grid.

The following options are available:

Table iterator

A table iterator is used when you have created a table object and need to iterate over all of the rows or a specific subset of the rows in the table.

You can create an iterator to query the contents of the table and then iterate over the query results.

The contents of the query results for the iterator are controlled by the use of a filter string when the iterator is first created. Using a NULL filter string when creating the iterator returns all of the rows of the table.

To restrict the query results to a subset of the rows of a table, provide a non-NULL filter string which contains a filter expression as described in Query Language.

For more information, see Table Iterator.

Session statement

A statement is created from a session object and is not tied to any one particular table. A session statement is created using a SQL SELECT string and the table for the query is determined by parsing the SELECT string. The SQL SELECT string supported has the form:

SELECT * FROM table_name WHERE filter

The syntax for a filter string is the same as for table iterators with the exception that parameter markers using '?' are supported. Parameter markers allow you to optimize the scenario where you want to run the same query several times with different values in a filter expression. For example,

SELECT * FROM mytable WHERE key > ?

Before a query is executed for the session statement, you must provide values for all of the

parameter markers in the SQL SELECT string by calling the appropriate tibdgStatement_SetXXX

method for each parameter value depending upon the data type of the value.

The parameter number is required when calling the SetXXX method with the first parameter in the statement being numbered 1.

For more information, see Session Statement.

(31)

The advantages of using a session statement over a table iterator:

The same query can be run multiple times using a single statement object.

You can use parameter markers to vary the results of your query each time it is run.

You do not have to create a table object before you can query a table.

Statements and result sets must be created on a non-transacted session because they do not interact with the commit and rollback APIs on a transacted session.

Table Iterator

Application programs can query a data grid table to retrieve a subset of its rows.

Procedure

1. Formulate a filter string that specifies the query, that is, the constraints that determine the rows that the query retrieves.

See Query Language.

2. Call the create iterator method of the table object.

Supply the filter string as an argument.

The API library sends the filter string to a proxy process, which retrieves matching rows from the data grid, and funnels them to the application through the iterator object.

3. Iterate over the resulting rows.

The iterator presents the rows in an order that is deterministic and repeatable, however, the application program cannot influence that order.

4. Close the iterator object.

Closing the iterator releases the resources that hold the query results within the component processes of the data grid.

Session Statement

A session statement is created using a SQL SELECT string and the table for the query is determined by parsing the SELECT string.

Procedure

1. Formulate a SQL SELECT string for the query you want to use to retrieve rows from a table.

2. Call the create statement method of the session object.

Supply the SQL SELECT string as an argument.

3. Call the statement methods to set the parameter values for the query.

4. Run the query by calling the statement method to execute the query.

A result set object is returned which contains the rows which satisfy the query.

5. Read the rows of the query result by looping and calling the following methods until no more rows are available to read:

a) Call the has next method of the result set to see if there is a row to read.

b) Call the next method of the result set to read the next row.

6. When all of the rows have been read, close the result set object.

7. Optionally set different parameter values and re-execute the query by repeating 3 - 6.

8. Close the statement object.

(32)

Query Language

ActiveSpaces software supports a restricted subset of the SQL query language.

To specify a query, supply a filter expression. For example:

column_name > 100

Queries return all columns of a table rather than a subset of specific columns. That is, all queries implicitly begin with SELECT * FROM table_name WHERE. Nonetheless, programs do not specify this string, they specify only the filter that would follow the WHERE keyword.

Filter expressions are not case sensitive. Query evaluation converts keywords and column names to lower case before evaluation.

Filter Expression Syntax Reference

Query filter expressions have the following form.

[ NOT ] column operator value { [ AND | OR ] [ NOT ] column operator value }*

The following sections describe further details of filter expression syntax and semantics.

Column

column can be the name of any column defined in the table.

Operator

operator can be any operator in the following table:

Operator Description

=

==

IS

!=

<>

IS NOT

>

<

>=

<=

ISNULL IS NULL

Tests that the row does not contain a value in this column.

(33)

Operator Description NOTNULL

NOT NULL IS NOT NULL

Tests that the row contains a value in this column.

BETWEEN value_1 and value_2 Requires two values, separated by the keyword and. The range includes the end values.

IN ( value [, value ]* ) Requires a set of values, separated by commas, surrounded by parentheses.

Value

value can be any value of the same data type as the column's data type.

Surround string values in single quote characters: for example, 'My Value'. Conjunctions

AND joins multiple conditions. The overall condition is true if and only if every individual condition is true.

OR joins multiple conditions. The overall condition is true if at least one of the individual conditions is true.

Negation

NOT reverses the boolean value of a logical expression that follows it. For example, you can use the operators NOT BETWEEN and NOT IN.

You can also precede an operator clause with NOT, for example:

NOT column operator value

Order of Operations

Order of operations is similar to SQL. NOT takes precedence over conjunctions. The conjunction AND takes precedence over OR.

You can use parentheses to group expressions, overriding that order.

Performance

See Efficiency of Filters.

Unsupported SQL Operators

TIBCO ActiveSpaces® does not support the following SQL operators and clauses.

Unsupported GLOB

UNIQUE

(34)

Unsupported

EXISTS ORDER BY LIMIT

Special Characters in Column Names

Column names with special characters require special treatment.

It is good practice for administrators to define column names that follow the SQL identifier rules. (See

"Column Names" in TIBCO ActiveSpaces Administration.)

Nonetheless, in some situations, a table might contain non-standard column names. For example, a table copied from a legacy data base might have columns with names that contain a space character.

If you must refer to non-standard column names in a filter expression, surround the column name with any of the following escape characters:

Technique Example

Single quotes 'column name'

Double quotes "column name"

Escaped double quotes \"column name\"

Square brackets [column name]

Back ticks (accent grave) `column name`

Efficiency of Filters

The efficient use of queries depends in part upon the way you construct filter expressions, and in part upon the way the administrator constructs table indexes. Programmers and administrators can use these rules of thumb to help promote efficiency and high performance.

Programmers: consult your data base administrator for information about the definition of indexes.

Keys and Indexes

Rule of Thumb: Construct filter expressions in which every conjunct refers to a key or index.

A filter expression that does not refer to a key or index results in a full table scan, which is inefficient.

A compound filter expression, which combines conjuncts using AND or OR, results in a full table scan for each conjunct that does not refer to a key or index.

Left-Most Columns

Rule of Thumb: Construct filter expressions that reference the left-most columns of an index using the predicates =, ==, <=, >=, <, >, IN, or BETWEEN.

When an index includes more than one column, the administrator has defined them in a specific order:

from left to right. Queries with filter expressions that refer to the left-most column of an index can be more efficient than filter expressions that skip the left-most column and instead refer only to columns to

(35)

its right. Similarly, queries with filter expressions that refer to the left-most two columns can be even more efficient. Queries can achieve maximum efficiency when they use filter expressions that refer to all the columns of an index.

In contrast, omitting the left-most column from the filter expression results in a full table scan, which is the least efficient.

The order in which the columns appear within the filter expression does not affect efficiency. Only the order of columns when defining the index matters.

Avoid Left-Most NOT

Rule of Thumb: Do not construct filter expressions that reference the left-most columns of an index using the predicates NOT, IS NOT, !=, <>, IS, ISNULL, IS NULL, NOTNULL, NOT NULL, and IS NOT NULL.

In contrast to the rule of left-most columns, filter expressions that reference the left-most columns with these operators have the opposite effect: to guarantee a full table scan, which is the least efficient.

The presence of the predicate IS in this list could be counterintuitive. See the next rule.

However, this rule does not imply that predicates in the NOT family are always inefficient. For example, a query can still be efficient if it obeys the left-most columns rule and also tests columns further to the right using NOT. For example, if the administrator defined an index on the columns

lastname and firstname, then this filter expression can be efficient:

lastname='Smith' and firstname IS NOT 'Dan'

Avoid IS

Rule of Thumb: Use the predicates = or ==, rather than IS.

Even though the predicates =, ==, and IS are semantic synonyms, the behavior of IS differs dramatically. Namely, IS guarantees a full table scan, which is the least efficient.

Bound Ranges from Both Ends

Rule of Thumb: When using the predicates > or >=, which specify a lower limit on a column's value, also include the opposite predicates, < or <=, to specify an upper limit on the same column.

A query searches an index from its lower limit to its upper limit. If you omit the upper limit, the query continues searching to the end of the index. If you omit the lower limit, the query begins with the first row of the index.

Query Behavior

Queries remain consistent until the application program closes the iterator object. That is, a query captures a snapshot of the matching rows, and the iterator presents the rows of the snapshot.

Subsequent changes to the table, such as put or delete calls, do not affect the snapshot.

After the application program closes the iterator object, the data grid can delete the snapshot and release its resources.

Snapshots are inexpensive if the table data changes slowly, but can become expensive if the data changes rapidly. To limit memory growth within data grid components, administrators can limit the number of concurrent snapshots.

(36)

Transaction Isolation

ActiveSpaces enforces the highest level of transaction isolation: serializable. As a result, serialization can delay database operations as transactions wait for other transactions to commit or roll back.

ActiveSpaces uses a pessimistic transaction model: blocking any operations that could violate database consistency or isolation. For example, when an operation in transaction A refers to table row R, and an operation in a second transaction B also refers to row R, then the second operation blocks until

transaction A either commits or rolls back. Similarly, an operation within a transaction can block operations in non-transacted sessions.

Listeners

Listeners are used to monitor events corresponding to changes in a table.

Using listeners you can either monitor the contents of a specific table or a subset of rows in a specific table.

For example, your application could track the table containing customer data. Specifically, track the activity in a particular geographic region; to know when new customers are added or deleted, or when customers move to another region.

An event indicates that the data in a table has changed.

An event is of type:

PUT: Indicates new data has been added to the table.

PUT events have a current value which is a copy of the row that was PUT into the table. If the PUT operation replaces an existing row, the event additionally has a previous value, which is a copy of the row prior to the PUT operation.

DELETE: Indicates that a row has been deleted from the table. DELETE events have a previous value, which is a copy of the row prior to the DELETE operation.

ERROR: Indicates that something has happened in the system that means the flow of events are disrupted. ERROR events have an error code and an error description. The application must destroy the table listener. Depending on the error code, it might or might not make sense for the application to recreate the table listener. The ActiveSpaces API documentation provides more details on the specific error codes that are possible.

EXPIRED: Indicates that a row has expired. When rows are removed from a table due to expiration, table listeners on the table receive EXPIRED events, if the expired rows match the listeners’ filters.

When creating a table listener you must specify the table that is the source of the events of interest and a callback function that is invoked when events are delivered to the application. The callback function executes in a thread that is internal to the ActiveSpaces client library and is expected to complete in a timely fashion. The client library retains ownership of the events and the rows they contain so any data that is required outside of the callback must be copied and managed by the application itself.

Filtering Events

When the table listener is being created you can optionally specify a filter string to further narrow the scope of events received.

The filter string specifies the criteria that events must match in order for them to be delivered to the table listener. The filter is equivalent to the WHERE clause of an SQL query and is applied to both the current and previous values for the row that has changed. If either the current or previous value matches the filter, then the event is delivered to the listener.

For example, in a table containing customer data with a column called state, the filter state = “CA”

limits the events delivered to only those involving customers in California.

(37)

Sizing Guide

Usually, the total data set is partitioned horizontally into copysets where each copyset holds a fraction of the data. Since a copyset in production typically includes more than one node for redundancy (where each node is an exact replica of the data in that copyset), let us start with a simplifying assumption that the data resides on a single node per copyset.

The size of a copyset is determined by the following factors:

The number of rows

The size of a row in bytes (which is determined by number of columns, the column data types, and the actual values placed in each column)

Indexes

ActiveSpaces provides an Excel spreadsheet that can be used to calculate the size of your data grid.

Download the ActiveSpaces Sizing Guide from https://docs.tibco.com/products/tibco-activespaces- enterprise-edition-4-0-0. Please review the spreadsheet for information about how the number of bytes in the example used in this guide were determined.

Example of a Sizing Calculation

Consider a scenario where the purchasing details of a customer are stored in the purchase table.

There are around five million rows in this table with the following schema:

Name of the field Data Type

An estimation of the disk space consumed (in bytes)

customer_id (Primary Index) Long 8

purchase_id Long 8

customer_first_name String 10

customer_last_name String 8

customer_post_code Long 8

payload String 10K

Size of Rows Without Secondary Indexes

Size of Row (in bytes) = 8 + 8 + 10 + 10 + 8 + 10K = 10,044

Estimated Internal Overhead + Primary Index (Long) Overhead per Row = (32 + 27) = 59 bytes Size of Row Including Overhead = 10,103

Size of All Rows with No Secondary Indexes = 5M x 10,103 = 50.5GB Size of Rows with Secondary Indexes

Index Overhead per Row = 45 bytes (might vary depending on actual values being indexed) purchase_id_idx = 5M x (45 + 8) = 0.27GB

customer_full_name_idx = 5M x (45 + 10 + 10) = 0.33GB Size of Secondary Indexes = 0.6GB Total Size In Bytes

參考文獻

相關文件

In JSDZ, a model process in the modeling phase is treated as an active entity that requires an operation on its data store to add a new instance to the collection of

Union of green and round: garden hose grass peas ball pie grapes Intersection of green and round: peas grapes.

• The memory storage unit holds instructions and data for a running program.. • A bus is a group of wires that transfer data from one part to another (data,

Responsible for providing reliable data transmission Data Link Layer from one node to another. Concerned with routing data from one network node Network Layer

We showed that the BCDM is a unifying model in that conceptual instances could be mapped into instances of five existing bitemporal representational data models: a first normal

Establish the start node of the state graph as the root of the search tree and record its heuristic value.. while (the goal node has not

The remaining positions contain //the rest of the original array elements //the rest of the original array elements.

• The memory storage unit holds instructions and data for a running program.. • A bus is a group of wires that transfer data from one part to another (data,