Other recommendations - A practical guide to realizing the seamless potential of storing and

Other than the previously discussed ways of performance tuning, the HBase cluster should be fine-tuned using a different kind of configuration for different types of use cases or workloads such as:

• Heavy write: Data written goes into the MemStore and are flushed to form new HFiles. These HFiles are compacted. As a best practice, flushing, compacting, or splitting should not happen too often as these processes increase the I/O, thus causing the slower cluster performance. Some recommendations are as follows:

° Keep the region size larger to avoid splits at write time ° Keep the HFile size larger to avoid compaction

• Heavy sequential reads—some recommendations are as follows:

° Higher block size to read more data per seek ° Avoid caching on table

• Heavy random reads: Effective use of the cache and better indexing will get higher performance. A few recommendations are as follows:

° Use a higher-block level cache and lower down the MemStore limit ° For better indexing, use the smaller block size

° Use bloom filters at column family level

In the case of mixed use of heavy read and write, all the performance tuning parameters should be given a serious look and would require multiple rounds of tuning to get the optimized configuration.

Troubleshooting

An HBase cluster does not run smoothly and expectedly sometimes, especially with bad configuration. This section covers the troubleshooting tools and techniques in brief for the HBase cluster running with ambiguous status. There are certain tools that are used while troubleshooting the HBase cluster. The following are some of the important tools that are preferred to be known to the administrators:

• jps: This tool shows the Java processes running for the current user.

$ $JAVA_HOME/bin/jps

• jmap: This tool is used to view the Java heap summary. For example, the following command shows the summary for the HRegionServer daemon's heap:

$ $JAVA_HOME/bin/jmap -heap 1812

• ps: This tool is used to view the occupied memory by the processes.

The following command uses the –rss flag to view sort processes in the descending order by their resident set size as:

$ ps auxk -rss | less

• jstat: This tool is used for monitoring the Java Virtual Machine. Run the following command to show the summary of the garbage collection statistics of an HRegionServer process running with the 1812 process ID and take this summary for every 1000 milliseconds:

hadoop@slave1$ jstat -gcutil 1812 1000

Apart from the preceding tools, there are many common errors that an administrator might encounter in a production environment. A few of them are discussed as follows:

Too many open files error: HBase runs on top of Hadoop that opens lots of files at the same time. Operating systems such as Linux define the limit (the default value is 1024) for file descriptors that any process might open. In case the user's open file count exceeds the OS-defined limit, the following error is visible:

java.io.FileNotFoundException: /usr/local/hadoop/var/dfs/data/current/subdir6/

blk_-34458031297234453 (Too many open files)

To fix this issue, increase the open file count for the user by adding the following property in the /etc/security/limits.conf file:

$ vi /etc/security/limits.conf

<username> soft nofile 65535

<username> hard nofile 65535

Also, add the following line to the /etc/pam.d/login file:

$ vi /etc/pam.d/login

session required pam_limits.so

Once done, log out and log in again as the user and restart the Hadoop and HBase clusters. The upper limit for files can be verified using the following command:

$ ulimit -n

Unable to create a new native thread error: The OS defines the limits for the use to execute the number of processes simultaneously. With a high load and lower value for nproc, the HBase cluster might throw an OutOfMemoryError exception as:

DataStreamer Exception: java.lang.OutOfMemoryError: unable to create new native thread

To fix this issue, increase the process execution count for the user by adding the following property in the /etc/security/limits.conf file:

$ vi /etc/security/limits.conf

<username> soft nproc 35000

<username> hard nproc 35000

Also, add the following line to the /etc/pam.d/login file:

$ vi /etc/pam.d/login

session required pam_limits.so

Once done, log out and log in again as the user and restart the Hadoop and HBase clusters. The upper limit for files can be verified using the following command:

$ ulimit -u

ZooKeeper client connection error: ZooKeeper defines the maxClientCnxns property that defines the number of concurrent connections any client might make to a

member of the ZooKeeper ensemble. This error usually occurs when running a MapReduce job over an HBase cluster. In the HBase cluster, region server acts as a ZooKeeper client, and if a region server's concurrent connection count exceeds the limit defined by ZooKeeper, the following error occurs:

java.io.IOException: Connection reset by peer

To fix this error, add/update the following property in the ZooKeeper configuration file (zoo.cfg) on every ZooKeeper quorum node:

$ vi $ZOOKEEPER_HOME/conf/zoo.cfg maxClientCnxns=100

Restart the ZooKeeper to apply the changes.

Summary

In this chapter, we looked at the HBase cluster administration techniques. We also discussed the different ways to monitor the HBase cluster starting from using the HBase monitoring framework and JMX to third-party tools, such as Ganglia and Nagios.

In the last section, we learned about the performance tuning areas that require considerations to get the optimized performance based on the workloads. This chapter also sheds some light on the HBase cluster trouble shooting.

Index

A

addColumn(byte[] family, byte[] qualifier) method 31, 42

addFamily(byte[] family) method 31, 42 add() option 29

administrative API about 85

data definition API 85 HBaseAdmin API 89 alter command 96

Apache Hadoop software library 6 application-managed approach 40

boolean hasFamily(byte[] c) method 86 boolean isBlockCacheEnabled() method 88 boolean isInMemory() method 88

boolean isMasterRunning() method 89 byte getVersion() method 93

C

close() method 43, 89 cluster

upgrading 121, 122 cluster consistency, HBase

consistency check 123 fixing, flags used 123 integrity check 123 cluster management

about 119

cluster, upgrading 121

CopyTable MapReduce job 125 data export tools 123

HBase cluster consistency 122 HBase cluster, starting 120 HBase cluster, stopping 120 HBase data import tools 123 node, decommissioning 121 nodes, adding 120

cluster monitoring about 127

HBase metrics framework 127 Collection<ServerName> delete '<table_name>', '<row_num>',

'column_family:key' 20

describe '<table_name>' 20 drop '<table_name>' 20

get'<table_name>', '<row_num>' 19 list 18

put '<table_name>', '<row_num>', 'column_family:key', 'value' 19 scan '<table_name >' 19

status 18 URL 21

compact command 97 compaction metrics

compaction queue size metric 129 compaction size metric 129 compaction time metric 129 comparison filters

Concurrent-Mark-Sweep GC (CMS) 138 Configuration getConfiguration()

Scan(byte[] startRow, byte[] stopRow) 41 Scan(byte[] startRow, Filter filter) 41 Scan(Get get) 41

CopyTable MapReduce job 125 count command 97

create, read, update and delete (CRUD operations) about 28

create operation 103 data, deleting 34-36 data, reading 31, 32 data, updating 33 data, writing 29, 30 delete operation 105

performing, with Kundera 101-105 read operation 103

update operation 104 custom endpoint coprocessor

building 84, 85 custom filters

pure custom filters 49 wrapper filters 49

column family methods 86 table name methods 86

alter 96

deleteColumn(byte[] family, byte[]

qualifier, long timestamp) method 34 deleteColumn(byte[] family, byte[]

qualifier) method 34

deleteColumns(byte[] family, byte[]

qualifier) method 34

deleteFamily(byte[] family, long timestamp) method 35

deleteFamily(byte[] family) method 34 deleteFamilyVersion(byte[] family, long

timestamp) method 35 delete operation 105

disable command 96

double getAverageLoad() method 92 drop command 96

E

enable command 96 endpoint coprocessor 84, 85 Entity Transaction 99 files types, for data storage

HFile 57

using, with Kundera 107, 108 utility filters 45

flush command 97

fully distributed mode, HBase installation 15

G

Ganglia URL 127-132 Ganglia, components

Ganglia meta daemon (gmetad) 131 Ganglia monitoring daemon (gmond) 131 Ganglia PHP web application 132

Get class

getCacheBlocks() method 33 getFamilyMap() method 33 getMaxResultsPerColumnFamily()

method 33

setTimestamp(long timestamp) method 36

get command 97

getDelegate method 107 GZIP 136

H

Hadoop 2.x 8

Hadoop Distributed File System (HDFS) 66 Hadoop ecosystem client

API documentation, URL 93 as data sink 71

as data source 70

as data source and sink 71-74 cluster consistency 122, 123 connection, establishing 27, 28 CRUD operations 28

data modeling 23-25 installing 8, 9

integrating, with MapReduce 66 MapReduce, running over 68-70 origin 6, 7

querying, with Kundera 106 securing 61

HBase data storage system 17 HBase Master 16

RegionServers 17 troubleshooting 139-141 ZooKeeper 16

HBase data storage system 17 HBase Master 16

master server metrics 128 Nagios 133

region server metrics 129 HBase replication

URL 61 HBase, securing

about 61

authentication, enabling 62 authorization, enabling 63, 64 REST clients, configuring 65 HBase shell

about 95, 96

data definition commands 96

data-handling administrative tools 97 data manipulation commands 97 HBase Version 0.98.7 8

HColumnDescriptor class 87

HConnection getConnection() method 89 HFile 24, 57-59

Hiveabout 116 URL 116

HTable class 28

indexing solutions for HBase approach 40 info metrics 130

interface definition (ID) 112 int getDeadServers() method 92 int getRegionsCount() method 93 int getRequestsCount() method 93 I/O metrics

FS read latency 130 FS sync latency 130 FS write latency 130

J

Java 1.7 installation about 9

fully distributed mode 15 local mode 10-13

pseudo-distributed mode 13, 14 Java Management Extensions (JMX)

about 133, 134 URL 133

Java Transaction API (JTA) 99 JMXToolkit

URL 133

JSON format (key-value pair) 110 JVM metrics

garbage collection 130 memory 130

thread 130

JVM tuning 137, 138

Kerberos Key Distribution Center (KDC) 61 keysabout 37

building, from source 100 features 108

filters, using with 107 Maven dependency, using 99 used, for performing CRUD

operations 101-105 used, for querying HBase 106 using, ways 99, 100

L

Lempel-Ziv-Oberhumer (LZO) 135 local mode, HBase installation 10-13

M

running, over HBase 68-70 Map<String, RegionState>

cyclic replication 60, 61 master-master replication 60 master-slave replication 59

master server metrics

flush queue size metric 129 flush size metric 129 flush time metric 129 methods, ClusterStatus class

boolean hasFamily(byte[] c) 86 HColumnDescriptor[] multiple counters 80, 81

N

Nagios about 133 URL 127, 133

next(int nbRows) method 43 next() method 43

online analytical processing (OLAP) 5 online transactional processing (OLTP) 5 Open Time Series Database (OpenTSDB) 7

P

PageFilter 47 performance tuning

about 135

compression algorithms 135 JVM tuning 137, 138 load balancing 136

pseudo-distributed mode, HBase installation 13, 14

pure custom filters 49 Put class

get(byte[] family, byte[] qualifier) method 31

has(byte[] family, byte[] qualifier, byte[] value) method 31 put command 97

QualifierFilter 49

R

RandomRowFilter 47 read operation 103

recommendations, performance tuning heavy random reads 139

heavy sequential reads 139 Reducer class, implementing

guidelines 69

RegionObserver type 82 region server metrics

block cache metrics 129 compaction metrics 129

relational database management system (RDBMS) 5 REST Java client 111 XML format 110 REST Java client 111 RowFilter 47

setStartRow(byte[] startRow) method 42 setStopRow(byte[] stopRow) method 42 setTimeRange(long minStamp, long

maxStamp method 31, 42 setTimeStamp(long timestamp)

method 31, 42

Simple Authentication and Security Layer (SASL) 62

file index size metric 130 stores metrics 130

String getClusterId() method 93 String getHBaseVersion() method 93

T

table scans, HBase 40-43 tables, HBase

truncate command 97

persistent time-varying rate 128 rate 128

incremental data, handling 7 utility filters write-ahead log (WAL) 55-57

X

Proudly sourced and uploaded by [StormRG]

Kickass Torrents | TPB | ExtraTorrent | h33t

HBase Essentials

在文檔中 A practical guide to realizing the seamless potential of storing and managing high-volume, high-velocity data quickly and painlessly with HBase (頁 149-162)