Other than the previously discussed ways of performance tuning, the HBase cluster should be fine-tuned using a different kind of configuration for different types of use cases or workloads such as:
• Heavy write: Data written goes into the MemStore and are flushed to form new HFiles. These HFiles are compacted. As a best practice, flushing, compacting, or splitting should not happen too often as these processes increase the I/O, thus causing the slower cluster performance. Some recommendations are as follows:
° Keep the region size larger to avoid splits at write time ° Keep the HFile size larger to avoid compaction
• Heavy sequential reads—some recommendations are as follows:
° Higher block size to read more data per seek ° Avoid caching on table
• Heavy random reads: Effective use of the cache and better indexing will get higher performance. A few recommendations are as follows:
° Use a higher-block level cache and lower down the MemStore limit ° For better indexing, use the smaller block size
° Use bloom filters at column family level
In the case of mixed use of heavy read and write, all the performance tuning parameters should be given a serious look and would require multiple rounds of tuning to get the optimized configuration.
Troubleshooting
An HBase cluster does not run smoothly and expectedly sometimes, especially with bad configuration. This section covers the troubleshooting tools and techniques in brief for the HBase cluster running with ambiguous status. There are certain tools that are used while troubleshooting the HBase cluster. The following are some of the important tools that are preferred to be known to the administrators:
• jps: This tool shows the Java processes running for the current user.
$ $JAVA_HOME/bin/jps
• jmap: This tool is used to view the Java heap summary. For example, the following command shows the summary for the HRegionServer daemon's heap:
$ $JAVA_HOME/bin/jmap -heap 1812
• ps: This tool is used to view the occupied memory by the processes.
The following command uses the –rss flag to view sort processes in the descending order by their resident set size as:
$ ps auxk -rss | less
• jstat: This tool is used for monitoring the Java Virtual Machine. Run the following command to show the summary of the garbage collection statistics of an HRegionServer process running with the 1812 process ID and take this summary for every 1000 milliseconds:
hadoop@slave1$ jstat -gcutil 1812 1000
Apart from the preceding tools, there are many common errors that an administrator might encounter in a production environment. A few of them are discussed as follows:
Too many open files error: HBase runs on top of Hadoop that opens lots of files at the same time. Operating systems such as Linux define the limit (the default value is 1024) for file descriptors that any process might open. In case the user's open file count exceeds the OS-defined limit, the following error is visible:
java.io.FileNotFoundException: /usr/local/hadoop/var/dfs/data/current/subdir6/
blk_-34458031297234453 (Too many open files)
To fix this issue, increase the open file count for the user by adding the following property in the /etc/security/limits.conf file:
$ vi /etc/security/limits.conf
<username> soft nofile 65535
<username> hard nofile 65535
Also, add the following line to the /etc/pam.d/login file:
$ vi /etc/pam.d/login
session required pam_limits.so
Once done, log out and log in again as the user and restart the Hadoop and HBase clusters. The upper limit for files can be verified using the following command:
$ ulimit -n
Unable to create a new native thread error: The OS defines the limits for the use to execute the number of processes simultaneously. With a high load and lower value for nproc, the HBase cluster might throw an OutOfMemoryError exception as:
DataStreamer Exception: java.lang.OutOfMemoryError: unable to create new native thread
To fix this issue, increase the process execution count for the user by adding the following property in the /etc/security/limits.conf file:
$ vi /etc/security/limits.conf
<username> soft nproc 35000
<username> hard nproc 35000
Also, add the following line to the /etc/pam.d/login file:
$ vi /etc/pam.d/login
session required pam_limits.so
Once done, log out and log in again as the user and restart the Hadoop and HBase clusters. The upper limit for files can be verified using the following command:
$ ulimit -u
ZooKeeper client connection error: ZooKeeper defines the maxClientCnxns property that defines the number of concurrent connections any client might make to a
member of the ZooKeeper ensemble. This error usually occurs when running a MapReduce job over an HBase cluster. In the HBase cluster, region server acts as a ZooKeeper client, and if a region server's concurrent connection count exceeds the limit defined by ZooKeeper, the following error occurs:
java.io.IOException: Connection reset by peer
To fix this error, add/update the following property in the ZooKeeper configuration file (zoo.cfg) on every ZooKeeper quorum node:
$ vi $ZOOKEEPER_HOME/conf/zoo.cfg maxClientCnxns=100
Restart the ZooKeeper to apply the changes.
Summary
In this chapter, we looked at the HBase cluster administration techniques. We also discussed the different ways to monitor the HBase cluster starting from using the HBase monitoring framework and JMX to third-party tools, such as Ganglia and Nagios.
In the last section, we learned about the performance tuning areas that require considerations to get the optimized performance based on the workloads. This chapter also sheds some light on the HBase cluster trouble shooting.
Index
A
addColumn(byte[] family, byte[] qualifier) method 31, 42
addFamily(byte[] family) method 31, 42 add() option 29
administrative API about 85
data definition API 85 HBaseAdmin API 89 alter command 96
Apache Hadoop software library 6 application-managed approach 40
boolean hasFamily(byte[] c) method 86 boolean isBlockCacheEnabled() method 88 boolean isInMemory() method 88
boolean isMasterRunning() method 89 byte getVersion() method 93
C
close() method 43, 89 cluster
upgrading 121, 122 cluster consistency, HBase
consistency check 123 fixing, flags used 123 integrity check 123 cluster management
about 119
cluster, upgrading 121
CopyTable MapReduce job 125 data export tools 123
HBase cluster consistency 122 HBase cluster, starting 120 HBase cluster, stopping 120 HBase data import tools 123 node, decommissioning 121 nodes, adding 120
cluster monitoring about 127
HBase metrics framework 127 Collection<ServerName> delete '<table_name>', '<row_num>',
'column_family:key' 20
describe '<table_name>' 20 drop '<table_name>' 20
get'<table_name>', '<row_num>' 19 list 18
put '<table_name>', '<row_num>', 'column_family:key', 'value' 19 scan '<table_name >' 19
status 18 URL 21
compact command 97 compaction metrics
compaction queue size metric 129 compaction size metric 129 compaction time metric 129 comparison filters
Concurrent-Mark-Sweep GC (CMS) 138 Configuration getConfiguration()
Scan(byte[] startRow, byte[] stopRow) 41 Scan(byte[] startRow, Filter filter) 41 Scan(Get get) 41
CopyTable MapReduce job 125 count command 97
create, read, update and delete (CRUD operations) about 28
create operation 103 data, deleting 34-36 data, reading 31, 32 data, updating 33 data, writing 29, 30 delete operation 105
performing, with Kundera 101-105 read operation 103
update operation 104 custom endpoint coprocessor
building 84, 85 custom filters
pure custom filters 49 wrapper filters 49
column family methods 86 table name methods 86
alter 96
deleteColumn(byte[] family, byte[]
qualifier, long timestamp) method 34 deleteColumn(byte[] family, byte[]
qualifier) method 34
deleteColumns(byte[] family, byte[]
qualifier) method 34
deleteFamily(byte[] family, long timestamp) method 35
deleteFamily(byte[] family) method 34 deleteFamilyVersion(byte[] family, long
timestamp) method 35 delete operation 105
disable command 96
double getAverageLoad() method 92 drop command 96
E
enable command 96 endpoint coprocessor 84, 85 Entity Transaction 99 files types, for data storage
HFile 57
using, with Kundera 107, 108 utility filters 45
flush command 97
fully distributed mode, HBase installation 15
G
Ganglia URL 127-132 Ganglia, components
Ganglia meta daemon (gmetad) 131 Ganglia monitoring daemon (gmond) 131 Ganglia PHP web application 132
Get class
getCacheBlocks() method 33 getFamilyMap() method 33 getMaxResultsPerColumnFamily()
method 33
setTimestamp(long timestamp) method 36
get command 97
getDelegate method 107 GZIP 136
H
Hadoop 2.x 8
Hadoop Distributed File System (HDFS) 66 Hadoop ecosystem client
API documentation, URL 93 as data sink 71
as data source 70
as data source and sink 71-74 cluster consistency 122, 123 connection, establishing 27, 28 CRUD operations 28
data modeling 23-25 installing 8, 9
integrating, with MapReduce 66 MapReduce, running over 68-70 origin 6, 7
querying, with Kundera 106 securing 61
HBase data storage system 17 HBase Master 16
RegionServers 17 troubleshooting 139-141 ZooKeeper 16
HBase data storage system 17 HBase Master 16
master server metrics 128 Nagios 133
region server metrics 129 HBase replication
URL 61 HBase, securing
about 61
authentication, enabling 62 authorization, enabling 63, 64 REST clients, configuring 65 HBase shell
about 95, 96
data definition commands 96
data-handling administrative tools 97 data manipulation commands 97 HBase Version 0.98.7 8
HColumnDescriptor class 87
HConnection getConnection() method 89 HFile 24, 57-59
Hiveabout 116 URL 116
HTable class 28
indexing solutions for HBase approach 40 info metrics 130
interface definition (ID) 112 int getDeadServers() method 92 int getRegionsCount() method 93 int getRequestsCount() method 93 I/O metrics
FS read latency 130 FS sync latency 130 FS write latency 130
J
Java 1.7 installation about 9
fully distributed mode 15 local mode 10-13
pseudo-distributed mode 13, 14 Java Management Extensions (JMX)
about 133, 134 URL 133
Java Transaction API (JTA) 99 JMXToolkit
URL 133
JSON format (key-value pair) 110 JVM metrics
garbage collection 130 memory 130
thread 130
JVM tuning 137, 138
Kerberos Key Distribution Center (KDC) 61 keysabout 37
building, from source 100 features 108
filters, using with 107 Maven dependency, using 99 used, for performing CRUD
operations 101-105 used, for querying HBase 106 using, ways 99, 100
L
Lempel-Ziv-Oberhumer (LZO) 135 local mode, HBase installation 10-13
M
running, over HBase 68-70 Map<String, RegionState>
cyclic replication 60, 61 master-master replication 60 master-slave replication 59
master server metrics
flush queue size metric 129 flush size metric 129 flush time metric 129 methods, ClusterStatus class
boolean hasFamily(byte[] c) 86 HColumnDescriptor[] multiple counters 80, 81
N
Nagios about 133 URL 127, 133
next(int nbRows) method 43 next() method 43
online analytical processing (OLAP) 5 online transactional processing (OLTP) 5 Open Time Series Database (OpenTSDB) 7
P
PageFilter 47 performance tuning
about 135
compression algorithms 135 JVM tuning 137, 138 load balancing 136
pseudo-distributed mode, HBase installation 13, 14
pure custom filters 49 Put class
get(byte[] family, byte[] qualifier) method 31
has(byte[] family, byte[] qualifier, byte[] value) method 31 put command 97
QualifierFilter 49
R
RandomRowFilter 47 read operation 103
recommendations, performance tuning heavy random reads 139
heavy sequential reads 139 Reducer class, implementing
guidelines 69
RegionObserver type 82 region server metrics
block cache metrics 129 compaction metrics 129
relational database management system (RDBMS) 5 REST Java client 111 XML format 110 REST Java client 111 RowFilter 47
setStartRow(byte[] startRow) method 42 setStopRow(byte[] stopRow) method 42 setTimeRange(long minStamp, long
maxStamp method 31, 42 setTimeStamp(long timestamp)
method 31, 42
Simple Authentication and Security Layer (SASL) 62
file index size metric 130 stores metrics 130
String getClusterId() method 93 String getHBaseVersion() method 93
T
table scans, HBase 40-43 tables, HBase
truncate command 97
persistent time-varying rate 128 rate 128
persistent time-varying rate 128 rate 128
incremental data, handling 7 utility filters write-ahead log (WAL) 55-57
X
Proudly sourced and uploaded by [StormRG]
Kickass Torrents | TPB | ExtraTorrent | h33t