应用开发简介_MapReduce服务 MRS_开发指南（适用于2.x及之前）_OpenTSDB应用开发_概述_华为云

(1)

开发指南

文档版本 16

发布日期 2021-07-27

(2)

非经本公司书面许可，任何单位和个人不得擅自摘抄、复制本文档内容的部分或全部，并不得以任何形式传播。

商标声明

和其他华为商标均为华为技术有限公司的商标。

本文档提及的其他所有商标或注册商标，由各自的所有人拥有。

注意

您购买的产品、服务或特性等应受华为公司商业合同和条款的约束，本文档中描述的全部或部分产品、服务或特性可能不在您的购买或使用范围之内。除非合同另有约定，华为公司对本文档内容不做任何明示或暗示的声明或保证。

由于产品版本升级或其他原因，本文档内容会不定期进行更新。除非另有约定，本文档仅作为使用指导，本文档中的所有陈述、信息和建议不构成任何明示或暗示的担保。

(3)

1 ^简介

概述

本文档提供Hadoop、Spark、HBase、Hive、MapReduce、Kafka、Storm大数据组件的应用开发的流程和操作指导，以及各个大数据组件的样例工程，同时提供了常见的问题解答和开发规范。

开发准备

● 您已经开通了华为云MapReduce服务。

● 您已经对Hadoop、Spark、HBase、Hive、MapReduce、Kafka、Storm大数据组件具备一定的认识。

● 您已经对Java语法具备一定的认识。

● 您已经对华为云弹性云服务器的使用方式和MapReduce服务开发组件有一定的了解。

● 您已经对Maven构建方式具备一定的认识和使用方法有一定了解。

(16)

2 MapReduce 服务样例工程构建方式

构建流程

MapReduce服务样例工程构建流程包括三个主要步骤：

1. 下载样例工程的Maven工程源码和配置文件，请参见样例工程获取地址。

2. 配置华为云镜像站中SDK的Maven镜像仓库，请参见华为云开源镜像配置方式。

3. 根据用户自身需求，按照各个组件“环境准备”章节构建完整的Maven工程。

样例工程获取地址

● 华为云MRS服务1.8之前版本的样例工程下载地址为：http://

mapreduceservice.obs-website.cn-north-1.myhuaweicloud.com/。

● 华为云MRS服务1.8.x版本的样例工程下载地址为：https://github.com/

huaweicloud/huaweicloud-mrs-example/tree/mrs-1.8。

huaweicloud/huaweicloud-mrs-example/tree/mrs-1.9。

huaweicloud/huaweicloud-mrs-example/tree/mrs-2.1。

图2-1 样例代码下载

华为云开源镜像配置方式

华为云提供开源镜像站（网址为https://mirrors.huaweicloud.com/），MRS服务样例工程依赖的jar包都需要在华为开源镜像站下载，剩余所依赖的开源jar包请直接从 Maven中央库下载。

(17)

说明

本地环境使用开发工具下载依赖的jar包前，需要确认以下信息。

● 确认本地环境网络正常。

打开浏览器访问：华为提供开源镜像站（https://mirrors.huaweicloud.com/repository/

maven/huaweicloudsdk/），查看网站是否能正常访问。如果访问异常，请先开通本地网 络。

● 确认当前开发工具是否开启代理。下载jar包前需要确保开发工具代理关闭。

比如以2020.2版本的IntelliJ IDEA开发工具为例，单击“File > Settings > Appearance &

Behavior > System Settings > HTTP Proxy”，选择“No proxy”，单击“OK”保存配置。

华为云开源镜像配置方式如下所示：

步骤1 使用前请确保您已安装JDK 1.8及以上版本和Maven 3.0及以上版本。

步骤2 打开网址https://mirrors.huaweicloud.com/并登录华为开源镜像站。

步骤3 点击此处，下载华为开源镜像站提供的“settings.xml”文件，覆盖至“<本地Maven

安装目录>/conf/settings.xml”文件即可。

若无法直接下载，在华为开源镜像站搜索并单击名称为“HuaweiCloud SDK”的版块，按照页面弹出的设置方法进行操作。

步骤4 参考以下方法手动修改“setting.xml”配置文件或者组件样例工程中的“pom.xml”

文件，配置镜像仓地址。

● 配置方法一：

手动在setting.xml配置文件的mirrors节点中添加开源镜像仓地址：

<mirrorOf>central</mirrorOf>

<url>https://repo1.maven.org/maven2/</url>

</mirror>

在setting.xml配置文件的profiles节点中添加如下镜像仓地址：

<id>huaweicloudsdk</id>

<url>https://repo.huaweicloud.com/repository/maven/huaweicloudsdk/</url>

</repository>

</repositories>

</profile>

在setting.xml配置文件的activeProfiles节点中添加如下镜像仓地址：

<activeProfile>huaweicloudsdk</activeProfile>

说明

华为云开源镜像站不提供第三方开源jar包下载，请配置华为云开源镜像后，额外配置第三方Maven镜像仓库地址。

● 配置方法二：

在二次开发工程样例工程中的pom.xml文件添加如下镜像仓地址：

<url>https://mirrors.huaweicloud.com/repository/maven/huaweicloudsdk/</url>

(18)

</repository>

<id>central</id>

<name>Mavn Centreal</name>

<url>https://repo1.maven.org/maven2/</url>

</repository>

</repositories>

----结束

(19)

3 HBase 应用开发

3.1 概述

3.1.1 应用开发简介

HBase 简介

HBase是一个高可靠性、高性能、面向列、可伸缩的分布式存储系统。HBase设计目标是用来解决关系型数据库在处理海量数据时的局限性。

HBase使用场景有如下几个特点：

● 处理海量数据（TB或PB级别以上）。

● 具有高吞吐量。

● 在海量数据中实现高效的随机读取。

● 具有很好的伸缩能力。

● 能够同时处理结构化和非结构化的数据。

● 不需要完全拥有传统关系型数据库所具备的ACID特性。ACID特性指原子性

（Atomicity）、一致性（Consistency）、隔离性（Isolation，又称独立性）、持久性（Durability）。

● HBase中的表具有如下特点：

– 大：一个表可以有上亿行，上百万列。

– 面向列：面向列（族）的存储和权限控制，列（族）独立检索。

– 稀疏：对于为空（null）的列，并不占用存储空间，因此，表可以设计的非常稀疏。

接口类型简介

由于HBase本身是由java语言开发出来的，且java语言具有简洁通用易懂的特性，推荐用户使用java语言进行HBase应用程序开发。

HBase采用的接口与Apache HBase保持一致，请参见：http://hbase.apache.org/

apidocs/index.html。

(20)

HBase通过接口调用，可提供的功能如表3-1所示。

表3-1 HBase 接口提供的功能

功能说明

CRUD数据读写功能增查改删

高级特性过滤器、协处理器

管理功能表管理、集群管理

3.1.2 常用概念

● 过滤器

过滤器提供了非常强大的特性来帮助用户提高HBase处理表中数据的效率。用户不仅可以使用HBase中预定义好的过滤器，而且可以实现自定义的过滤器。

● 协处理器

允许用户执行region级的操作，并且可以使用与RDBMS中触发器类似的功能。

● Client

客户端直接面向用户，可通过Java API或HBase Shell访问服务端，对HBase的表进行读写操作。本文中的HBase客户端特指从装有HBase服务的MRS Manager上下载的HBase client安装包，里面包含通过Java API访问HBase的样例代码。

3.1.3 开发流程

本文档主要基于Java API对HBase进行应用开发。

开发流程中各阶段的说明如图3-1和表3-2所示。

(21)

图3-1 HBase 应用程序开发流程

表3-2 HBase 应用开发的流程说明

阶段说明参考文档

了解基本概念在开始开发应用前，需要

了解HBase的基本概念，

了解场景需求，设计表等。

常用概念

准备开发环境和运行环境 HBase的应用程序当前推荐使用Java语言进行开发。可使用Eclipse工具。

HBase的运行环境即 HBase客户端，请根据指导完成客户端的安装和配置。

开发和运行环境简介

(22)

阶段说明参考文档

准备工程 HBase提供了不同场景下

的样例程序，您可以导入样例工程进行程序学习。

或者您可以根据指导，新建一个HBase工程。

配置并导入样例工程

根据场景开发工程提供了Java语言的样例工程，包含从建表、写入到删除表全流程的样例工程。

典型场景开发思路

编译并运行程序指导用户将开发好的程序

编译并提交运行。

调测程序

查看程序运行结果程序运行结果会写在用户

指定的路径下。用户还可以通过UI查看应用运行情况。

查看调测结果

3.2 环境准备

3.2.1 开发和运行环境简介

在进行二次开发时，要准备的开发环境如表3-3所示。同时需要准备运行调测的Linux 环境，用于验证应用程序运行正常。

表3-3 开发环境

准备项说明

操作系统 Windows系统，推荐Windows 7及以上

版本。

安装JDK 开发环境的基本配置。版本要求：1.8及

以上。

安装和配置Eclipse 用于开发HBase应用程序的工具。

安装Maven 用于编译样例工程。

网络确保客户端与HBase服务主机在网络上互

通。

● 选择Windows开发环境下，安装Eclipse，安装JDK。

请安装JDK1.8及以上版本。Eclipse使用支持JDK1.8及以上的版本，并安装JUnit插件。

(23)

说明

● 若使用IBM JDK，请确保Eclipse中的JDK配置为IBM JDK。

● 若使用Oracle JDK，请确保Eclipse中的JDK配置为Oracle JDK。

● 不同的Eclipse不要使用相同的workspace和相同路径下的示例工程。

● 准备一个应用程序运行测试的Linux环境。

准备运行调测环境

步骤1 在弹性云服务器管理控制台，申请一个新的弹性云服务器，用于应用开发、运行、调测。

● 弹性云服务器的安全组需要和MRS集群Master节点的安全组相同。

● 弹性云服务器的VPC需要与MRS集群在同一个VPC中。

● 弹性云服务器的网卡需要与MRS集群在同一个网段中。

步骤2 申请弹性IP，绑定新申请的ECS的IP，并配置安全组出入规则。

步骤3 下载客户端程序，请参考下载MRS客户端。

步骤4 登录存放下载的客户端的节点，再安装客户端。

1. 执行以下命令解压客户端包：

cd /opt

tar -xvf /opt/MRS_Services_Client.tar 2. 执行以下命令校验安装文件包：

sha256sum -c /opt/MRS_Services_ClientConfig.tar.sha256

MRS_Services_ClientConfig.tar:OK

3. 执行以下命令解压安装文件包：

tar -xvf /opt/MRS_Services_ClientConfig.tar

4. 执行如下命令安装客户端到指定目录（绝对路径），例如“/opt/client”。目录会自动创建。

cd /opt/MRS_Services_ClientConfig sh install.sh /opt/client

Components client installation is complete.

----结束

3.2.2 准备开发用户

开发用户用于运行样例工程。用户需要有HBase权限，才能运行HBase样例工程。

前提条件

MRS服务集群开启了Kerberos认证，没有开启Kerberos认证的集群忽略该步骤。

操作步骤

步骤1 登录MRS Manager，请参考登录MRS Manager。

步骤2 在MRS Manager界面选择“系统设置 > 角色管理 > 添加角色”，如图 1 添加角色所示。

(24)

图3-2 添加角色

1. 填写角色的名称，例如hbaserole。

2. 编辑角色，在“权限”的表格中选择“HBase> HBase Scope”，勾选

“Admin”、“Create”、“Read”、“Write”和“Execute”，单击“确定”

保存，如图3-3所示。

图3-3 编辑角色

步骤3 单击“系统设置 > 用户管理 > 添加用户”，为样例工程创建一个用户。

步骤4 填写用户名，例如hbaseuser，用户类型为“机机”用户，加入用户组supergroup，

设置其“主组”为supergroup，并绑定角色hbaserole取得权限，单击“确定”，如图3-4所示。

(25)

图3-4 添加用户

步骤5 在MRS Manager界面选择“系统设置 > 用户管理”，在用户名中选择hbaseuser，然后在右侧“操作”列中选择“更多 >下载认证凭据”，保存后解压得到用户的

user.keytab文件与krb5.conf文件，用于在样例工程中进行安全认证，如图3-5所示。

图3-5 下载认证凭据

----结束

参考信息

如果修改了组件的配置参数，需重新下载客户端配置文件并更新运行调测环境上的客户端。

3.2.3 配置并导入样例工程

前提条件

确保本地PC的时间与MRS集群的时间差要小于5分钟。MRS集群的时间可通过MRS Manager页面右上角查看。

(26)

图3-6 MRS 集群的时间

操作步骤

步骤1 在样例工程获取地址获取HBase示例工程。

步骤2 在HBase示例工程根目录，即HBase样例工程的“pom.xml”层目录下，打开cmd命令行窗口，执行 mvn install 进行编译。

步骤3 在步骤2中打开的cmd命令行窗口中，执行 mvn eclipse:eclipse 创建Eclipse工程。

步骤4 设置Eclipse开发环境。

1.在Eclipse的菜单栏中，选择“Window > Preferences”。

弹出“Preferences”窗口。

2.在左边导航上选择“General > Workspace”，在“Text file encoding”区域，选中

“Other”，并设置参数值为“UTF-8”，单击“Apply”，

如图 2 设置Eclipse的编码格式所示。

(27)

图3-7 设置 Eclipse 的编码格式

3.在左边导航上选择“Maven > User Settings”, 在“User Settings”中点击

“Browse”导入Maven的settings.xml配置，单击“Apply”，点击“OK”, 如图3-8所示。

(28)

图3-8 设置 Eclipse 的 Maven 开发环境

步骤5 在应用开发环境中，导入样例工程到Eclipse开发环境。

1. 选择“File > Import > General > Existing Projects into Workspace > Next

>Browse”。

显示“浏览文件夹”对话框。

2. 选择样例工程文件夹，单击“Finish”。

----结束

3.3 开发程序

3.3.1 典型场景开发思路

通过典型场景，我们可以快速学习和掌握HBase的开发过程，并且对关键的接口函数有所了解。

场景说明

假定用户开发一个应用程序，用于管理企业中的使用A业务的用户信息，如表3-4所示，A业务操作流程如下：

● 创建用户信息表。

(29)

● 在用户信息中新增用户的学历、职称等信息。

● 根据用户编号查询用户姓名和地址。

● 根据用户姓名进行查询。

● 查询年龄段在[20–29]之间的用户信息。

● 数据统计，统计用户信息表的人员数、年龄最大值、年龄最小值、平均年龄。

● 用户销户，删除用户信息表中该用户的数据。

● A业务结束后，删除用户信息表。

表3-4 用户信息

编号姓名性别年龄地址

120050002

01 Zhang

San Male 19 Shenzhen City, Guangdong Province 120050002

02 Li

Wanti ng

Femal

e 23 Hangzhou City, Zhejiang Province

120050002

03 Wang

Ming Male 26 Ningbo City, Zhejiang Province 120050002

04 Li

Gang Male 18 Xiangyang City, Hubei Province 120050002

05 Zhao

Enru Femal

e 21 Shangrao City, Jiangxi Province 120050002

06 Chen

Long Male 32 Zhuzhou City, Hunan Province 120050002

07 Zhou

Wei Femal

e 29 Nanyang City, Henan Province 120050002

08 Yang

Yiwen Femal

e 30 Wenzhou City, Zhejiang Province 120050002

09 Xu

Bing Male 26 Weinan City, Shaanxi Province 120050002

10 Xiao

Kai Male 25 Dalian City, Liaoning Province

数据规划

合理地设计表结构、行键、列名能充分利用HBase的优势。本样例工程以唯一编号作为RowKey，列都存储在info列族中。

功能分解

根据上述的业务场景进行功能分解，需要开发的功能点如表3-5所示。

(30)

表3-5 在 HBase 中开发的功能

序号步骤代码实现

1 根据表3-4中的信息创建表。请参见创建表。

2 导入用户数据。请参见插入数据。

3 增加“教育信息”列族，在用户信息中新增用户的学历、职称等信息。

请参见修改表。

4 根据用户编号查询用户姓名和地址。请参见使用Get读取数

据。

5 根据用户姓名进行查询。请参见使用过滤器Filter。

6 用户销户，删除用户信息表中该用户的数据。请参见删除数据。

7 A业务结束后，删除用户信息表。请参见删除表。

关键设计原则

HBase是以RowKey为字典排序的分布式数据库系统，RowKey的设计对性能影响很大，具体的RowKey设计请考虑与业务结合。

3.3.2 创建 Configuration

功能介绍

HBase通过加载配置文件来获取配置项，包括用户登录信息配置项。

代码样例

下面代码片段在com.huawei.bigdata.hbase.examples包中。

调用类TestMain下的init()方法会初始化Configuration对象：

private static void init() throws IOException { // load hbase client info

if(clientInfo == null) {

clientInfo = new ClientInfo(CONF_DIR + HBASE_CLIENT_PROPERTIES);

restServerInfo = clientInfo.getRestServerInfo();

}

// Default load from conf directory conf = HBaseConfiguration.create();

conf.addResource(CONF_DIR + "core-site.xml");

conf.addResource(CONF_DIR + "hdfs-site.xml");

conf.addResource(CONF_DIR + "hbase-site.xml");

}

(31)

3.3.3 创建 Connection

功能介绍

HBase通过ConnectionFactory.createConnection(configuration)方法创建Connection 对象。传递的参数为上一步创建的Configuration。

Connection封装了底层与各实际服务器的连接以及与ZooKeeper的连接。Connection 通过ConnectionFactory类实例化。创建Connection是重量级操作，Connection是线程安全的，因此，多个客户端线程可以共享一个Connection。

典型的用法，一个客户端程序共享一个单独的Connection，每一个线程获取自己的 Admin或Table实例，然后调用Admin对象或Table对象提供的操作接口。不建议缓存或者池化Table、Admin。Connection的生命周期由调用者维护，调用者通过调用

close()，释放资源。

代码样例

以下代码片段是创建Connection的示例：

private TableName tableName = null;

private Configuration conf = null;

private Connection conn = null;

public static final String TABLE_NAME = "hbase_sample_table";

public HBaseExample(Configuration conf) throws IOException { this.conf = conf;

this.tableName = TableName.valueOf(TABLE_NAME);

this.conn = ConnectionFactory.createConnection(conf);

}

说明

1. 样例代码中有很多的操作，如建表、查询、删表等等，这里只列举了建表testCreateTable和删除表dropTable这2种操作。可参考对应章节样例。

2. 创建表操作所需的Admin对象是从Connection对象获取。

3. 登录代码要避免重复调用。

3.3.4 创建表

功能简介

HBase通过org.apache.hadoop.hbase.client.Admin对象的createTable方法来创建表，

并指定表名、列族名。创建表有两种方式，建议采用预分Region建表方式：

● 快速建表，即创建表后整张表只有一个Region，随着数据量的增加会自动分裂成多个Region。

● 预分Region建表，即创建表时预先分配多个Region，此种方法建表可以提高写入大量数据初期的数据写入速度。

说明

表的列名以及列族名不能包含特殊字符，可以由字母、数字以及下划线组成。

(32)

代码样例

以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的 testCreateTable方法中

MRS 3.x及以后版本请使用以下代码创建表：

public static void testCreateTable() { LOG.info("Entering testCreateTable.");

// Set the column family name.

byte [] fam = Bytes.toBytes("info");

ColumnFamilyDescriptor familyDescriptor = ColumnFamilyDescriptorBuilder.newBuilder(fam) // Set data encoding methods. HBase provides DIFF,FAST_DIFF,PREFIX

// HBase 2.0 removed `PREFIX_TREE` Data Block Encoding from column families.

.setDataBlockEncoding(DataBlockEncoding.FAST_DIFF)

// Set compression methods, HBase provides two default compression // methods:GZ and SNAPPY

// GZ has the highest compression rate,but low compression and // decompression effeciency,fit for cold data

// SNAPPY has low compression rate, but high compression and // decompression effeciency,fit for hot data.

// it is advised to use SANPPY

.setCompressionType(Compression.Algorithm.SNAPPY) .build();

TableDescriptor htd =

TableDescriptorBuilder.newBuilder(tableName).setColumnFamily(familyDescriptor).build();

Admin admin = null;

try {

// Instantiate an Admin object.

admin = conn.getAdmin();

if (!admin.tableExists(tableName)) { LOG.info("Creating table...");

// create table

admin.createTable(htd);

LOG.info(admin.getClusterMetrics());

LOG.info(admin.listNamespaceDescriptors());

LOG.info("Table created successfully.");

} else {

LOG.warn("table already exists");

}

} catch (IOException e) {

LOG.error("Create table failed.", e);

} finally {

if (admin != null) { try {

// Close the Admin object.

admin.close();

LOG.error("Failed to close admin ", e);

} } }

LOG.info("Exiting testCreateTable.");

}MRS 3.x之前版本请使用以下代码创建表：

public static void testCreateTable() { LOG.info("Entering testCreateTable.");

// Specify the table descriptor.

HTableDescriptor htd = new HTableDescriptor(tableName);

// Set the column family name to info.

HColumnDescriptor hcd = new HColumnDescriptor("info");

// Set data encoding methods.HBase provides DIFF,FAST_DIFF,PREFIX // and PREFIX_TREE

hcd.setDataBlockEncoding(DataBlockEncoding.FAST_DIFF);

// Set compression methods,HBase provides two default compression // methods:GZ and SNAPPY

(33)

// GZ has the highest compression rate,but low compression and // decompression effeciency,fit for cold data

// SNAPPY has low compression rate, but high compression and // decompression effeciency,fit for hot data.

// it is advised to use SANPPY

hcd.setCompressionType(Compression.Algorithm.SNAPPY);

htd.addFamily(hcd);

Admin admin = null;

try {

if (!admin.tableExists(tableName)) { LOG.info("Creating table...");

// create table

admin.createTable(htd);

LOG.info(admin.getClusterStatus());

LOG.info(admin.listNamespaceDescriptors());

LOG.info("Table created successfully.");

} else {

LOG.warn("table already exists");

}

LOG.error("Create table failed.", e);

} finally {

admin.close();

} } }

LOG.info("Exiting testCreateTable.");

}

3.3.5 删除表

功能简介

HBase通过org.apache.hadoop.hbase.client.Admin的deleteTable方法来删除表。

代码样例

以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的 dropTable方法中

public void dropTable() {

LOG.info("Entering dropTable.");

Admin admin = null;

try {

if (admin.tableExists(tableName)) { // Disable the table before deleting it.

admin.disableTable(tableName);//注[1]

// Delete table.

admin.deleteTable(tableName);

}

LOG.info("Drop table successfully.");

LOG.error("Drop table failed ", e);

} finally {

(34)

admin.close();

LOG.error("Close admin failed ", e);

} } }

LOG.info("Exiting dropTable.");

}

注意事项

注[1]只有表被disable时，才能被删除掉，所以deleteTable常与disableTable，

enableTable，tableExists，isTableEnabled，isTableDisabled结合在一起使用。

3.3.6 修改表

功能简介

HBase通过org.apache.hadoop.hbase.client.Admin的modifyTable方法修改表信息。

代码样例

以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的 testModifyTable方法中

MRS 3.x及以后的版本使用以下代码修改表：

public void testModifyTable() {

LOG.info("Entering testModifyTable.");

// Specify the column family name.

byte[] familyName = Bytes.toBytes("education");

Admin admin = null;

try {

// Obtain the table descriptor.

TableDescriptor htd = TableDescriptorBuilder.newBuilder(tableName).build();

// Check whether the column family is specified before modification.

if (!htd.hasColumnFamily(familyName)) { // Create the column descriptor.

ColumnFamilyDescriptor cfd = ColumnFamilyDescriptorBuilder.of(familyName);

TableDescriptor td = TableDescriptorBuilder.newBuilder(admin.getDescriptor(tableName)) .setColumnFamily(cfd)

.build();

// Disable the table to get the table offline before modifying // the table.

admin.disableTable(tableName);

// Submit a modifyTable request.

admin.modifyTable(td);

// Enable the table to get the table online after modifying the // table.

admin.enableTable(tableName);

}

LOG.info("Modify table successfully.");

LOG.error("Modify table failed ", e);

} finally {

admin.close();

(35)

} } }

LOG.info("Exiting testModifyTable.");

}MRS 3.x之前的版本使用以下代码修改表：

public void testModifyTable() {

LOG.info("Entering testModifyTable.");

byte[] familyName = Bytes.toBytes("education");

Admin admin = null;

try {

// Obtain the table descriptor.

HTableDescriptor htd = admin.getTableDescriptor(tableName);

// Check whether the column family is specified before modification.

if (!htd.hasFamily(familyName)) { // Create the column descriptor.

HColumnDescriptor hcd = new HColumnDescriptor(familyName);

htd.addFamily(hcd);

// Disable the table to get the table offline before modifying // the table.

admin.disableTable(tableName);//注[1]

// Submit a modifyTable request.

admin.modifyTable(tableName, htd);

// Enable the table to get the table online after modifying the // table.

admin.enableTable(tableName);

}

LOG.info("Modify table successfully.");

LOG.error("Modify table failed ", e);

} finally {

admin.close();

} } }

LOG.info("Exiting testModifyTable.");

}

说明

注[1]modifyTable只有表被disable时，才能生效。

3.3.7 插入数据

功能简介

HBase是一个面向列的数据库，一行数据，可能对应多个列族，而一个列族又可以对应多个列。通常，写入数据的时候，我们需要指定要写入的列（含列族名称和列名称）。HBase通过HTable的put方法来Put数据，可以是一行数据也可以是数据集。

(36)

代码样例

以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的 testPut方法中

public void testPut() {

LOG.info("Entering testPut.");

byte[] familyName = Bytes.toBytes("info");

// Specify the column name.

byte[][] qualifiers = {Bytes.toBytes("name"), Bytes.toBytes("gender"), Bytes.toBytes("age"), Bytes.toBytes("address")};

Table table = null;

try {

// Instantiate an HTable object.

table = conn.getTable(tableName);

List<Put> puts = new ArrayList<Put>();

// Instantiate a Put object.

Put put = new Put(Bytes.toBytes("012005000201"));

put.addColumn(familyName, qualifiers[0], Bytes.toBytes("Zhang San"));

put.addColumn(familyName, qualifiers[1], Bytes.toBytes("Male"));

put.addColumn(familyName, qualifiers[2], Bytes.toBytes("19"));

put.addColumn(familyName, qualifiers[3], Bytes.toBytes("Shenzhen, Guangdong"));

puts.add(put);

put = new Put(Bytes.toBytes("012005000202"));

put.addColumn(familyName, qualifiers[0], Bytes.toBytes("Li Wanting"));

put.addColumn(familyName, qualifiers[1], Bytes.toBytes("Female"));

put.addColumn(familyName, qualifiers[3], Bytes.toBytes("Shijiazhuang, Hebei"));

puts.add(put);

put.addColumn(familyName, qualifiers[0], Bytes.toBytes("Wang Ming"));

put.addColumn(familyName, qualifiers[3], Bytes.toBytes("Ningbo, Zhejiang"));

puts.add(put);

put.addColumn(familyName, qualifiers[0], Bytes.toBytes("Li Gang"));

put.addColumn(familyName, qualifiers[3], Bytes.toBytes("Xiangyang, Hubei"));

puts.add(put);

put.addColumn(familyName, qualifiers[0], Bytes.toBytes("Zhao Enru"));

put.addColumn(familyName, qualifiers[3], Bytes.toBytes("Shangrao, Jiangxi"));

puts.add(put);

put.addColumn(familyName, qualifiers[0], Bytes.toBytes("Chen Long"));

put.addColumn(familyName, qualifiers[3], Bytes.toBytes("Zhuzhou, Hunan"));

puts.add(put);

put.addColumn(familyName, qualifiers[0], Bytes.toBytes("Zhou Wei"));

put.addColumn(familyName, qualifiers[3], Bytes.toBytes("Nanyang, Henan"));

puts.add(put);

(37)

put.addColumn(familyName, qualifiers[0], Bytes.toBytes("Yang Yiwen"));

put.addColumn(familyName, qualifiers[3], Bytes.toBytes("Kaixian, Chongqing"));

puts.add(put);

put.addColumn(familyName, qualifiers[0], Bytes.toBytes("Xu Bing"));

put.addColumn(familyName, qualifiers[3], Bytes.toBytes("Weinan, Shaanxi"));

puts.add(put);

put.addColumn(familyName, qualifiers[0], Bytes.toBytes("Xiao Kai"));

put.addColumn(familyName, qualifiers[3], Bytes.toBytes("Dalian, Liaoning"));

puts.add(put);

// Submit a put request.

table.put(puts);

LOG.info("Put successfully.");

} catch (IOException e) { LOG.error("Put failed ", e);

} finally {

if (table != null) { try {

// Close the HTable object.

table.close();

LOG.error("Close table failed ", e);

} } }

LOG.info("Exiting testPut.");

}

注意事项

不允许多个线程在同一时间共用同一个HTable实例。HTable是一个非线程安全类，因此，同一个HTable实例，不应该被多个线程同时使用，否则可能会带来并发问题。

3.3.8 删除数据

功能简介

HBase通过Table实例的delete方法来Delete数据，可以是一行数据也可以是数据集。

代码样例

以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的 testDelete方法中

public void testDelete() {

LOG.info("Entering testDelete.");

byte[] rowKey = Bytes.toBytes("012005000201");

Table table = null;

try {

// Instantiate an HTable object.

(38)

// Instantiate a Delete object.

Delete delete = new Delete(rowKey);

// Submit a delete request.

table.delete(delete);

LOG.info("Delete table successfully.");

LOG.error("Delete table failed ", e);

} finally {

table.close();

} } }

LOG.info("Exiting testDelete.");

}

3.3.9 使用 Get 读取数据

功能简介

要从表中读取一条数据，首先需要实例化该表对应的Table实例，然后创建一个Get对象。也可以为Get对象设定参数值，如列族的名称和列的名称。查询到的行数据存储在 Result对象中，Result中可以存储多个Cell。

代码样例

以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的 testGet方法中

public void testGet() {

LOG.info("Entering testGet.");

byte[] familyName = Bytes.toBytes("info");

// Specify the column name.

byte[][] qualifier = {Bytes.toBytes("name"), Bytes.toBytes("address")};

// Specify RowKey.

byte[] rowKey = Bytes.toBytes("012005000201");

Table table = null;

try {

// Create the Configuration instance.

// Instantiate a Get object.

Get get = new Get(rowKey);

// Set the column family name and column name.

get.addColumn(familyName, qualifier[0]);

get.addColumn(familyName, qualifier[1]);

// Submit a get request.

Result result = table.get(get);

// Print query results.

for (Cell cell : result.rawCells()) {

LOG.info(Bytes.toString(CellUtil.cloneRow(cell)) + ":"

+ Bytes.toString(CellUtil.cloneFamily(cell)) + ","

(39)

+ Bytes.toString(CellUtil.cloneQualifier(cell)) + ","

+ Bytes.toString(CellUtil.cloneValue(cell)));

}

LOG.info("Get data successfully.");

LOG.error("Get data failed ", e);

} finally {

table.close();

} } }

LOG.info("Exiting testGet.");

}

3.3.10 使用 Scan 读取数据

功能简介

要从表中读取数据，首先需要实例化该表对应的Table实例，然后创建一个Scan对象，

并针对查询条件设置Scan对象的参数值，为了提高查询效率，最好指定StartRow和 StopRow。查询结果的多行数据保存在ResultScanner对象中，每行数据以Result对象形式存储，Result中存储了多个Cell。

代码样例

以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的 testScanData方法中

public void testScanData() {

LOG.info("Entering testScanData.");

Table table = null;

// Instantiate a ResultScanner object.

ResultScanner rScanner = null;

try {

Scan scan = new Scan();

scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"));

// Set the StartRow

scan.setStartRow(Bytes.toBytes("012005000202"));//注[1]

// Set the StopRow

scan.setStopRow(Bytes.toBytes("012005000210"));//注[1]

// Set the Caching size.

scan.setCaching(1000);//注[2]

// Set the Batch size.

scan.setBatch(100);//注[2]

// Submit a scan request.

rScanner = table.getScanner(scan);

for (Result r = rScanner.next(); r != null; r = rScanner.next()) { for (Cell cell : r.rawCells()) {

(40)

} }

LOG.info("Scan data successfully.");

LOG.error("Scan data failed ", e);

} finally {

if (rScanner != null) { // Close the scanner object.

rScanner.close();

}

table.close();

} } }

LOG.info("Exiting testScanData.");

}

注意事项

1. 建议Scan时指定StartRow和StopRow，一个有确切范围的Scan，性能会更好些。

2. 可以设置Batch和Caching关键参数。

– Batch

使用Scan调用next接口每次最大返回的记录数，与一次读取的列数有关。

– Caching

RPC请求返回next记录的最大数量，该参数与一次RPC获取的行数有关。

3.3.11 使用过滤器 Filter

功能简介

HBase Filter主要在Scan和Get过程中进行数据过滤，通过设置一些过滤条件来实现，

如设置RowKey、列名或者列值的过滤条件。

代码样例

以下代码片段在com.huawei.bigdata.hbase.examples包的“HBaseSample”类的 testFilterList方法中

public void testFilterList() {

LOG.info("Entering testFilterList.");

Table table = null;

// Instantiate a ResultScanner object.

ResultScanner rScanner = null;

try {

Scan scan = new Scan();

(41)

scan.addColumn(Bytes.toBytes("info"), Bytes.toBytes("name"));

// Instantiate a FilterList object in which filters have "and"

// relationship with each other.

FilterList list = new FilterList(Operator.MUST_PASS_ALL);

// Obtain data with age of greater than or equal to 20.

//MRS 3.x及其之后版本使用CompareOperator替换CompareOp

list.addFilter(new SingleColumnValueFilter(Bytes.toBytes("info"), Bytes.toBytes("age"), CompareOp.GREATER_OR_EQUAL, Bytes.toBytes(new Long(20))));

// Obtain data with age of less than or equal to 29.

//MRS 3.x及其之后版本使用CompareOperator替换CompareOp

list.addFilter(new SingleColumnValueFilter(Bytes.toBytes("info"), Bytes.toBytes("age"), CompareOp.LESS_OR_EQUAL, Bytes.toBytes(new Long(29))));

scan.setFilter(list);

// Submit a scan request.

rScanner = table.getScanner(scan);

for (Result r = rScanner.next(); r != null; r = rScanner.next()) { for (Cell cell : r.rawCells()) {

} }

LOG.info("Filter list successfully.");

LOG.error("Filter list failed ", e);

} finally {

if (rScanner != null) { // Close the scanner object.

rScanner.close();

}

table.close();

} } }

LOG.info("Exiting testFilterList.");

}

3.3.12 添加二级索引

功能介绍

您可以使用org.apache.hadoop.hbase.hindex.client.HIndexAdmin中提供的方法来管理HIndexes。该类提供了将索引添加到现有表的方法：

根据用户是否希望在添加索引操作期间构建索引数据，有两种不同的方法可将索引添加到表中：

● addIndicesWithData()

● addIndices()

代码样例

以下代码片段在com.huawei.bigdata.hbase.examples包的“HIndexExample”类的 addIndicesExample方法中：

(42)

addIndices(): 将索引添加到没有数据的表中

public void addIndicesExample() { LOG.info("Entering Adding a Hindex.");

// Create index instance

TableIndices tableIndices = new TableIndices();

HIndexSpecification spec = new HIndexSpecification(indexNameToAdd);

//MRS 3.x及其以后版本推荐使用ColumnFamilyDescriptorBuilder替换HColumnDescriptor来构造添加列操作。如下，

//

spec.addIndexColumn(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("info")).build(),"name", //ValueType.STRING, null);

spec.addIndexColumn(new HColumnDescriptor("info"), "name", ValueType.STRING);

tableIndices.addIndex(spec);

Admin admin = null;

HIndexAdmin iAdmin = null;

try {

iAdmin = HIndexClient.newHIndexAdmin(admin);

// add index to the table

iAdmin.addIndices(tableName, tableIndices);

// Alternately, add the specified indices with data // iAdmin.addIndicesWithData(tableName, tableIndices);

LOG.info("Successfully added indices to the table " + tableName);

LOG.error("Add Indices failed for table " + tableName + "." + e);

} finally {

if (iAdmin != null) { try {

// Close the HIndexAdmin object.

iAdmin.close();

LOG.error("Failed to close HIndexAdmin ", e);

} }

admin.close();

} } }

LOG.info("Exiting Adding a Hindex.");

}

以下代码片段在com.huawei.bigdata.hbase.examples包的“HIndexExample”类的 addIndicesExampleWithData方法中：

addIndicesWithData():将索引添加到具有大量预先存在数据的表中

public void addIndicesExampleWithData() { LOG.info("Entering Adding a Hindex With Data.");

// Create index instance

TableIndices tableIndices = new TableIndices();

HIndexSpecification spec = new HIndexSpecification(indexNameToAdd);

//MRS 3.x及其以后版本推荐使用ColumnFamilyDescriptorBuilder替换HColumnDescriptor来构造添加列操作。如下，

//spec.addIndexColumn(ColumnFamilyDescriptorBuilder.newBuilder(Bytes.toBytes("info")).build(),"age", //ValueType.STRING, null);

spec.addIndexColumn(new HColumnDescriptor("info"), "age", ValueType.STRING);

tableIndices.addIndex(spec);

Admin admin = null;

HIndexAdmin iAdmin = null;

try {

iAdmin = HIndexClient.newHIndexAdmin(admin);

// add index to the table

应用开发简介_MapReduce服务 MRS_开发指南（适用于2.x及之前）_OpenTSDB应用开发_概述_华为云

开发指南

目 录

1 简介...1

2 MapReduce 服务样例工程构建方式...2

3 HBase 应用开发... 5

4 Hive 应用开发... 82

5 MapReduce 应用开发...149

6 HDFS 应用开发... 180

7 Spark 应用开发...227

8 Storm 应用开发... 381

9 Kafka 应用开发...411

10 Presto 应用开发...430

11 OpenTSDB 应用开发... 450

12 Flink 应用开发... 475

13 Impala 应用开发... 550

14 Alluxio 应用开发... 572

15 附录... 581

1 简介

概述

开发准备

2 MapReduce 服务样例工程构建方式

构建流程

样例工程获取地址

mapreduceservice.obs-website.cn-north-1.myhuaweicloud.com/。

huaweicloud/huaweicloud-mrs-example/tree/mrs-1.8。

huaweicloud/huaweicloud-mrs-example/tree/mrs-1.9。

huaweicloud/huaweicloud-mrs-example/tree/mrs-2.1。

华为云开源镜像配置方式

3 HBase 应用开发

3.1 概述

3.1.1 应用开发简介

HBase 简介

接口类型简介

apidocs/index.html。

3.1.2 常用概念

3.1.3 开发流程

3.2 环境准备

3.2.1 开发和运行环境简介

准备运行调测环境

3.2.2 准备开发用户

前提条件

操作步骤

参考信息

3.2.3 配置并导入样例工程

前提条件

操作步骤

3.3 开发程序

3.3.1 典型场景开发思路

场景说明

数据规划

功能分解

关键设计原则

3.3.2 创建 Configuration

功能介绍

代码样例

3.3.3 创建 Connection

功能介绍

代码样例

3.3.4 创建表

功能简介

代码样例

3.3.5 删除表

功能简介

代码样例

注意事项

3.3.6 修改表

功能简介

代码样例

3.3.7 插入数据

功能简介

代码样例

注意事项

3.3.8 删除数据

功能简介

代码样例

3.3.9 使用 Get 读取数据

功能简介

代码样例

3.3.10 使用 Scan 读取数据

目录

1 ^简介