IMPORT - Verifying Sqoop - About the Tutorial

Step 8: Verifying Sqoop

3. IMPORT

emp_add:

id hno street city

1201 288A vgiri jublee

1202 108I aoc sec-bad

1203 144Z pgutta hyd

1204 78B old city sec-bad

1205 720X hitec sec-bad

emp_contact:

id phno email

1201 2356742 [email protected]

1202 1661663 [email protected]

1203 8887776 [email protected]

1204 9988774 [email protected]

1205 1231231 [email protected]

Importing a Table

Sqoop tool ‘import’ is used to import table data from the table to the Hadoop file system as a text file or a binary file.

The following command is used to import the emp table from MySQL database server to HDFS.

$ sqoop import \

--connect jdbc:mysql://localhost/userdb \ --username root \

--table emp --m 1

16 If it is executed successfully, then you get the following output.

14/12/22 15:24:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5 14/12/22 15:24:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.

14/12/22 15:24:56 INFO tool.CodeGenTool: Beginning code generation

14/12/22 15:24:58 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1

14/12/22 15:24:58 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop

14/12/22 15:25:11 INFO orm.CompilationManager: Writing jar file:

/tmp/sqoop-hadoop/compile/cebe706d23ebb1fd99c1f063ad51ebd7/emp.jar ---

---

14/12/22 15:25:40 INFO mapreduce.Job: The url to track the job:

http://localhost:8088/proxy/application_1419242001831_0001/

14/12/22 15:26:45 INFO mapreduce.Job: Job job_1419242001831_0001 running in uber mode : false

14/12/22 15:26:45 INFO mapreduce.Job: map 0% reduce 0%

14/12/22 15:28:08 INFO mapreduce.Job: map 100% reduce 0%

14/12/22 15:28:16 INFO mapreduce.Job: Job job_1419242001831_0001 completed successfully

--- ---

14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Transferred 145 bytes in 177.5849 seconds (0.8165 bytes/sec)

14/12/22 15:28:17 INFO mapreduce.ImportJobBase: Retrieved 5 records.

To verify the imported data in HDFS, use the following command.

$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*

It shows you the emp table data and fields are separated with comma (,).

17 1201, gopal, manager, 50000, TP

1202, manisha, preader, 50000, TP 1203, kalil, php dev, 30000, AC 1204, prasanth,php dev, 30000, AC 1205, kranthi, admin, 20000, TP

Importing into Target Directory

We can specify the target directory while importing table data into HDFS using the Sqoop import tool.

Following is the syntax to specify the target directory as option to the Sqoop import command.

--target-dir <new or exist directory in HDFS>

The following command is used to import emp_add table data into ‘/queryresult’

directory.

$ sqoop import \

--connect jdbc:mysql://localhost/userdb \ --username root \

--table emp_add \ --m 1 \

--target-dir /queryresult

The following command is used to verify the imported data in /queryresult directory form emp_add table.

$ $HADOOP_HOME/bin/hadoop fs -cat /queryresult/part-m-*

It will show you the emp_add table data with comma (,) separated fields.

1201, 288A, vgiri, jublee 1202, 108I, aoc, sec-bad 1203, 144Z, pgutta, hyd 1204, 78B, oldcity, sec-bad 1205, 720C, hitech, sec-bad

Import Subset of Table Data

We can import a subset of a table using the ‘where’ clause in Sqoop import tool. It executes the corresponding SQL query in the respective database server and stores the result in a target directory in HDFS.

The syntax for where clause is as follows.

--where <condition>

The following command is used to import a subset of emp_add table data. The subset query is to retrieve the employee id and address, who lives in Secunderabad city.

--where “city =’sec-bad’” \ --target-dir /wherequery

The following command is used to verify the imported data in /wherequery directory from the emp_add table.

$ $HADOOP_HOME/bin/hadoop fs -cat /wherequery/part-m-*

It will show you the emp_add table data with comma (,) separated fields.

1202, 108I, aoc, sec-bad 1204, 78B, oldcity,sec-bad 1205, 720C, hitech, sec-bad

Incremental Import

Incremental import is a technique that imports only the newly added rows in a table.

It is required to add ‘incremental’, ‘check-column’, and ‘last-value’ options to perform the incremental import.

The following syntax is used for the incremental option in Sqoop import command.

--incremental <mode>

--check-column <column name>

19 --last value <last check column value>

Let us assume the newly added data into emp table is as follows:

1206, satish p, grp des, 20000, GR

The following command is used to perform the incremental import in the emp table.

$ sqoop import \

--connect jdbc:mysql://localhost/userdb \ --username root \

--table emp \ --m 1 \

--incremental append \ --check-column id \ -last value 1205

The following command is used to verify the imported data from emp table to HDFS emp/ directory.

$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*

It shows you the emp table data with comma (,) separated fields.

1201, gopal, manager, 50000, TP 1202, manisha, preader, 50000, TP 1203, kalil, php dev, 30000, AC 1204, prasanth, php dev, 30000, AC 1205, kranthi, admin, 20000, TP 1206, satish p, grp des, 20000, GR

The following command is used to see the modified or newly added rows from the emp table.

$ $HADOOP_HOME/bin/hadoop fs -cat /emp/part-m-*1

It shows you the newly added rows to the emp table with comma (,) separated fields.

1206, satish p, grp des, 20000, GR

20 This chapter describes how to import all the tables from the RDBMS database server to the HDFS. Each table data is stored in a separate directory and the directory name is same as the table name.

Syntax

The following syntax is used to import all tables.

$ sqoop import-all-tables (generic-args) (import-args)

$ sqoop-import-all-tables (generic-args) (import-args)

Example

Let us take an example of importing all tables from the userdb database. The list of tables that the database userdb contains is as follows.

+---+

| Tables | +---+

| emp |

| emp_add |

| emp_contact | +---+

The following command is used to import all the tables from the userdb database.

$ sqoop import \

--connect jdbc:mysql://localhost/userdb \ --username root

Note: If you are using the import-all-tables, it is mandatory that every table in that database must have a primary key field.

The following command is used to verify all the table data to the userdb database in HDFS.

$ $HADOOP_HOME/bin/hadoop fs -ls

4. IMPORT-ALL-TABLES

21 It will show you the list of table names in userdb database as directories.

Output

drwxr-xr-x - hadoop supergroup 0 2014-12-22 22:50 _sqoop drwxr-xr-x - hadoop supergroup 0 2014-12-23 01:46 emp drwxr-xr-x - hadoop supergroup 0 2014-12-23 01:50 emp_add drwxr-xr-x - hadoop supergroup 0 2014-12-23 01:52 emp_contact

22 This chapter describes how to export data back from the HDFS to the RDBMS database. The target table must exist in the target database. The files which are given as input to the Sqoop contain records, which are called rows in table. Those are read and parsed into a set of records and delimited with user-specified delimiter.

The default operation is to insert all the record from the input files to the database table using the INSERT statement. In update mode, Sqoop generates the UPDATE statement that replaces the existing record into the database.

Syntax

The following is the syntax for the export command.

$ sqoop export (generic-args) (export-args)

$ sqoop-export (generic-args) (export-args)

Example

Let us take an example of the employee data in file, in HDFS. The employee data is available in emp_data file in ‘emp/’ directory in HDFS. The emp_data is as follows.

1201, gopal, manager, 50000, TP

It is mandatory that the table to be exported is created manually and is present in the database from where it has to be exported.

The following query is used to create the table ‘employee’ in mysql command line.

$ mysql

mysql> USE db;

mysql> CREATE TABLE employee ( id INT NOT NULL PRIMARY KEY, name VARCHAR(20),

deg VARCHAR(20),

在文檔中 About the Tutorial (頁 18-26)