Learn how to work with the Hive interactive shell.
Learn how to create tables in Hive.
Learn how to load data into Hive tables.
Learn how to run basic Hive querys.
This section shows the basic usage of Hadoop Hive. Hive uses a SQL-like language called HiveQL, and runs on top of Hadoop. Instead of writing raw MapReduce programs, Hive allows you to perform data warehouse tasks using a simple and familiar query language. After completing this section, you will be able to use HiveQL to query big data.
Interactive shell
In the sample code below we will continue to use the same event tuple patient data. Let's start the Hive CLI interactive shell first by typing hive in the command line.
Create table
Before loading data, we first need to define a table just like we would if we were working with a database server such as SQL.
And you can check existing tables and schema with the commands SHOW TABLES; and DESCRIBE table_name; respectively.
Load data
Next we'll insert data into the table.
Query
Basic
With the data loaded you can run familiar SQL statements like:
Save result
You can also save query results to local directory (in the local file system):
You can learn more about Hive syntax from the language manual.
Besides shell
Besides running commands with the interactive shell, you can also run a script in batch mode automatically. For example, in the sample/hive folder, you can run the entire sample.hql script with the command:
The contents of the script is simply all of the commands that we ran in the shell, with one additional statement to drop existing table if necessary:
Furthermore, it's also possible to run hive as a server and connect to the server with JDBC or with its beeline client.
Related tools
Hive translate queries into a series of MapReduce jobs, therefore it is not suitable for real-time use cases. Alternative tools inspired and influenced by Hive are getting more attention lately, for example, Cloudera Impala and Spark SQL.