Hive中DDL和DML的示例分析
这篇文章主要为大家展示了“Hive中DDL和DML的示例分析”,内容简而易懂,条理清晰,希望能够帮助大家解决疑惑,下面让小编带领大家一起研究并学习一下“Hive中DDL和DML的示例分析”这篇文章吧。
创新互联客户idc服务中心,提供眉山联通机房、成都服务器、成都主机托管、成都双线服务器等业务的一站式服务。通过各地的服务中心,我们向成都用户提供优质廉价的产品以及开放、透明、稳定、高性价比的服务,资深网络工程师在机房提供7*24小时标准级技术保障。
Hive构建在Hadoop之上:
Hive的数据存放在HDFS之上
Hive的元数据可以存放在RDBMS之上
一、DDL:Data Defination Language
1.1 Hive创建数据库的语法:
CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name
[COMMENT database_comment] -- 数据库注释
[LOCATION hdfs_path] -- 数据库存放在hdfs上的路径 默认: /user/hive/warehouse/
[WITH DBPROPERTIES (property_name=property_value, ...)];
1.2 Hive创建的数据库默认存放路径:/user/hive/warehouse/.db
1.3 default是Hive中默认的一个数据库。
1.4 Hive删除数据库的语法:
DROP (DATABASE|SCHEMA) [IF EXISTS] database_name [RESTRICT|CASCADE];
其中CASCADE关键字表示强制删除。
DROP DATABASE IF EXISTS gw_db CASCADE;
删除一个数据库,默认情况下,hive不允许删除含有表的数据库,要先将数据库中的表清空才能drop,否则会报错
hive> drop database users;
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database gw_db is not empty. One or more tables exist.)
hive> DROP DATABASE IF EXISTS gw_db CASCADE; -- 加入CASCADE关键字,可以强制删除一个数据库
OK
Time taken: 2.292 seconds
1.5 Hive数据库使用的命令:
create 创建数据库
alter 修改数据库
drop 删除数据库
show databases; 显示所有表数据库
desc database xxx; 查看数据库信息
use 切换数据库
二、Data Manipulation Language
2.1 Hive建表语法:
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name -- (Note: TEMPORARY available in Hive 0.14.0 and later)
[(col_name data_type [COMMENT col_comment], ... [constraint_specification])]
[COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)]
[CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS]
[SKEWED BY (col_name, col_name, ...) -- (Note: Available in Hive 0.10.0 and later)]
ON ((col_value, col_value, ...), (col_value, col_value, ...), ...)
[STORED AS DIRECTORIES]
[
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name' [WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later)
]
[LOCATION hdfs_path]
[TBLPROPERTIES (property_name=property_value, ...)] -- (Note: Available in Hive 0.6.0 and later)
[AS select_statement]; -- (Note: Available in Hive 0.5.0 and later; not supported for external tables)
CREATE [TEMPORARY] [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name
LIKE existing_table_or_view_name
[LOCATION hdfs_path];
data_type
: primitive_type
| array_type
| map_type
| struct_type
| union_type -- (Note: Available in Hive 0.7.0 and later)
primitive_type
: TINYINT
| SMALLINT
| INT
| BIGINT
| BOOLEAN
| FLOAT
| DOUBLE
| DOUBLE PRECISION -- (Note: Available in Hive 2.2.0 and later)
| STRING
| BINARY -- (Note: Available in Hive 0.8.0 and later)
| TIMESTAMP -- (Note: Available in Hive 0.8.0 and later)
| DECIMAL -- (Note: Available in Hive 0.11.0 and later)
| DECIMAL(precision, scale) -- (Note: Available in Hive 0.13.0 and later)
| DATE -- (Note: Available in Hive 0.12.0 and later)
| VARCHAR -- (Note: Available in Hive 0.12.0 and later)
| CHAR -- (Note: Available in Hive 0.13.0 and later)
array_type
: ARRAY < data_type >
map_type
: MAP < primitive_type, data_type >
struct_type
: STRUCT < col_name : data_type [COMMENT col_comment], ...>
union_type
: UNIONTYPE < data_type, data_type, ... > -- (Note: Available in Hive 0.7.0 and later)
row_format
: DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
constraint_specification:
: [, PRIMARY KEY (col_name, ...) DISABLE NOVALIDATE ]
[, CONSTRAINT constraint_name FOREIGN KEY (col_name, ...) REFERENCES table_name(col_name, ...) DISABLE NOVALIDATE
常用的基本数据类型:
数值类型:int bigint float double decimal
字符串类型:string
2.2 分隔符
行:
列: \001 我们看到的是:^A
2.3 创建表
建ruozedata_emp表:
hive> use ruozedata;
hive> create table if not exists ruozedata_emp
> (empno int, ename string, job string, mgr int, hiredate string, salary double, comm double, deptno int)
> ROW FORMAT DELIMITED
> FIELDS TERMINATED BY '\t' ;
OK
Time taken: 0.262 seconds
查看ruozedata_emp表信息:
hive> desc formatted ruozedata_emp;
OK
# col_name data_type comment
empno int
ename string
job string
mgr int
hiredate string
salary double
comm double
deptno int
# Detailed Table Information
Database: ruozedata -- 数据库名
Owner: hadoop -
CreateTime: Thu Jun 21 13:20:31 CST 2018 -- 创建时间
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://hadoop002:9000/ruozedata_03/ruozedata_emp -- 存储在hdfs上的路径
Table Type: MANAGED_TABLE -- 表类型默认为 内部表
Table Parameters:
transient_lastDdlTime 1529558431
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.184 seconds, Fetched: 34 row(s)
2.4 加载数据文件到表:
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
filepath:文件路径
[LOCAL]:
local: 从本地文件系统加载数据到hive表
非local:从HDFS文件系统加载数据到hive表
[OVERWRITE]:
OVERWRITE: 加载数据到表的时候数据的处理方式,覆盖
非OVERWRITE:追加
[PARTITION (partcol1=val1, partcol2=val2 ...)]:指定分区
加载数据到ruozedata_emp表:
hive> LOAD DATA LOCAL INPATH '/home/hadoop/data/emp.txt' OVERWRITE INTO TABLE ruozedata_emp;
Loading data to table ruozedata.ruozedata_emp
Table ruozedata.ruozedata_emp stats: [numFiles=1, numRows=0, totalSize=652, rawDataSize=0]
OK
Time taken: 1.053 seconds
查看ruozedata_emp表里的数据:
hive> select * from ruozedata_emp;
OK
7839 KING PRESIDENT NULL 1981-11-17 5000 NULL 10
7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30
7876 ADAMS CLERK 7788 1987-05-23 1100 NULL 20
7900 JAMES CLERK 7698 1981-12-03 950 NULL 30
7902 FORD ANALYST 7566 1981-12-03 3000 NULL 20
7934 MILLER CLERK 7782 1982-01-23 1300 NULL 10
7369 SMITH CLERK 7902 1980-12-17 800 NULL 20
7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
7521 WARD SALESMAN 7698 1981-02-22 1250 500 30
7566 JONES MANAGER 7839 1981-04-02 2975 NULL 20
7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
7698 BLAKE MANAGER 7839 1981-05-01 2850 NULL 30
7782 CLARK MANAGER 7839 1981-06-09 2450 NULL 10
7788 SCOTT ANALYST 7566 1987-04-19 3000 NULL 20
Time taken: 0.205 seconds, Fetched: 15 row(s)
查看hdfs上的文件:
[hadoop@hadoop002 app]$ hadoop fs -ls hdfs://hadoop002:9000/ruozedata_03/ruozedata_emp
18/06/21 13:51:49 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 1 items
-rwxr-xr-x 1 hadoop supergroup 652 2018-06-21 13:25 hdfs://hadoop002:9000/ruozedata_03/ruozedata_emp/emp.txt
查看emp.txt文件的内容:
[hadoop@hadoop002 app]$ hadoop fs -text hdfs://hadoop002:9000/ruozedata_03/ruozedata_emp/emp.txt
18/06/21 13:52:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
7839 KING PRESIDENT NULL 1981-11-17 5000 NULL 10
7844 TURNER SALESMAN 7698 1981-09-08 1500 0 30
7876 ADAMS CLERK 7788 1987-05-23 1100 NULL 20
7900 JAMES CLERK 7698 1981-12-03 950 NULL 30
7902 FORD ANALYST 7566 1981-12-03 3000 NULL 20
7934 MILLER CLERK 7782 1982-01-23 1300 NULL 10
7369 SMITH CLERK 7902 1980-12-17 800 NULL 20
7499 ALLEN SALESMAN 7698 1981-02-20 1600 300 30
7521 WARD SALESMAN 7698 1981-02-22 1250 500 30
7566 JONES MANAGER 7839 1981-04-02 2975 NULL 20
7654 MARTIN SALESMAN 7698 1981-09-28 1250 1400 30
7698 BLAKE MANAGER 7839 1981-05-01 2850 NULL 30
7782 CLARK MANAGER 7839 1981-06-09 2450 NULL 10
7788 SCOTT ANALYST 7566 1987-04-19 3000 NULL 20
[hadoop@hadoop002 app]$
2.5 使用其他方式创建表
用现有表创建一个新表,包括表结构和数据,这个过程要走mr
CREATE table ruozedata_emp2 as select * from ruozedata_emp;
用现有表创建一个新表,仅仅只有表结构,没有数据,这个过程要走mr
CREATE table ruozedata_emp3 as select * from ruozedata_emp where 1=2;
用现有表创建一个新表,仅仅只有表结构,没有数据,不走mr
CREATE table ruozedata_emp4 like ruozedata_emp;
2.6 修改表名
ALTER TABLE table_name RENAME TO new_table_name;
2.7 删除表数据
DELETE FROM tablename [WHERE expression]; -- 删除表数据
TRUNCATE TABLE table_name [PARTITION partition_spec]; -- 清空表
2.8 删除表
DROP TABLE [IF EXISTS] table_name [PURGE];
2.9 Hive表常用的命令:
create 创建表
alter 修改表
drop 删除表
show tables 显示当前数据库的所有表
show create table XXX; 显示xxx表的创建信息
desc [formatted] 查看表信息
以上是“Hive中DDL和DML的示例分析”这篇文章的所有内容,感谢各位的阅读!相信大家都有了一定的了解,希望分享的内容对大家有所帮助,如果还想学习更多知识,欢迎关注创新互联行业资讯频道!
分享文章:Hive中DDL和DML的示例分析
分享URL:http://myzitong.com/article/gihgdo.html