Меню
Книга

Apache Hive Essentials

Автор: Dayong Du
Для новичков
Язык: Английский

Apache Hive — по сути надстройка над Hadoop, которая позволяет получать с помощью любимого всеми SQL писать запросы, которые будут трансформированы в MapReduce. Многие BI инструменты имеет Hive ODBC драйвер в своем арсенале, для доступа к системам Big Data.

Цитируем:

«Hive is a standard for SQL queries over petabytes of data in Hadoop. It provides SQL-like access for data in HDFS making Hadoop to be used like a warehouse structure. The Hive Query Language (HQL) has similar semantics and functions as standard SQL in the relational database so that experienced database analysts can easily get their hands on it. Hive's query language can run on different computing frameworks, such as MapReduce, Tez, and Spark for better performance. Hive's data model provides a high-level, table-like structure on top of HDFS. It supports three data structures: tables, partitions, and buckets, where tables correspond to HDFS directories and can be divided into partitions, which in turn can be divided into buckets. Hive supports a majority of primitive data formats such as TIMESTAMP, STRING, FLOAT, BOOLEAN, DECIMAL, DOUBLE, INT, SMALLINT, BIGINT, and complex data types, such as UNION, STRUCT, MAP, and ARRAY».