Hive Overview

The Sisense ODBC Driver with SQL Connector for Apache Hive enables standard SQL-92 access directly to Apache Hive distributions. It efficiently maps SQL to HiveQL delivering full SQL application functionality and real-time analytic and reporting capabilities to viewers.

Downloading and Connecting the Sisense Hive ODBC Driver

  1. Download and install the Sisense Hive ODBC Driver.
  2. In ElastiCube Manager, click Add Data and then, Generic ODBC Driver.
  3. Select DSN. If a DSN file has already been created, select it in the dropdown list, and click Test Connection. If you need to add a DSN, see Adding a DSN. If your connection has been set up, you can click OK and review the tables generated by the Sisense Hive ODBC driver.
    ODBC PROPS
    OR
    Select Connection String (DSN-Less) and enter your Connection String. Click Test Connection.  For more information, see Connecting without a DSN.

Adding a DSN

If a DSN has not yet been created, you must add the DSN manually.

Note: To add a DSN you must run the Sisense ElastiCube Manager as an administrator.

To add a DSN:

  1. Click Add DSN.
    ODBC PROPS
  2. Select the System Data Source option. The created file applies to all users in a specific machine only. Click Next.
    CreateNewDataSource
  3. Select the Sisense Hive ODBC Driver, and click Next.
    Hive Drivers
  4. Click Finish.
    Hive Drivers1
  5. In the Sisense Hive ODBC Driver DSN Setup window, do the following:
    Hive Setup
FieldProcedure
Data Source NameEnter your name for the DSN that is to be displayed in the DSN dropdown list.
DescriptionEnter your description of the DSN.
Hive Server TypeSelect the server type used by your data source.

Note: If you are connecting through Apache ZooKeeper, then Hive Server 1 is not supported.
Service Discovery ModeSelect the Service Discovery Mode. There are two possible options you can select:

No Service Discovery: The driver connects to Hive without using the ZooKeeper service.

ZooKeeper: The driver discovers Hive Server 2 services via the ZooKeeper service.
Host(s)If you selected No Service Discovery above as the Service Discovery Mode, then enter the IP address or hostname of the Hive server.

OR

If you selected ZooKeeper as the Service Discovery Mode, enter a comma-separated list of ZooKeeper servers.

Use the following format, where zk_host is the IP address or hostname of the ZooKeeper server and zk_port is the number of the port that the ZooKeeper server uses: zk_host1:zk_port1,zk_host2:zk_port2 .
PortIf you selected No Service Discovery above as the Service Discovery Mode, then enter the number of the TCP port on which the Hive server is listening. Otherwise, do not type a value in the field.
DatabaseEnter the name of the database schema to use when a schema is not explicitly specified in a query. You can still issue queries on other schemas by explicitly specifying the schema in the query. To inspect your databases and determine the appropriate schema to use, type the show databases command at the Hive command prompt.
ZooKeeper NamespaceIf you selected ZooKeeper as the Service Discovery Mode, enter the namespace on ZooKeeper under which Hive Server 2 znodes are added. Otherwise, do not type a value in the field.
AuthenticationSome Hive servers are configured to require authentication for access. To connect to a Hive server, you must configure the Sisense Hive ODBC Driver to use the authentication mechanism that matches the access requirements of the server and provides the necessary credentials

Hive Server 1 does not support authentication.

Most default configurations of Hive Server 2 require User Name authentication. To verify the authentication mechanism that you need to use for your connection, check the configuration of your Hadoop/Hive distribution.
MechanismIf your Hive server is configured to use authentication, select the appropriate authentication mechanism and provide the necessary credentials and authentication settings in the fields below the list.
RealmThis field is available when Kerberos is selected as your authentication mechanism.

Enter the realm of the Hive Server 2 host.

If your Kerberos configuration already defines the realm of the Hive Server 2 host as the default realm, then you do not need to configure this option.
Host FQDNThis field is available when Kerberos is selected as your authentication mechanism.

Enter the HS2HostFQDN key to the fully qualified domain name of the Hive Server 2 host.
Service NameThis field is available when Kerberos is selected as your authentication mechanism.

Enter the HS2KrbServiceName key to the service name of the Hive Server 2 host.
User NameThis field is available when User Name, User Name and Password, or Windows Azure is selected as your authentication mechanism.

The user name that you use to access Hive Server 2.
PasswordThis field is available when User Name and Password or Windows Azure is selected as your authentication mechanism.

The password corresponding to the user name that you provided in the User Name field (the UID key).
Delegation UIDEnter the name of the user to be delegated in the Delegation UID field.

The Delegation UID represents a user that is different than the authenticated user for the connection.

This option is applicable only when connecting to a Hive Server 2 instance that supports this feature.
Thrift TransportSelect the transport protocol to use in the Thrift layer. For Hive Server 1, select Binary. When using an authentication mechanism, the Binary transport protocol is not supported.

For Hive Server 2, select SASL.

If you select HTTP, the HTTP Options button is enabled. See Configuring HTTP Options for more information.

To configure client-server verification over SSL, click SSL Options. See SSL Options for more information.
  1. (Optional) To configure advanced driver options, click Advanced Options. For more information, see Advanced Options.
  2. (Optional) To configure logging behavior for the driver, click Logging Options. For more information, see Logging Options.
  3. Click Test. A pop window is displayed that indicates the connection was successful.
  4. Click OK.
  5. In the ODBC Connectivity Properties window, click Test Connection.
  6. Click OK to save and close the ODBC Data Source Administrator.

Configuring HTTP Options

You can configure options such as custom headers when using the HTTP transport
protocol in the Thrift layer.

To configure HTTP options:

  1. In the Sisense Hive ODBC Driver DSN Setup window, select HTTP as the HTTP transport protocol from the Thrift Transport list.
  2. Click HTTP Options. The HTTP Properties window is displayed.
    HTTP Properties
  3. In the HTTP Path field, enter the partial URL corresponding to the Hive server.
  4. Click Add. The Edit Property window is displayed.
    EditProperties
  5. In the Key and Value fields, enter the appropriate keys and values and click OK.
  6. After you add an HTTP header, you can edit or delete the header by clicking Editor Delete.
  7. Click OK.

SSL Options

You can configure verification between the client and the Hive server over SSL.

To configure SSL verification:

  1. In the Sisense Hive ODBC Driver DSN Setup window, click SSL Options. The SSL Options window is displayed.
    HiveSSLProperties
  2. Select Enable SSL.
  3. Select the Allow Self-signed Server Certificate checkbox to allow self-signed certificates from the server.
  4. Select the Allow Common Name Host Name Mismatch checkbox to allow the common name of a CA-issued SSL certificate to not match the host name of the Hive server.
  5. In the Trusted Certificates field, enter the path or click Browse and navigate to the location of your PEM file if you want to the driver to load your SSL certificates when verifying the server.
    OR
    To use the trusted CA certificates PEM file that is installed with the driver, leave the Trusted Certificates field empty.
  6. Select the Two-Way SSL checkbox if you want to configure two-way SSL verification. The Client authentication fields are enabled.
  7. In the Client Certificate File field, enter the path or click Browse and navigate to the location of your PEM containing the client’s certificate.
  8. In the Client Private Key File field, enter the path or click Browse and navigate to the location of your file containing the client’s private key.
  9. In the Client Private Key Password field, if the private key file is protected with a password, enter the password. Select the Save Password (Encrypted) checkbox to save the password.
  10. Click OK.

Advanced Options

You can configure advanced driver by clicking Advanced Options in the Sisense Hive ODBC Driver DSN Setup window.
This displays the Advanced Options window from which you can configure the following options:
HiveAdvanced

FieldDescription
Use Native QuerySelect to disable the SQL Connector feature.

Select Use Native Query to disable the SQL Connector feature and allow the driver to execute HiveQL directly.

When Use Native Query is selected, the driver does not transform the queries emitted by an application, so the native query is used.

When Use Native Query is disabled, the driver transforms the queries emitted by an application and converts them into an equivalent form for Hive.
Fast SQLPrepareSelect to defer query execution to SQLExecute.
Driver Config Take PrecedenceSelect to allow driver-wide configurations to take precedence over connection and DSN settings.
Use Async ExecSelect to use the asynchronous version of the API call against Hive for executing a query.

This option is applicable only when connecting to a Hive cluster running Hive 0.12.0 or later.
Get Tables With QuerySelect to retrieve the names of tables in a database by using the SHOW TABLES query.

This option is applicable only when connecting to Hive Server 2.
Unicode SQL character typesSelect to enable the driver to return SQL_WVARCHAR instead of SQL_VARCHAR for STRING and VARCHAR columns, and SQL_WCHAR instead of SQL_CHAR for CHAR columns.
Show System TableSelect to enable the driver to return the hive_system table for catalog function calls such as SQLTables and SQLColumns.
Use only SSPISelect to handle Kerberos authentication using the SSPI plugin instead of Kerberos by default.
Rows fetched per blockEnter the number of rows to be fetched per block.
Default string column lengthEnter the maximum data length for STRING columns.
Binary column lengthEnter the maximum data length for BINARY columns.
Decimal columnEnter the maximum number of digits to the right of the decimal point for numeric data types.
Async Exec Poll Interval (ms)Enter the time in milliseconds between each poll for the query execution status. This option is applicable only to HDInsight clusters.
Socket Timeout (s)Define the amount of time until the connection times out. The default amount of time is 30 seconds.
  1. Click Server Side Properties to configure the driver to apply configuration properties to the Hive server. For more information, see Configuring Server Side Properties.
  2. Click OK.

Configuring Server-Side Properties

You can use the driver to apply configuration properties to the Hive server.
The Sisense ODBC Driver with SQL Connector for Apache Hive allows you to set server-side properties via a DSN. Server-side properties specified in a DSN affect only the connection that is established using the DSN.

To create a server-side property:

  1. In the Sisense Hive ODBC Driver DSN Setup window, click Advanced Options > Side Server Properties. The Server Side Properties window is displayed.
    Hive Server Side Properties
  2. Click Add. The Edit Property window is displayed.
    EditProperties
  3. In the Key and Value fields, enter the appropriate keys and values and click OK.
    For a list of all Hadoop and Hive server-side properties that your implementation supports, type set -v at the Hive CLI command line or Beeline. You can also execute the set -v query after connecting using the driver. After you add your properties, you can edit or delete the properties by clicking Edit or Delete.
  4. Click OK.
  5. Select the Apply properties with queries checkbox to configure the driver to apply each server-side property by executing a query when opening a session to the Hive server.
    OR
    Clear the Apply Server Side Properties with Queries checkbox to configure the driver to use a more efficient method for applying server-side properties that does not involve additional network round-tripping.
    The more efficient method is not available for Hive Server 1, and it might not be compatible with some Hive Server 2 builds. If the server-side properties do not take effect when the checkbox is clear, then select the checkbox.
  6. Select the Convert Key Name to Lower Case checkbox to force the driver to convert server-side property key names to all lower case characters.
  7. Click OK.

Logging Options

You can configure logging options by clicking Logging Options in the Sisense Hive ODBC Driver DSN Setup window. This displays the Logging Options window.

HiveLog

The ODBC Data Source Administrator provides tracing functionality, which you can activate to help troubleshoot issues.

Important: Only enable logging long enough to capture an issue. Logging decreases performance
and can consume a large quantity of disk space.

The driver allows you to set the amount of detail included in log files. The table below  lists the logging levels provided by the Sisense ODBC Driver with SQL Connector for Hive, in order from least verbose to most verbose.

In the Logging Options window, you can configure the following information:

FieldDescription
Log LevelSelect the Log Level. There are seven possible options:

LOG_OFF: Disables all logging.
LOG_FATAL: Logs very severe error events that lead the driver to abort.
LOG_ERROR: Logs error events that might still allow the driver to continue running.
LOG_WARNING:Logs potentially harmful situations.
LOG_INFO: Logs general information that describes the progress of the driver.
LOG_DEBUG: Logs detailed information that is useful for debugging the driver.
LOG_TRACE: Logs more detailed information than the DEBUG level.
Log PathEnter the full path to the folder where you want to save log files.
OR
Click Browse and select the folder where you want to save log files.
Log NamespaceEnter a name for the log.

Connecting without a DSN

Some client applications provide support for connecting to a data source using a driver without a DSN. To configure a DSN-less connection, you can use a connection string.

The following is an example of a connection string for a DSN-less connection:

 

Driver=DriverNameOrFile;HOST=MyHiveServer;PORT=PortNumber; Schema=DefaultSchema;HiveServerType=ServerType

 

The table below provides a list of possible keys you can add to your string and their descriptions:

Key NameDescription
DriverNameOrFileThe absolute path of the shared object file for the driver.
MyHiveServerThe IP address or host name of the Hive server.
PortNumberThe number of the port that the Hive server uses.
DefaultSchemaThe database schema to use when a schema is not explicitly specified in a query.
ServerTypeThe server type, either 1 (for Hive Server 1) or 2 (for Hive Server 2).

Adding Hive Tables to your Project

After setting up the DSN or the DSN-less connection, the Add table from Hive Tables window is displayed.

From this window, you add your Hive tables to your ElastiCube Manager. In addition, you can view the SQL syntax in the Query Preview section and click Edit to customize it.

To add Hive Tables to your Project:

  1. Connect to Hive via the Sisense ODBC Tool.
  2. Add your data source.
  3. In the Add table from Hive Tables window, select the table you want to add to the ElastiCube Manager.
  4. Click Add. The selected tables are added to your ElastiCube Manager.