Forked from vinodkc/Spark HWC integration - HDP 3 Secure cluster.md
Created
February 12, 2019 11:27
-
-
Save snemuri/c91d1fc6d2d170c2f4187677fd56be7d to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Prerequisites : | |
>> Kerberized Cluster | |
>>Enable hive interactive server in hive | |
>>Get following details from hive for spark | |
spark.hadoop.hive.llap.daemon.service.hosts @llap0 | |
spark.sql.hive.hiveserver2.jdbc.url jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive | |
spark.datasource.hive.warehouse.metastoreUri thrift://c420-node3.squadron-labs.com:9083 | |
Basic testing : | |
1) Create a table employee in hive and load some data | |
eg: | |
Create table | |
---------------- | |
CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, destination String) | |
COMMENT 'Employee details' | |
ROW FORMAT DELIMITED | |
FIELDS TERMINATED BY ',' | |
LINES TERMINATED BY '\n' | |
STORED AS TEXTFILE; | |
Load data data.txt file into hdfs | |
--------------- | |
1201,Gopal,45000,Technical manager | |
1202,Manisha,45000,Proof reader | |
1203,Masthanvali,40000,Technical writer | |
1204,Kiran,40000,Hr Admin | |
1205,Kranthi,30000,Op Admin | |
LOAD DATA INPATH '/tmp/data.txt' OVERWRITE INTO TABLE employee; | |
2) kinit to the spark user and run | |
spark-shell --master yarn --conf "spark.security.credentials.hiveserver2.enabled=false" --conf "spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;principal=hive/[email protected]" --conf "spark.datasource.hive.warehouse.metastoreUri=thrift://c420-node3.squadron-labs.com:9083" --conf "spark.datasource.hive.warehouse.load.staging.dir=/tmp/" --conf "spark.hadoop.hive.llap.daemon.service.hosts=@llap0" --conf "spark.hadoop.hive.zookeeper.quorum=c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar | |
Note: spark.security.credentials.hiveserver2.enabled should be set to false for YARN client deploy mode, and true for YARN cluster deploy mode (by default). This configuration is required for a Kerberized cluster | |
3) run following code in scala shell to view the table data | |
import com.hortonworks.hwc.HiveWarehouseSession | |
val hive = HiveWarehouseSession.session(spark).build() | |
hive.execute("show tables").show | |
hive.executeQuery("select * from employee").show | |
4) To apply common properties by default, add following setting into ambari spark2 custom conf | |
spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;principal=hive/[email protected] | |
spark.datasource.hive.warehouse.metastoreUri=thrift://c420-node3.squadron-labs.com:9083 | |
spark.datasource.hive.warehouse.load.staging.dir=/tmp/ | |
spark.hadoop.hive.llap.daemon.service.hosts=@llap0 | |
spark.hadoop.hive.zookeeper.quorum=c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181 | |
5) spark-shell --master yarn --conf "spark.security.credentials.hiveserver2.enabled=false" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar | |
Note: Common properties are read from spark default properties | |
6) run following code in scala shell to view the hive table data | |
import com.hortonworks.hwc.HiveWarehouseSession | |
val hive = HiveWarehouseSession.session(spark).build() | |
hive.execute("show tables").show | |
hive.executeQuery("select * from employee").show | |
7) To integrate HWC in Livy2 | |
a) add following property in Custom livy2-conf | |
livy.file.local-dir-whitelist=/usr/hdp/current/hive_warehouse_connector/ | |
b) Add hive-site.xml to /usr/hdp/current/spark2-client/conf on all cluster nodes. | |
c) Login to Zeppelin and in livy2 interpreter settings add following | |
livy.spark.hadoop.hive.llap.daemon.service.hosts @llap0 | |
livy.spark.security.credentials.hiveserver2.enabled true | |
livy.spark.sql.hive.hiveserver2.jdbc.url jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive | |
livy.spark.sql.hive.hiveserver2.jdbc.url.principal hive/[email protected] | |
livy.spark.yarn.security.credentials.hiveserver2.enabled true | |
livy.spark.jars file:///usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar | |
d) Restart livy2 interpreter | |
e) in first paragraph add | |
%livy2 | |
import com.hortonworks.hwc.HiveWarehouseSession | |
val hive = HiveWarehouseSession.session(spark).build() | |
f) in second paragraph add | |
%livy2 | |
hive.executeQuery("select * from employee").show | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment