snemuri · February 12, 2019 11:27
diff --git a/Spark HWC integration - HDP 3 Secure cluster b/Spark HWC integration - HDP 3 Secure cluster
 Prerequisites : 
 >> Kerberized Cluster

 >>Enable hive interactive server in hive

 >>Get following details from hive for spark

 spark.hadoop.hive.llap.daemon.service.hosts @llap0
 spark.sql.hive.hiveserver2.jdbc.url  jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive
 spark.datasource.hive.warehouse.metastoreUri thrift://c420-node3.squadron-labs.com:9083



 Basic testing :

 1) Create a table employee in hive and load some data
        eg: 
        Create table 
        ----------------
        
 CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, destination String)
 COMMENT 'Employee details'
 ROW FORMAT DELIMITED
 FIELDS TERMINATED BY ','
 LINES TERMINATED BY '\n'
 STORED AS TEXTFILE;

        Load data data.txt file into hdfs
        ---------------
 1201,Gopal,45000,Technical manager
 1202,Manisha,45000,Proof reader
 1203,Masthanvali,40000,Technical writer
 1204,Kiran,40000,Hr Admin
 1205,Kranthi,30000,Op Admin


 LOAD DATA INPATH '/tmp/data.txt' OVERWRITE INTO TABLE employee;

 2) kinit to the spark user and run 

 spark-shell --master yarn --conf "spark.security.credentials.hiveserver2.enabled=false" --conf "spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;principal=hive/[email protected]" --conf "spark.datasource.hive.warehouse.metastoreUri=thrift://c420-node3.squadron-labs.com:9083" --conf "spark.datasource.hive.warehouse.load.staging.dir=/tmp/" --conf "spark.hadoop.hive.llap.daemon.service.hosts=@llap0" --conf "spark.hadoop.hive.zookeeper.quorum=c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar

 Note: spark.security.credentials.hiveserver2.enabled should be set to false for YARN client deploy mode, and true for YARN cluster deploy mode (by default). This configuration is required for a Kerberized cluster

 3) run following code in scala shell to view the table data
 import com.hortonworks.hwc.HiveWarehouseSession
 val hive = HiveWarehouseSession.session(spark).build()
 hive.execute("show tables").show
 hive.executeQuery("select * from employee").show



 4) To apply common properties by default, add following setting into ambari spark2 custom conf


 spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;principal=hive/[email protected]
 spark.datasource.hive.warehouse.metastoreUri=thrift://c420-node3.squadron-labs.com:9083
 spark.datasource.hive.warehouse.load.staging.dir=/tmp/
 spark.hadoop.hive.llap.daemon.service.hosts=@llap0
 spark.hadoop.hive.zookeeper.quorum=c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181


 5) spark-shell --master yarn  --conf "spark.security.credentials.hiveserver2.enabled=false" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar
 Note: Common properties are read from spark default properties

 6) run following code in scala shell to view the hive table data

 import com.hortonworks.hwc.HiveWarehouseSession
 val hive = HiveWarehouseSession.session(spark).build()
 hive.execute("show tables").show
 hive.executeQuery("select * from employee").show


 7) To integrate HWC in Livy2

  a) add following property in Custom livy2-conf
        livy.file.local-dir-whitelist=/usr/hdp/current/hive_warehouse_connector/
  b) Add hive-site.xml to /usr/hdp/current/spark2-client/conf on all cluster nodes.
  
  c) Login to Zeppelin and in livy2 interpreter  settings add following 

 livy.spark.hadoop.hive.llap.daemon.service.hosts	@llap0
 livy.spark.security.credentials.hiveserver2.enabled	true
 livy.spark.sql.hive.hiveserver2.jdbc.url	jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive
 livy.spark.sql.hive.hiveserver2.jdbc.url.principal	hive/[email protected]
 livy.spark.yarn.security.credentials.hiveserver2.enabled	true
 livy.spark.jars file:///usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar

  d) Restart livy2 interpreter 
  
  e) in first paragraph add 
  %livy2
 import com.hortonworks.hwc.HiveWarehouseSession
 val hive = HiveWarehouseSession.session(spark).build()

  f) in second paragraph add
   %livy2
 hive.executeQuery("select * from employee").show
	Prerequisites :
	>> Kerberized Cluster

	>>Enable hive interactive server in hive

	>>Get following details from hive for spark

	spark.hadoop.hive.llap.daemon.service.hosts @llap0
	spark.sql.hive.hiveserver2.jdbc.url jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive
	spark.datasource.hive.warehouse.metastoreUri thrift://c420-node3.squadron-labs.com:9083



	Basic testing :

	1) Create a table employee in hive and load some data
	eg:
	Create table
	----------------

	CREATE TABLE IF NOT EXISTS employee ( eid int, name String, salary String, destination String)
	COMMENT 'Employee details'
	ROW FORMAT DELIMITED
	FIELDS TERMINATED BY ','
	LINES TERMINATED BY '\n'
	STORED AS TEXTFILE;

	Load data data.txt file into hdfs
	---------------
	1201,Gopal,45000,Technical manager
	1202,Manisha,45000,Proof reader
	1203,Masthanvali,40000,Technical writer
	1204,Kiran,40000,Hr Admin
	1205,Kranthi,30000,Op Admin


	LOAD DATA INPATH '/tmp/data.txt' OVERWRITE INTO TABLE employee;

	2) kinit to the spark user and run

	spark-shell --master yarn --conf "spark.security.credentials.hiveserver2.enabled=false" --conf "spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;principal=hive/[email protected]" --conf "spark.datasource.hive.warehouse.metastoreUri=thrift://c420-node3.squadron-labs.com:9083" --conf "spark.datasource.hive.warehouse.load.staging.dir=/tmp/" --conf "spark.hadoop.hive.llap.daemon.service.hosts=@llap0" --conf "spark.hadoop.hive.zookeeper.quorum=c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar

	Note: spark.security.credentials.hiveserver2.enabled should be set to false for YARN client deploy mode, and true for YARN cluster deploy mode (by default). This configuration is required for a Kerberized cluster

	3) run following code in scala shell to view the table data
	import com.hortonworks.hwc.HiveWarehouseSession
	val hive = HiveWarehouseSession.session(spark).build()
	hive.execute("show tables").show
	hive.executeQuery("select * from employee").show



	4) To apply common properties by default, add following setting into ambari spark2 custom conf


	spark.sql.hive.hiveserver2.jdbc.url=jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive;principal=hive/[email protected]
	spark.datasource.hive.warehouse.metastoreUri=thrift://c420-node3.squadron-labs.com:9083
	spark.datasource.hive.warehouse.load.staging.dir=/tmp/
	spark.hadoop.hive.llap.daemon.service.hosts=@llap0
	spark.hadoop.hive.zookeeper.quorum=c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181


	5) spark-shell --master yarn --conf "spark.security.credentials.hiveserver2.enabled=false" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar
	Note: Common properties are read from spark default properties

	6) run following code in scala shell to view the hive table data

	import com.hortonworks.hwc.HiveWarehouseSession
	val hive = HiveWarehouseSession.session(spark).build()
	hive.execute("show tables").show
	hive.executeQuery("select * from employee").show


	7) To integrate HWC in Livy2

	a) add following property in Custom livy2-conf
	livy.file.local-dir-whitelist=/usr/hdp/current/hive_warehouse_connector/
	b) Add hive-site.xml to /usr/hdp/current/spark2-client/conf on all cluster nodes.

	c) Login to Zeppelin and in livy2 interpreter settings add following

	livy.spark.hadoop.hive.llap.daemon.service.hosts @llap0
	livy.spark.security.credentials.hiveserver2.enabled true
	livy.spark.sql.hive.hiveserver2.jdbc.url jdbc:hive2://c420-node2.squadron-labs.com:2181,c420-node3.squadron-labs.com:2181,c420-node4.squadron-labs.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2-interactive
	livy.spark.sql.hive.hiveserver2.jdbc.url.principal hive/[email protected]
	livy.spark.yarn.security.credentials.hiveserver2.enabled true
	livy.spark.jars file:///usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.1.0-187.jar

	d) Restart livy2 interpreter

	e) in first paragraph add
	%livy2
	import com.hortonworks.hwc.HiveWarehouseSession
	val hive = HiveWarehouseSession.session(spark).build()

	f) in second paragraph add
	%livy2
	hive.executeQuery("select * from employee").show