Using Hive on Spark

By September 7, 2016

Running Hive on Spark rather than MapReduce, its default, can be a faster alternative. The easiest way to do this is to set the hive execution engine to spark on your Hive session.

hive --hiveconf mapreduce.job.queuename=<your_queue>
set hive.execution.engine=spark;

Next, you’ll want to set the number of executor instances, executor cores, and executor memory. For a better idea of how to find the ideal settings for this, you can take a look at this documentation. Changing these settings is essential to getting the job to run quickly. An example of doing so would be:

set spark.executor.instances=15;
set spark.executor.cores=4;
set spark.executor.memory=3g;

Then, run a query just like you would for any Hive job (such as the examples earlier in this guide), and it should run (faster) with Spark instead.

Leave a Reply