Use HCatalog in HDInsight

HCatalog add a lot of functionalities when we want to reuse schemas between processing tools (Pig, MapReduce). It greatly simplifies data consumption by storing the metastore of Hive tables. There’s a great documentation about using HCatalog there. Now, let’s look at this scenario. A developer created an external table in Hive and another wants to use it in Pig.

For example, a Pig developer want to use the HiveSampleTable. This is a sample table that is created when you create an HDInsight cluster. Without the HCatalog, the Pig developer would have to know where the data is stored and its structure. First thing first, she needs to instruct Pig to sue HCatalog using this switch when Pig is called:

C:\apps\dist\hadoop-2.4.0.2.1.15.1-1234>%pig_home%\bin\pig –useHCatalog;

The above command allows Pig to leverage the HCatalog.

Then, she can declare a variable that points to the HIveSampleTable in Hive.

SampleTable = LOAD ‘HiveSampleTable’ USING org.apache.hive.hcatalog.pig.HCatLoader();
2015-09-11 23:01:59,308 [main] INFO hive.metastore – Trying to connect to metastore with URI thrift://headnodehost:9083
2015-09-11 23:01:59,391 [main] INFO hive.metastore – Connected to metastore.
2015-09-11 23:02:00,058 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS

Now, if she calls a describe command, here is what she gets:

grunt> DESCRIBE SampleTable;
2015-09-11 23:03:58,829 [main] INFO org.apache.hadoop.conf.Configuration.deprecation – fs.default.name is deprecated. Instead, use fs.defaultFS
SampleTable: {clientid: chararray,querytime: chararray,market: chararray,deviceplatform: chararray,devicemake: chararray,devicemodel: chararray,state
le,sessionid: long,sessionpagevieworder: long}

We clearly see here that we can leverage the HCatalog metastore since Pig has recognized the underneath file and its structure. Moving forward, we can now interact with this variable like we would do without knowledge of the file location and schema.

Happy HDInsight coding! J

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: