To work with JSON data in hive we need to use some JSON serde .

JsonSerDe - HCatalog

https://github.com/apache/hcatalog/blob/branch-0.5/core/src/main/java/org/apache/hcatalog/data/JsonSerDe.java

Usage :

  1. Download hcatalog-core jar .

    Path : http://mvnrepository.com/artifact/org.apache.hcatalog/hcatalog-core/0.11.0

  2. Copy this jar to lib folder $HIVE_HOME (where hive is installed ). If you are not adding to lib folder you have to run following command on hive client/shell before interacting with JSON Data. This command will add to auxillary hive path .

    Examples :

cp /home/sohanvir/Downloads/hcatalog-core-0.11.0.jar /usr/local/hive/lib/

or

add jar /home/sohanvir/Downloads/hcatalog-core-0.11.0.jar;

  1. Create Table :

CREATE TABLE employee (id string,name string,age string ) ROW FORMAT SERDE 'org.apache.hcatalog.data.JsonSerDe' STORED AS TEXTFILE;

  1. Create some file jsonData.txt at some location on your system . And Following Records . Complete Json Record should be in single line . NO Sepration by /n .

    Example :

     File Location : /home/sohanvir/hive-tutorial/jsonData.txt
    
     Data in file  : 
             {"id":1,"age":43,"name":"sohi"}
             {"id":2,"age":3,"name":"dj"}
    
  2. Load data to table :

    LOAD DATA LOCAL INPATH '/home/sohanvir/hive-tutorial/jsonData.txt' OVERWRITE INTO TABLE employee ;

  3. Fetch Data :

    select * from employee ;

Note : It is not necessary that the order of columns of table in hive should be same as order of fields in json record , as data will be accessed by keys in json record not by order.

Resources :

  1. https://github.com/apache/hcatalog/blob/branch-0.5/core/src/main/java/org/apache/hcatalog/data/JsonSerDe.java
  2. http://ottomata.org/tech/too-many-hive-json-serdes/
  3. https://github.com/rcongiu/Hive-JSON-Serde
  4. http://stackoverflow.com/questions/11479247/how-do-you-make-a-hive-table-out-of-json-data

Issues Found :

1. Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: org.codehaus.jackson.JsonParseException: Unexpected end-of-input: expected close marker for OBJECT (from [Source:java.io.ByteArrayInputStream@1d5a594e; line: 1, column: 0])

Sol ==> http://stackoverflow.com/questions/30117572/cant-get-hive-to-accept-json-file


2. Null Pointer exception while working on json data in hive . Failed with exception java.io.IOException:java.lang.NullPointerException

Sol : Cause can be that table might not contain some column which actual records are having . Vice verse will not cause any issue ,that is if table has some column and records do not have any value for that then for that column value will be NULL .