Hive does not support fully row level updates. it has certain limitaion to do with row level updates
In every project we do incremental updates but to achive this in Hive we have to follow these steps.
scenario :
Assume Target table TableA has 50 records with ids from 1,2,3,.....to 50
Incremental table TableB has 10 records where as 5 new records and 5 updated records with ids 1,2,3,4,5,51,52,53,54,55
here we need to load TableB into TableA but the condition here is there should not be any duplicates in target table.
To achieve above you have to find the records which are belongs to the only target table TableA records first. it means you have to get the records of id'with only 6,7,8.....50 then you have to union the Table B so that you are loading data with only Target TableB+TableA into target table
If you are facing any problems in loading incremental data into hive then please comment here
No comments:
Post a Comment