Hadoop Cascading framework to Update specific column data -


i have mongodb collection looks this

id  name    createtime  updatetime  age country verificationstatus id1 abc 10-7-2013   10-7-2013   21  xxxx    initial_mail id2 efg 9-7-2013    10-7-2013   22  xxxx    first_reminder id3 hij 8-7-2013    10-7-2013   45  xxxx    initial_mail 

i have cascading job evaluation collection , want update “verificationstatus” , “updatetime” columns “id” without disturbing other columns

but in cascading if set these 2 columns losing other column data. left this.

id  updatetime  verificationstatus id1 11-7-2013   blocked id2 11-7-2013   second_reminder id3 11-7-2013   first_reminder 

sinkmode update works updating transaction transaction not individual column data.

how can approach issue?

ps: join or merge doesn’t work. since source , sink cannot point same collection casacading design.

option 1:

write cascading function updates these 2 columns above , pass in function , original fields pipe , use fields.replace replace columns new column values.

option 2:

you create 2 pipes 1 original column data want keep includes id field mention in post , pipe updates columns , use cogroup bring these datasets together.


Comments