i working stocks transaction log files. each line denotes trade transaction 20 tab separated values. using hadoop process file , benchmarking of trades. right each line have perform separate benchmark calculations , hence there no need reduce function in map-reduce. in order perform benchmark calculation of each line have query sybase database obtains standard values corresponding line. database indexed on 2 values of each line [ trade id , stock id]. question should use tradeid , stockid key in mapreduce program or should choose other value/[combination of values] key.
so, each line of input, you're going query database , perform benchmark calculations each line separately. after finish benchmark calculations, going output each line benchmark value.
in case, can either not use reducer @ all, or use identity reducer.
so map function read in line, fire query sybase database standard values, , perform benchmark calculations. since want output each line benchmark value, have map function output line key , benchmark value value, i.e <line, benchmark value>
your map function this: (i'm assuming benchmark value integer)
public void map(text key, intwritable value, context context) throws exception { string line = value.tostring(); //this key in final output /* perform operations on line */ /* standard values = <return value sybase query.>; */ /*perform benchmark calculations , obtain benchmark values */ context.write(line,benchmarkvalue); }
Comments
Post a Comment