java - How to select the optimal key in map reduce? -


i working stocks transaction log files. each line denotes trade transaction 20 tab separated values. using hadoop process file , benchmarking of trades. right each line have perform separate benchmark calculations , hence there no need reduce function in map-reduce. in order perform benchmark calculation of each line have query sybase database obtains standard values corresponding line. database indexed on 2 values of each line [ trade id , stock id]. question should use tradeid , stockid key in mapreduce program or should choose other value/[combination of values] key.

so, each line of input, you're going query database , perform benchmark calculations each line separately. after finish benchmark calculations, going output each line benchmark value.

in case, can either not use reducer @ all, or use identity reducer.

so map function read in line, fire query sybase database standard values, , perform benchmark calculations. since want output each line benchmark value, have map function output line key , benchmark value value, i.e <line, benchmark value>

your map function this: (i'm assuming benchmark value integer)

public void map(text key, intwritable value, context context) throws exception {     string line = value.tostring();   //this key in final output       /*           perform operations on line        */        /*            standard values = <return value sybase query.>;        */        /*perform benchmark calculations , obtain benchmark values */        context.write(line,benchmarkvalue);          } 

Comments