Mahout Optimization : Multithreading TopItems.getTopUsers() and TopItems.getTopItems() -


we have following system in place:
no of users : ~500k
no of items : ~100k

usersimilarity usersimilarity = new tanimotocoefficientsimilarity(datamodel);        userneighborhood neighborhood = new nearestnuserneighborhood(neighborhoodsize,usersimilarity, datamodel); genericbooleanprefuserbasedrecommender recommender = new genericbooleanprefuserbasedrecommender(datamodel, neighborhood ,usersimilarity); 

with above recommender getting response time average of 600ms 400 neighbourhood size.

we tried making less 100ms(online engine) , did achieve using custom topitems.gettopusers() , topitems.gettopitems() multithreaded(equal no of cores) functions. avg time taken functions
topusers(): ~ 30-40 ms
topitems(): ~ 50-60 ms

however, when tried make many concurrent requests (even order of 25), response time shoots seconds.

we afford precompute neighbourhood each user topitems() still clear bottleneck concurrent requests.

would suggest way improve response time concurrent requests multithreading?

the fallback option store precomputed recommendations in nosql db. going little expensive precompute on regular basis not active users. pick active users , precompute recommendations more of not-so-active users.

any thoughts?

yes, multi-threading not increase overall throughput of system. means can answer 1 request faster bringing bear more threads. when number of concurrent requests equals number of cores it's started, more or less; in fact overhead of threading may make slower.

of course can try adding more machines , maintaining n instances of service.

this you're going on neighborhood-based model. item-neighborhood versions have more levers pull: can control sampling of number of items considered. can help.

beyond need @ models built scale better. favor matrix factorization-based techniques better in way.


Comments