hadoop - Block Replication Limits in HDFS -


i'm rebuilding our servers have our region-servers , data nodes. when take down data node, after 10 minutes blocks had being re-replicated among other data nodes, should. have 10 data-nodes, see heavy network traffic blocks being re-replicated. however, i'm seeing traffic 500-600mbps per server (the machines have gigabit interfaces) it's not network-bound. i'm trying figure out limiting speed data-nodes send , receive blocks. each data-node has 6 7200 rpm sata drives, , io usage low during this, peaking 20-30% per drive. there limit built hdfs limits speed @ blocks replicated?

the rate of replication work throttled hdfs not interfere cluster traffic when failures happen during regular cluster load.

the properties control dfs.namenode.replication.work.multiplier.per.iteration (2), dfs.namenode.replication.max-streams (2) , dfs.namenode.replication.max-streams-hard-limit (4). foremost controls rate of work scheduled dn @ every heartbeat occurs, , other 2 further limit maximum parallel threaded network transfers done datanode @ time. values in () indicate defaults. description of available @ https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

you can perhaps try increase set of values (10, 50, 100) respectively spruce network usage (requires namenode restart), note dn memory usage may increase result of more blocks information being propagated it. reasonable heap size these values dn role 4 gb.

p.s. these values not tried me on production systems personally. not want max out re-replication workload such affects regular cluster work, recovery of 1/3 replicas may of lesser priority missing job/query slas due lack of network resources (unless have fast network that's under-utilised under loaded periods). try tune till you're satisfied results.


Comments