Hadoop集群性能测试
磁盘读:
# hdparm -tT --direct /dev/vdb1/dev/vdb1: Timing O_DIRECT cached reads: 3286 MB in 2.00 seconds = 1613.15 MB/sec Timing O_DIRECT disk reads: 3000MB in 3.01 seconds = 1022.49 MB/sec网络IO
网络传输,点对点copy,传输速度平均101.6MB/s
iperf测的平均网络IO为110左右MB/s
Hadoop Benchmark
Benchmark工具
网上的benchmark工具挺多的,总结一下大致有下面几个:
- hadoop自带的Test
- intel的 HiBench
- 中科院的BigDataBench
- berkeley的benchmark
- ebay的benchmark(名字记不清了)
这是目前我找到的几个比较出名一些的hadoopbenchmark。缩小一下范围后,准备在前三个中选一个。其实这个各有特点,但是考虑到这次只测试io,而且集群的root权限也不在我这,就用个比较省事的,hadoop自带的了。
脚本
写了个小脚本。
jar_path=hadoop-test-mr1.jarmain_class=TestDFSIOecho "开始hadoop集群测试!"echo "-------------------------------------------------------------"echo "清空测试目录!"hadoop jar $jar_path $main_class -cleanecho "开始极小文件测试!"echo "-------------------------------------------------------------"echo "读写10000个10B的文件"hadoop jar $jar_path $main_class -write -nrFiles 1000 -size "10B"hadoop jar $jar_path $main_class -read -nrFiles 1000 -size "10B"......hadoop jar $jar_path $main_class -cleanecho "开始巨文件测试!"echo "-------------------------------------------------------------"echo "读写5个100G的文件"hadoop jar $jar_path $main_class -write -nrFiles 5 -size "100GB"hadoop jar $jar_path $main_class -read -nrFiles 5 -size "100GB"测试结果
每一次测试都会在当前目录的TestDFSIO_results.log中追加新的测试结果。
----- TestDFSIO ----- : write Date & time: Tue Apr 12 12:20:18 CST 2016 Number of files: 1000Total MBytes processed: 0.009536743 Throughput mb/sec: 9.813281434897923E-5Average IO rate mb/sec: 9.844686428550631E-5 IO rate std deviation: 5.294680350263851E-6 Test exec time sec: 184.055----- TestDFSIO ----- : read Date & time: Tue Apr 12 12:23:37 CST 2016 Number of files: 1000Total MBytes processed: 0.009536743 Throughput mb/sec: 0.0029361893978024937Average IO rate mb/sec: 0.003687877906486392 IO rate std deviation: 0.002046490931134166 Test exec time sec: 184.024