Cluster Benchmark - 准备 - All About Free

基于Cloudera CDH的集群已经部署完成,目前集群的配置如下:

  • CPU i7-4790K CPU @ 4.00GHz
  • MEM 16GB
  • HDD 1TB
  • NET 1000M LAN

集群一共4台主机,运行Hadoop、PostgreSQL、MongoDB、Spark及Hive等服务。

在进行cluster benchmark前,先进行一些准备工作,如工具、数据准备等。

工具

  • HiBench,主要用来做MapReduce和Spark的性能对比,并且根据结果进行一些配置调优

数据集

  • HiBench自带数据集
  • 自生成的2G、20G、200G的电表用电数据

    Name,Date,T1,T2,T3,T4,T5,T6,T7,T8,T9,T10,T11,T12,T13,T14,T15,T16,T17,T18,T19,T20,T21,T22,T23,T24
    tpm1_R1-12-47-1_tm_1_7c3c3e39dd7a9d33,2006-01-01,3.1272327499999997,10.394874999999999,5.391951000000001,10.32085,10.383375000000001,10.424949999999999,0.8456075000000001,3.4217275000000003,3.3851625,3.3802950000000003,0.911879,0.90341175,0.87138125,3.2868905,0.82968875,5.8568282499999995,6.073462499999999,6.343275,7.5142075,6.383839999999999,8.80295,6.207680000000001,10.90155,3.2379179999999996
    tpm1_R1-12-47-1_tm_2_7c3c3e39dd7a9d33,2006-01-01,0.41440199999999994,0.370113,0.35526250000000004,0.3458965,0.35339425,0.39913449999999995,0.53686825,0.6503465,0.61882925,0.6125167499999999,0.610577,0.6083605,1.8160477499999996,0.5580125,0.5445135,0.57932425,0.7016105,0.90579425,0.9704775,0.93271975,0.89949725,0.8396155,0.70478725,0.5901097499999999
    tpm2_R1-12-47-1_tm_2_7c3c3e39dd7a9d33,2006-01-01,0.60541825,0.627177,0.5498865,0.6200257499999999,0.6303885,0.6933225000000001,0.8008432499999999,1.01453125,0.8872774999999999,0.9067015,0.9040405,0.8735145,0.8466045,0.8330110000000001,0.8138880000000001,0.8343885,1.0614575,1.3469475,1.43749,1.384125,1.3378824999999999,1.2823624999999998,1.065415,0.79980075
    

HiBench 准备

确保机器上安装好了maven,如果未安装,则下载maven的bin压缩包,解压并且将目录的/bin文件夹加入系统PATH变量中即可。

$ mvn --version
Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 2015-11-11T00:41:47+08:00)
Maven home: /opt/apache-maven
Java version: 1.8.0_77, vendor: Oracle Corporation
Java home: /usr/java/jdk1.8.0_77/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.13.0-83-generic", arch: "amd64", family: "unix"

由于我们的集群部署在互联网之后,需要单独配置以下maven的代理,否则编译是无法进行的。

$ mkdir ~/.m2
$ cp /opt/apache-maven/conf/settings.xml ~/.m2/settings.xml# maven放在/opt/apache-maven中
$ vi ~/.m2/settings.xml
# 找到proxies部分,去掉注释,并且设置相关的代理信息即可。

此时,下载HiBench的代码包,我直接下载了release的5.0版本的打包代码,下载地址在这里。下载好,解压缩后,进入src目录编译。

$ cd HiBench-HiBench-5.0/src
$ mvn clean package -D spark1.6 -D MR2 # 设置spark的版本号和mapreduce的版本, 开始编译

编译结束后,对HiBench进行配置

$ cd conf
$ cp 99-user_defined_properties.conf.template 99-user_defined_properties.conf

根据我们的集群中Hadoop和Spark的版本,进行配置

hibench.hadoop.home             /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/hadoop
hibench.spark.home              /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/lib/spark
hibench.hdfs.master             hdfs://Slave2:8020
hibench.spark.master            yarn-client
hibench.hadoop.executable       /opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/bin/hadoop
hibench.hadoop.version          hadoop2
hibench.hadoop.release          cdh5
hibench.hadoop.mapreduce.home   /opt/cloudera/parcels/CDH/jars
hibench.spark.version          spark1.6

同时设置MapReduce和Spark的内存等参数即可。

Free /
Published under (CC) BY-NC-SA in categories technology