阅读 88

Spark性能测试WordCount负载-HiBench-执行报错

背景

  • Spark版本2.3.1,同样适用于Spark2.2.x系列
  • CentOS7 x86_64 ,JAVA1.8.0
  • HiBench-master版(7.0)

步骤

  1. 下载编译HiBench (maven 3.3.9):

    mvn -Dspark=2.2 -Dscala=2.11 clean package

  2. 按照官网SparkBench配置各项,参考SparkBench配置

  3. 执行生成数据脚本,生成数据规模为large

    bin/workloads/micro/wordcount/prepare/prepare.sh

  4. 执行Spark的wordcount工作负载:

    bin/workloads/micro/wordcount/spark/run.sh

报错

ERROR: Spark job com.intel.hibench.sparkbench.micro.ScalaWordCount failed to run successfully.

错误日志

org.apache.spark.SparkException: Exception thrown in awaitResult:
    at ……
Caused by: java.io.IOException: Failed to send RPC 7038938719505164344 to /hostname:port: java.nio.channels.ClosedChannelException
    at ……
Caused by: java.nio.channels.ClosedChannelException
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)
19/09/12 17:33:52 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
……
java.lang.IllegalStateException: Spark context stopped while waiting for backend
Exception in thread "main" java.lang.IllegalStateException: Spark context stopped while waiting for backend
	at org.apache.spark.scheduler.TaskSchedulerImpl.waitBackendReady(TaskSchedulerImpl.scala:669)
	……
复制代码

分析

日志报错信息较多,不容易定位错误,容易发现Caused byjava.nio.channels.ClosedChannelException,依照此线索查找解决方案有二(不是本例的解决办法):

其一,增大虚拟内存

虚拟内存的总量 = yarn.scheduler.minimum-allocation-mb * yarn.nodemanager.vmem-pmem-ratio . 如果需要的虚拟内存总量超过这个计算所得的数值,就会出现 Killing container.

vim yarn-site.xml

<property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>8096</value>
        <discription>每个任务最多可用内存,单位MB,默认8182MB</discription>
</property>
<property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>2048</value>
        <discription>每个任务最少可用内存</discription>
</property>
<property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>4.1</value>
<property>
复制代码

但如果这些配置已经是合理的(最大值或较大值),则本方法无效。

其二,关闭虚拟内存检测(不推荐)

有点掩耳盗铃吧 也是修改yarn-site.xml:

<property> 
    <name>yarn.nodemanager.pmem-check-enabled</name>
    <value>false</value>
</property>

<property> 
    <name>yarn.nodemanager.vmem-check-enabled</name> 
    <value>false</value>
</property>
复制代码

这两个参数的意思是说是否启动一个线程检查每个任务正使用的物理内存量和虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true。此处试了,没有起作用,还是报错

解决方案

关键日志

注意日志里INFO部分的提示信息:

 INFO ApplicationMaster: Final app status: FAILED, exitCode: 13, (reason: Uncaught exception: 
 org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, 
 requested resource type=[vcores] < 0 or greater than maximum allowed allocation. Requested resource=<memory:4505, vCores:4>,
 maximum allowed allocation=<memory:24576, vCores:3>, 
 please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, 
 which might be less than configured maximum allocation=<memory:24576, vCores:3>
复制代码

注意Invalid resource request,可以看到是无效的资源请求。因为我使用的环境是虚拟机,配置不是很高,请求的虚拟core的数量超过了能分配的最大限制,因此报错。之前看到的java.nio.channels.ClosedChannelException这个错误有迷惑性,不容易发现错误的原因。

解决办法是针对这个HiBench任务,配置有效的资源请求。修改spark.conf,将请求的cores数量降低为2(默认的是4,而我的机器上设置单个Container最大vcores是3)。

vim /{HiBench-home}/conf/spark.conf

调整如下内容(酌情):

hibench.yarn.executor.num     4
hibench.yarn.executor.cores   2
复制代码

保存后再次运行spark wordcount负载:

bin/workloads/micro/wordcount/spark/run.sh

start ScalaSparkWordcount bench
hdfs rm -r: …… -rm -r -skipTrash hdfs://hostname:8020/HiBench/xxx/Wordcount/Output
rm: `hdfs://hostname:8020/HiBench/xxx/Wordcount/Output': No such file or directory
hdfs du -s: ……
Export env: SPARKBENCH_PROPERTIES_FILES=……
Submit Spark job: /usr/hdp/xxx/spark2/bin/spark-submit  ……
19/09/12 18:00:31 INFO ShutdownHookManager: Deleting directory /tmp/spark-2bf5c456-70f1-4b7a-81c6-xxx
finish ScalaSparkWordcount bench
复制代码

ok,运行成功.

查看报告

cat hibench.report

Type         Date       Time     Input_data_size      Duration(s)          Throughput(bytes/s)  Throughput/node     
ScalaSparkWordcount 2019-09-11 17:00:03 3258327393           58.865               55352542             18450847            
ScalaSparkWordcount 2019-09-12 18:00:32 3258311659           76.810               42420409             14140136
复制代码

其他spark工作负载出错类似处理即可,有帮助的话求个赞!Thanks,有任何疑问可以留言交流。

关注下面的标签,发现更多相似文章
评论