Cloudera CDH Spark添加新的库依赖 & 代理设置 - All About Free

通过Cloudera Manager给Spark添加新的库依赖

在Cloudera Manager中找到Spark服务,进入配置页面,在筛选器中选择类别中的高级,找到spark-conf/spark-defaults.conf 的 Spark 客户端高级配置代码段(安全阀)项目,在其中添加以下内容

spark.jars.packages=com.databricks:spark-csv_2.10:1.4.0,org.mongodb.spark:mongo-spark-connector_2.10:1.0.0,org.postgresql:postgresql:9.4-1201-jdbc41

这里我增加了spark-csvmongo-spark-connectorpostgresql jdbc

重新部署修改的配置后,在终端启动如spark-shell,就可以看到依赖自动处理了。

free@Slave2:~$ spark-shell
Ivy Default Cache set to: /home/free/.ivy2/cache
The jars for the packages stored in: /home/free/.ivy2/jars
:: loading settings :: url = jar:file:/opt/cloudera/parcels/CDH-5.7.1-1.cdh5.7.1.p0.11/jars/spark-assembly-1.6.0-cdh5.7.1-hadoop2.6.0-cdh5.7.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
com.databricks#spark-csv_2.10 added as a dependency
org.mongodb.spark#mongo-spark-connector_2.10 added as a dependency
org.postgresql#postgresql added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
    found com.databricks#spark-csv_2.10;1.4.0 in central
    found org.apache.commons#commons-csv;1.1 in central
    found com.univocity#univocity-parsers;1.5.1 in central
    found org.mongodb.spark#mongo-spark-connector_2.10;1.0.0 in central
    found org.mongodb#mongo-java-driver;3.2.2 in central
    found org.postgresql#postgresql;9.4-1201-jdbc41 in central
downloading https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.4.0/spark-csv_2.10-1.4.0.jar ...

通过Cloudera Manager给Spark设置HTTP/HTTPS代理

由于我们的集群部署在局域网中,要访问互联网需要通过HTTP/HTTPS代理,我尝试过以下几种命令

export http_proxy=<proxyHost>:<proxyPort>
export https_proxy=<proxyHost>:<proxyPort>
export JAVA_OPTS="-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort>"

都没有效果,最后发现可以通过spark.driver.extraJavaOptions设置来生效

spark-shell --conf "spark.driver.extraJavaOptions=-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort> -Dhttps.proxyHost=<proxyHost> -Dhttps.proxyPort=<proxyPort>"即可

同时可以通过Cloudera Manager,找到Spark服务,进入配置页面,在筛选器中选择类别中的高级,找到spark-conf/spark-defaults.conf 的 Spark 客户端高级配置代码段(安全阀)项目,在其中添加以下内容

spark.driver.extraJavaOptions=-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort> -Dhttps.proxyHost=<proxyHost> -Dhttps.proxyPort=<proxyPort>
spark.executor.extraJavaOptions=-Dhttp.proxyHost=<proxyHost> -Dhttp.proxyPort=<proxyPort> -Dhttps.proxyHost=<proxyHost> -Dhttps.proxyPort=<proxyPort>

重新部署配置即可。

Free /
Published under (CC) BY-NC-SA in categories technology