我正在尝试从 ElasticSearch 中读取数据到 Spark ?
conf = {"es.resource":"sflow_*/sflow","es.nodes":"ES01","es.query":'some query'}
rdd = sc.newAPIHadoopRDD("org.elasticsearch.hadoop.mr.EsInputFormat", "org.apache.hadoop.io.NullWritable", "org.elasticsearch.hadoop.mr.LinkedMapWritable", conf=conf)
rdd.take(2)
在 rdd.take(2) 之后,进程将卡住并发出如下警告日志
16/03/14 20:52:07 WARN httpclient.SimpleHttpConnectionManager: SimpleHttpConnectionManager being used
incorrectly. Be sure that HttpMethod.releaseConnection() is always called and that only one thread and/or
method is using this connection manager at a time.
但使用 rdd.first() 总是会成功返回结果。你知道为什么吗?
None
本文内容由网友自发贡献,版权归原作者所有,本站不承担相应法律责任。如您发现有涉嫌抄袭侵权的内容,请联系:hwhale#tublm.com(使用前将#替换为@)