GDAL-2.4.0 获取Hadoop-3.1.2 hdfs tif文件信息
GDAL-2.4.0增加了以下功能:
Add /vsihdfs/ virtual file system handler for Hadoop File System (via libhdfs)
Add /vsiwebhdfs/ read-write virtual file system for Web Hadoop File System REST API
调用方式请查看GDAL Virtual File Systems。
1. hadoop-3.1.2安装:
(1)环境说明
Oracle VM VirtualBox 虚拟机
Master: CentOS7.6-1810_Hadoop_Master hadoop主节点,ip:192.168.56.100;用户xxxx 123456,root 123456;
Node1:CentOS7.6-1810_Hadoop_Node1 hadoop子节点,ip:192.168.56.101;用户xxxx 123456, root 123456。
(2)下载安装包
hadoop-3.1.2;
jdk-8u201-linux-x64;
(3)ssh免密码登录
<1> CentOS默认没有启动ssh无密登录,去掉/etc/ssh/sshd_config其中2行的注释,每台服务器都要设置;
#RSAAuthentication yes 是否允许使用纯RSA公钥认证。仅用于SSH-1。默认值是”yes”。
#PubkeyAuthentication yes 是否允许公钥认证。仅可以用于SSH-2。默认值为”yes”。
说明:这两个都是默认值 yes
如果是默认的话,可以不用设置啦
<2> 输入命令,ssh-keygen -t rsa,生成key,都不输入密码,一直回车,/root就会生成.ssh文件夹,每台服务器都要设置;
<3> 合并公钥到authorized_keys文件,在Master服务器,进入/root/.ssh目录,通过SSH命令合并;
cat id_rsa.pub>> authorized_keys
ssh root@192.168.56.100 cat ~/.ssh/id_rsa.pub>> authorized_keys
ssh root@192.168.56.101 cat ~/.ssh/id_rsa.pub>> authorized_keys
<4> 把Master服务器的authorized_keys、known_hosts复制到node1服务器的/root/.ssh目录(注意复制的文件权限要和Master中的一样,否则无效,自己试验了)
按以下操作,不需要对红字部分进行检查。
scp -r ~/.ssh/authorized_keys 192.168.56.101:~/.ssh/ #need password
scp -r ~/.ssh/known_hosts 192.168.56.101:~/.ssh/ #noneed
<5> 完成,ssh root@192.168.56.101就不需要输入密码了
(4)安装JDK
<1>下载“jdk-8u201-linux-x64.tar.gz”,放到/home/jdk/目录下;
<2>解压,输入命令,tar -zxvf jdk-8u201-linux-x64.tar.gz;
<3>编辑/etc/profile;
export JAVA_HOME=/home/jdk/jdk1.8.0_201
export CLASSPATH=.:$JAVA_HOME/jre/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
<4>使配置生效,输入命令,source /etc/profile;
<5>输入命令,java -version,完成。
(5)安装hadoop
只在Master服务器解压,然后复制到Node1服务器
<1>下载“hadoop-3.1.2.tar.gz”,放到/home/hadoop目录下 ;
<2>解压,输入命令,tar -zxvf hadoop-3.1.2.tar.gz;
<3>在/home/hadoop目录下创建数据存放的文件夹,tmp、hdfs、hdfs/data、hdfs/name。
(6)hadoop集群配置 Hadoop Cluster Setup
相关配置文件默认设置可以查看\share\doc\hadoop目录下的xxx-default.xml。
以下配置文件/home/hadoop/hadoop-3.1.2/etc/hadoop
<1>core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.56.100:9000</value><!--NameNode的URL,hdfs://主机名:端口/-->
</property>
<property>
<name>hadoop.tmp.dir</name><!--Hadoop的默认临时路径,如果在新增节点或者其他情况下莫名其妙的DataNode启动不了,就删除此文件中的tmp目录即可。如果删除了NameNode机器的此目录,那么需要重新执行NameNode格式化命令-->
<value>file:/home/hadoop/tmp</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131702</value>
</property>
</configuration>
<2>hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>192.168.56.100:9001</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
<3>mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>192.168.56.100:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>192.168.56.100:19888</value>
</property>
</configuration>
<4>yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>192.168.56.100:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>192.168.56.100:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>192.168.56.100:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>192.168.56.100:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>192.168.56.100:8088</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>20480</value><!--这里768需要大些20480,不然有问题,后续查找为什么-->
</property>
<property>
<name>yarn.application.classpath</name>
<value>/home/hadoop/hadoop-3.1.2/etc/hadoop:/home/hadoop/hadoop-3.1.2/share/hadoop/common/lib/*:/home/hadoop/hadoop-3.1.2/share/hadoop/common/*:/home/hadoop/hadoop-3.1.2/share/hadoop/hdfs:/home/hadoop/hadoop-3.1.2/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-3.1.2/share/hadoop/hdfs/*:/home/hadoop/hadoop-3.1.2/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-3.1.2/share/hadoop/mapreduce/*:/home/hadoop/hadoop-3.1.2/share/hadoop/yarn:/home/hadoop/hadoop-3.1.2/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-3.1.2/share/hadoop/yarn/*</value>
</property>
</configuration>
<5>hadoop-env .sh 、yarn-env .sh
设置 export JAVA_HOME=/home/jdk/jdk1.8.0_201
<6>修改/home/hadoop/hadoop-3.1.2/etc/hadoop/workers
删除默认的:localhost 添加:192.168.56.101
<7>注意环境变量的设置
export LD_LIBRARY_PATH=/home/hadoop/hadoop-3.1.2/lib/native:/home/jdk/jdk1.8.0_201/jre/lib/amd64/server:$LD_LIBRARY_PATH
负责会出现2019-03-13 22:26:04,846 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath --glob):$CLASSPATH //gdalinfo无法识别文件,主要也是jar包没有加载到吧
export PATH=$PATH:$JAVA_HOME/bin:/home/hadoop/hadoop-3.1.2/bin:/home/hadoop/hadoop-3.1.2/sbin/:/home/gdalinstall/bin
export HADOOP_HOME=/home/hadoop/hadoop-3.1.2
如果在启动或者使用gdalinfo 获取hdfs 或者webhdfs是出现问题,注意将所有.jar包加入到CLASSPATH中,缩减路径不行的话,使用全路径
<8>将配置好的Hadoop复制到子节点对应位置
scp -r /home/hadoop 192.168.56.101:/home/
<9>Master服务器启动hadoop
初始化hdfs namenode -format;
全部启动sbin/start-all.sh,也可以分开start-dfs.sh、start-yarn.sh;
停止stop-all.sh;
jps查看相关信息。
注意:
由于是最小化安装,可能需要安装的包
$tar -zxvf apache-ant-1.10.5-bin.tar.gz
$cd apache-ant-1.10.5
$cd apache-ant-1.10.5
$export PATH=$PATH:/home/ant/apache-ant-1.10.5/bin
<10>访问测试
systemctl stop firewalld关闭主子节点防火墙;
浏览器打开http://192.168.56.100:9870/
测试MapReduce
hadoop jar /home/hadoop/hadoop-3.1.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar wordcount /Hadoop/Input /Hadoop/Output
[root@hsmaster home]# hadoop jar /home/hadoop/hadoop-3.1.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.2.jar wordcount /Hadoop/Input /Hadoop/Output
2019-04-16 21:41:13,731 INFO client.RMProxy: Connecting to ResourceManager at /192.168.56.100:8032
2019-04-16 21:41:14,954 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1555422038152_0001
2019-04-16 21:41:16,121 INFO input.FileInputFormat: Total input files to process : 1
2019-04-16 21:41:16,753 INFO mapreduce.JobSubmitter: number of splits:1
2019-04-16 21:41:17,527 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1555422038152_0001
2019-04-16 21:41:17,533 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-04-16 21:41:17,907 INFO conf.Configuration: resource-types.xml not found
2019-04-16 21:41:17,908 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2019-04-16 21:41:18,628 INFO impl.YarnClientImpl: Submitted application application_1555422038152_0001
2019-04-16 21:41:18,774 INFO mapreduce.Job: The url to track the job: http://hsmaster:8088/proxy/application_1555422038152_0001/
2019-04-16 21:41:18,775 INFO mapreduce.Job: Running job: job_1555422038152_0001
2019-04-16 21:41:33,313 INFO mapreduce.Job: Job job_1555422038152_0001 running in uber mode : false
2019-04-16 21:41:33,315 INFO mapreduce.Job: map 0% reduce 0%
2019-04-16 21:41:42,944 INFO mapreduce.Job: map 100% reduce 0%
2019-04-16 21:41:51,155 INFO mapreduce.Job: map 100% reduce 100%
2019-04-16 21:41:53,231 INFO mapreduce.Job: Job job_1555422038152_0001 completed successfully
2019-04-16 21:41:53,435 INFO mapreduce.Job: Counters: 53
File System Counters
FILE: Number of bytes read=69
FILE: Number of bytes written=432893
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=158
HDFS: Number of bytes written=43
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=6093
Total time spent by all reduces in occupied slots (ms)=5795
Total time spent by all map tasks (ms)=6093
Total time spent by all reduce tasks (ms)=5795
Total vcore-milliseconds taken by all map tasks=6093
Total vcore-milliseconds taken by all reduce tasks=5795
Total megabyte-milliseconds taken by all map tasks=6239232
Total megabyte-milliseconds taken by all reduce tasks=5934080
Map-Reduce Framework
Map input records=4
Map output records=6
Map output bytes=63
Map output materialized bytes=69
Input split bytes=118
Combine input records=6
Combine output records=5
Reduce input groups=5
Reduce shuffle bytes=69
Reduce input records=5
Reduce output records=5
Spilled Records=10
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=252
CPU time spent (ms)=3000
Physical memory (bytes) snapshot=530427904
Virtual memory (bytes) snapshot=5562036224
Total committed heap usage (bytes)=406847488
Peak Map Physical memory (bytes)=305770496
Peak Map Virtual memory (bytes)=2778148864
Peak Reduce Physical memory (bytes)=224657408
Peak Reduce Virtual memory (bytes)=2783887360
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=40
File Output Format Counters
Bytes Written=43
hdfs dfs -cat hdfs://192.168.56.100:9000/Hadoop/Output/part-r-00000
Hello 1
bigdata 1
hadoop 1
hello 2
spark 1
2. GDAL安装:
(1)安装cURL 支持webhdfs访问
curl-7.64.1.tar.gz
$tar -zxvf curl-7.64.1.tar.gz
cd curl-7.64.1
$./configure
$make
$make intsll
(2)安装版本gdal-2.4.0.tar.gz
$tar -zxvf gdal-2.4.0.tar.gz
$cd gdal-2.4.0
$./configure --prefix=/home/gdalinstall --with-java=/home/jdk/jdk1.8.0_201/ --with-hdfs=/home/hadoop/hadoop-3.1.2/ --with-jvm-lib=/home/jdk/jdk1.8.0_201/jre/lib/amd64/server/libjvm.so --with-jvm-lib-add-rpath
$make
$make install
(3)gdalinfo访问hdfs tif文件信息
如果非默认安装,需要将/home/gdalinstall/bin添加到PATH
$gdalinfo /vsihdfs/hdfs://192.168.56.100:9000/gdal_test/GF1_PMS1_E81.9_N33.9_20161221_L1A0002059011-MSS1_oc.tiff
$gdalinfo /vsiwebhdfs/http://192.168.56.100:9870/webhdfs/v1/gdal_test/GF1_PMS1_E81.9_N33.9_20161221_L1A0002059011-MSS1_oc.tiff
[root@hsmaster cUrl]# gdalinfo /vsihdfs/hdfs://192.168.56.100:9000/gdal_test/GF1_PMS1_E81.9_N33.9_20161221_L1A0002059011-MSS1_oc.tiff
Driver: GTiff/GeoTIFF
Files: /vsihdfs/hdfs://192.168.56.100:9000/gdal_test/GF1_PMS1_E81.9_N33.9_20161221_L1A0002059011-MSS1_oc.tiff
Size is 5392, 5395
Coordinate System is:
PROJCS["WGS 84 / UTM zone 44N",
GEOGCS["WGS 84",
DATUM["WGS_1984",
SPHEROID["WGS 84",6378137,298.257223563,
AUTHORITY["EPSG","7030"]],
AUTHORITY["EPSG","6326"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4326"]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",81],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting",500000],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
AXIS["Easting",EAST],
AXIS["Northing",NORTH],
AUTHORITY["EPSG","32644"]]
Origin = (562777.776901091565378,3769000.594448732677847)
Pixel Size = (7.810000000000000,-7.810000000000000)
Metadata:
AREA_OR_POINT=Area
Image Structure Metadata:
INTERLEAVE=PIXEL
Corner Coordinates:
Upper Left ( 562777.777, 3769000.594)
Lower Left ( 562777.777, 3726865.644)
Upper Right ( 604889.297, 3769000.594)
Lower Right ( 604889.297, 3726865.644)
Center ( 583833.537, 3747933.119)
Band 1 Block=2048x2048 Type=UInt16, ColorInterp=Gray
Band 2 Block=2048x2048 Type=UInt16, ColorInterp=Undefined
Band 3 Block=2048x2048 Type=UInt16, ColorInterp=Undefined
Band 4 Block=2048x2048 Type=UInt16, ColorInterp=Undefined