hadoop集群搭建

这一段时间用到了hadoop集群,所以也就搭建了一个hadoop集群,这个是女票整理的手册,整理的比较简单,但是我个人感觉hadoop的搭建难度比较小,只用作个参考。

验证网络

验证网络:输入ifconfig,查看hadoop-master,hadoop-slave的IP地址。保证物理机和虚拟机,虚拟机和虚拟机之间可以ping通。如果ping不通记得关闭虚拟机防火墙。
关闭虚拟机防火墙:

1
service iptables stop

永久关闭防火墙:

1
chkconfig iptables off

两个命令同时运行,运行完成后查看防火墙关闭状态

1
service iptables status

修改hosts文件

关闭防火墙之后就要把主机的ip地址以及主机名和slave节点的ip地址以及主机名添加到两台机器的hosts文件中,保证两台机器可以互相ping通。

安装jdk

之前的文章中有介绍在centos下配置java环境,大家可以点击这里查看。

配置ssh无密码登陆

在master主机生成密钥并配置ssh无密码登录主机,步骤:

1
2
cd /root/ 
cd .ssh/

如果没有.ssh目录则创建一个:mkdir .ssh 生成密钥对: ssh-keygen -t rsa 然后一直确定,生成的密钥对保存在.ssh/id_rsa文件中。

1
2
3
cp id_rsa.pub authorized_keys
scp authorized_keys root@hadoop-slave:/root/.ssh
ssh hadoop-slave

检查是否可以从master无密码登录slave上,验证后exit退出。

安装hadoop

下载hadoop压缩包并解压,修改配置文件即可。配置文件见文末。
配置完成后,验证是否成功。
格式化hadoop:

1
bin/hadoop namenode –format

命令全部在hadoop根目录下进行
启动hadoop:

1
sbin/start-all.sh

配置文件

core-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/root/hadoop/hadoop-2.7.1/hadoop_tmp</value>
<description>Abase for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://hw004:9000</value>
</property>
</configuration>

hdfs-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>dfs.name.dir</name>
<value>/root/hadoop/hadoop-2.7.1/dfs/name</value>
<description>Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently.</description>
</property>

<property>
<name>dfs.data.dir</name>
<value>/root/hadoop/hadoop-2.7.1/dfs/data</value>
<description>Comma separated list of paths on the local filesystem of a DataNode where it should store its blocks.</description>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.federation.nameservices</name>
<value>hw004</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hw004</name>
<value>hw004:9000</value>
</property>
</configuration>

mapred-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapred.job.tracker</name>
<value>hw004:9001</value>
<description>Host or IP and port of JobTracker.</description>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hw004:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hw004:19888</value>
</property>
<property>
<name>mapreduce.map.cpu.vcores</name>
<value>6</value>
<description></description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>6</value>
<description></description>
</property>
</configuration>

yarn-site.xml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
<property>
<description>The address of the resource tracker interface.</description>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hw004:8025</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>hw004:8040</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hw004:8030</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hw004:8032</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hw004:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description> <name>yarn.resourcemanager.admin.address</name> <value>hw004:8141</value> </property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>hw004:8141</value>
</property>
<property>
<description>The address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>hw004:8088</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/root/hadoop/hadoop_data/local</value>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/root/hadoop/hadoop_data/logs</value>
</property>
<property>
<name>yarn.app.mapreduce.am.staging-dir</name>
<value>/user</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>6</value>
<description></description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description></description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>6</value>
<description></description>
</property>
</configuration>

masters

1
hw004

slaves

1
hw073

参考网址

Hadoop集群搭建详细简明教程_图文_百度文库

Hadoop2.7.1运行wordcount_百度经验

Windows下使用Eclipse搭建Hadoop开发环境

本文作者:Qiu Qingyu
版权声明:本博客所有文章除特别声明外,均采用CC BY-NC-SA 3.0 CN许可协议。转载请注明出处!
本文永久链接:http://qiuqingyu.cn/2015/12/13/hadoop集群搭建/