大数据平台环境搭建.md 25 KB

[TOC]

集群环境搭建

在lab1、lab2和lab3三台机器上搭建集群环境,包括hadoop、zookeeper、kafka和hbase。

0、安装必要依赖

# 关闭防火墙
systemctl stop firewalld && systemctl disable firewalld

# 关闭selinux
sed -i 's/enforcing/disabled/' /etc/selinux/config  # 永久
setenforce 0  # 临时
cat /etc/selinux/config

# 关闭swap
swapoff -a  # 临时
sed -ri 's/.*swap.*/#&/' /etc/fstab    # 永久
free -m

# 根据规划设置主机名
hostnamectl set-hostname <hostname>

# 在master添加hosts
cat >> /etc/hosts << EOF
自己的IP lab1
自己的IP lab2
自己的IP lab3
EOF
# 安装rpm包,rpm文件夹
rpm -ivh *.rpm  --force --nodeps
# 修改当前时间为北京时间
# 查看当前系统时间 
date
# 修改当前系统时间 
date -s "2018-2-22 19:10:30
# 查看硬件时间 
hwclock --show
# 修改硬件时间 
hwclock --set --date "2018-2-22 19:10:30"
# 同步系统时间和硬件时间 
hwclock --hctosys
# 保存时钟 
clock -w

# 重启
reboot now

1、Java

  1. 上传Java安装包至lab1的/opt/modules/目录并解压
   tar -zxf jdk-8u301-linux-x64.tar.gz
   
   mv jdk-8u301-linux-x64 java
  1. 在/etc/profiel文件中添加如下环境变量
   export JAVA_HOME=/opt/modules/java
   export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar
   export PATH=$PATH:$JAVA_HOME/bin
   
   export HADOOP_HOME=/opt/modules/hadoop-3.1.3
   PATH=$PATH:$HOME/bin:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
   
   export HBASE_HOME=/opt/modules/hbase-2.2.6
   export PATH=$PATH:$HBASE_HOME/bin
   
   export KAFKA_HOME=/opt/modules/kafka_2.11-2.1.1
   export PATH=$PATH:$KAFKA_HOME/bin
  1. 使环境变量生效
   source /etc/profile

2、hadoop集群环境搭建

hadoop集群规划:

lab1 lab2 lab3
HDFS NameNode DataNode DataNode SecondaryNameNode DataNode
YARN NodeManager ResourceManager NodeManager NodeManager
  1. 上传hadoop安装包至lab1的/opt/modules/目录并解压
   tar -zxf hadoop-3.1.3.tar.gz
  1. 进入/opt/modules/hadoop-3.1.3/etc/hadoop目录,配置core-site.xml、hdfs-site.xml、yarn-site.xml以及mapred-site.xml四个文件。
   cd etc/hadoop
   vi core-site.xml
   
   # 添加如下配置
   
   <!-- 指定NameNode的地址 -->
   <property>
   	<name>fs.defaultFS</name>
   	<value>hdfs://lab1:8020</value>
   </property>
   <!-- 指定hadoop数据的存储目录,切记更改为自己的目录路径 -->
   <property>
   	<name>hadoop.tmp.dir</name>
   	<value>/opt/modules/hadoop-3.1.3/data</value>
   </property>
   <!-- 配置HDFS网页登录使用的静态用户为root -->
   <property>
   	<name>hadoop.http.staticuser.user</name>
   	<value>root</value>
   </property>
   vi hdfs-site.xml
   
   # 添加如下配置
   
   <!-- nn web端访问地址-->
   <property>
   	<name>dfs.namenode.http-address</name>
   	<value>lab1:9870</value>
   </property>
   <!-- 2nn web端访问地址-->
   <property>
   	<name>dfs.namenode.secondary.http-address</name>
   	<value>lab3:9868</value>
   </property>
   vi yarn-site.xml
   
   # 添加如下配置
   
     <!-- 指定MR走shuffle -->
     <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
     </property>
     <!-- 指定ResourceManager的地址-->
     <property>
       <name>yarn.resourcemanager.hostname</name>
       <value>lab2</value>
     </property>
     <!-- 环境变量的继承 -->
     <property>
       <name>yarn.nodemanager.env-whitelist</name>
       <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
     </property>
   
   vi mapred-site.xml
   
   # 添加如下配置
   
   <!-- 指定MapReduce程序运行在Yarn上 -->
   <property>
   	<name>mapreduce.framework.name</name>
   	<value>yarn</value>
   </property>
   vi hadoop-env.sh
   
   # 添加java环境变量
   export JAVA_HOME=/opt/modules/java
   
   # 添加以下配置
   export HDFS_NAMENODE_USER=root
   export HDFS_DATANODE_USER=root
   export HDFS_SECONDARYNAMENODE_USER=root
   export YARN_RESOURCEMANAGER_USER=root
   export YARN_NODEMANAGER_USER=root
   vi workers
   
   # 添加以下配置
   lab1
   lab2
   lab3
  1. 上传hadoop安装包至lab2、lab3的/opt/modules/目录并解压
   tar -zxf hadoop-3.1.3.tar.gz
  1. 用上面修改过的文件分别替换lab2、lab3相应位置的文件

  2. 如果是第一次启动集群,需要在lab1格式化namenode

   hdfs namenode -format
  1. 进入lab1的/opt/modules/hadoop-3.1.3/sbin目录,启动HDFS
   ./start-dfs.sh

jps查看相关进程

lab1

   DataNode
   NameNode

lab2

   DataNode

lab3

   SecondaryNameNode
   DataNode
  1. 进入lab2的/opt/modules/hadoop-3.1.3/sbin目录,启动YARN
   ./start-yarn.sh

lab1

   NodeManager

lab2

   ResourceManager
   NodeManager

lab3

   NodeManager
  1. 在浏览器查看http://lab1:9870和http://lab2:8088查看HDFS的NameNode和YARN的ResourceManager

3、zookeeper集群环境搭建

  1. 上传zookeeper安装包至lab1的/opt/modules/目录并解压
   tar -zxf zookeeper-3.4.14.tar.gz
  1. 进入/opt/modules/zookeeper-3.4.14/conf/目录,修改zoo.cfg文件
   mv zoo_sample.cfg zoo.cfg
   
   vi zoo.cfg
   
   # 修改如下内容
   
   dataDir=/opt/modules/zookeeper-3.4.14/data/zData
   
   # 添加如下内容
   
   server.1=lab1:2888:3888
   server.2=lab2:2888:3888
   server.3=lab3:2888:3888
  1. 进入/opt/modules/zookeeper-3.4.14/data/zData/目录,创建myid文件
   touch myid
   
   echo 1 >> data/zData/myid
  1. 上传zookeeper安装包至lab2、lab3的/opt/modules/目录并解压
   tar -zxf hadoop-3.1.3.tar.gz
  1. 用上面修改过的文件分别替换lab2、lab3相应位置的文件

  2. 修改lab2、lab3的/opt/modules/zookeeper-3.4.14/data/zData/myid,分别将1改为2、3

  3. 在三台机器的/opt/modules/zookeeper-3.4.14/bin目录下启动zookeeper

   ./zkServer.sh start

查看状态

   ./zkServer.sh status
  1. 使用jps查看相关进程

lab1:

   QuorumPeerMain

lab2:

   QuorumPeerMain

lab3:

   QuorumPeerMain

4、kafka集群环境安装

  1. 上传kafka安装包至lab1的/opt/modules/目录并解压
   tar -zxf kafka_2.11-2.1.1.tgz
  1. 进入/opt/modules/kafka_2.11-2.1.1/config/目录,修改server.properties、producer.properties、consumer.properties、kafka-run-class.sh、kafka-server-start.sh文件
   vi server.properties
   
   # 修改如下配置
   
   broker.id=0(当前broker的编号)
   
   listeners=PLAINTEXT://lab1:9092(当前broker的ip)
   
   zookeeper.connect=lab1:2181,lab2:2181,lab3:2181
   
   # 增加如下配置
   
   delete.topic.enable=true
   vi producer.properties
   
   # 修改如下配置
   
   bootstrap.servers=lab1:9092,lab2:9092,lab3:9092
   vi consumer.properties
   
   # 修改如下配置
   
   bootstrap.servers=lab1:9092,lab2:9092,lab3:9092
下面两个配置文件在bin目录下 /opt/modules/kafka_2.11-2.1.1/bin
   vi kafka-run-class.sh
   
   # 修改如下配置
   KAFKA_JMX_OPTS="
   # -Dsun.rmi.transport.tcp.responseTimeout=60000   #超时时间
   # -Dcom.sun.management.jmxremote.local.only=false"  #k
   
   -Dcom.sun.management.jmxremote=true 
   -Dcom.sun.management.jmxremote.authenticate=false  
   -Dcom.sun.management.jmxremote.ssl=false 
   -Djava.rmi.server.hostname=服务器的IP地址或者域名
   
   # JMX port to use
   if [  $JMX_PORT ]; then
      KAFKA_JMX_OPTS="$KAFKA_JMX_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT"
   fi
   vi kafka-server-start.sh
   
   # 增加export JMX_PORT="9999"
   
   if [ "x$KAFKA_HEAP_OPTS" = "x" ]; then
       export KAFKA_HEAP_OPTS="-Xmx1G -Xms1G"
       export JMX_PORT="9999"
   fi
  1. 上传kafka安装包至lab2、lab3的/opt/modules/目录并解压
   tar -zxf kafka_2.11-2.1.1.tgz
  1. 用上面修改过的文件分别替换lab2、lab3相应位置的文件

  2. 修改lab2、lab3的/opt/modules/kafka_2.11-2.1.1/server.properties,分别改为

lab2:

   vi server.properties
   
   # 修改如下配置
   
   broker.id=1(当前broker的编号)
   
   listeners=PLAINTEXT://lab2:9092(当前broker的ip)
   advertised.listeners=PLAINTEXT://当前服务器的wip:9092
   
   zookeeper.connect=lab1:2181,lab2:2181,lab3:2181
   
   # 增加如下配置
   
   delete.topic.enable=true

lab3:

   vi server.properties
   
   # 修改如下配置
   
   broker.id=2(当前broker的编号)
   
   listeners=PLAINTEXT://lab3:9092(当前broker的ip)
   
   zookeeper.connect=lab1:2181,lab2:2181,lab3:2181
   
   # 增加如下配置
   
   delete.topic.enable=true
  1. 在三台机器的/opt/modules/kafka_2.11-2.1.1/bin目录下启动kafka
   kafka-server-start.sh -daemon ../config/server.properties
  1. 使用jps查看相关进程

lab1:

   Kafka

lab2:

   Kafka

lab3:

   Kafka

5、Hbase集群环境搭建

  1. 上传Hbase安装包至lab1的/opt/modules/目录并解压
   tar -zxf hbase-2.2.6-bin.tar.gz
  1. 进入/opt/modules/hbase-2.2.6/conf目录下,修改hbase-env.sh、hbase-site.xml以及regionservers文件
   vi hbase-env.sh
   
   # 修改如下配置
   
   export JAVA_HOME=/opt/modules/java
   export HBASE_MANAGES_ZK=false
   vi hbase-site.xml
   
   # 增加如下配置
   
   <property>
   <!-- 指定 hbase 在 HDFS 上存储的路径 -->
   	<name>hbase.rootdir</name>
   	<value>hdfs://lab1:8020/hbase</value>
   	<!-->端口要和Hadoop的fs.defaultFS端口一致 -->
   </property>
   <property>
   <!-- 指定 hbase 是分布式的 -->
   	<name>hbase.cluster.distributed</name>
   	<value>true</value>
   </property>
   <property>
   <!-- 指定 zk 的地址,多个用“,”分割 -->
   	<name>hbase.zookeeper.quorum</name>
   	<value>lab1:2181,lab2:2181,lab3:2181</value>
   </property>
   <property>
   	<name>hbase.tmp.dir</name>
   	<value>file:/opt/modules/hbase-2.2.6/tmp</value>
   </property>
   vi regionservers
   
   # 增加如下配置
   
   lab1
   lab2
   lab3
  1. 在lab1的/opt/modules/hbase-2.2.6/bin/下,启动hbase
   ./start-hbase.sh
  1. 使用jps查看相关进程

lab1:

   HMaster
   HRegionServer

lab2:

   HRegionServer

lab3:

   HRegionServer

6、solr的安装与使用


6.1、solr的安装---(三台机器都需要安装solr)

# 1.将安装包放入/opt/modules目录下
solr.zip
# 2.解压缩
unzip solr.zip
# 3.创建用户
sudo useradd solr
echo solr | sudo passwd --stdin solr
# 4.修改 solr 目录的所有者为 solr 用户
sudo chown -R solr:solr /opt/modules/solr
# 5.修改/opt/modules/solr/bin/solr.in.sh 文件中的以下属性
vim /opt/modules/solr/bin/solr.in.sh
ZK_HOST="lab1:2181,lab2:2181,lab3:2181"
SOLR_HOST="自己的ip"

6.2、启动集群

# 1.启动solr
ssh lab1 sudo -i -u solr /opt/modules/solr/bin/solr start # 关闭为stop
ssh lab2 sudo -i -u solr /opt/modules/solr/bin/solr start
ssh lab3 sudo -i -u solr /opt/modules/solr/bin/solr start
sudo -i -u solr /opt/modules/solr/bin/solr start -c -p 8983 -z 150.158.138.99:2181,49.235.67.21:2181,42.192.195.253:2181 -force
# 强制启动
./solr start -c -p 8983 -z lab1:2181,lab2:2181,lab3:2181 -force
# 2.创建collection---在一个节点上面运行即可
sudo -i -u solr /opt/modules/solr/bin/solr create -c vertex_index -d /opt/modules/atlas-2.2.0/conf/solr -shards 3 -replicationFactor 1

sudo -i -u solr /opt/modules/solr/bin/solr create -c edge_index -d /opt/modules/atlas-2.2.0/conf/solr -shards 3 -replicationFactor 1

sudo -i -u solr /opt/modules/solr/bin/solr create -c fulltext_index -d /opt/modules/atlas-2.2.0/conf/solr -shards 3 -replicationFactor 1
# 3.删除collection
curl "http://127.0.0.1:8983/solr/admin/collections?action=DELETE&name=edge_index"

curl "http://127.0.0.1:8983/solr/admin/collections?action=DELETE&name=vertex_index"

curl "http://127.0.0.1:8983/solr/admin/collections?action=DELETE&name=fulltext_index"

7、atlas的安装---(单节点,这里选择二号虚拟机)


7.1、atlas安装

# 1.将安装包放入/opt/modules目录下
atlas-2.2.0.zip
# 2.解压缩
unzip atlas-2.2.0.zip
# 3.修改atlas的配置文件
vim /opt/modules/atlas-2.2.0/conf/atlas-application.properties
# 将其中的lab1,lab2,lab3全部更换为自己的ip,并修改这一段
atlas.rest.address=http://lab2:21000  # 选择几号机器,就写那个IP
# 4.删除原来的hbase的安装包
rm -rf /opt/modules/hbase-2.2.6/lib/commons-configuration-1.6.jar
# 5.移动高版本的安装包
mv commons-configuration-1.10.jar /opt/modules/hbase-2.2.6/lib/commons-configuration-1.10.jar

7.2、atlas集成Hbase

# 修改 hbase-site.xml文件,加入这一行
<property>
<name>hbase.coprocessor.master.classes</name>
<value>org.apache.atlas.hbase.hook.HBaseAtlasCoprocessor</value>
</property>
# 拷贝atlas配置文件到hbase的conf中
cp atlas-application.properties /opt/modules/hbase-2.2.6/conf/
# 链接atlas钩子到hbase
ln -s <atlas package>/hook/hbase/* <hbase-home>/lib/
# 检查atlas-env.sh文件配置是否有hbase路径
export HBASE_CONF_DIR=/opt/modules/hbase-2.2.6/conf

# 最后执行钩子程序
./import-hbase.sh

7.3、atlas集成hive

# 修改  hive-site.xml 文件,加入这一行
<property>
    <name>hive.exec.post.hooks</name>
      <value>org.apache.atlas.hive.hook.HiveHook</value>
  </property>
# 拷贝atlas配置文件到hbase的conf中
cp atlas-application.properties /opt/modules/hive-3.1.2/conf
# 检查atlas-env.sh文件配置是否有hbase路径
export HBASE_CONF_DIR=/opt/modules/hbase-2.2.6/conf
# Add 'export HIVE_AUX_JARS_PATH=<atlas package>/hook/hive' in /opt/modules/atlas-2.2.0/conf/atlas-env.sh of your hive configuration
export HIVE_AUX_JARS_PATH=/opt/modules/atlas-2.2.0/hook/hive
# 最后执行钩子程序
./import-h.sh

8、Altas数据恢复

Altas数据恢复涉及solr以及两张hbase表——apache_atlas_entity_auditapache_atlas_janus

8.1、hbase表数据导出和恢复

  1. hdfs中新建文件夹
   hadoop fs -mkdir /tmp/atlas_data
  1. 将这两张表的数据导出到hdfs
   hbase org.apache.hadoop.hbase.mapreduce.Export apache_atlas_entity_audit hdfs://lab1:8020/tmp/hbase/atlas_data/apache_atlas_entity_audit
   
   hbase org.apache.hadoop.hbase.mapreduce.Export apache_atlas_janus hdfs://lab1:8020/tmp/hbase/atlas_data/apache_atlas_janus
  1. 新建本地文件夹altas_data,将导出的文件保存到该文件夹内
   hadoop fs -get  /tmp/hbase/atlas_data/apache_atlas_entity_audit ./
   
   hadoop fs -get  /tmp/hbase/atlas_data/apache_atlas_janus ./
  1. 将这两份文件上传至目标机器的hdfs
   # 将上述的两个文件夹,放入lab2j
   hdfs dfs -put ./atlas_data/  hdfs://lab1:8020/tmp/
  1. 查看hbase表结构并在目标机器中创建这两张表,注意需要删除原表结构中的TTL => 'FOREVER'
   # 1.创建apache_atlas_janus表
   create 'apache_atlas_janus', {NAME => 'e', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false',KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE',  MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}, {NAME => 'f', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} ,{NAME => 'g', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false',KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'},{NAME => 'h', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false',KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'} ,{NAME => 'i', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}, {NAME => 'l', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}, {NAME => 'm', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false',KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false',CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}, {NAME => 's', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false',KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE',  MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'GZ', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}, {NAME => 't', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'NONE', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
   
   # 2.创建apache_atlas_entity_audit
   create 'apache_atlas_entity_audit',  {NAME => 'dt', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false',NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => 'false', DATA_BLOCK_ENCODING => 'FAST_DIFF', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'false', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false',PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'GZ', BLOCKCACHE => 'true', BLOCKSIZE => '65536'}
  1. 将导出的数据导入到新建的hbase表中
   hbase org.apache.hadoop.hbase.mapreduce.Import apache_atlas_entity_audit hdfs://lab1:8020/tmp/atlas_data/apache_atlas_entity_audit
   
   hbase org.apache.hadoop.hbase.mapreduce.Import apache_atlas_janus hdfs://lab1:8020/tmp/atlas_data/apache_atlas_janus

8.2、solr数据的恢复--- (单节点执行,这里选择lab2,即上面atlas的虚拟机)

# 0.
mkdir -p /opt/modules/solr/backup
# 移动备份文件到这个文件夹
mv ./* /opt/modules/solr/backup/
# 给solr权限
chown -R solr:solr /opt/modules/solr/backup

# 1.创建备份,一下操作皆为solr用户
su solr
curl 'http://127.0.0.1:8983/solr/fulltext_index/replication?command=backup&location=/opt/modules/solr/backup&name=fulltext_index.bak'

curl 'http://127.0.0.1:8983/solr/vertex_index/replication?command=backup&location=/opt/modules/solr/backup&name=vertex_index.bak'

curl 'http://127.0.0.1:8983/solr/edge_index/replication?command=backup&location=/opt/modules/solr/backup&name=edge_index.bak'

# 2.恢复备份,将备份拷贝到/opt/modules/solr/backup目录下
curl 'http://127.0.0.1:8983/solr/fulltext_index/replication?command=restore&location=/opt/modules/solr/backup&name=fulltext_index.bak'

curl 'http://127.0.0.1:8983/solr/vertex_index/replication?command=restore&location=/opt/modules/solr/backup&name=vertex_index.bak'

curl 'http://127.0.0.1:8983/solr/edge_index/replication?command=restore&location=/opt/modules/solr/backup&name=edge_index.bak'

# 3.查看备份细节
curl "http://localhost:8983/solr/fulltext_index/replication?command=details"

8.3、启动atlas

# 进入atlas安装目录执行
bin/atlas_start.py
# 稍等10-20分钟访问
http://lab2:21000

9、Mysql安装

9.1、卸载CentOS7系统自带的mariadb和mysql

# 1.删除mariadb
rpm -qa|grep mariadb
rpm -e --nodeps mariadb-libs-5.5.64-1.el7.x86_64
# 2.删除mysql
rpm -qa |grep -i mysql
yum remove mysql*
find / -name mysql # 删除相关目录
rm -rf #删除相关目录
rm -rf /etc/my.cnf
rm -rf /var/log/mysqld.log

9.2、上传安装包,安装mysql

# 1.解压
tar -xvf mysql-5.7.35-1.el7.x86_64.rpm-bundle.tar 
# 2.执行安装
rpm -ivh *.rpm --nodeps --force

9.3、配置Mysql

# 1.首先启动mysql服务
systemctl start mysqld && systemctl enable mysqld
# 2.查看默认生成的密码
cat /var/log/mysqld.log | grep password
2021-12-07T06:31:15.336280Z 1 [Note] A temporary password is generated for root@localhost: v;pW)YU;S9fr
2021-12-07T06:32:52.501914Z 0 [Note] Shutting down plugin 'sha256_password'
2021-12-07T06:32:52.501916Z 0 [Note] Shutting down plugin 'mysql_native_password'
2021-12-07T06:33:08.907588Z 2 [Note] Access denied for user 'root'@'localhost' (using password: NO)
# 3.使用该密码登录本地 MySQL 服务器 (v;pW)YU;S9fr)
mysql -u root -p
# 4.设置mysql密码
# 设置密码等级
set global validate_password_length=4;
set global validate_password_policy=0;
# 修改默认密码,注意替换后面的密码
ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'XDU520bdm';
flush privileges;
# 5.开放远程连接
use mysql;
update user set user.Host='%' where user.User='root';
flush privileges;
select host,user from user;

10、nginx安装

# 1.安装包解压
unzip package-cnetos7.6-7.zip
rpm -ivh *.rpm  --force --nodeps
# 2.安装nginx
tar -zxvf nginx
cd nginx
./configure
make -j4 && make install

11、目前遇见的问题与解决方案

11.1、atlas启动报错

image-20220320185243246

# 猜测可能是申请内存过大导致的
# 修改linux系统配置,让他允许申请大内存
#释放内存
echo 3 > /proc/sys/vm/drop_caches
# 解决方式:
echo 1 > /proc/sys/vm/overcommit_memory
# 此方式临时生效,系统重启后消失
# 编辑/etc/sysctl.conf ,添加vm.overcommit_memory=1,然后sysctl -p 使配置文件永久生效
# 当然这是我们在开发环境下的解决方式, 在生产环境还是要尽量去优化调整JVM的参数来保证每个程序都有足够的内存来保证运行

附录

软件 版本 安装位置
java 1.8 lab1,lab2,lab3
hadoop 3.1.3 lab1,lab2,lab3
zookeeper 3.4.14 lab1,lab2,lab3
kafka 2.11-2.1.1 lab1,lab2,lab3
Hbase 2.2.6 lab1,lab2,lab3
solr 7.7.3 lab1,lab2,lab3
atlas 2.2.0 lab2
mysql 5.7 lab1
nginx 1.18.0 lab2