【误闯大数据】【从安装到入门】Flume部署

610 阅读3分钟
作者 日期 天气
元公子 2020-01-28(周二) 老样子

你知道的越少,你不知道的就越少

没有朋友的点赞,就没法升级打怪

一、它是啥

Flume是一个分布式的、高可用的海量日志收集、聚合和传输日志收集系统,支持在日志系统中定制各类数据发送方(如:Kafka,HDFS等),便于收集数据。其核心为agent,agent是一个java进程,运行在日志收集节点。

Flume追求的是数据的多样性:数据来源的多样性、数据流向的多样性。

Flume的核心是把数据从数据源(source)收集过来,在将收集到的数据送到指定的目的地(sink)。为了保证输送的过程一定成功,在送到目的地(sink)之前,会先缓存数据(channel),待数据真正到达目的地(sink)后,flume在删除自己缓存的数据。

Fume采用三层架构,分别为Agent、Collector和Storage,每一层可以水平扩展。Master是Flume集群控制器,统一管理、协调Agent和Collector的配置信息,并且Master可使用多个。

Agent和Collector均由Source和Sink两部分组成,Source是数据来源,Sink是数据去向。

Flume使用两个组件:Master和Node。Node来决定作为Agent还是Collector。

Agent的作用是将数据源的数据发送给Collector,Collector的作用是将多个Agent的数据汇总后,加载到Storage。

其中Agent包含Source,Channel和 Sink,三者组建了一个Agent。

Event将传输的数据进行封装,是flume传输数据的基本单位,如果是文本文件,通常是一行记录,event也是事务的基本单位。event从source,流向channel,再到sink,本身为一个字节数组,并可携带headers(头信息)信息。event代表着一个数据的最小完整单元,从外部数据源来,向外部的目的地去。

在整个数据传输过程中,流动的是event。事务保证是在event级别。flume可以支持多级flume的agent,支持扇入(fan-in)、扇出(fan-out)

二、环境准备

  • 示例使用Centos7 64位操作系统
  • Java 1.8以上环境

三、下载安装包

官方地址:前往下载页

下载最新版软件:apache-flume-1.9.0-bin.tar.gz

四、开始安装

创建安装目录

[root@hadoop-master /soft]$ tar -xvzf apache-flume-1.9.0-bin.tar.gz 
[root@hadoop-master /soft]# chown -R hadoop:hadoop apache-flume-1.9.0-bin
[root@hadoop-master /soft]# ln -s apache-flume-1.9.0-bin flume

设置环境变量

[root@hadoop-master /soft]# vi /etc/profile
export FLUME_HOME=/soft/flume
export PATH=$PATH:$FLUME_HOME/bin
[root@hadoop-master /soft]# source /etc/profile

改改配置文件

[hadoop@hadoop-master /home/hadoop]$ cp /soft/flume/conf/flume-env.sh.template /soft/flume/conf/flume-env.sh
[hadoop@hadoop-master /home/hadoop]$ vi /soft/flume/conf/flume-env.sh
# export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export JAVA_HOME=/soft/jdk
执行保存: Esc :wq

启动测试

[hadoop@hadoop-master /home/hadoop]$ flume-ng version
Flume 1.9.0
Source code repository: https://git-wip-us.apache.org/repos/asf/flume.git
Revision: d4fcab4f501d41597bc616921329a4339f73585e
Compiled by fszabo on Mon Dec 17 20:45:25 CET 2018
From source with checksum 35db629a3bda49d23e9b3690c80737f9
# 官方Demo例子
[hadoop@hadoop-master /home/hadoop]$ mkdir /soft/flume/conf/example
[hadoop@hadoop-master /home/hadoop]$ vi /soft/flume/conf/example/simple.conf
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop-master
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
[hadoop@hadoop-master /home/hadoop]$ mkdir -p /home/hadoop/flume_test/example/logs
[hadoop@hadoop-master /home/hadoop]$ flume-ng agent --conf /soft/flume/conf --conf-file /soft/flume/conf/example/simple.conf --name a1 -Dflume.root.logger=INFO,console > /home/hadoop/flume_test/example/logs/flume-hdfs.log 2>&1 &

# 启动一个新终端
[hadoop@hadoop-master /home/hadoop]$ telnet hadoop-master 44444
Trying 192.168.146.131...
Connected to hadoop-master.
Escape character is '^]'.
hello work
OK
# 观察日志在不断的输出
[hadoop@hadoop-master /home/hadoop]$ cat /home/hadoop/flume_test/example/logs/flume-hdfs.log

五、服务自启动

[hadoop@hadoop-master /home/hadoop]$ su - root
[root@hadoop-master /root]# vi /etc/init.d/flume-example
#!/bin/sh
# chkconfig: 345 85 15
# description: service for flume-example
# processname: flume-example

case "$1" in
        start)
                echo "Starting flume-example"
                su - hadoop -c 'flume-ng agent --conf /soft/flume/conf --conf-file /soft/flume/conf/example/simple.conf --name a1 -Dflume.root.logger=INFO,console > /home/hadoop/flume_test/example/logs/flume-hdfs.log 2>&1 &'
                echo "Flume-example started"
                ;;
        stop)
                echo "Stopping flume-example"
                PID_COUNT=`ps aux |grep /soft/flume/conf/example/simple.conf |grep -v grep | wc -l`
                PID=`ps aux |grep /soft/flume/conf/example/simple.conf |grep -v grep | awk {'print $2'}`
                if [ $PID_COUNT -gt 0 ];then
                    echo "Try stop flume-example"
                    kill -9 $PID
                    echo "Kill flume-example SUCCESS!"
                else
                    echo "There is no flume-example!"
                fi
                ;;
        restart)
                echo "Restarting flume-example"
                $0 stop
                $0 start
                ;;
        status)
                PID_COUNT=`ps aux |grep /soft/flume/conf/example/simple.conf |grep -v grep | wc -l`
                if [ $PID_COUNT -gt 0 ];then
                    echo "flume-example is running"
              	else
                    echo "flume-example is stopped"
                fi
                ;;
        *)
                echo "Usage:$0 {start|stop|restart|status}"
                exit 1
esac
执行保存: Esc :wq
[root@hadoop-master /root]# chmod 755 /etc/init.d/flume-example
[root@hadoop-master /root]# chkconfig --add flume-example
[root@hadoop-master /root]# chkconfig flume-example on
[root@hadoop-master /root]# service flume-example start

其它关联指令:

  • 启动:service flume-example start
  • 停止:service flume-example stop
  • 重启:service flume-example restart
  • 关闭自启动:chkconfig flume-example off
  • 删除自启动:chkconfig --del flume-example

附录: