설치
- http://flume.apache.org/download.html 에서 Apache Flume binary(tar.gz)를 다운로드 받습니다.
- tar -xvfz apache-flume-1.3.1-bin.tar.gz
- ln -s apache-flume-1.3.1-bin.tar.gz flume
export FLUME_HOME=/apps/apache-flume-1.3.1-bin.tar.gz PATH=$PATH:$HOME/bin:${FLUME_HOME}/bin export PATH
환경설정
- cd flume
- cp flume-conf.properties.template flume.conf
- vi flume.conf
# The configuration file needs to define the sources, # the channels and the sinks. # Sources, channels and sinks are defined per agent, # in this case called 'agent' agent.sources = seqGenSrc agent.channels = memoryChannel agent.sinks = loggerSink # For each one of the sources, the type is defined agent.sources.seqGenSrc.type = seq //source 타입 # The channel can be defined as follows. agent.sources.seqGenSrc.channels = memoryChannel // source를 channel과 연결 # Each sink's type must be defined agent.sinks.loggerSink.type = logger // sink 타입 #Specify the channel the sink should use agent.sinks.loggerSink.channel = memoryChannel // sink를 channel과 연결 # Each channel's type is defined. agent.channels.memoryChannel.type = memory //channel 타입 # Other config values specific to each type of channel(sink or source) # can be defined as well # In this case, it specifies the capacity of the memory channel agent.channels.memoryChannel.capacity = 100 // channel의 용량이 파일에서 Source, Sink, Channel를 어떻게 설정하냐에 따라 여러 Flow을 구성 하실 수가 있습니다.
환경설정 샘플
제가 구축한 설정은 서버1, 서버2, 서버3에다가 각각 Flume을 설치한 후 서버1에서는 서버2와 서버3에서 보내는 데이터를 파일로 저장합니다. 서버2와 서버3은 로그파일을 읽어 서버 1로 보내줍니다.
서버1 flume.conf
agent01.sources = avroGenSrc agent01.channels = memoryChannel agent01.sinks = fileSink # For each one of the sources, the type is defined agent01.sources.avroGenSrc.type = avro agent01.sources.avroGenSrc.bind = localhost agent01.sources.avroGenSrc.port = 3333 # The channel can be defined as follows. agent01.sources.avroGenSrc.channels = memoryChannel # Each sink's type must be defined agent01.sinks.fileSink.type = file_roll agent01.sinks.fileSink.sink.directory = /home/aaaaa/flume/data agent01.sinks.fileSink.sink.rollInterval = 30 agent01.sinks.fileSink.sink.batchSize = 100 #Specify the channel the sink should use agent01.sinks.fileSink.channel = memoryChannel # Each channel's type is defined. agent01.channels.memoryChannel.type = memory # Other config values specific to each type of channel(sink or source) # can be defined as well # In this case, it specifies the capacity of the memory channel agent01.channels.memoryChannel.capacity = 100000 agent01.channels.memoryChannel.transactionCapacity = 10000
서버2, 서버3 flume.conf
서버2와 서버3의 환경설정은 똑같습니다. 단지 agent 명만 다릅니다.
agent02.sources = execGenSrc agent02.channels = memoryChannel agent02.sinks = avroSink # For each one of the sources, the type is defined agent02.sources.execGenSrc.type = exec agent02.sources.execGenSrc.command = tail -F /home/aaaaa/hadoop/logs/logsample.log agent02.sources.execGenSrc.batchSize = 10 # The channel can be defined as follows. agent02.sources.execGenSrc.channels = memoryChannel # Each sink's type must be defined agent02.sinks.avroSink.type = avro agent02.sinks.avroSink.hostname = 데이타가 보내져야될 호스트 주소 agent02.sinks.avroSink.port = 3333 agent02.sinks.avroSink.batch-size = 10 #Specify the channel the sink should use agent02.sinks.avroSink.channel = memoryChannel # Each channel's type is defined. agent02.channels.memoryChannel.type = memory # Other config values specific to each type of channel(sink or source) # can be defined as well # In this case, it specifies the capacity of the memory channel agent02.channels.memoryChannel.capacity = 100000 agent02.channels.memoryChannel.transactionCapacity = 10000
실행
- ./bin/flume-ng agent --conf-file ./conf/flume.conf --name agent01
- ./bin/flume-ng agent --conf-file ./conf/flume.conf --name agent02
- ./bin/flume-ng agent --conf-file ./conf/flume.conf --name agent03
데이터가 이동되는 것을 볼 수 있습니다. 한가지 이슈는 exec는 파일을 읽을때 버퍼링을 좀 있다고 합니다. 파일을 수정해보면 파일 내용이 잘 전달되어 오는것을 확인 하실 수 있습니다.
참고
댓글 없음:
댓글 쓰기