24 Akka
24.1 Akka 概述
Spark的RPC是通过Akka类库实现的,Akka用Scala语言开发,基于Actor并发模型实现;
Akka具有高可靠、高性能、可扩展等特点,使用Akka可以轻松实现分布式RPC功能。
Actor是Akka中最核心的概念,它是一个封装了状态和行为的对象,Actor之间可以通过交换消息的方式进行通信,每个Actor都有自己的收件箱(MailBox)。
通过Actor能够简化锁及线程管理,可以非常容易地开发出正确地并发程序和并行系统。
Akka 具有如下特性:
1)提供了一种高级抽象,能够简化在并发(Concurrency)/并行(Parallelism)应用场景下的编程开发;
2)提供了异步非阻塞的、高性能的事件驱动编程模型;
3)超级轻量级事件处理(每GB堆内存几百万Actor);
24.2 Akka 组成及架构原理
ActorSystem
在Akka中,ActorSystem是一个重量级的结构。
ActorSystem 的职责是 负责创建并管理其创建的Actor,ActorSystem的单例的,一个JVM进程中有一个即可,而Actor是多例的。
Actor
在Akka中,Actor负责通信,在Actor中有一些重要的生命周期方法
1)preStart()方法:该方法在Actor对象构造方法执行后执行,整 个Actor生命周期中仅执行一次, 就像 mapreduce里的 setup()
2)receive()方法:该方法在Actor的preStart方法执行完成后执行,用于接收消息,会被反复执行, 就像mapreduce里的map()
每个actor 对象有对应的外部引用xxxRef,可以通过该 actor 对象的外部引用与actor通信。
akka的架构原理
其中:
mailbox负责存储actor收到的消息,dispatcher负责从mailbox取消息,分配线程给actor执行具体的业务逻辑。
sender引用代表最近收到消息的发送actor,通常用于回消息,比如 sender() !xxxx。
24.3 Akka 的使用
使用Akka需要增加这两个的pom依赖
<!-- 添加 akka 的 actor 依赖 -->
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-actor_2.11</artifactId>
<version>2.4.17</version>
</dependency>
<!-- 多进程之间的 Actor 通信 -->
<dependency>
<groupId>com.typesafe.akka</groupId>
<artifactId>akka-remote_2.11</artifactId>
<version>2.4.17</version>
</dependency>
24.3.1 发送给自己
步骤:
1)创建ActorSystem
2)定义处理信息的Actor实现类
class HelleAkka extends Actor{
//接受消息
override def receive: Receive = {
//接受消息的处理逻辑
}
}
3)创建目标Actor的ActorRef对象
4)往目标Actor的ActorRef对象发送信息
package day05
import akka.actor.{Actor, ActorRef, ActorSystem, Props}
class HelloAkkaActor extends Actor{
override def preStart() ={
println("do preStart()")
}
override def receive: Receive = {
case "start" => println("actor receive==>start ....")
case "id01" => println("actor receive==>id01")
case "stop this" =>{
println("actor receive==> stop this, stop....")
// 关闭它自己,ActorSystem不关闭
context.stop(self)
}
case "stop all" =>{
println("actor receive==> stop all, stop....")
// 关闭ActorSystem
context.system.terminate()
}
}
}
object HelloAkka{
def main(args: Array[String]): Unit = {
// 创建ActorSystem对象
val sys = ActorSystem("hello_sys")
// 创建actor,并返回ActorRef对象
val helloRef: ActorRef = sys.actorOf(Props[HelloAkkaActor], "hello")
helloRef ! "start"
helloRef ! "id01"
// helloRef ! "stop this"
helloRef ! "stop all"
}
}
24.3.2 发送给本机的其它线程
package day05
import akka.actor.{Actor, ActorRef, ActorSystem, Props}
class GirlActor extends Actor{
override def receive: Receive = {
case "踩我脚了" =>{
println("GirlActor receive=> 踩我脚了, 思索片刻,回复 ‘谁啊’")
Thread.sleep(1000)
sender() ! "谁啊"
}
case "美女是你啊" =>{
println("GirlActor receive=> 美女是你啊, 回复 ‘咋滴啦’")
sender() ! "咋滴啦"
}
}
}
class BoyActor(girlRef:ActorRef) extends Actor{
override def receive: Receive = {
case "action" =>{
println("BoyActor receive==> action, dialog start ....")
girlRef ! "踩我脚了"
}
case "谁啊" =>{
println("BoyActor receive==> 谁啊, send '美女是你啊'")
girlRef ! "美女是你啊"
}
case "咋滴啦" =>{
println("BoyActor receive==> 咋滴啦, send ‘踩我脚了’")
girlRef ! "踩我脚了"
}
}
}
object DialogAkka {
def main(args: Array[String]): Unit = {
val sys = ActorSystem("dialog_sys")
// 创建 GirlActor,并返回girl的ActorRef对象
val girlRef: ActorRef = sys.actorOf(Props[GirlActor], "girl")
// 创建 BoyActor , 并返回Boy的ActorRef对象
val boyRef: ActorRef = sys.actorOf(Props[BoyActor](new BoyActor(girlRef)), "boy")
boyRef ! "action"
}
}
24.3.3 发送给不同的进程
**
远端actorRef设置参数:
akka.actor.provider = "akka.remote.RemoteActorRefProvider"
akka.remote.netty.tcp.hostname = $host
akka.remote.netty.tcp.port = $port
1) 创建一个 Server 端用于回复消息
server
package day05
import akka.actor.{Actor, ActorSystem, Props}
import com.typesafe.config.{Config, ConfigFactory}
class ServerActor extends Actor{
override def receive: Receive = {
case "start" => println("ServerActor receive==> start, start....")
case "马上开始发送计算任务" =>{
println("ServerActor receive==> 马上开始发送计算任务")
}
case Client2ServerMsg(num1, symbol, num2) =>{
println(s"ServerActor receive==> ${num1} ${symbol} ${num2}")
var errCode = "000000"
var errMsg = "成功"
var result:Int = 0
symbol match {
case "+" => result = num1 + num2
case "-" => result = num1 - num2
case "*" => result = num1 * num2
case _ =>{
errCode = "err001"
errMsg = "当前版本只支持+、-、*运算,待版本升级后尝试其他运算"
}
}
val msg = Server2ClientMsg(errCode, errMsg, result)
println(s"ServerActor send==>${msg}")
// 返回数据(errCode, errMsg, result)
sender() ! msg
}
}
}
object ServerActor{
def main(args: Array[String]): Unit = {
val host:String = "127.0.0.1"
val port:Int = 8888
// 解析配置参数
val config:Config = ConfigFactory.parseString(
s"""
|akka.actor.provider = "akka.remote.RemoteActorRefProvider"
|akka.remote.netty.tcp.hostname = $host
|akka.remote.netty.tcp.port = $port
""".stripMargin
)
val sys = ActorSystem("server_sys", config)
// akka地址:akka.tcp://server_sys@127.0.0.1:8888/user/server
val serverRef = sys.actorOf(Props[ServerActor], "server")
serverRef ! "start"
}
}
启动后结果:
等待客户端发送消息,实现交互
2) 创建一个 Client 端发送消息
client
package day05
import akka.actor.{Actor, ActorSelection, ActorSystem, Props}
import com.typesafe.config.{Config, ConfigFactory}
import scala.util.Random
class ClientActor(val serverHost:String, val serverPort:Int) extends Actor{
var serverRef: ActorSelection = _
override def preStart(): Unit = {
// 通过akka地址获取到ServerActor的Ref对象,如果这个Actor对象不存在,该方法是不创建对象的
serverRef = context.actorSelection(s"akka.tcp://server_sys@${serverHost}:${serverPort}/user/server")
}
override def receive: Receive = {
case "start" => {
println("ClientActor receive==> start, start....")
println("ClientActor send==> 马上开始发送计算任务")
serverRef ! "马上开始发送计算任务"
}
case SendClientMsg(data) =>{
println(s"ClientActor receive==>${data}")
val arr = data.split(" ")
if(arr.size != 3){
println(s"ClientActor receive data error, info: ${data}")
}else{
val num1 = arr(0).toInt
val num2 = arr(2).toInt
val symbol = arr(1)
val msg = Client2ServerMsg(num1, symbol, num2)
println(s"ClientActor send msg to Server==>${msg}")
serverRef ! msg
}
}
case msg:Server2ClientMsg =>{
println(s"ClientActor from Server receive==>${msg}")
}
}
}
object ClientActor{
def main(args: Array[String]): Unit = {
val serverHost:String = "127.0.0.1"
val serverPort:Int = 8888
val host:String = "127.0.0.1"
val port:Int = 8889
// 解析配置参数
val config:Config = ConfigFactory.parseString(
s"""
|akka.actor.provider = "akka.remote.RemoteActorRefProvider"
|akka.remote.netty.tcp.hostname = $host
|akka.remote.netty.tcp.port = $port
""".stripMargin
)
val sys = ActorSystem("client_sys", config)
// akka地址:akka.tcp://client_sys@127.0.0.1:8889/user/client
val clientRef = sys.actorOf(Props[ClientActor](new ClientActor(serverHost, serverPort)), "client")
clientRef ! "start"
// 随机生成 100 以内的计算
val arr = Array[String]("+", "-", "*", "/")
val random = new Random()
while(true){
val num1 = random.nextInt(100)
val num2 = random.nextInt(100)
val sysbol = arr(random.nextInt(arr.size))
clientRef ! SendClientMsg(s"$num1 ${sysbol} $num2")
Thread.sleep(1000)
}
}
}
case class 用于传输消息
// 本地发送给ClientActor消息
case class SendClientMsg(val data:String)
// ClientActor 发送给 ServerActor的消息
case class Client2ServerMsg(val num1:Int, val symbol:String, val num2:Int)
// ServerActor 把计算结果返回给ClientActor
case class Server2ClientMsg(val errCode:String, val errMsg:String, val result:Int)
运行效果:
clientActor
serverActor
24.4 打包上集群
1)修改代码参数
2)添加打包pom,确保资源文件目录配置与实际一致
<build>
<resources>
<resource>
<directory>src/main/resources</directory>
</resource>
</resources>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<configuration>
<descriptors>
<descriptor>src/main/resources/assembly.xml</descriptor>
</descriptors>
<!--<archive>-->
<!--<manifest>-->
<!--<mainClass>${package.mainClass}</mainClass>-->
<!--</manifest>-->
<!--</archive>-->
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.12</version>
<configuration>
<skip>true</skip>
<forkMode>once</forkMode>
<excludes>
<exclude>**/**</exclude>
</excludes>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
</plugins>
</build>
3)添加 assembly.xml
<assembly xmlns="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/plugins/maven-assembly-plugin/assembly/1.1.0 http://maven.apache.org/xsd/assembly-1.1.0.xsd">
<id>hainiu</id>
<formats>
<format>jar</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<fileSets>
<fileSet>
<directory>${project.build.directory}/classes</directory>
<outputDirectory>/</outputDirectory>
<excludes>
<exclude>*.xml</exclude>
<exclude>*.properties</exclude>
</excludes>
</fileSet>
</fileSets>
<dependencySets>
<dependencySet>
<outputDirectory>/</outputDirectory>
<useProjectArtifact>false</useProjectArtifact>
<unpack>true</unpack>
<scope>runtime</scope>
</dependencySet>
</dependencySets>
</assembly>
注意:
如果网址有红线,Alt+Enter,选择 Ignored Schemas and DTDs(忽略)即可。到时候,运行打包时Maven会在Maven中央仓库中寻找最新版的assembly插件。
4)执行clean,然后执行 工程的rebuild
5)执行 assembly:assembly 打包
5)将冲突的配置进行合并,然后导入到导出的jar包中
手动合并成一个reference.conf,并添加到导出的jar 包里。
6)集群执行启动server Actor,启动本地client Actor,与集群的ServerActor 通信
服务端信息:
客户端信息:
24.5 用akka实现Wordcount(利用分布式思想)
定义MapperActor
内部实现输出 单词,1
要考虑到多个reduce分配数据的问题
定义ReducerActor
内部实现接收多个MapperActor 发过来的数据,实现按照 单词进行聚合
package day05
import akka.actor.{Actor, ActorRef, ActorSystem, Props}
import scala.collection.mutable
import scala.collection.mutable.ListBuffer
import scala.io.Source
// 定义MapperActor
class MapperActor(val rRefs:ListBuffer[ActorRef]) extends Actor{
override def receive: Receive = {
case path:String =>{
println(s"MapperActor ${self.path.name} receive==>${path}")
val list = Source.fromFile(path).getLines().toList
// 不带有combiner的
// val tuples = list.flatMap(_.split("\t")).map((_,1))
// 带有combiner的
val tuples = list.flatMap(_.split("\t")).map((_,1)).groupBy(_._1).mapValues(_.size).toList
// 一个一个发给reduce
// 根据key的hash % reduce 个数来发数据到指定reduce
for(t <- tuples){
val key = t._1
val value = t._2
val rIndex = (key.hashCode & Integer.MAX_VALUE) % rRefs.size
// 发送消息
rRefs(rIndex) ! MapperSendReducerMsg(key, value)
}
// mapper数据处理完成, 通知所有ReducerActor发完了
for(rRef <- rRefs){
rRef ! MapperSendReduerEndMsg(self.path.name)
}
}
}
}
// 定义ReducerActor
class ReducerActor(val mapMaxNum:Int) extends Actor{
// 用来统计wordcount的结果
val map = new mutable.HashMap[String,Int]
// 用来装mapper name的set
val set = new mutable.HashSet[String]
override def receive: Receive = {
case msg:MapperSendReducerMsg =>{
println(s"ReducerActor ${self.path.name} receive==>${msg}")
val option: Option[Int] = map.get(msg.key)
if(option == None){
// key还不存在,直接put
map.put(msg.key, msg.value)
}else{
// key存在,把原来的value和本次的value加一起
val lastValue = option.get
map.put(msg.key, lastValue + msg.value)
}
// println(s"ReducerActor ${self.path.name} count==>${map}")
}
case MapperSendReduerEndMsg(mName) =>{
set.add(mName)
if(set.size == mapMaxNum){
// 代表所有map的数据都已经发到reduce了, 输出最终结果
println(s"ReducerActor ${self.path.name} count==>${map}")
}
}
}
}
case class MapperSendReducerMsg(val key:String,val value:Int)
case class MapperSendReduerEndMsg(val mName:String)
object WordCountAkka {
def main(args: Array[String]): Unit = {
val files = Array("/tmp/scala/input/word1.txt",
"/tmp/scala/input/word2.txt",
"/tmp/scala/input/word3.txt",
"/tmp/scala/input/word4.txt")
val sys = ActorSystem("wordcount_sys")
// 装reduceRef对象
val rRefs = new ListBuffer[ActorRef]
// 设定reducer个数
val reducerNum = 5
val mapMaxNum = files.size
for(i <- 0 until reducerNum){
val ref = sys.actorOf(Props[ReducerActor](new ReducerActor(mapMaxNum)), s"r${i}")
rRefs += ref
}
// 根据有几个文件创建几个MapperActor对象
for(i <- 0 until files.size){
val ref = sys.actorOf(Props[MapperActor](new MapperActor(rRefs)), s"m${i}")
ref ! files(i)
}
}
}