当前位置：首页 > news >正文

【RocketMQ】源码详解：生产者启动与消息发送流程

news 2025/6/27 7:36:13

消息发送

生产者启动

入口 : org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#start(boolean)

生产者在调用send()方法发送消息之前,需要调用start进行启动, 生产者启动过程中会启动一些服务和线程

启动过程中会启动MQClientInstance, 这个实例是针对一个项目的全部生产者消费者, 而不是单个的生产者或消费者

MQClientInstance内部会启动一些服务和定时任务，如netty服务、内部生产者服务等

启动方法最后，则会发送心跳包给broker

生产者启动: org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#start(boolean)

public void start(final boolean startFactory) throws MQClientException {switch (this.serviceState) {case CREATE_JUST:this.serviceState = ServiceState.START_FAILED;// 检查配置,主要是生产者组名this.checkConfig();if (!this.defaultMQProducer.getProducerGroup().equals(MixAll.CLIENT_INNER_PRODUCER_GROUP)) {this.defaultMQProducer.changeInstanceNameToPID();}// 创建 MQClientInstance 实例, 消费者启动时也有这一步(对于每个客户端来说, 只有一个客户端实例(一个项目有多个生产者、消费者))this.mQClientFactory = MQClientManager.getInstance().getOrCreateMQClientInstance(this.defaultMQProducer, rpcHook);// 将当前生产者注册到MQClientInstance中的producerTableboolean registerOK = mQClientFactory.registerProducer(this.defaultMQProducer.getProducerGroup(), this);if (!registerOK) {this.serviceState = ServiceState.CREATE_JUST;throw new MQClientException();}// 自动创建topic的配置this.topicPublishInfoTable.put(this.defaultMQProducer.getCreateTopicKey(), new TopicPublishInfo());if (startFactory) {/** 启动 MQClientInstance* netty服务、各种定时任务、拉取消息服务、rebalanceService服务*/mQClientFactory.start();}log.info("the producer [{}] start OK);this.serviceState = ServiceState.RUNNING;break;// ...省略default:break;}// 发送心跳信息给所有broker。this.mQClientFactory.sendHeartbeatToAllBrokerWithLock();// 启动扫描 超时请求 的定时任务,this.startScheduledTask();}

客户端实例启动: org.apache.rocketmq.client.impl.factory.MQClientInstance#start

public void start() throws MQClientException {synchronized (this) {switch (this.serviceState) {case CREATE_JUST:this.serviceState = ServiceState.START_FAILED;// If not specified,looking address from name serverif (null == this.clientConfig.getNamesrvAddr()) {this.mQClientAPIImpl.fetchNameServerAddr();}// netty服务this.mQClientAPIImpl.start();// 启动各种定时任务this.startScheduledTask();// 拉取消息服务,针对消费者this.pullMessageService.start();// 重平衡服务,针对消费者this.rebalanceService.start();this.defaultMQProducer.getDefaultMQProducerImpl().start(false);log.info("the client factory [{}] start OK", this.clientId);this.serviceState = ServiceState.RUNNING;break;case START_FAILED:throw new MQClientException("The Factory object[" + this.getClientId() + "] has been created before, and failed.", null);default:break;}}
}

消息发送流程

入口: org.apache.rocketmq.client.impl.producer.DefaultMQProducerImpl#sendDefaultImpl

调用任一发送方法后，会一路调用到sendDefaultImpl方法

首先会检查消费者状态和消息的格式是否正确

之后会进入一个循环来发送消息，同步消息的循环次数为3次，即可以重试两次，其余消息只发送一次

在循环中，首先会按照轮询的方法选择一个queue进行发送，若发送出现异常则退出当前循环进入下一次循环（若开启故障延迟还会更新broker的故障表，设置隔离时间，隔离时间根据 MQFaultStrategy类中的latencyMax和notAvailableDuration数组进行判断，如其中超时在0.55s - 1s内则隔离30s）

在重新获取queue时，若开启故障延迟，在选择时则会选择【不在故障列表中，或者在故障列表但是时间已经过了其下一次可用的时间点的可用broker】，以实现高可用。若未开启故障延迟，则会传入上一次选择的broker，在这次选择时避开，选择方式也是轮询。

private SendResult sendDefaultImpl(Message msg,final CommunicationMode communicationMode,final SendCallback sendCallback,final long timeout
) throws MQClientException, RemotingException, MQBrokerException, InterruptedException {// 检查生产者状态this.makeSureStateOK();/*** 检查消息格式是否合法:* 1. msg是否为null* 2. topic 是否为空、长度是否大于127、字符串是否有非法字符、是否是系统topic（比如延时topic）* 3. 消息体 是否为空、大小是否大于4MB*/Validators.checkMessage(msg, this.defaultMQProducer);final long invokeID = random.nextLong();long beginTimestampFirst = System.currentTimeMillis();long beginTimestampPrev = beginTimestampFirst;long endTimestamp = beginTimestampFirst;// 获取topic路由信息(存在哪些broker上), 首先获取本地缓存的,若没有则获取nameServer的TopicPublishInfo topicPublishInfo = this.tryToFindTopicPublishInfo(msg.getTopic());if (topicPublishInfo != null && topicPublishInfo.ok()) {boolean callTimeout = false;MessageQueue mq = null;Exception exception = null;SendResult sendResult = null;// 计算最大发送次数，同步模式为3，即默认允许重试2次，可更改重试次数// 其他模式为1，即不允许重试，不可更改。int timesTotal = communicationMode == CommunicationMode.SYNC ? 1 + this.defaultMQProducer.getRetryTimesWhenSendFailed() : 1;int times = 0;String[] brokersSent = new String[timesTotal];for (; times < timesTotal; times++) {/*** 如果mq为空则说明第一次进入,则不存在lastBrokerName* 否则,说明为循环进入,则上一次发送失败,则获取上一次失败的brokerName*/String lastBrokerName = null == mq ? null : mq.getBrokerName();/*** 选择一个queue** selectOneMessageQueue方法内，可选故障转移为开启, 需要sendLatencyFaultEnable设置为true* 开启：*      对于请求响应较慢的broker，可以在一段时间内将其状态置为不可用(下方catch中有调用的updateFaultItem方法)*      消息队列选择时，会过滤掉mq认为不可用的broker，以此来避免不断向宕机的broker发送消息*      选取一个延迟较短的broker，实现消息发送高可用。* 不开启：*     则传入lastBrokerName,即不会再次选择上次发送失败的broker**/MessageQueue mqSelected = this.selectOneMessageQueue(topicPublishInfo, lastBrokerName);if (mqSelected != null) {mq = mqSelected;brokersSent[times] = mq.getBrokerName();try {beginTimestampPrev = System.currentTimeMillis();if (times > 0) {//Reset topic with namespace during resend.msg.setTopic(this.defaultMQProducer.withNamespace(msg.getTopic()));}long costTime = beginTimestampPrev - beginTimestampFirst;if (timeout < costTime) {callTimeout = true;break;}// 发送消息sendResult = this.sendKernelImpl(msg, mq, communicationMode, sendCallback, topicPublishInfo, timeout - costTime);endTimestamp = System.currentTimeMillis();// 这里调用并传入false，是为了在发送时间超过550ms时，把broker置为故障，// 隔离时间根据 MQFaultStrategy类中的latencyMax和notAvailableDuration数组进行判断，如其中超时在0.55s - 1s内则隔离30sthis.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, false);switch (communicationMode) {case ASYNC:return null;case ONEWAY:return null;case SYNC:if (sendResult.getSendStatus() != SendStatus.SEND_OK) {if (this.defaultMQProducer.isRetryAnotherBrokerWhenNotStoreOK()) {continue;}}return sendResult;default:break;}} catch (RemotingException e) {endTimestamp = System.currentTimeMillis();// 异常传入为true，表示隔离时间采用默认的30sthis.updateFaultItem(mq.getBrokerName(), endTimestamp - beginTimestampPrev, true);log.warn(String.format("sendKernelImpl exception, resend at once, InvokeID: %s, RT: %sms, Broker: %s", invokeID, endTimestamp - beginTimestampPrev, mq), e);log.warn(msg.toString());exception = e;continue;}// ... 省略代码

选择queue : org.apache.rocketmq.client.latency.MQFaultStrategy#selectOneMessageQueue

public MessageQueue selectOneMessageQueue(final TopicPublishInfo tpInfo, final String lastBrokerName) {// 判断是否启用故障延迟机制，默认不启用if (this.sendLatencyFaultEnable) {try {int index = tpInfo.getSendWhichQueue().incrementAndGet();for (int i = 0; i < tpInfo.getMessageQueueList().size(); i++) {int pos = Math.abs(index++) % tpInfo.getMessageQueueList().size();if (pos < 0)pos = 0;// 轮询获取到一个MessageQueue mq = tpInfo.getMessageQueueList().get(pos);// 如果该broker不在故障列表中，或者在故障列表但是时间已经过了其下一次可用的时间点，则为可用，直接返回if (latencyFaultTolerance.isAvailable(mq.getBrokerName()))return mq;}// 到这里说明全部不正常// 没有选出无故障的mq，那么从故障集合中随机选择一个final String notBestBroker = latencyFaultTolerance.pickOneAtLeast();// 如果写队列数大于0，那么选择该brokerint writeQueueNums = tpInfo.getQueueIdByBroker(notBestBroker);if (writeQueueNums > 0) {final MessageQueue mq = tpInfo.selectOneMessageQueue();if (notBestBroker != null) {mq.setBrokerName(notBestBroker);mq.setQueueId(tpInfo.getSendWhichQueue().incrementAndGet() % writeQueueNums);}return mq;} else {// 如果写队列数小于0，那么移除该brokerlatencyFaultTolerance.remove(notBestBroker);}} catch (Exception e) {log.error("Error occurred when selecting message queue", e);}// 上面都没有返回，则采用轮询的方式选择return tpInfo.selectOneMessageQueue();}// 默认不启用return tpInfo.selectOneMessageQueue(lastBrokerName);
}

更新延时表: org.apache.rocketmq.client.latency.MQFaultStrategy#updateFaultItem

判断延时时间: org.apache.rocketmq.client.latency.MQFaultStrategy#computeNotAvailableDuration

public void updateFaultItem(final String brokerName, final long currentLatency, boolean isolation) {if (this.sendLatencyFaultEnable) {// 若isolation为true则默认延时30s,否则调用方法根据超时时间来获取延时时间long duration = computeNotAvailableDuration(isolation ? 30000 : currentLatency);//更新故障记录表this.latencyFaultTolerance.updateFaultItem(brokerName, currentLatency, duration);}
}private long[] latencyMax = {50L, 100L, 550L, 1000L, 2000L, 3000L, 15000L};
private long[] notAvailableDuration = {0L, 0L, 30 * 1000L, 60 * 1000L, 120 * 1000L/* 2min */, 180000L/* 3min */, 600000L/* 10min */};private long computeNotAvailableDuration(final long currentLatency) {/*** 根据latencyMax和notAvailableDuration的下标一一对应,若超时时间大于等于notAvailableDuration,则延时latencyMax对应下标的时间* 小于0.55s : 0s* [0.55s,1s) : 30s* [1s,2s)    : 60s* ....省略*/for (int i = latencyMax.length - 1; i >= 0; i--) {if (currentLatency >= latencyMax[i])return this.notAvailableDuration[i];}return 0;
}