我有一个 Kafka 流应用程序,其中 kafka-streams 和 kafka-clients 均为 2.4.0
具有以下配置
properties.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, brokers);
properties.put(StreamsConfig.PROCESSING_GUARANTEE_CONFIG, StreamsConfig.EXACTLY_ONCE);
- 经纪人= ip1:端口1, ip2:端口2,ip3:端口3,
- 主题分区:3
- 主题复制:3
场景1:我只启动 2 个代理(流应用程序在代理 ip 设置中仍然包含代理的三个 ip),当我启动我的流应用程序时,会发生以下错误。
2020-02-13 13:28:19.711 WARN 18756 --- [-1-0_0-producer] org.apache.kafka.clients.NetworkClient : [Producer clientId=my-app1-a4c8867f-b914-49bb-bc58-203349700828-StreamThread-1-0_0-producer, transactionalId=my-app1-0_0] Connection to node -2 (/ip2:port2) could not be established. Broker may not be available.
1分钟后
org.apache.kafka.streams.errors.StreamsException: stream-thread [my-app1-a4c8867f-b914-49bb-bc58-203349700828-StreamThread-1] Failed to rebalance.
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:852)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:743)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread [my-app1-a4c8867f-b914-49bb-bc58-203349700828-StreamThread-1] task [0_0] Failed to initialize task 0_0 due to timeout.
at org.apache.kafka.streams.processor.internals.StreamTask.initializeTransactions(StreamTask.java:966)
at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:254)
at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:176)
at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:355)
at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:313)
at org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.createTasks(StreamThread.java:298)
at org.apache.kafka.streams.processor.internals.TaskManager.addNewActiveTasks(TaskManager.java:160)
at org.apache.kafka.streams.processor.internals.TaskManager.createTasks(TaskManager.java:120)
at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsAssigned(StreamsRebalanceListener.java:77)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:272)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:400)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:421)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:340)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:471)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1267)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1231)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1211)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:843)
... 3 common frames omitted
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId
我正在测试高可用性测试场景。我认为卡夫卡仍然应该作为复制正确地存在于两个代理中(我已经使用kafka GUI工具进行了检查)。
场景2:今天我注意到,当我只启动 2 个经纪人并给出这两个经纪人的 ip 时(即流应用程序只有两个工作经纪人的 ip)
2020-02-16 16:18:24.818 INFO 5741 --- [-StreamThread-1] o.a.k.c.c.internals.AbstractCoordinator : [Consumer clientId=my-app-0a357371-525b-46cf-9fe1-34ee94fa4158-StreamThread-1-consumer, groupId=my-app] Group coordinator ip2:port2 (id: 2147483644 rack: null) is unavailable or invalid, will attempt rediscovery
2020-02-16 16:18:24.818 ERROR 5741 --- [-StreamThread-1] o.a.k.s.p.internals.StreamThread : stream-thread [my-app-0a357371-525b-46cf-9fe1-34ee94fa4158-StreamThread-1] Encountered the following unexpected Kafka exception during processing, this usually indicate Streams internal errors:
org.apache.kafka.streams.errors.StreamsException: stream-thread [my-app-0a357371-525b-46cf-9fe1-34ee94fa4158-StreamThread-1] Failed to rebalance.
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:852)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:743)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:698)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:671)
Caused by: org.apache.kafka.streams.errors.StreamsException: stream-thread [my-app-0a357371-525b-46cf-9fe1-34ee94fa4158-StreamThread-1] task [0_0] Failed to initialize task 0_0 due to timeout.
at org.apache.kafka.streams.processor.internals.StreamTask.initializeTransactions(StreamTask.java:966)
at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:254)
at org.apache.kafka.streams.processor.internals.StreamTask.<init>(StreamTask.java:176)
at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:355)
at org.apache.kafka.streams.processor.internals.StreamThread$TaskCreator.createTask(StreamThread.java:313)
at org.apache.kafka.streams.processor.internals.StreamThread$AbstractTaskCreator.createTasks(StreamThread.java:298)
at org.apache.kafka.streams.processor.internals.TaskManager.addNewActiveTasks(TaskManager.java:160)
at org.apache.kafka.streams.processor.internals.TaskManager.createTasks(TaskManager.java:120)
at org.apache.kafka.streams.processor.internals.StreamsRebalanceListener.onPartitionsAssigned(StreamsRebalanceListener.java:77)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:272)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:400)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:421)
at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:340)
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:471)
at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1267)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1231)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1211)
at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:843)
... 3 common frames omitted
Caused by: org.apache.kafka.common.errors.TimeoutException: Timeout expired after 60000milliseconds while awaiting InitProducerId
Note:如果我不设置,情况就不是这样EXACTLY_ONCE在属性中。他们按预期工作。
尝试增加重连并减少最大毫秒数,但没有帮助。
谁能解释我缺少什么?
当broker 1宕机时broker2的日志:
[2020-02-17 02:29:00,302] INFO [ReplicaFetcher replicaId=1, leaderId=3, fetcherId=0] Retrying leaderEpoch request for partition __consumer_offsets-36 as the leader reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
Kafka 日志中充满了上面的行。
现在一项重大观察:
当我关闭broker2(即broker 1和broker 3正在运行)时,我的流应用程序运行良好。仅当代理 1 关闭时,我的应用程序才会关闭。我猜测一些应该在所有代理之间分发的关键信息仅保存在代理 1 中。