V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
• 请不要在回答技术问题时复制粘贴 AI 生成的内容
Arrackisarookie
V2EX  ›  程序员

Kafka 服务隔几天就卡死一次 kill-9 之后 重启又能恢复 麻烦各位大佬看看是啥原因

  •  
  •   Arrackisarookie ·
    nicearrack · 34 天前 · 2110 次点击
    这是一个创建于 34 天前的主题,其中的信息可能已经有所发展或是发生改变。

    Kafka 服务隔几天就卡死一次 kill-9 之后 重启能恢复正常 过几天又不行了

    麻烦各位大佬看看是啥原因,看看是性能问题还是网络问题还是啥别的

    版本 kafka3.0.0 ,用的自带的 zookeeper ,数据量不小,每个主题保存了 5G ,大概最多几个小时的数据

    用了自带的 sasl-plaintext 鉴权,单服务器单节点

    broker.id=0
    listeners=LOCAL://xxx:9092,JTKG://xxx:9093,GDKJ://xxx:9094  # 跨网所以映射了多个,这里的 xxx 代表同一个 ip
    advertised.listeners=LOCAL://xxx:9092,JTKG://yyy:9093,GDKJ://zzz:29092  # xxx,yyy,zzz 是不同 ip
    listener.security.protocol.map=LOCAL:SASL_PLAINTEXT,JTKG:SASL_PLAINTEXT,GDKJ:SASL_PLAINTEXT
    inter.broker.listener.name=LOCAL
    sasl.enabled.mechanisms=SCRAM-SHA-256
    sasl.mechanism.inter.broker.protocol=SCRAM-SHA-256
    authorizer.class.name=kafka.security.authorizer.AclAuthorizer
    super.users=User:admin
    num.network.threads=3
    num.io.threads=8
    socket.send.buffer.bytes=1024000
    socket.receive.buffer.bytes=1024000
    socket.request.max.bytes=104857600
    log.dirs=/home/kafka/kafka_2.12-3.0.0/kafka-logs
    num.partitions=1
    num.recovery.threads.per.data.dir=1
    offsets.topic.replication.factor=1
    transaction.state.log.replication.factor=1
    transaction.state.log.min.isr=1
    log.retention.hours=168
    log.retention.bytes=5368709120
    log.segment.bytes=1073741824
    log.retention.check.interval.ms=300000
    zookeeper.connect=localhost:2181
    zookeeper.connection.timeout.ms=60000
    group.initial.rebalance.delay.ms=0
    

    下为最近一次日志

    [2024-12-19 13:14:31,683] INFO [Controller id=0] Partitions undergoing preferred replica election:  (kafka.controller.KafkaController)
    [2024-12-19 13:14:31,683] INFO [Controller id=0] Partitions that completed preferred replica election:  (kafka.controller.KafkaController)
    [2024-12-19 13:14:31,692] INFO [Controller id=0] Skipping preferred replica election for partitions due to topic deletion:  (kafka.controller.KafkaController)
    [2024-12-19 13:14:31,692] INFO [Controller id=0] Resuming preferred replica election for partitions:  (kafka.controller.KafkaController)
    [2024-12-19 13:14:31,693] INFO [Controller id=0] Starting replica leader election (PREFERRED) for partitions  triggered by ZkTriggered (kafka.controller.KafkaController)
    [2024-12-19 13:14:31,823] INFO [Controller id=0] Starting the controller scheduler (kafka.controller.KafkaController)
    [2024-12-19 13:14:31,847] DEBUG [Controller id=0] Resigning (kafka.controller.KafkaController)
    [2024-12-19 13:14:31,847] DEBUG [Controller id=0] Unregister BrokerModifications handler for Set(0) (kafka.controller.KafkaController)
    [2024-12-19 13:14:31,859] INFO [PartitionStateMachine controllerId=0] Stopped partition state machine (kafka.controller.ZkPartitionStateMachine)
    [2024-12-19 13:14:31,859] INFO [ReplicaStateMachine controllerId=0] Stopped replica state machine (kafka.controller.ZkReplicaStateMachine)
    [2024-12-19 13:14:32,002] INFO [RequestSendThread controllerId=0] Shutting down (kafka.controller.RequestSendThread)
    [2024-12-19 13:14:32,051] WARN [RequestSendThread controllerId=0] Controller 0 epoch 60 fails to send request (xxx) to broker xxx:9092 (id: 0 rack: null). Reconnecting to broker. (kafka.controller.RequestSendThread)
    java.io.IOException: Client was shutdown before response was read
        at org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:109)
        at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:252)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
    [2024-12-19 13:14:32,070] ERROR [RequestSendThread controllerId=0] Controller 0 fails to send a request to broker xxx:9092 (id: 0 rack: null) (kafka.controller.RequestSendThread)
    java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(AbstractQueuedSynchronizer.java:1326)
        at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:277)
        at kafka.utils.ShutdownableThread.pause(ShutdownableThread.scala:82)
        at kafka.controller.RequestSendThread.backoff$1(ControllerChannelManager.scala:233)
        at kafka.controller.RequestSendThread.doWork(ControllerChannelManager.scala:261)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
    [2024-12-19 13:14:32,071] INFO [RequestSendThread controllerId=0] Stopped (kafka.controller.RequestSendThread)
    [2024-12-19 13:14:32,076] INFO [RequestSendThread controllerId=0] Shutdown completed (kafka.controller.RequestSendThread)
    [2024-12-19 13:14:32,146] INFO [Controller id=0] Resigned (kafka.controller.KafkaController)
    [2024-12-19 13:14:32,316] DEBUG [Controller id=0] Broker 0 has been elected as the controller, so stopping the election process. (kafka.controller.KafkaController)
    [2024-12-19 13:14:51,582] INFO [ControllerEventThread controllerId=0] Shutting down (kafka.controller.ControllerEventManager$ControllerEventThread)
    [2024-12-19 13:14:51,582] INFO [ControllerEventThread controllerId=0] Stopped (kafka.controller.ControllerEventManager$ControllerEventThread)
    [2024-12-19 13:14:51,582] INFO [ControllerEventThread controllerId=0] Shutdown completed (kafka.controller.ControllerEventManager$ControllerEventThread)
    [2024-12-19 13:14:51,583] DEBUG [Controller id=0] Resigning (kafka.controller.KafkaController)
    [2024-12-19 13:14:51,583] DEBUG [Controller id=0] Unregister BrokerModifications handler for Set() (kafka.controller.KafkaController)
    [2024-12-19 13:14:51,583] INFO [PartitionStateMachine controllerId=0] Stopped partition state machine (kafka.controller.ZkPartitionStateMachine)
    [2024-12-19 13:14:51,583] INFO [ReplicaStateMachine controllerId=0] Stopped replica state machine (kafka.controller.ZkReplicaStateMachine)
    [2024-12-19 13:14:51,583] INFO [Controller id=0] Resigned (kafka.controller.KafkaController)
    
    5 条回复    2024-12-20 09:08:01 +08:00
    yoruoxx
        1
    yoruoxx  
       34 天前
    我觉得可以问问 Claude 😂
    fox2081
        2
    fox2081  
       34 天前
    居然能撞头像的🙃
    JavasBoy
        3
    JavasBoy  
       34 天前
    系统多大内存,分配了多少 JVM,这些列出来,系统负载监控,包括网卡流量,CPU 、IO 、IOPS 。系统日志。
    rekulas
        4
    rekulas  
       33 天前
    以我对 java 的固有印象 内存冗余多吗
    Arrackisarookie
        5
    Arrackisarookie  
    OP
       33 天前
    @yoruoxx 免费版的不太好使😣

    @fox2081 hhhh 这不是巧了么

    @JavasBoy
    @rekulas
    之前用的都是默认的 Kafka 4G ,zookeeper 512M ,昨天把 Kafka JVM 的-Xmx 加到了 20G ,zookeeper 的加到了 4G ,这台服务器上就这俩服务,CPU 基本没超过 40%过。
    所以大佬,一般这种现象基本都是资源不足的问题是么?应该不是服务本身配置项的问题吧?我问通义它提到了可能需要调整 `log.flush.interval.messages` 和 `log.flush.interval.ms`啥的
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   1012 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 21ms · UTC 22:20 · PVG 06:20 · LAX 14:20 · JFK 17:20
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.