-
Star
(356)
You must be signed in to star a gist -
Fork
(167)
You must be signed in to fork a gist
-
-
Save jkreps/c7ddb4041ef62a900e6c to your computer and use it in GitHub Desktop.
Producer | |
Setup | |
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test-rep-one --partitions 6 --replication-factor 1 | |
bin/kafka-topics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3 | |
Single thread, no replication | |
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 | |
Single-thread, async 3x replication | |
bin/kafktopics.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --create --topic test --partitions 6 --replication-factor 3 | |
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test6 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 | |
Single-thread, sync 3x replication | |
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000 100 -1 acks=-1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=64000 | |
Three Producers, 3x async replication | |
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 | |
Throughput Versus Stored Data | |
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 | |
Effect of message size | |
for i in 10 100 1000 10000 100000; | |
do | |
echo "" | |
echo $i | |
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test $((1000*1024*1024/$i)) $i -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=128000 | |
done; | |
Consumer | |
Consumer throughput | |
bin/kafka-consumer-perf-test.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --messages 50000000 --topic test --threads 1 | |
3 Consumers | |
On three servers, run: | |
bin/kafka-consumer-perf-test.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --messages 50000000 --topic test --threads 1 | |
End-to-end Latency | |
bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency esv4-hcl198.grid.linkedin.com:9092 esv4-hcl197.grid.linkedin.com:2181 test 5000 | |
Producer and consumer | |
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196 | |
bin/kafka-consumer-perf-test.sh --zookeeper esv4-hcl197.grid.linkedin.com:2181 --messages 50000000 --topic test --threads 1 | |
What is the default location of kafka-topics.sh?
inside bin folder
Hi Jay,
Does the ProducerPerformance class still works? I tried to run the command:
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test7 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
and I got the following error:
Error: Could not find or load main class org.apache.kafka.client.tools.ProducerPerformance
What should I fix? Thank you!
Hey @yuhengd
From kafka base directory:
./gradlew jarAll -x signArchives -x test -x javadoc -x scaladoc
That should build kafka-tools, which is where ProducerPerformance.class gets loaded from.
On lines 9 and 14, it uses topics test6 and test7 which weren't created previously, so they won't probably have the expected replication factor.
The results on the article are made with that mistake in the commands?
Hi Guys i am wondering if you have any idea about the impact of batch.size
on the overall performance ?
i am guessing increasing it to the order of 1MB would increase the throughput ?
It would be nice that topics created in the setup: test and test-rep-one can be explained as how they are used in the later tests. This is not obvious. Also, how the batch size 8K v.s 60K was chosen. Finally on line 21, how do we know 3 threads were used?
You could run the benchmark tests out of the box without building it first in 0.9. Note the difference in the package name for ProducerPerformance
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic first --num-records 50000000 --record-size 100 --throughput -1 --producer-props acks=1 bootstrap.servers=localhost:9092 buffer.memory=67108864 batch.size=8196
@ssinganamalla will you plz provide mi script for local machine bechmark test?
kafka version is lastest
Error: Could not find or load main class test7
mybe is the test file can not work in lastest version kafka.
Does the file need update?
can we pass file instead of message size for producer perf test , which have my real time data.?
Would be great to have an updated version of this for latest version of Kafka. I've got kafka_2.11-0.10.20.0, and bin/kafka-run-class.sh kafka.tools.TestEndToEndLatency
can't find the class. Is it renamed? Do I have to do something to get it? A previous comment mentioned running gradlew from the top directory, but that is not present either. I see the consumer and producer performance test scripts in the bin directory, but I want to run the end to end performance test and there doesn't seem to be a script for that.
After what seems like way too long I figured it out. Looks like it's actually named EndToEndLatency. The above command should be
bin/kafka-run-class.sh kafka.tools.EndToEndLatency
in the root of the project. Strangely, neither command worked for me in kafka 0.8.1. I kept getting the "Could not find or load main class" error.
Just in case anyone else is having trouble running the latency tests, try version 0.11.0.1(kafka_2.11-0.11.0.1.tgz) and use class EndToEndLatency instead of TestEndToEndLatency.
Where do I find the source of the consumer test? In the kafka-consumer-perf-test.sh the class is kafka.tools.ConsumerPerformance I don't find this one in the sources, only as a compiled .class file. I would like to look at the insides, but can't locate the .java, it isn't in the same package as the org.apache.kafka.tools.ProducerPerformance.
how can we change the number of producers?
"how can we change the number of producers?"
just run it in three diff machine.
Detail in this blog: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines
@jkreps could you please update the code using the latest kafka released?
@nahdukesaba commands for kafka 1.0.0 https://gist.github.com/zodvik/b86757d45a95ed194fc9d87e507c1bcc
I am not able to connect to esv4-hcl197.grid.linkedin.com:2181, can someone please help me with that?
FATAL Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) \u2502
org.I0Itec.zkclient.exception.ZkException: Unable to connect to esv4-hcl197.grid.linkedin.com:2181
What is the advantage of using zookeeper in different machine.
What is the advantage of using zookeeper in different machine?
What is the advantage of using zookeeper in different machine?
I used my zk in the same machine with kafka, although I'm afraid it's a dangerous decision.
I did this because I could save three machines.
some key point is:
1.kafka data transferring on the NIC will make zk connection timeout due to hit the limits of NIC.
2.if zk use the same data disk with kafka,zk will have a IO blocking while kafka busy reading and writing.
so when I use zk within the three machines same with kafka. I set zk data dir to an independent disk. such as os disk. usually ssd.
For recent version(test with 2.3.0):
- clone kafka source code, then run
./gradlew jarAll -x signArchives -x test -x javadoc -x scaladoc
- run test
bin/kafka-run-class.sh org.apache.kafka.tools.ProducerPerformance --topic test-rep-one --num-records 50000000 --record-size 100 --throughput=-1 --producer.config ./test.conf
test.conf:
[root@dx-app2 kafka]# cat test.conf
# list of brokers used for bootstrapping knowledge about the rest of the cluster
# format: host1:port1,host2:port2 ...
bootstrap.servers=localhost:9092
# the default batch size in bytes when batching multiple records sent to a partition
batch.size=8196
# the total bytes of memory the producer can use to buffer records waiting to be sent to the server
buffer.memory=67108864
@daixiang0
Hi, I can't understand the first step. What do you mean by saying "clone kafka source code"?
I think the answer is simple, for example, run 'git clone ...location' command where ...location is Kafka source on github.
Is this test setup running zookeeper on 3 nodes or just 1? It's not clear from this file.
I believe the zookeeper switch on 'kafka-consumer-perf-test' has deprecated for 'kafka-consumer-perf-test.bat' . (I am running remotely from kafka path CLI)
Running the below from CLI I am getting the error.
kafka-consumer-perf-test --topic test --bootstrap-server test:XXXX --messages 10 --threads 1 --consumer.config C:*****\consumer.properties --group test --timeout 100000 --print-metrics
Exception in thread "main" java.util.IllegalFormatConversionException: f != java.lang.Integer
at java.util.Formatter$FormatSpecifier.failConversion(Formatter.java:4302)
at java.util.Formatter$FormatSpecifier.printFloat(Formatter.java:2806)
at java.util.Formatter$FormatSpecifier.print(Formatter.java:2753)
at java.util.Formatter.format(Formatter.java:2520)
at java.util.Formatter.format(Formatter.java:2455)
at java.lang.String.format(String.java:2940)
at scala.collection.immutable.StringLike.format(StringLike.scala:354)
at scala.collection.immutable.StringLike.format$(StringLike.scala:353)
at scala.collection.immutable.StringOps.format(StringOps.scala:33)
at kafka.utils.ToolsUtils$.$anonfun$printMetrics$3(ToolsUtils.scala:60)
at kafka.utils.ToolsUtils$.$anonfun$printMetrics$3$adapted(ToolsUtils.scala:58)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at kafka.utils.ToolsUtils$.printMetrics(ToolsUtils.scala:58)
at kafka.tools.ConsumerPerformance$.main(ConsumerPerformance.scala:82)
at kafka.tools.ConsumerPerformance.main(ConsumerPerformance.scala)
@daixiang0
Hi, I can't understand the first step. What do you mean by saying "clone kafka source code"?
I think to clone source code is to build Kafka from source. If you've already installed, just go run the shell script. :)
Hello
Single-thread, sync 3x replication
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000 100 -1 acks=-1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=64000
Three Producers, 3x async replication
bin/kafka-run-class.sh org.apache.kafka.clients.tools.ProducerPerformance test 50000000 100 -1 acks=1 bootstrap.servers=esv4-hcl198.grid.linkedin.com:9092 buffer.memory=67108864 batch.size=8196
Sorry but what is the difference between sync and async replication ? i was thinking synch mode mode corresponds to acks=1 and async correspond to acks=all but the command lines let me think something different ?
Thanks
On trunk (commit
a30491ac5d3fc5967e8c4b16c68bdbfc312748f5
) we have broken consumer test.Consumer show zero stats and on brokers we have the next error: