For some demo code on Google Cloud
$ hey -n 16 -c 2 -m POST -H "Content-Type: application/json" -D /tmp/data.json "http://127.0.0.1:8000/v1/chat/completions"
Summary:
Total: 103.4511 secs
Slowest: 12.9607 secs
Fastest: 12.8936 secs
Average: 12.9302 secs
Requests/sec: 0.1547
Total data: 17568 bytes
Size/request: 1098 bytes
Response time histogram:
12.894 [1] |■■■■■■■
12.900 [0] |
12.907 [6] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
12.914 [0] |
12.920 [0] |
12.927 [1] |■■■■■■■
12.934 [0] |
12.941 [0] |
12.947 [2] |■■■■■■■■■■■■■
12.954 [1] |■■■■■■■
12.961 [5] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
Latency distribution:
10% in 12.9032 secs
25% in 12.9044 secs
50% in 12.9462 secs
75% in 12.9573 secs
90% in 12.9607 secs
0% in 0.0000 secs
0% in 0.0000 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 12.8936 secs, 12.9607 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0001 secs, 0.0001 secs, 0.0002 secs
resp wait: 12.9299 secs, 12.8934 secs, 12.9605 secs
resp read: 0.0001 secs, 0.0000 secs, 0.0001 secs
Status code distribution:
[200] 16 responses
$ hey -n 27 -c 3 -m POST -H "Content-Type: application/json" -D /tmp/data.json "http://127.0.0.1:8000/v1/chat/completions"
Summary:
Total: 164.0015 secs
Slowest: 18.0583 secs
Fastest: 17.1194 secs
Average: 17.3301 secs
Requests/sec: 0.1646
Total data: 19764 bytes
Size/request: 1098 bytes
Response time histogram:
17.119 [1] |■■■■
17.213 [9] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
17.307 [5] |■■■■■■■■■■■■■■■■■■■■■■
17.401 [0] |
17.495 [0] |
17.589 [0] |
17.683 [0] |
17.777 [0] |
17.870 [0] |
17.964 [0] |
18.058 [3] |■■■■■■■■■■■■■
Latency distribution:
10% in 17.1498 secs
25% in 17.1555 secs
50% in 17.2097 secs
75% in 17.2749 secs
90% in 18.0583 secs
0% in 0.0000 secs
0% in 0.0000 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 17.1194 secs, 18.0583 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0001 secs, 0.0001 secs, 0.0001 secs
resp wait: 17.3299 secs, 17.1191 secs, 18.0581 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0001 secs
Status code distribution:
[200] 18 responses
Error distribution:
[9] Post "http://127.0.0.1:8000/v1/chat/completions": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
$ hey -n 27 -c 3 -t 60 -m POST -H "Content-Type: application/json" -D /tmp/data.json "http://127.0.0.1:8000/v1/chat/completions"
Summary:
Total: 157.3466 secs
Slowest: 18.1024 secs
Fastest: 17.1515 secs
Average: 17.4807 secs
Requests/sec: 0.1716
Total data: 29646 bytes
Size/request: 1098 bytes
Response time histogram:
17.152 [1] |■■■
17.247 [12] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
17.342 [5] |■■■■■■■■■■■■■■■■■
17.437 [0] |
17.532 [0] |
17.627 [0] |
17.722 [0] |
17.817 [0] |
17.912 [0] |
18.007 [4] |■■■■■■■■■■■■■
18.102 [5] |■■■■■■■■■■■■■■■■■
Latency distribution:
10% in 17.1527 secs
25% in 17.2091 secs
50% in 17.2691 secs
75% in 17.9939 secs
90% in 18.0804 secs
95% in 18.1024 secs
0% in 0.0000 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 17.1515 secs, 18.1024 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0001 secs, 0.0001 secs, 0.0002 secs
resp wait: 17.4805 secs, 17.1514 secs, 18.1022 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0001 secs
Status code distribution:
[200] 27 responses
$ hey -n 32 -c 4 -t 60 -m POST -H "Content-Type: application/json" -D /tmp/data.json "http://127.0.0.1:8000/v1/chat/completions"
Summary:
Total: 185.8593 secs
Slowest: 29.0606 secs
Fastest: 22.3235 secs
Average: 23.2285 secs
Requests/sec: 0.1722
Total data: 35088 bytes
Size/request: 1096 bytes
Response time histogram:
22.323 [1] |■
22.997 [27] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
23.671 [0] |
24.345 [0] |
25.018 [0] |
25.692 [0] |
26.366 [0] |
27.039 [0] |
27.713 [0] |
28.387 [0] |
29.061 [4] |■■■■■■
Latency distribution:
10% in 22.3366 secs
25% in 22.3433 secs
50% in 22.4335 secs
75% in 22.4992 secs
90% in 28.9879 secs
95% in 29.0606 secs
0% in 0.0000 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0000 secs, 22.3235 secs, 29.0606 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0001 secs, 0.0001 secs, 0.0002 secs
resp wait: 23.2283 secs, 22.3233 secs, 29.0601 secs
resp read: 0.0000 secs, 0.0000 secs, 0.0001 secs
Status code distribution:
[200] 32 responses
$ hey -n 36 -c 6 -t 60 -m POST -H "Content-Type: application/json" -D /tmp/data.json "http://127.0.0.1:8000/v1/chat/completions"
Summary:
Total: 216.0321 secs
Slowest: 59.2439 secs
Fastest: 19.8983 secs
Average: 34.5948 secs
Requests/sec: 0.1666
Total data: 38414 bytes
Size/request: 1097 bytes
Response time histogram:
19.898 [1] |■■
23.833 [0] |
27.767 [2] |■■■■
31.702 [5] |■■■■■■■■■■■
35.637 [18] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
39.571 [3] |■■■■■■■
43.506 [3] |■■■■■■■
47.440 [1] |■■
51.375 [1] |■■
55.309 [0] |
59.244 [1] |■■
Latency distribution:
10% in 31.6910 secs
25% in 31.7065 secs
50% in 34.3068 secs
75% in 36.9943 secs
90% in 43.5902 secs
95% in 59.2439 secs
0% in 0.0000 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0001 secs, 19.8983 secs, 59.2439 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0001 secs, 0.0001 secs, 0.0002 secs
resp wait: 34.5945 secs, 19.8982 secs, 59.2430 secs
resp read: 0.0001 secs, 0.0000 secs, 0.0001 secs
Status code distribution:
[200] 35 responses
Error distribution:
[1] Post "http://127.0.0.1:8000/v1/chat/completions": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
$ hey -n 30 -c 5 -t 60 -m POST -H "Content-Type: application/json" -D /tmp/data.json "http://127.0.0.1:8000/v1/chat/completions"
Summary:
Total: 166.4694 secs
Slowest: 44.6345 secs
Fastest: 11.7664 secs
Average: 27.3764 secs
Requests/sec: 0.1802
Total data: 32933 bytes
Size/request: 1097 bytes
Response time histogram:
11.766 [1] |■■
15.053 [0] |
18.340 [0] |
21.627 [1] |■■
24.914 [2] |■■■■
28.200 [18] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
31.487 [5] |■■■■■■■■■■■
34.774 [1] |■■
38.061 [1] |■■
41.348 [0] |
44.634 [1] |■■
Latency distribution:
10% in 22.6455 secs
25% in 26.8570 secs
50% in 26.8627 secs
75% in 29.4758 secs
90% in 32.3267 secs
95% in 44.6345 secs
0% in 0.0000 secs
Details (average, fastest, slowest):
DNS+dialup: 0.0001 secs, 11.7664 secs, 44.6345 secs
DNS-lookup: 0.0000 secs, 0.0000 secs, 0.0000 secs
req write: 0.0001 secs, 0.0001 secs, 0.0002 secs
resp wait: 27.3761 secs, 11.7663 secs, 44.6340 secs
resp read: 0.0001 secs, 0.0000 secs, 0.0001 secs
Status code distribution:
[200] 30 responses