Colleagues wrote an api gateway service that required me to perform concurrency and stability testing. One said that the pressure test will remind you of ab, wrk tools. Apache's ab performance is a bit unsatisfactory, although the event is also epoll, which is a single thread, can not be filled with cpu. Wrk is a good thing, based on the event pool encapsulated by redis ae_event, in addition to multi-threaded mode and lua script. But if the logic of the pressure measurement is more complicated, then lua is not good, especially when third-party modules are introduced. As a gopher for two or three years of experience, it is natural to use golang to write stress test scripts.
When performing the pressure measurement, we found a go performance problem, whether it is the http pressure test client or the api server, there is a problem that the cpu utilization rate is not high. No matter how big your coroutine is, the cpu is always running out of dissatisfaction and the utilization rate is not high. Top look at each cpu core idle idle a lot, soft interrupt is no problem, the kernel log is not error, the network's full connection and semi-connection are not abnormal, network bandwidth is not a problem.
Note: 5000 associations and 10,000 coroutines, under the http pressure test scene, his cpu performance is the same.
The process of analyzing the problem?
When we turn off the API forwarding function of the server, we only keep the web function. Using the wrk pressure test, you can see that the server's cpu can be filled. The client of the pressure test is an http request, and the api gateway is also a http request, and there is a commonality. Is it possible to guess the bottleneck of the go net/http request?
Below is our analysis report by go tool pprof. Found that net/http transport takes a bit of time, transport is just a net/http connection pool, it should be very fast. There are two methods that are relatively time consuming, tryPutIdleConn is used to plug back the connection, and roundTrip is used to get the connection. Below we analyze the source code of net/http transport.
Let me talk about the data structure of the net/http transport connection pool. The most intuitive feeling is that there are many locks.
Continue to see how golang net/http gets available connections from the connection pool. The entry is the RoundTrip method.
Transport will first call getConn to get the connection, then call persistConn's roundTrip method, select various channels.
Finally, the method of returnPutIdleConn return connection is analyzed. When the request is completed, it is selected according to various conditions to be plugged back to the idle pipeline or directly closed.
Why is cpu utilization not going up?
In the statistics of the system call, we found that futex and pselect6 are much more special. Futex is the system call of the lock, pselect6 is a high-precision sleep, he can sleep subtle, nanoseconds. No doubt, no matter what your precision sleep, it will be threaded.
We analyzed the net/nttp transport source and found that there are various shared channels and mutex inside, and there are locks inside the channel. I have written an article about the problems caused by the competition of golang locks. On the one hand, there are too many syscalls, on the other hand, there are cases where the CPU is not saturated and the utilization rate is low.
For 啥cpu is not saturated, you are sleeping and go to run cpu, because there is no trigger for handoffp, so the thread will not be added, the existing thread is running pselect6 system call, well, directly paste the runtime code.
Note: My friend asked me a question. When runtime sleep, why does sysmon not retake (), sysmon code says that when it exceeds 10ms, preemption will occur, then handoffp, then startm! Futexsleep sleep is also a few microseconds, will not issue preemption scheduling, when the lock is not available multiple times in the for loop, he will yield cut out.
Solve the problem of go net/http cpu running dissatisfied?
The cause of the problem is caused by lock competition. How to reduce the lock competition of net/http? It is not enough to open more than a few net/http transport connection pools. Then do a polling algorithm for the connection pool. Do not lock this polling algorithm! ! ! Locking and creating locks are competing! The idea of improving the tuning of the client and the api gateway is the same.
So what is the problem with opening a Transport connection pool? The number of connections is significantly increased. In addition, during the pre-heating period, there will be new new connections, three handshakes, and the request will be slightly slower, and the latter will be ok. In addition, the http connection will also participate in the heartbeat check of tcp. Of course, this kind of interaction is in the kernel layer, and the upper layer does not need to care.
Looking at the cpu performance of the client in the multi-transport, the utilization of the cpu is obviously coming up. In addition, QPS throughput has reached about 6W.
At this time, we will look at the time-consuming map of go pprof cpu and find that the time-consuming of the transport is shortened a lot, whether it is the RoundTrip connection and the tryPutIdleConn return connection.
Through the flame map, I see a lot more time-consuming than net/http transport. For example, readLoop and writeLoop take more time. Analyze the source code of these two methods, and the various channels are flying. Now I have no optimization. These two methods are the core read/write logic of net/http, and this CPU consumption is acceptable. There is also a consumption of io/ioutil.ReadAll. ReadAll is constantly in the makeSlice space, which also increases the consumption of gc. You can add a sync.Pool buffer pool later.
How to analyze the performance bottleneck of golang service?
Use pprof to view the flame map and cpu time-consuming statistics, find the suspect object, and then directly look at the source code of the relevant library. When you find too many futex and pselect6, you should consider whether there is a lock conflict.
to sum up:
This is the third time I encountered a problem with the golang lock scheduling, which caused the CPU utilization to fail. At first, I thought it was the bottleneck of golang's coroutine scheduling. Last year when I wrote the cdn service gateway, I also encountered a strange problem, cpu utilization is also not up, but the top sys consumption is relatively large, through the stres sampling analysis futex call number is high, and finally the reason is the map lock The competition caused, improved to map segmentation lock solution.
Friends who are interested in Golang can add groups: 278517979 !!!
In addition, if you think the article has something to do with you! If you want to repay the money , you can use WeChat to scan the QR code below, thank you!
Also mark the original address of the blog xiaorui.cc