遵循一个约定:如果
goroutine
负责创建goroutine
,它也负责确保他可以停止goroutine
发送不接收,一般来说发送者,正常发送,接收者正常接收,这样没啥问题。但是一旦接收者异常,发送者会被阻塞,造成泄漏。
func leakOfMemory() {
errChan := make(chan error) //a.
go func() {
time.Sleep(2 * time.Second)
errChan <- errors.New("chan error") // b.
fmt.Println("finish ending ")
}()
select {
case <-time.After(time.Second):
fmt.Println("超时") //c
case err := <-errChan: //d.
fmt.Println("err:", err)
}
fmt.Println("leakOfMemory exit")
}
func TestLeakOfMemory(t *testing.T) {
leakOfMemory()
time.Sleep(3 * time.Second)
fmt.Println("main exit...")
fmt.Println("NumGoroutine:", runtime.NumGoroutine())
}
上面的代码执行结果:
=== RUN TestLeakOfMemory
超时
leakOfMemory exit
main exit...
NumGoroutine: 3
--- PASS: TestLeakOfMemory (4.00s)
PASS
最开始只有两个 goruntine ,为啥执行后有三个 goruntine ?
由于没有往 errChan
中发送消息,所以 d
处 会一直阻塞,1s 后 ,c
处打印超时
,程序退出,此时,有个协程在 b
处往协程中塞值,但是此时外面的 goruntine
已经退出了,此时 errChan
没有接收者,那么就会在 b
处阻塞,因此协程一直没有退出,造成了泄漏,如果有很多类似的代码,会造成 OOM
。
看如下代码:
func leakOfMemory_1(nums ...int) {
out := make(chan int)
// sender
go func() {
defer close(out)
for _, n := range nums { // c.
out <- n
time.Sleep(time.Second)
}
}()
// receiver
go func() {
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
for n := range out { //b.
if ctx.Err() != nil { //a.
fmt.Println("ctx timeout ")
return
}
fmt.Println(n)
}
}()
}
func TestLeakOfMemory(t *testing.T) {
fmt.Println("NumGoroutine:", runtime.NumGoroutine())
leakOfMemory_1(1, 2, 3, 4, 5, 6, 7)
time.Sleep(3 * time.Second)
fmt.Println("main exit...")
fmt.Println("NumGoroutine:", runtime.NumGoroutine())
}
上述代码执行结果:
=== RUN TestLeakOfMemory
NumGoroutine: 2
1
2
ctx timeout
main exit...
NumGoroutine: 3
--- PASS: TestLeakOfMemory (3.00s)
PASS
理论上,是不是最开始只有2个goruntine
,实际上执行完出现了3个gorountine
, 说明 leakOfMemory_1
里面起码有一个协程没有退出。 因为时间到了,在 a
出,程序就准备退出了,也就是说 b
这个就退出了,没有接收者继续接受 chan
中的数据了,c
处往chan
写数据就阻塞了,因此协程一直没有退出,就造成了泄漏。
如何解决上面说的协程泄漏问题?
可以加个管道通知来防止内存泄漏。
func leakOfMemory_2(done chan struct{}, nums ...int) {
out := make(chan int)
// sender
go func() {
defer close(out)
for _, n := range nums {
select {
case out <- n:
case <-done:
return
}
time.Sleep(time.Second)
}
}()
// receiver
go func() {
ctx, cancel := context.WithTimeout(context.Background(), time.Second)
defer cancel()
for n := range out {
if ctx.Err() != nil {
fmt.Println("ctx timeout ")
return
}
fmt.Println(n)
}
}()
}
func TestLeakOfMemory(t *testing.T) {
fmt.Println("NumGoroutine:", runtime.NumGoroutine())
done := make(chan struct{})
defer close(done)
leakOfMemory_2(done, 1, 2, 3, 4, 5, 6, 7)
time.Sleep(3 * time.Second)
done <- struct{}{}
fmt.Println("main exit...")
fmt.Println("NumGoroutine:", runtime.NumGoroutine())
}
代码执行结果:
=== RUN TestLeakOfMemory
NumGoroutine: 2
1
2
ctx timeout
main exit...
NumGoroutine: 2
--- PASS: TestLeakOfMemory (3.00s)
PASS
最开始是 2个 goruntine
程序结束后还2个 goruntine
,没有协程泄漏。
map
是引用类型,函数值传值是调用,参数副本依然指向m
,因为值传递的是引用,对于共享变量,资源并发读写会产生竞争,故共享资源遭受到破坏。
func TestConcurrencyMap(t *testing.T) {
m := make(map[int]int)
go func() {
for {
m[3] = 3
}
}()
go func() {
for {
m[2] = 2
}
}()
//select {}
time.Sleep(10 * time.Second)
}
上诉代码执行结果:
=== RUN TestConcurrencyMap
fatal error: concurrent map writes
goroutine 5 [running]:
runtime.throw({0x1121440?, 0x0?})
/go/go1.18.8/src/runtime/panic.go:992 +0x71 fp=0xc000049f78 sp=0xc000049f48 pc=0x10333b1
...
首先,程序代码运行前,需要加这个代码:
import (
"context"
"errors"
"fmt"
"log"
"net/http"
_ "net/http/pprof"
"runtime"
"testing"
"time"
)
func TestLeakOfMemory(t *testing.T) {
//leakOfMemory()
fmt.Println("NumGoroutine:", runtime.NumGoroutine())
for i := 0; i < 1000; i++ {
go leakOfMemory_1(1, 2, 3, 4, 5, 6, 7)
}
//done := make(chan struct{})
//defer close(done)
//leakOfMemory_2(done, 1, 2, 3, 4, 5, 6, 7)
time.Sleep(3 * time.Second)
//done <- struct{}{}
fmt.Println("main exit...")
fmt.Println("NumGoroutine:", runtime.NumGoroutine())
log.Println(http.ListenAndServe("localhost:6060", nil))
}
上面的执行后,登陆网址 http://localhost:6060/debug/pprof/goroutine?debug=1
,可以看到下面的页面:
但是看不到图形界面,怎么办?
需要安装 graphviz
在控制台执行如下命令
brew install graphviz # 安装graphviz,只需要安装一次就行了
go tool pprof -http=":8081" http://localhost:6060/debug/pprof/goroutine?debug=1
然后可以登陆网页:http://localhost:8081/ui/
看到下图:
image.png
发现有一个程序//GoProject/main/concurrency/channel.leakOfMemory_1.func1
占用 cpu 特别大. 想看下这个程序是啥?
使用如下结果:
go tool pprof http://localhost:6060/debug/pprof/goroutine
火焰图分析:
Total:总共采样次数,100次。
Flat:函数在样本中处于运行状态的次数。简单来说就是函数出现在栈顶的次数,而函数在栈顶则意味着它在使用CPU。
Flat%:Flat / Total。
Sum%:自己以及所有前面的Flat%的累积值。解读方式:表中第3行Sum% 32.4%,意思是前3个函数(运行状态)的计数占了总样本数的32.4%
Cum:函数在样本中出现的次数。只要这个函数出现在栈中那么就算进去,这个和Flat不同(必须是栈顶才能算进去)。也可以解读为这个函数的调用次数。
Cum%:Cum / Total
进入控制台,输入 top
Type: goroutine
Time: Feb 5, 2024 at 10:02am (CST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 1003, 99.90% of 1004 total
Dropped 35 nodes (cum <= 5)
flat flat% sum% cum cum%
1003 99.90% 99.90% 1003 99.90% runtime.gopark
0 0% 99.90% 1000 99.60% //GoProject/main/concurrency/channel.leakOfMemory_1.func1
0 0% 99.90% 1000 99.60% runtime.chansend
0 0% 99.90% 1000 99.60% runtime.chansend1
(pprof)
其中 其中runtime.gopark即可认为是挂起的goroutine数量。发现有大量协程被 runtime.gopark
然后输入 traces runtime.gopark
(pprof) traces runtime.gopark
Type: goroutine
Time: Feb 5, 2024 at 10:02am (CST)
-----------+-------------------------------------------------------
1000 runtime.gopark
runtime.chansend
runtime.chansend1
//GoProject/main/concurrency/channel.leakOfMemory_1.func1
-----------+-------------------------------------------------------
1 runtime.gopark
runtime.chanrecv
runtime.chanrecv1
testing.(*T).Run
testing.runTests.func1
testing.tRunner
testing.runTests
testing.(*M).Run
main.main
runtime.main
-----------+-------------------------------------------------------
1 runtime.gopark
runtime.netpollblock
internal/poll.runtime_pollWait
internal/poll.(*pollDesc).wait
internal/poll.(*pollDesc).waitRead (inline)
internal/poll.(*FD).Read
net.(*netFD).Read
net.(*conn).Read
net/http.(*connReader).backgroundRead
-----------+-------------------------------------------------------
1 runtime.gopark
runtime.netpollblock
internal/poll.runtime_pollWait
internal/poll.(*pollDesc).wait
internal/poll.(*pollDesc).waitRead (inline)
internal/poll.(*FD).Accept
net.(*netFD).accept
net.(*TCPListener).accept
net.(*TCPListener).Accept
net/http.(*Server).Serve
net/http.(*Server).ListenAndServe
net/http.ListenAndServe (inline)
//GoProject/main/concurrency/channel.TestLeakOfMemory
testing.tRunner
-----------+-------------------------------------------------------
(pprof)
可以发现泄漏了 1000 个 goruntine
。
然后通过调用栈,可以看到调用链路:
channel.leakOfMemory_1.func1->runtime.chansend1->runtime.chansend->runtime.gopark
runtime.chansend1
是阻塞的调用,协程最终被 runtime.gopark
挂起,从而导致泄漏。
然后再输入 list GoProject/main/concurrency/channel. leakOfMemory_1.func1
可以看到如下
(pprof) list //GoProject/main/concurrency/channel.
leakOfMemory_1.func1
Total: 1004
ROUTINE ======================== //GoProject/main/concurrency/channel.leakOfMemory_1.func1 in /Users/bytedance/go/src///GoProject/main/concurrency/channel/channel_test.go
0 1000 (flat, cum) 99.60% of Total
. . 62: out := make(chan int)
. . 63: // sender
. . 64: go func() {
. . 65: defer close(out)
. . 66: for _, n := range nums {
. 1000 67: out <- n
. . 68: time.Sleep(time.Second)
. . 69: }
. . 70: }()
. . 71:
. . 72: // receiver
可以看到使用了一个非缓冲的 channel
, 上面已经分析了,没有接收者,发送者out
在写入channel
时阻塞, 协程无法退出,因此有协程泄漏。
go tool pprof http://localhost:6060/debug/pprof/heap
然后输入 top
(pprof) top
Showing nodes accounting for 6662.08kB, 86.68% of 7686.14kB total
Showing top 10 nodes out of 24
flat flat% sum% cum cum%
5125.63kB 66.69% 66.69% 5125.63kB 66.69% runtime.allocm
1024.41kB 13.33% 80.01% 1024.41kB 13.33% runtime.malg
512.05kB 6.66% 86.68% 512.05kB 6.66% internal/poll.runtime_Semacquire
0 0% 86.68% 512.05kB 6.66% GoProject/main/concurrency/channel.leakOfMemory_1.func2
0 0% 86.68% 512.05kB 6.66% fmt.Fprintln
0 0% 86.68% 512.05kB 6.66% fmt.Println (inline)
0 0% 86.68% 512.05kB 6.66% internal/poll.(*FD).Write
0 0% 86.68% 512.05kB 6.66% internal/poll.(*FD).writeLock (inline)
0 0% 86.68% 512.05kB 6.66% internal/poll.(*fdMutex).rwlock
0 0% 86.68% 512.05kB 6.66% os.(*File).Write
(pprof)
看着不是很大,达不到内存增长泄漏的级别。