go学习(九)——Go tool pprof之runtime/pprof 的使用

测试环境:centos7 go1.9 go1.6 gvmv1.0.22

Go 中监控代码性能的有两个包:

  • net/http/pprof
  • runtime/pprof

这两个包都是可以监控代码性能的, 只不过net/http/pprof是通过http端口方式暴露出来的,内部封装的仍然是runtime/pprof。

1.CPU Profiling

Golang 提供了 pprof 包(runtime/pprof)用于输出运行时的 profiling 数据,这些数据可以被 pprof 工具(或者 go tool pprof,其为 pprof 的变种)使用。通常我们这样来使用 pprof 包:

// 定义 flag cpuprofile
var cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file")
func main() {
	flag.Parse()
	// 如果命令行设置了 cpuprofile
	if *cpuprofile != "" {
		// 根据命令行指定文件名创建 profile 文件
		f, err := os.Create(*cpuprofile)
		if err != nil {
			log.Fatal(err)
		}
		// 开启 CPU profiling
		pprof.StartCPUProfile(f)
		defer pprof.StopCPUProfile()
	}
	...

假定我们编写的一个程序 mytest 中加入了上述代码则可以执行并生成 profile 文件:

./mytest -cpuprofile=mytest.prof

这里,我们生成了 mytest.prof profile 文件。有了 profile 文件就可以使用 go tool pprof 程序来解析此文件:

go tool pprof mytest mytest.prof

pprof 程序中最重要的命令就是 topN,此命令用于显示 profile 文件中的最靠前的 N 个样本(samples)。

(pprof) top10
Total: 2525 samples
	298  11.8%  11.8%	 345  13.7% runtime.mapaccess1_fast64
	268  10.6%  22.4%	2124  84.1% main.FindLoops
	251   9.9%  32.4%	 451  17.9% scanblock
	178   7.0%  39.4%	 351  13.9% hash_insert
	131   5.2%  44.6%	 158   6.3% sweepspan
	119   4.7%  49.3%	 350  13.9% main.DFS
	 96   3.8%  53.1%	  98   3.9% flushptrbuf
	 95   3.8%  56.9%	  95   3.8% runtime.aeshash64
	 95   3.8%  60.6%	 101   4.0% runtime.settype_flush
	 88   3.5%  64.1%	 988  39.1% runtime.mallocgc

2.runtime/pprof 的用法示例

这里的例子我们用 递归实现的斐波纳契数列来测试性能,斐波纳契数列 的代码如下

package main


import (

    "flag"

    "fmt"

    "log"

    "os"

    "runtime/pprof"

)


var (

  //定义外部输入文件名字

  cpuprofile = flag.String("cpuprofile", "", "write cpu profile to file.")

)


func main() {

    log.Println("begin")

    flag.Parse()


    if *cpuprofile != "" {

        f, err := os.Create(*cpuprofile)

        if err != nil {

            log.Fatal(err)

        }


        pprof.StartCPUProfile(f)

        defer pprof.StopCPUProfile()

    }


    for i := 0; i < 30; i++ {

        nums := fibonacci(i)

        fmt.Println(nums)

    }

}

//递归实现的斐波纳契数列

func fibonacci(num int) int {

    if num < 2 {

        return 1

    }

    return fibonacci(num-1) + fibonacci(num-2)

}

创建名字为testfibonacci的go文件,并写入上述代码,执行命令go build testfibonacci.go

运行程序的时候加一个--cpuprofile参数,比如./testfibonacci --cpuprofile=fibonacci.prof

这样程序运行的时候的cpu信息就会记录到XXX.prof中了。

下一步就可以使用这个prof信息做出性能分析图了(需要安装graphviz)。

查看prof文件:

使用go tool pprof (应用程序) (应用程序的prof文件),即:go tool pprof testfibonacci fibonacci.prof

进入到pprof,使用top命令可以查看样本数据,使用web命令就会生成svg文件:


参考链接:http://blog.csdn.net/zhonglinzhang/article/details/71191650

                  http://www.cnblogs.com/yjf512/archive/2012/12/27/2835331.html

                  https://studygolang.com/articles/3176


已标记关键词 清除标记
<div class="post-text" itemprop="text"> <p>i have an application written in Go is doing message processing, need to pick up message from network (UDP) at a rate of 20K/second (potentially more), and each message can be up to UDP packet's maximum length (64KB-headersize), the program needs to decode this incoming packet and encode into another format and send to another network;</p> <p>right now on a 24core+64GB RAM machine it runs ok, but occasionally lost some packets, the programming pattern is already following the <a href="https://blog.golang.org/pipelines" rel="nofollow noreferrer">pipelines</a> using multiple go-routines / channels and it takes 10% of whole machine cpu load; so it has potential to use more CPU% or RAM to handle all 20K/s messages without losing any one; then i started profiling, following this <a href="https://blog.golang.org/profiling-go-programs" rel="nofollow noreferrer">profiling</a> I found in cpu profile the <code>runtime.mallocgc</code> appears the top one, that is garbage collector runtime, I suspect this GC could be the culprit it hangs for a few millisecond (or some microseconds) and lost some packets, and some best practices says switch to sync.Pool might help, but my switch to pool seemingly results more CPU contention and lost even more packets and more often</p> <pre><code>(pprof) top20 -cum (sync|runtime) 245.99s of 458.81s total (53.61%) Dropped 487 nodes (cum <= 22.94s) Showing top 20 nodes out of 22 (cum >= 30.46s) flat flat% sum% cum cum% 0 0% 0% 440.88s 96.09% runtime.goexit 1.91s 0.42% 1.75% 244.87s 53.37% sync.(*Pool).Get 64.42s 14.04% 15.79% 221.57s 48.29% sync.(*Pool).getSlow 94.29s 20.55% 36.56% 125.53s 27.36% sync.(*Mutex).Lock 1.62s 0.35% 36.91% 72.85s 15.88% runtime.systemstack 22.43s 4.89% 41.80% 60.81s 13.25% runtime.mallocgc 22.88s 4.99% 46.79% 51.75s 11.28% runtime.scanobject 1.78s 0.39% 47.17% 49.15s 10.71% runtime.newobject 26.72s 5.82% 53.00% 39.09s 8.52% sync.(*Mutex).Unlock 0.76s 0.17% 53.16% 33.74s 7.35% runtime.gcDrain 0 0% 53.16% 33.70s 7.35% runtime.gcBgMarkWorker 0 0% 53.16% 33.69s 7.34% runtime.gcBgMarkWorker.func2 </code></pre> <p>the use of pool is the standard</p> <pre><code>// create this one globally at program init var rfpool = &sync.Pool{New: func() interface{} { return new(aPrivateStruct); }} // get rf := rfpool.Get().(*aPrivateStruct) // put after done processing this message rfpool.Put(rf) </code></pre> <p>not sure am I doing wrong? or wonder what are the other ways could tune the GC to use less CPU%? the go version is 1.8</p> <p>the list shows a lot of lock contention happened in the pool.getSlow <a href="https://golang.org/src/sync/pool.go" rel="nofollow noreferrer">src to pool.go at golang.org</a></p> <pre><code>(pprof) list sync.*.getSlow Total: 7.65mins ROUTINE ======================== sync.(*Pool).getSlow in /opt/go1.8/src/sync/pool.go 1.07mins 3.69mins (flat, cum) 48.29% of Total . . 144: x = p.New() . . 145: } . . 146: return x . . 147:} . . 148: 80ms 80ms 149:func (p *Pool) getSlow() (x interface{}) { . . 150: // See the comment in pin regarding ordering of the loads. 30ms 30ms 151: size := atomic.LoadUintptr(&p.localSize) // load-acquire 180ms 180ms 152: local := p.local // load-consume . . 153: // Try to steal one element from other procs. 30ms 130ms 154: pid := runtime_procPin() 20ms 20ms 155: runtime_procUnpin() 730ms 730ms 156: for i := 0; i < int(size); i++ { 51.55s 51.55s 157: l := indexLocal(local, (pid+i+1)%int(size)) 580ms 2.01mins 158: l.Lock() 10.65s 10.65s 159: last := len(l.shared) - 1 40ms 40ms 160: if last >= 0 { . . 161: x = l.shared[last] . . 162: l.shared = l.shared[:last] . 10ms 163: l.Unlock() . . 164: break . . 165: } 490ms 37.59s 166: l.Unlock() . . 167: } 40ms 40ms 168: return x . . 169:} . . 170: . . 171:// pin pins the current goroutine to P, disables preemption and returns poolLocal pool for the P. . . 172:// Caller must call runtime_procUnpin() when done with the pool. . . 173:func (p *Pool) pin() *poolLocal { </code></pre> </div>
©️2020 CSDN 皮肤主题: 酷酷鲨 设计师:CSDN官方博客 返回首页