simpleperf、Flame Graph使用简介

在实际的工程中，可以使用simpleperf分析camera gpu内存泄露和GPU内存拆解。一般camera gpu泄露问题总体不太容易定位，在代码层面上，gpu memory的分配和消费涉及到camera app、camera provider和camera server 众多业务逻辑及gpu驱动的memory management。但是如果我们抓住内存分配和回收在内核中的末端接口，从下往上dump 出所有alloc和free的调用，通过对比是可以定位出在哪里出现了alloc但是没有free的情况，也就是可以找到泄漏点，如果能找到泄露点，自然距离解决问题就不远了。

simpleperf简介

为了达成上述的目的，我们需要一个工具可以追踪gpu驱动kgsl 中定义的关于内存的tracepoints，也就是kgsl_mem_alloc、kgsl_mem_free、kgsl_mem_map，使用这个工具dump出所有调用这三个接口的调用，这种工具有很多，但是如果需要带上用户空间的backtrace可选项就少了。介于kgsl有现成的静态tracepoint以及工具易用度，这里选择了simpleperf。

simpleperf 是原linux平台perf的简化android版本，现在已经收录在google source code中，具体的google 官方代码路径为：https://cs.android.com/android/platform/superproject/+/master:system/extras/simpleperf/

Simpleperf是一个命令行工具，它的工具集包涵client端和host端。

-Client端：运行在Android系统上的可执行文件，负责收集性能数据；shell中端中直接使用simpleperf record或者simpleperf report命令。

-Host端：则运行开发机上，负责对数据进行分析和可视化；可以使用脚本report_html.py进行可视化。

Simpleperf还包含其他脚本：

用于记录事件的脚本（脚本或可执行文件均可记录）：app_profiler.py、run_simpleperf_without_usb_connection.py

用于报告的脚本：report.py、report_html.py

用于分析 profiling data 的脚本：simpleperf_report_lib.py

Simpleperf使用

将Simpleperf工具的可执行程序 push 到手机上

adb push ~/system/extras/simpleperf/scripts/bin/android/arm64/simpleperf /data/local/tmp

adb shell chmod 777 /data/local/tmp/simpleperf

启动手机上的被测程序，ps 出该程序的进程ID

adb shell ps -A | grep "需要分析的应用程序包名"

例如：

adb shell ps -A|grep camera
- 查询provider_pid_1273

记录运行结果数据perf.data

# -p <进程号>

# --duration <持续的时间（秒为单位）>

# -o <输出文件名称>

# --call-graph dwarf 用在32位系统中，64位则采用--call-graph fp

adb shell simpleperf record -p 11555 --duration 10 -o /sdcard/perf.data --call-graph fp

报告结果数据：将data转为txt

adb shell simpleperf report -i /sdcard/perf.data -o /sdcard/perf.txt

将手机的文件pull到电脑指定路径

adb pull /sdcard/perf.txt ./
adb pull /sdcard/perf.data ./

使用脚本report_html.py将data转为html文件

perf.data是一个文本文件，如果嵌套过深，基本就看不懂了。

可通过其中一个脚本（python report.py -g）启动GUI显示可视化结果。

还可以通过如下转化为HTML文件。

python3 ~/system/extras/simpleperf//report_html.py -i perf.data -o simpleperf.html

下载FlameGraph

https://github.com/brendangregg/FlameGraph.git

# 将FlameGraph下载到指定目录simpleperf-demo，并将simpleperf复制到该目录
git clone https://github.com/brendangregg/FlameGraph.git
chmod 777 FlameGraph/flamegraph.pl
chmod 777 FlameGraph/stackcollapse-perf.pl
cp -r /home/hqb/Android/Sdk/ndk/21.3.6528147/simpleperf /home/mi/simpleperf-demo

生成火焰图

cd /home/mi/simpleperf-demo
python ./simpleperf/report_sample.py > out.perf
./FlameGraph/stackcollapse-perf.pl out.perf > out.folded
./FlameGraph/flamegraph.pl out.folded > out.svg

火焰图（Flame Graph）

生成的perf-*.html文件，使用Google浏览器打开后，能看到：各个线程占用的CPU时钟周期 & 火焰图（Flame Graph）。

简介

官网：Flame Graphs

在线案例：mysqld`page_align (97 samples, 0.03%)

火焰图是基于 stack 信息生成的 SVG 图片, 用来展示 CPU 的调用栈，用浏览器打开可以与用户互动。

火焰图中的每一个方框是一个函数，方框的长度，代表了它的执行时间，所以越宽的函数，执行越久。火焰图的楼层每高一层，就是更深一级的函数被调用，最顶层的函数，是叶子函数。

y 轴表示调用栈，每一层都是一个函数。调用栈越深，火焰就越高，顶部就是正在执行的函数，下方都是它的父函数。

x 轴表示抽样数，如果一个函数在 x 轴占据的宽度越宽，就表示它被抽到的次数多，即执行的时间长。

注意，x 轴不代表时间，而是所有的调用栈合并后，按字母顺序排列的。

火焰图就是看顶层的哪个函数占据的宽度最大。只要有"平顶"（plateaus），就表示该函数可能存在性能问题。

颜色没有特殊含义，因为火焰图表示的是 CPU 的繁忙程度，所以一般选择暖色调。

3.2-火焰图查看方法

3.2.1-鼠标悬浮

火焰的每一层都会标注函数名，鼠标悬浮时会显示完整的函数名、抽样抽中的次数、占据总抽样次数的百分比。下面是一个例子。

3.2.2-点击放大

在某一层点击，火焰图会水平放大，该层会占据所有宽度，显示详细信息。

左上角会同时显示"Reset Zoom"，点击该链接，图片就会恢复原样。

3.2.3-搜索

按下 Ctrl + F 会显示一个搜索框，或点击右上角半透明的Search按钮，输入关键词或正则表达式，所有符合条件的函数名会高亮显示。

demo参考

perf_kgsl_get.sh

#!/bin/bash


perfdata="/data/local/tmp/perf.data"
ultis_dir="../utils/simpleperf/"

function initialize()
{
    adb root  >> /dev/null 2>&1
    adb remount  >> /dev/null 2>&1
    #if you need show allocation and release details of function, set show_func_detail as true pls.
    show_func_detail=false   #true
    #get pid
    pid=$(adb shell ps | grep camera.provider | awk '{print $2}')
    #pid=$(adb shell ps | grep com.android.camera | awk '{print $2}')
    len=$(echo "$pid" | wc -l)
    if [[ $len -gt 1 ]];then
        echo "ERROR:pid is Greater than one!!!"
        exit 1
    fi
    product=`adb shell getprop ro.product.name`
    local_path=${product}_$(date +%F_%H%M%S)
    mkdir $local_path
    adb shell rm $perfdata >> /dev/null 2>&1
}

function get_perfdata()
{
    echo "Begin to catch perf data..."
    if [ ! -n "$1" ] ;then
        #here are some other events we can use to debug: kgsl:kgsl_context_create,kgsl:kgsl_context_destroy,kgsl:kgsl_pool_free_page,kgsl:kgsl_pool_get_page,kgsl:kgsl_pool_add_page
        adb shell "simpleperf record -e fastrpc:fastrpc_dma_alloc,fastrpc:fastrpc_dma_free,fastrpc:fastrpc_dma_map,fastrpc:fastrpc_dma_unmap -a --call-graph dwarf -o $perfdata" &
    else
        adb shell "simpleperf record -e fastrpc:fastrpc_dma_alloc,fastrpc:fastrpc_dma_free,fastrpc:fastrpc_dma_map,fastrpc:fastrpc_dma_unmap -a --call-graph dwarf --duration $1 -o $perfdata" &
    fi
}

#由perf.data解析出gpu result
function get_fastrpc_result()
{
    echo ""
    ps -aux | grep -Pai "simpleperf[ ].+" | awk '{print $2}' | xargs kill -9
    #由于写perf.data文件需要比较久的时间，所以需要判断文件是否写完
    while :
    do
      #获取结束标志
      adb shell cat $perfdata | head -n 1 |  grep -Pia "PERFILE2h"  >> /dev/null 2>&1
      if [ $? -eq 0 ];
      then
        break
      fi
      sleep 1
      adb shell ls -al '/data/local/tmp/' | grep -Pia "perf.data" >> /dev/null 2>&1
      if [ $? -eq 1 ];
      then
        echo "------------------------------------"
        echo "Do not generate perf.data, exit!!!"
        echo "------------------------------------"
        exit 1
      fi
    done

    adb shell "simpleperf report -i $perfdata" > $local_path/perf.txt
    adb shell "simpleperf report -i $perfdata -g --full-callgraph" > $local_path/perf_callgraph.txt

    adb pull $perfdata $local_path

    python3 ${ultis_dir}simpleperf/scripts/report_sample.py -i $local_path/perf.data  --show_tracing_data > $local_path/perf_trace_report.txt
    if [ "$show_func_detail" == "true" ]; then
        python2 gpu_get_result_detail.py $local_path/perf_trace_report.txt gpu >$local_path/perf_trace_result.txt
    else
        python2 gpu_get_result.py $local_path/perf_trace_report.txt gpu >$local_path/perf_trace_result.txt
    fi
    echo "Output file is $local_path/perf_trace_result.txt"
    exit 0
}


trap 'get_fastrpc_result' INT

function main()
{
    initialize
    get_perfdata $1
    while :
    do
        sleep 10
    done
    get_fastrpc_result
}

main $1

gpu_get_result.py

# -*- coding: utf-8 -*-
import sys
import string
import copy

#Total dict structure
# total_dict = {pid:list(total_info_dict,thread_dict,func_dict),}

# stats_dict = {'alloc': 0, 'map': 0, 'free': 0, 'diff': 0, 'max': 0, 'alloc_time': 0, 'free_time': 0}
# thread_dict = {thread_tid:thread_stats_dict}
# func_dict = {func:func_stats_dict}

list_pid = []
list_tid = []
list_func_so = []

gpu_commonlib_list = ['libc.so', 'libgsl.so', 'libCB.so', 'libOpenCL.so']
func_flag = 0
so_flag = 0
gpu = False

def stats_calc(stats_dict, entry_size, entry_flag ):
    if entry_flag == '+':
        stats_dict['alloc'] += entry_size
        stats_dict['alloc_time'] += 1
        stats_dict['diff'] += entry_size
    elif entry_flag == '++':
        stats_dict['map'] += entry_size
        stats_dict['map_time'] += 1
        stats_dict['diff'] += entry_size
    else:
        stats_dict['free'] += entry_size
        stats_dict['free_time'] += 1
        stats_dict['diff'] -= entry_size
    if stats_dict['diff'] > stats_dict['max']:
        stats_dict['max'] = stats_dict['diff']
    return

if len(sys.argv) < 3:
    print "Please input file and type( gpu),such as: python gpu_get_result.py a.log gpu"
    sys.exit()

f=open(sys.argv[1], "r")
if sys.argv[2] == "gpu":
    event_type = ("kgsl:kgsl_mem_free", "kgsl:kgsl_mem_alloc", "kgsl:kgsl_mem_map")
    so = []
    entry_done = 0
    entry_tgid = 0
    entry_func_name = "other"
    entry_usage = "other"

    init_stats_dict = {'alloc': 0, 'map': 0, 'free': 0, 'diff': 0, 'max': 0, 'alloc_time': 0, 'map_time': 0, 'free_time': 0}
    init_dict_list = [init_stats_dict, {}, {}]
    total_dict = {}

    for line in f:
        # entry start
        if any(event in line for event in event_type):
            tid_name = line.split('\t')[0].strip()
            tid = line.split('\t')[1].strip().split()[0]
            entry_tid_id = tid_name + '_' + tid
            gpu = True

            #if ("kgsl:kgsl_mem_alloc" in line or "kgsl:kgsl_mem_map" in line):   # Start of each event
            if ("kgsl:kgsl_mem_alloc" in line ):
                entry_flag = "+"
            elif ("kgsl:kgsl_mem_map" in line):
                entry_flag = "++"
            else:
                entry_flag = "-"
            list_tid.append(tid_name)
            continue
        if gpu == True:
            if so_flag == 0 and len(line.split('\t')) > 0 and ("tracing data:\n" not in line):
                if line.split()[1].startswith("lib"):
                    if len(line.split()[1].split('[')):
                        so.append(line.split()[1].split('[')[0])   # Remove addr off info,such as [+8596c]    ']'))']'
                continue
            so_flag = 1
            for so_name in so:
                if any(gpu_commonlib in so_name for gpu_commonlib in gpu_commonlib_list) == False:
                    entry_func_name = so_name
                    func_flag = 1
                    break
            if (func_flag == 0):
                entry_func_name = "other"

            if ("size" == line.split(':')[0].strip()):
                entry_size = int(line.split(':')[1].strip())
                continue
            if ("tgid" == line.split(':')[0].strip()) :
                entry_tgid = line.split(':')[1].strip()
                continue
            if ("usage" == line.split(':')[0].strip()):
                entry_usage = line.split(':')[1].strip()
                entry_done = 1

            if entry_done == 1:
                entry_done = 0
                gpu = False
                so_flag = 0
                func_flag = 0
                so = []
                stats_dict = {}
                thread_dict = {}
                func_dict = {}
                dict_list = []
                #  PID overview
                if total_dict.has_key(entry_tgid) == False:
                   total_dict[entry_tgid] = copy.deepcopy(init_dict_list)
                   list_pid.append(entry_tgid)

                dict_list = total_dict[entry_tgid]
                stats_dict = dict_list[0]

                if entry_flag == '+':
                    stats_dict['alloc'] += entry_size
                    stats_dict['alloc_time'] += 1
                    stats_dict['diff'] += entry_size
                elif entry_flag == '++':
                    stats_dict['map'] += entry_size
                    stats_dict['map_time'] += 1
                    stats_dict['diff'] += entry_size
                else:
                    stats_dict['free'] += entry_size
                    stats_dict['free_time'] += 1
                    stats_dict['diff'] -= entry_size
                if stats_dict['diff'] > stats_dict['max']:
                    stats_dict['max'] = stats_dict['diff']

                thread_dict = dict_list[1]
                if thread_dict.has_key(entry_tid_id) == False:
                    thread_dict[entry_tid_id] = copy.deepcopy(init_stats_dict)
                stats_dict = thread_dict[entry_tid_id]

                if entry_flag == '+':
                    stats_dict['alloc'] += entry_size
                    stats_dict['alloc_time'] += 1
                    stats_dict['diff'] += entry_size
                elif entry_flag == '++':
                    stats_dict['map'] += entry_size
                    stats_dict['map_time'] += 1
                    stats_dict['diff'] += entry_size
                else:
                    stats_dict['free'] += entry_size
                    stats_dict['free_time'] += 1
                    stats_dict['diff'] -= entry_size
                if stats_dict['diff'] > stats_dict['max']:
                    stats_dict['max'] = stats_dict['diff']

                func_dict = dict_list[2]
                if func_dict.has_key(entry_func_name) == False:
                    func_dict[entry_func_name] = copy.deepcopy(init_stats_dict)
                    list_func_so.append(entry_func_name)
                stats_dict = func_dict[entry_func_name]
                stats_calc(stats_dict, entry_size, entry_flag)

print "All pid: ", list(set(list_pid))
print "All tid: ", list(set(list_tid))
print "All func: ", list(set(list_func_so))
print "======================================================================================================="
print "Memory cost and revert distribution:"
print "(diff means gpu memory usage, if the diff still exists after completing the case and closing the camera, it is likely to be a gpu leak)"
for key in total_dict:
    print "pid:",key
    stats_dict = {}
    stats_dict=total_dict[key]
    # output the pid kgsl total message
    for message in stats_dict[0]:
        if message == "alloc" or message == "map" or message == "max"or message == "free"or message == "diff":
            
            if message == "diff" and stats_dict[0][message] != 0:
                    print("%10s:%12s MB"%(message,stats_dict[0][message]/1024./1024))
            else:
                print('%10s:%12s MB'%(message,stats_dict[0][message]/1024./1024))
        else:
            print('%10s:%12s MB'%(message,stats_dict[0][message]))
    print "*****tid message*****"
    for message in stats_dict[1]:
        tid_dict = {}
        tid_dict = stats_dict[1][message]

        print('%25s'%message),

        for tismessage in tid_dict:
            if tismessage == "alloc" or tismessage == "map" or tismessage == "max"or tismessage == "free" or tismessage == "diff":
                if tismessage == "diff" and tid_dict[tismessage] != 0:
                    print("%10s:%12s MB"%(tismessage,tid_dict[tismessage]/1024./1024)),
                else:
                    print('%10s:%12s MB'%(tismessage,tid_dict[tismessage]/1024./1024)),
            else:
                print('%10s:%5s'%(tismessage,tid_dict[tismessage])),
        print ""
    print "*****func message*****"
    for message in stats_dict[2]:
        func_dict = {}
        func_dict = stats_dict[2][message]

        print('%25s'%message),

        for funmessage in func_dict:
            if funmessage == "alloc" or funmessage == "map" or funmessage == "max"or funmessage == "free"or funmessage == "diff":
                if funmessage == "diff" and func_dict[funmessage] != 0:
                    print("%10s:%12s MB"%(funmessage,func_dict[funmessage]/1024./1024)),
                else:
                    print('%10s:%12s MB'%(funmessage,func_dict[funmessage]/1024./1024)),
            else:
                print('%10s:%5s'%(funmessage,func_dict[funmessage])),
        print ""
    print "======================================================================================================="
    # print (total_dict)