项目中有这样一种场景会引起SurfaceView出现ANR,在主Activity中创建并使用SurfaceView,然后不停的进入子ActivityB ,返回主Activity再进入子ActivityB这样循环,就会出现ANR的问题。

我通过查看SurfaceView源码发现了一个坑,其实很多人使用的姿势不对,他们没有出现ANR只是幸运而已。

1、如何找ANR日志

出现ANR之后我立刻想到要拿到ANR日志,可以通过如下命令获取ANR日志:

adb pull data/anr/traces.txt

这样就把ANR日志下载到电脑了。

2、分析ANR日志

打开ANR日志,可以看到main线程的堆栈信息

"main" prio=5 tid=1 Waiting
| group="main" sCount=1 dsCount=0 obj=0x75372268 self=0xaabd3ce0
| sysTid=3980 nice=-1 cgrp=top_visible sched=0/0 handle=0xf7762b50
| state=S schedstat=( 22407689743 17485653812 44070 ) utm=1882 stm=358 core=5 HZ=100
| stack=0xff307000-0xff309000 stackSize=8MB
| held mutexes=
at java.lang.Object.wait!(Native method)
- waiting on <0x0488bcf7> (a java.lang.Object)
at java.lang.Thread.parkFor$(Thread.java:1235)
- locked <0x0488bcf7> (a java.lang.Object)
at sun.misc.Unsafe.park(Unsafe.java:299)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:810)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:843)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1172)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:181)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:257)
at android.view.SurfaceView.updateWindow(SurfaceView.java:517)
at android.view.SurfaceView.onWindowVisibilityChanged(SurfaceView.java:246)
at android.view.View.dispatchWindowVisibilityChanged(View.java:9737)
at android.view.ViewGroup.dispatchWindowVisibilityChanged(ViewGroup.java:1309)
at android.view.ViewGroup.dispatchWindowVisibilityChanged(ViewGroup.java:1309)
at android.view.ViewGroup.dispatchWindowVisibilityChanged(ViewGroup.java:1309)
... repeated 10 times
at android.view.ViewRootImpl.performTraversals(ViewRootImpl.java:1415)
at android.view.ViewRootImpl.doTraversal(ViewRootImpl.java:1139)
at android.view.ViewRootImpl$TraversalRunnable.run(ViewRootImpl.java:6238)
at android.view.Choreographer$CallbackRecord.run(Choreographer.java:884)
at android.view.Choreographer.doCallbacks(Choreographer.java:696)
at android.view.Choreographer.doFrame(Choreographer.java:631)
at android.view.Choreographer$FrameDisplayEventReceiver.run(Choreographer.java:870)
at android.os.Handler.handleCallback(Handler.java:743)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:150)
at android.app.ActivityThread.main(ActivityThread.java:5621)
at java.lang.reflect.Method.invoke!(Native method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:794)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:684)

日志最直接应该就是SurfaceView导致的主线程等待问题
at java.lang.Object.wait!(Native method)
at android.view.SurfaceView.updateWindow(SurfaceView.java:517)

1、分析SurfaceView源码:

根据log可以看出是SurfacecView导致的ANR,主界面用到SurfaceView,在执行SurfaceView.updateWindow方法里面的ReentrantLock.lock()时一直阻塞在这里,导致了ANR。

打开SurfaceView源码,看到updateWindow方法里面果然有mSurfaceLock.lock()方法。

mSurfaceLock是这样被定义的:final ReentrantLock mSurfaceLock = new ReentrantLock();

肯定是有一个地方没有调用unlock释放锁,导致调用lock时一直无法获得锁,想到Canvas有lock,并且需要开发者及时unlock。

操作画布的代码并没有问题,在finally里unlock也是正确的,如下:

   

Canvas canvas = mHolder.lockCanvas();

    if(canvas != null){

        try {

            for (Heart heart : mHeartArray) {

                canvas.drawBitmap(heart.bitmap, null, heart.dst, mPaint);

            }

        } catch (Exception e) {

            e.printStackTrace();

        } finally {

            mHolder.unlockCanvasAndPost(canvas);

        }

    }



自己反复让Activity前后台切换,因为SurfaceView不可见会被销毁,可见后会被创建。这时终于复现了ANR,并且看到了一条异常:

System.err: java.lang.IllegalStateException: Surface has already been released.

于是开始具体分析源码,先看unlockCanvasAndPost实现,因为可能unlock

   

 // SurfaceView.SurfaceHolder的实现

    @Override

    public void unlockCanvasAndPost(Canvas canvas) {

        mSurface.unlockCanvasAndPost(canvas);

        mSurfaceLock.unlock();

    }

    // Surface类

    public void unlockCanvasAndPost(Canvas canvas) {

        synchronized (mLock) {

            checkNotReleasedLocked();

            //...

        }

    }



    // 找到了那个抛异常位置,如果在这里抛出异常,那么在就不会执行SurfaceLock.unlock了,最后导致再次lock的时候出现ANR。

    // 当mNativeObject=0时,会抛这个异常,接着看mNativeObject什么情况下回置为0.

   

 private void checkNotReleasedLocked() {

        if (mNativeObject == 0) {

            throw new IllegalStateException("Surface has already been released.");

        }

    }



   // 原来这个方法会把mNativeObject置为0,接分析哪里调用这个方法

   

 private void setNativeObjectLocked(long ptr) {

        //...

        mNativeObject = ptr;

        //...

    }



   // 搜索了一下,原来这里调用了setNativeObjectLocked(0)

   

@Deprecated

    public void transferFrom(Surface other) {

        if (other != this) {

            //...

            other.setNativeObjectLocked(0);

            //...

        }

    }



// SurfaceView里调用transferFrom

   

 /** @hide */

    protected void updateWindow(boolean force, boolean redrawNeeded) {

        mSurfaceLock.lock();

        try {

        } finally {

            mSurfaceLock.unlock();

        }

        try {

            ....

            if (mSurfaceCreated && (surfaceChanged || (!visible && visibleChanged))) {

                mSurfaceCreated = false;

                if (mSurface.isValid()) {

                    callbacks = getSurfaceCallbacks();

                    for (SurfaceHolder.Callback c : callbacks) {

                        c.surfaceDestroyed(mSurfaceHolder);

                    }

                }

            }

            mSurface.transferFrom(mNewSurface);

            ....

        } finally {

        }

    }

}



SurfaceView生命周期如下:

surfaceCreated:当从不可见状态变为可见状态时

surfaceChanged:当大小改变时

surfaceDestroyed:当从可见状态变为不可见状态时


根据BUG复现步骤,点击设置按钮,跳转到子页面,此时主界面处于不可见状态,因此SurfaeView会被销毁,所以会调用surfaceDestroyed。

// 从上面代码可以看到,先回调surfaceDestroyed,然后执行mSurface.transferFrom(mNewSurface),这时会将mNativeObject置为0,

// 如果恰好此时调用unlockCanvasAndPost,会抛出异常,并且不能调用unlock,导致下次创建SurfaceView时发生ANR。

产生ANR的原因:简而言之,处于在lockCanvas和unlockCanvasAndPost之间时,SurfaceView销毁了,导致unlock失败,出现了死锁。

总结本次ANR过程:

第一步:执行了mHolder.lockCanvas(),lock成功获得锁

第二步:此时恰巧遇到SurfaceView销毁,surfaceDestroyed执行,并且将mNativeObject置为0

第三步:调用unlockCanvasAndPost,但是由于mNativeObject为0,所以抛出异常,并没有成功unlock

第四步:SurfaceView重新创建,尝试lock,因为上次的锁没有释放,所以进入了无限等待。



解决方法:分为2步

1、在操作画布过程增加同步锁,让整个操作画布过程作为一个整体

synchronized (this) {  
if (mDrawFlag) {  
Canvas canvas = mHolder.lockCanvas();  
if (canvas != null) {  
try {  
for (Heart heart : mHeartArray) {  
canvas.drawBitmap(heart.bitmap, null, heart.dst, mPaint);  
}  
  
}  
} catch (Exception e) {  
e.printStackTrace();  
} finally {  
try {  
mHolder.unlockCanvasAndPost(canvas);  
} catch (Exception e) {  
e.printStackTrace();  
}  
}  
}  
}

2、在SurfaceView销毁回调增加同步锁,可以保证mNativeObject不会在lockCanvas和unlockCanvasAndPost之间置为0

@Override  
    public void surfaceDestroyed(SurfaceHolder holder) {  
        synchronized (this) {  
            mDrawFlag = false;  
        }  
    }

Logo

开放原子开发者工作坊旨在鼓励更多人参与开源活动,与志同道合的开发者们相互交流开发经验、分享开发心得、获取前沿技术趋势。工作坊有多种形式的开发者活动,如meetup、训练营等,主打技术交流,干货满满,真诚地邀请各位开发者共同参与!

更多推荐