Yolov5 Android tf-lite方式集成

上一篇文章中提到的torchscript方式在手机上实际的检测效果差了很多，于是尝试了另外两种方式，第二种方式目前还有问题，所以就先不写了。这篇文章介绍的是第三种方法。zldrobit创建了一个ftlite的分支，https://github.com/zldrobit/yolov5.git。要使用这个方法文章中步骤也写的比较详细了。

1.克隆相关的分支：

git clone https://github.com/zldrobit/yolov5.git
cd yolov5
git checkout tf-android

2.安装所需的环境：

pip install -r requirements.txt
pip install tensorflow==2.4.1

3.转换weight文件：

#Convert weights to TensorFlow SavedModel, GraphDef and fp16 TFLite model, and verify them with
PYTHONPATH=. python models/tf.py --weights weights/yolov5s.pt --cfg models/yolov5s.yaml --img 320
python3 detect.py --weights weights/yolov5s.pb --img 320
python3 detect.py --weights weights/yolov5s_saved_model/ --img 320

#Convert weights to int8 TFLite model, and verify it with (Post-Training Quantization needs train or val images from COCO 2017 dataset)
PYTHONPATH=. python3  models/tf.py --weights weights/yolov5s.pt --cfg models/yolov5s.yaml --img 320 --tfl-int8 --source /data/dataset/coco/coco2017/train2017 --ncalib 100
python3 detect.py --weights weights/yolov5s-int8.tflite --img 320 --tfl-int8

#Convert weights to TensorFlow SavedModel and GraphDef integrated with NMS, and verify them with
PYTHONPATH=. python3  models/tf.py --img 320 --weights weights/yolov5s.pt --cfg models/yolov5s.yaml --tf-nms
python3 detect.py --img 320 --weights weights/yolov5s.pb --no-tf-nms
python3 detect.py --img 320 --weights weights/yolov5s_saved_model --no-tf-nms

我使用的是下面的转换方式：

python models/tf.py --weights best.pt --cfg models/yolov5s.yaml --img 640

注意：最新版的yolov5已经内置了tf.py文件，但是参数有所变化，不能进行文件转换，具体代码我还没对比

输出：

(E:\anaconda_dirs\venvs\yolov5_latest) C:\Users\obaby>cd /d F:\Pycharm_Projects\yolov5_zldrobit

(E:\anaconda_dirs\venvs\yolov5_latest) F:\Pycharm_Projects\yolov5_zldrobit>tf_640_fp16.bat

(E:\anaconda_dirs\venvs\yolov5_latest) F:\Pycharm_Projects\yolov5_zldrobit>python models/tf.py --weights  best.pt --cfg models/yolov5s.yaml --img 640
2021-09-29 20:53:02.769490: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
Namespace(batch_size=1, cfg='models/yolov5s.yaml', dynamic_batch_size=False, img_size=[640, 640], iou_thres=0.5, ncalib=100, score_thres=0.4, source='../data/coco128.yaml', tf_nms=False, tf_raw_resize=False, tfl_int8=False, topk_all=100, topk_per_class=100, weights='best.pt')
E:\anaconda_dirs\venvs\yolov5_latest\lib\site-packages\torch\nn\functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  ..\c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)

Starting TensorFlow saved_model export with TensorFlow 2.4.1...
Overriding models/yolov5s.yaml nc=80 with nc=1
2021-09-29 20:53:22.340330: I tensorflow/compiler/jit/xla_cpu_device.cc:41] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-29 20:53:22.341670: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library nvcuda.dll
2021-09-29 20:53:22.362073: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.815GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-09-29 20:53:22.362352: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-09-29 20:53:22.494935: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-09-29 20:53:22.495010: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-09-29 20:53:22.544367: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-09-29 20:53:22.574051: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-09-29 20:53:22.575034: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2021-09-29 20:53:22.624204: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-09-29 20:53:22.625221: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2021-09-29 20:53:22.625267: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-09-29 20:53:22.626462: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-09-29 20:53:22.627722: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-29 20:53:22.627762: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]
2021-09-29 20:53:22.627960: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
input_1 (InputLayer)            [(1, 640, 640, 3)]   0
__________________________________________________________________________________________________
tf__focus (tf_Focus)            (1, 320, 320, 32)    3584        input_1[0][0]
__________________________________________________________________________________________________
tf__conv_1 (tf_Conv)            (1, 160, 160, 64)    18688       tf__focus[0][0]
__________________________________________________________________________________________________
tf__c3 (tf_C3)                  (1, 160, 160, 64)    19200       tf__conv_1[0][0]
__________________________________________________________________________________________________
tf__conv_7 (tf_Conv)            (1, 80, 80, 128)     74240       tf__c3[0][0]
__________________________________________________________________________________________________
tf__c3_1 (tf_C3)                (1, 80, 80, 128)     158208      tf__conv_7[0][0]
__________________________________________________________________________________________________
tf__conv_17 (tf_Conv)           (1, 40, 40, 256)     295936      tf__c3_1[0][0]
__________________________________________________________________________________________________
tf__c3_2 (tf_C3)                (1, 40, 40, 256)     627712      tf__conv_17[0][0]
__________________________________________________________________________________________________
tf__conv_27 (tf_Conv)           (1, 20, 20, 512)     1181696     tf__c3_2[0][0]
__________________________________________________________________________________________________
tf_spp (tf_SPP)                 (1, 20, 20, 512)     658432      tf__conv_27[0][0]
__________________________________________________________________________________________________
tf__c3_3 (tf_C3)                (1, 20, 20, 512)     1185792     tf_spp[0][0]
__________________________________________________________________________________________________
tf__conv_35 (tf_Conv)           (1, 20, 20, 256)     132096      tf__c3_3[0][0]
__________________________________________________________________________________________________
tf__upsample (tf_Upsample)      (1, 40, 40, 256)     0           tf__conv_35[0][0]
__________________________________________________________________________________________________
tf__concat (tf_Concat)          (1, 40, 40, 512)     0           tf__upsample[0][0]
                                                                 tf__c3_2[0][0]
__________________________________________________________________________________________________
tf__c3_4 (tf_C3)                (1, 40, 40, 256)     363520      tf__concat[0][0]
__________________________________________________________________________________________________
tf__conv_41 (tf_Conv)           (1, 40, 40, 128)     33280       tf__c3_4[0][0]
__________________________________________________________________________________________________
tf__upsample_1 (tf_Upsample)    (1, 80, 80, 128)     0           tf__conv_41[0][0]
__________________________________________________________________________________________________
tf__concat_1 (tf_Concat)        (1, 80, 80, 256)     0           tf__upsample_1[0][0]
                                                                 tf__c3_1[0][0]
__________________________________________________________________________________________________
tf__c3_5 (tf_C3)                (1, 80, 80, 128)     91648       tf__concat_1[0][0]
__________________________________________________________________________________________________
tf__conv_47 (tf_Conv)           (1, 40, 40, 128)     147968      tf__c3_5[0][0]
__________________________________________________________________________________________________
tf__concat_2 (tf_Concat)        (1, 40, 40, 256)     0           tf__conv_47[0][0]
                                                                 tf__conv_41[0][0]
__________________________________________________________________________________________________
tf__c3_6 (tf_C3)                (1, 40, 40, 256)     297984      tf__concat_2[0][0]
__________________________________________________________________________________________________
tf__conv_53 (tf_Conv)           (1, 20, 20, 256)     590848      tf__c3_6[0][0]
__________________________________________________________________________________________________
tf__concat_3 (tf_Concat)        (1, 20, 20, 512)     0           tf__conv_53[0][0]
                                                                 tf__conv_35[0][0]
__________________________________________________________________________________________________
tf__c3_7 (tf_C3)                (1, 20, 20, 512)     1185792     tf__concat_3[0][0]
__________________________________________________________________________________________________
tf__detect (tf_Detect)          ((1, 25200, 6), [(1, 16182       tf__c3_5[0][0]
                                                                 tf__c3_6[0][0]
                                                                 tf__c3_7[0][0]
==================================================================================================
Total params: 7,082,806
Trainable params: 7,063,542
Non-trainable params: 19,264
__________________________________________________________________________________________________
2021-09-29 20:53:30.545120: W tensorflow/python/util/util.cc:348] Sets are not currently considered sequences, but this may change in the future, so consider avoiding using them.
WARNING:absl:Found untraced functions such as tf__conv_layer_call_fn, tf__conv_layer_call_and_return_conditional_losses, tf_bn_1_layer_call_fn, tf_bn_1_layer_call_and_return_conditional_losses, tf__conv_2_layer_call_fn while saving (showing 5 of 845). These functions will not be directly callable after loading.
WARNING:absl:Found untraced functions such as tf__conv_layer_call_fn, tf__conv_layer_call_and_return_conditional_losses, tf_bn_1_layer_call_fn, tf_bn_1_layer_call_and_return_conditional_losses, tf__conv_2_layer_call_fn while saving (showing 5 of 845). These functions will not be directly callable after loading.
TensorFlow saved_model export success, saved as best_saved_model

Starting TensorFlow GraphDef export with TensorFlow 2.4.1...
2021-09-29 20:53:59.141105: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2021-09-29 20:53:59.141295: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2021-09-29 20:53:59.144017: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.815GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-09-29 20:53:59.144075: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-09-29 20:53:59.144245: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-09-29 20:53:59.144426: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-09-29 20:53:59.144600: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-09-29 20:53:59.144768: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-09-29 20:53:59.145910: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2021-09-29 20:53:59.145948: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-09-29 20:53:59.146945: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2021-09-29 20:53:59.146982: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-09-29 20:53:59.229585: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-29 20:53:59.229661: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2021-09-29 20:53:59.230189: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2021-09-29 20:53:59.230371: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-29 20:53:59.253957: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:928] Optimization results for grappler item: graph_to_optimize
  function_optimizer: function_optimizer did nothing. time = 0.179ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.

TensorFlow GraphDef export success, saved as best.pb

Starting TFLite export with TensorFlow 2.4.1...
WARNING:absl:Found untraced functions such as tf__conv_layer_call_fn, tf__conv_layer_call_and_return_conditional_losses, tf_bn_1_layer_call_fn, tf_bn_1_layer_call_and_return_conditional_losses, tf__conv_2_layer_call_fn while saving (showing 5 of 845). These functions will not be directly callable after loading.
WARNING:absl:Found untraced functions such as tf__conv_layer_call_fn, tf__conv_layer_call_and_return_conditional_losses, tf_bn_1_layer_call_fn, tf_bn_1_layer_call_and_return_conditional_losses, tf__conv_2_layer_call_fn while saving (showing 5 of 845). These functions will not be directly callable after loading.
2021-09-29 20:54:34.699132: I tensorflow/core/grappler/devices.cc:69] Number of eligible GPUs (core count >= 8, compute capability >= 0.0): 1
2021-09-29 20:54:34.699328: I tensorflow/core/grappler/clusters/single_machine.cc:356] Starting new session
2021-09-29 20:54:34.700680: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.815GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-09-29 20:54:34.700738: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-09-29 20:54:34.700915: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-09-29 20:54:34.701091: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-09-29 20:54:34.701269: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-09-29 20:54:34.701452: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-09-29 20:54:34.702670: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2021-09-29 20:54:34.702710: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-09-29 20:54:34.703696: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2021-09-29 20:54:34.703745: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-09-29 20:54:34.703958: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-29 20:54:34.704120: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2021-09-29 20:54:34.704284: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2021-09-29 20:54:34.704456: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set
2021-09-29 20:54:34.723002: I tensorflow/core/grappler/optimizers/meta_optimizer.cc:928] Optimization results for grappler item: graph_to_optimize
  function_optimizer: function_optimizer did nothing. time = 0.001ms.
  function_optimizer: function_optimizer did nothing. time = 0ms.

2021-09-29 20:54:36.226574: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:316] Ignored output_format.
2021-09-29 20:54:36.226705: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:319] Ignored drop_control_dependency.
2021-09-29 20:54:36.334634: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1720] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 computeCapability: 8.6
coreClock: 1.815GHz coreCount: 68 deviceMemorySize: 10.00GiB deviceMemoryBandwidth: 707.88GiB/s
2021-09-29 20:54:36.334766: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cudart64_110.dll
2021-09-29 20:54:36.335168: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublas64_11.dll
2021-09-29 20:54:36.335458: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cublasLt64_11.dll
2021-09-29 20:54:36.335723: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cufft64_10.dll
2021-09-29 20:54:36.336020: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library curand64_10.dll
2021-09-29 20:54:36.337043: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cusolver64_10.dll'; dlerror: cusolver64_10.dll not found
2021-09-29 20:54:36.337085: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library cusparse64_11.dll
2021-09-29 20:54:36.337790: W tensorflow/stream_executor/platform/default/dso_loader.cc:60] Could not load dynamic library 'cudnn64_8.dll'; dlerror: cudnn64_8.dll not found
2021-09-29 20:54:36.337822: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1757] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
2021-09-29 20:54:36.337873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1261] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-09-29 20:54:36.337897: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1267]      0
2021-09-29 20:54:36.338060: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1280] 0:   N
2021-09-29 20:54:36.338224: I tensorflow/compiler/jit/xla_gpu_device.cc:99] Not creating XLA devices, tf_xla_enable_xla_devices not set

TFLite export success, saved as best-fp16.tflite

最终权重文件会转换为best-fp16.tflite文件：

4.将tflite文件放到项目的assets目录下：

修改coco.txt为对应的分类：

DetectorFactory.java中对inputSize根据model文件名称进行了动态处理：

if (modelFilename.equals("yolov5s.tflite")) {
            labelFilename = "file:///android_asset/coco.txt";
            isQuantized = false;
            inputSize = 640;
            output_width = new int[]{80, 40, 20};
            masks = new int[][]{{0, 1, 2}, {3, 4, 5}, {6, 7, 8}};
            anchors = new int[]{
                    10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
            };
        }
        else if (modelFilename.equals("yolov5s-fp16.tflite")) {
            labelFilename = "file:///android_asset/coco.txt";
            isQuantized = false;
            inputSize = 640; //初始为320，但是我导出模型的时候使用的640所以修改成了640
            output_width = new int[]{40, 20, 10};
            masks = new int[][]{{0, 1, 2}, {3, 4, 5}, {6, 7, 8}};
            anchors = new int[]{
                    10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
            };
        }
        else if (modelFilename.equals("yolov5s-int8.tflite")) {
            labelFilename = "file:///android_asset/coco.txt";
            isQuantized = true;
            inputSize = 320;
            output_width = new int[]{40, 20, 10};
            masks = new int[][]{{0, 1, 2}, {3, 4, 5}, {6, 7, 8}};
            anchors = new int[]{
                    10,13, 16,30, 33,23, 30,61, 62,45, 59,119, 116,90, 156,198, 373,326
            };
        }

然后就可以编译运行了。