Contact Us

Tel:0371-63387308
      0371-65330928
E-mail:guoshuxuebao@caas.cn

Home-Journal Online-2023 No.5

Image recognition of leaf pests and diseases based on improved ShuffleNet V2 in litchi

Online:2023/7/11 9:07:27 Browsing times:
Author: XIE Jiaxing , CHEN Binhan , PENG Jiajun , HE Peihua , JING Tingwei , SUN Daozong , GAO Peng , WANG Weixing , ZHENG Daide , LI Jun
Keywords: Litchi; Leaf spot symptom; Image recognition; ShuffleNet V2 model; Model parameter
DOI: 10.13925/j.cnki.gsxb.20220597
Received date:
Accepted date:
Online date:
PDF Abstract

Abstract:【ObjectivesLitchi suffers from many kinds of pests and diseases. Therefore, it is necessary to invest enough energy and funds to control them to ensure the normal growth of litchi. At present, the leaf pest and disease identification of litchi is a problem urgently to be solved. In order to explore ways to identify litchi leaf pests and diseases in a timely and accurate manner, the experiment was undertaken, so as to take preventive and control measures in a timely manner. In this study, common leaf pest and disease images were taken as the research objects, and the ShuffleNet V2 model was improved accordingly for the difficulty to accurately identify if the types of pests and diseases on litchi leaves had the characteristics of large distribution area and different size of lesions.MethodsFirst, five types oflitchi leaf pests and diseases (Algal leaf spot, Aceria litchi, Sooty mold, Anthraconse and Dasineura sp.) were collected as the data set for the model test. In order to improve the robustness of the model, the original data were augmented by methods such as flipping, cropping, adding noise and changing contrast to obtain a more abundant data set. Second, hybrid dilated convolution was used in the network feature extraction module to obtain a larger receptive field to avoid the loss of information during the downsampling process, and to eliminate the local information loss caused by the use of ordinary dilated convolution stacking. The bigger the receptive field, the larger the range of the corresponding original image, which means that it contains more global and higher semantic level features. In terms of image classification, the region of interest is often distributed in multiple areas of the image, and more global information and higher-level feature information are needed to better identify the target. Third, the attention mechanism can better aggregate the feature information of the target to be recognized by the network model and reduce the influence of irrelevant background. By embedding a lightweight channel attention mechanism ECA (Efficient Channel Attention) in the model, the interdependence between feature maps is improved. In addition, by using hybrid atrous convolution and embedding ECA attention mechanism to effectively improve the information extraction, the image data do not need deeper structure and more channels to obtain the final classification structure. Therefore, in order to achieve a lightweight model purpose, it is required to delete the unnecessary number of layers and channels in the model, adjust the number of modules in stage2, stage3, stage4 to 3, 6, 3, and change the number of channels to 16, 24, 48, 96, 192, reducing the model parameters and calculations.ResultsThe improved model in this study achieved a recognition accuracy of 99.04% in terms to five types of litchi pests and disease images (Algal leaf spot, Aceria litchi, Sooty mold, Anthraconse and Dasineura sp.), which was 2.55% higher than the original network ShuffleNet V2. The traditional convolution reduces the size of the feature map in downsampling, which leads to the loss of information along with the decrease in resolution; when the feature map is the same, the dilated convolution can obtain a larger receptive field, thereby obtaining denser feature information. It has a better recognition rate for small targets, but the ordinary dilated convolution has a grid effect, which causes part of the information to be lost when the same dilated rate is superimposed; this study uses a hybrid dilated convolution, which avoids the use of traditional convolution or ordinary dilated convolution. Part of the information is lost during downsampling by convolution, and the accuracy is higher than 95.53% by using ordinary dilated convolution and 96.61% by using the original network; Compared with the SE (Squeeze-and-Excitation Attention Module) attention module and the CBAM attention module (Convolutional Block Attention Module), the ECA attention module adopts a local cross-channel interaction strategy without dimensionality reduction and adaptive selection of the method of 2D convolution kernel size, and the method of summarizing cross-channel information through 1D convolution layer to obtain more accurate attention information has better accuracy and lower parameter amount. Compared with the classic networks AlexNet, ResNet-18, DenseNet and MobileNet V2, the improved model has great advantages in accuracy, parameter amount and computation amount, and can maintain a low parameter amount and computation amount, which has the highest accuracy. The number of parameters of the improved model is only 0.059´10 , which is 4.92% of the original model. The amount of floating point operations is only 0.183´ 10 .ConclusionThe results of this study show that atrous convolution with atrous rate intersected by [1, 2, 3] in feature extraction maintains high resolution and prevents information loss during downsampling, while embedding ECA attention module in the model improves key information. Therefore, more useful information can be obtained, and a better classification effect can be obtained; and the model, thenumber of network layers and the number of channels can be further deleted on the premise of achieving the recognition effect, which better balances the amount of parameters of network. The three indicators of accuracy and calculation amount greatly reduce the amount of model parameters and calculation amount of the model, and at the same time improve the performance of the model to a higher level, which is conducive to deployment in embedded resource-constrained devices like mobile terminals. It helps to realize real-time and accurate identification of crop pests and diseases.