- Author: SI Naijin, CAI Jianrong, WANG Chen, ZHU Wenhui, PAN Bingke, SHI Zhiguo
- Keywords: Citrus; Fruits expanding- period; Rotational imaging; Fruit yield estimation; YOLOv8nSC; DeepSORT
- DOI: 10.13925/j.cnki.gsxb.20250487
- Received date:
- Accepted date:
- Online date:
PDF () Abstract()
【Objective】Accurate citrus yield estimation is a critical component of precision agricultural management, essential for optimizing production and ensuring economic benefits for growers. Traditional manual sampling methods are labor-intensive, time-consuming, and prone to subjective errors, failing to meet the demands of modern horticulture. While deep learning- based object detection has shown promise in fruit recognition from static images, its application in real- world orchard environments is hindered by challenges such as fruit occlusion, varying illumination, and perspective limitations, which lead to counting inaccuracies. Most existing video-based methods rely on linear, inter-row scanning, a technique suitable for trellis-trained orchards but ineffective for citrus trees, which typically grow in a standalone, three- dimensional crown shape. To address these limitations, this study aimed to develop and validate a robust and automated yield estimation system for individual citrus trees during the fruit expansion period. The primary objective was to establish an end-to-end pipeline that combines a novel data acquisition strategy—360° rotational video capture—with an improved object detection model (YOLOv8n-SC) and a multi-object tracking algorithm (DeepSORT). The research scope included a systematic investigation into the influence of different environmental factors, specifically lighting conditions (sunny, overcast, and artificial light) and camera perspectives (high-angle, straight-angle, and low-angle), on the overall accuracy and stability of the yield estimation system.【Methods】The methodology encompassed data acquisition, model development, and a multi- stage experimental evaluation. For data acquisition, a handheld imaging system was used to capture 360° rotational video streams around 20 individual citrus trees in a high-density orchard under three distinct lighting conditions and three different vertical camera angles, resulting in 180 video clips. A comprehensive dataset of 9000 images was created by extracting frames, which were then meticulously annotated and partitioned into training (80%) and testing (20%) sets. For model development, the baseline YOLOv8n was enhanced to create the YOLOv8n-SC model by implementing three key improvements: integrating a fourth, high- resolution (160 × 160) detection head to improve sensitivity to small targets; embedding the Convolutional Block Attention Module (CBAM) within the backbone network to enhance feature distinction between fruits and leaves; and replacing the original CIoU loss function with the Wise-IoU (WIoU) v3 loss function to accelerate convergence and improve bounding box precision. For tracking and counting, the DeepSORT algorithm was employed to process detection outputs from the video stream. Its workflow involves a Kalman filter for motion prediction, a cascaded matching strategy combining motion and appearance features for reliable identity association, and a track management system to assign unique IDs. The final fruit count was determined by the maximum unique ID assigned during the video sequence. 【Results】The proposed YOLOv8n-SC model significantly outperformed other contemporary models. On the comprehensive dataset, it achieved a recall of 97.6% and a mean Average Precision at an IoU threshold of 0.5 (mAP50) of 93.7%, marking improvements of 2.0 and 3.5 percentage points over the baseline YOLOv8n, respectively. It also showed clear advantages over Faster R- CNN and the lightweight YOLOv10n. A detailed ablation study confirmed that each of the three modifications—the additional detection head, the CBAM module, and the WIoU loss function—contributed positively to the model's overall performance. Lighting proved to be a critical factor, as the artificial light source yielded the best detection results with an mAP50 of 98.2%, markedly superior to overcast (96.1% ) and sunny (83.5%) conditions. The camera angle also had a significant impact, with the straight-angle perspective providing the most balanced and comprehensive view, achieving the highest Multi-Object Tracking Accuracy (MOTA) of 90.8% and a high Multi-Object Tracking Precision (MOTP) of 94.0%. When comparing tracking algorithms, the combination of YOLOv8n-SC with DeepSORT provided the most accurate counts. For the optimal configuration (straight- angle video under artificial light), the system achieved an Average Counting Precision (ACP) of 89.46%, with a Root Mean Square Error (RMSE) of 10.83 and a Mean Absolute Error (MAE) of 9.98. This level of accuracy was superior to that achieved when pairing YOLOv8n- SC with either the SORT algorithm (ACP of 84.81% ) or the Tracktor algorithm (ACP of 86.37%).【Conclusion】This study successfully demonstrates the viability of an automated citrus yield estimation system based on rotational video streams and deep learning. The specifically developed YOLOv8n- SC object detector proves highly effective for identifying green citrus fruits in complex orchard environments. The complete pipeline, integrating this detector with the DeepSORT tracking algorithm, provides reliable fruit counts for individual trees. Key findings indicate that data acquisition conditions are paramount; a stable, artificial light source is optimal for detection, and a straight-angle camera trajectory is superior for comprehensive tracking. The optimized system achieved a final counting accuracy of 89.46%, providing a strong technical foundation and a practical reference for the implementation of precision yield forecasting in modern citrus cultivation. This research makes two primary contributions to the field. First, it introduces a novel data acquisition paradigm—360° rotational video capture—tailored specifically for fruit trees with complex 3D crown structures, offering a more comprehensive alternative to conventional linear scanning methods. Second, it presents a purposebuilt detector, YOLOv8n-SC, whose targeted enhancements are systematically validated to address the specific challenges of detecting small, low-contrast fruits in visually cluttered agricultural settings.