Abstract: 【Objective】In the context of predicting soluble solids content (SSC) for ‘Miliang No. 1’ kiwifruit, SSC is a key quality indicator representing the concentration of soluble sugars, which are important for determining the sweetness and maturity of the fruit. Accurate and timely SSC assessment is crucial for both consumer satisfaction and market pricing. Traditional methods such as refractometry and liquid chromatography, while accurate, are time-consuming, costly, and destructive, making them unsuitable for large-scale or real-time monitoring. To address these challenges, this study aims to develop a non-destructive SSC prediction model using hyperspectral imaging technology, integrating multiple preprocessing methods, feature extraction algorithms, and machine learning models. The goal is to enhance the robustness and generalization of SSC predictions by optimizing the entire prediction process, rather than focusing on individual steps like preprocessing or feature extraction, which has been the primary focus of many previous studies.T【Methods】This study was conducted using 150 ‘Miliang No. 1’ kiwifruit samples, which were randomly divided into a training set of 120 samples and a test set of 30 samples. Hyperspectral images were captured using a Rikola portable hyperspectral imager, covering the spectral range of 500 nm to 900 nm with a wavelength interval of 2 nm, resulting in 194 spectral bands. The imaging was conducted in a controlled dark-box laboratory environment to ensure data consistency and minimize external interference. After the hyperspectral images were captured, SSC measurements were performed using an ATAGO PAL-BX/ACID 8 refractometer. Three SSC measurements were taken for each sample, and the arithmetic mean of the three values was used as the actual SSC value.To improve the quality of the spectral data, various preprocessing methods were applied. Four specific methods were employed to enhance data consistency and eliminate noise: multiplicative scatter correction (MSC), Savitzky-Golay smoothing (SG), SG combined with MSC (SG-MSC), and SG combined with standard normal variate (SG-SNV). The optimal preprocessing method was determined based on the performance of the partial least squares regression (PLSR) model, with MSC identified as the most effective method for reducing noise and correcting baseline drift.On this basis, feature extraction was performed using competitive adaptive reweighted sampling (CARS), successive projections algorithm (SPA), and random frog (RF) to identify key spectral bands most relevant to SSC. These extracted spectral bands were then used as inputs for four machine learning models: partial least squares regression (PLSR), support vector regression (SVR), random forest regression (RFR), and backpropagation neural network (BPNN). The coupling relationships between the spectral data and the actual SSC measurements were evaluated, and their predictive performances were compared. Based on the best-performing model, particle swarm optimization (PSO) was further introduced to fine-tune the model parameters, aiming to enhance both prediction accuracy and generalization ability.【Results】After applying the four preprocessing methods to the spectral data, the MSC method was found to be the most effective at eliminating noise and baseline drift, leading to a significant overlap in the spectral curves. The MSC-CARS-PLSR, MSC-SPA-PLSR, and MSC-RF-PLSR models demonstrated improved performance compared to the full-band PLSR model. Specifically, the value for these models increased by 0.01 to 0.092, while the RMSEC decreased by 0.0383 to 0.1341. The three feature extraction methods were particularly successful in reducing interfering variables and improving the predictive power of the models. It was noted that the majority of the spectral feature bands identified through the feature extraction process were concentrated within the 750 nm to 900 nm range, indicating that this range is the most sensitive interval for predicting the SSC of kiwifruit. Following feature extraction, the performance of the four machine learning models was evaluated, and the MSC-CARS-SVR model was found to exhibit the best predictive performance. After PSO parameter optimization, and the comparison revealed that MSC-CARS-PSO-SVR model had the best prediction effect, with the coefficient of determination=0.949, =0.913, the root mean square error RMSEC=0.3412, and the RMSEP=0.3649. These results indicate that the SVR model, especially when optimized using PSO, was highly effective at handling complex, high-dimensional, and small-sample data, making it particularly well-suited for predicting SSC in kiwifruit and other quality metrics. However, the worst prediction effect was achieved by utilizing the BPNN model, in which the CARS-BPNN test set =0.633, RMSEP=1.2308. It indicates that the characteristics of the dataset used in this experiment may not be applicable to neural network prediction models, as its complexity or size may not be sufficient to effectively avoid model overfitting, which in turn may lead to limited prediction performance and affect the accuracy of the results. 【Conclusion】The results of this study demonstrate that the MSC-CARS-PSO-SVR model is highly effective at predicting the internal quality indicators of kiwifruit, particularly SSC. This model provides a scientific basis for non-destructive quality inspection of agricultural products. By combining data preprocessing, feature extraction, and machine learning techniques with hyperspectral imaging, the study presents a rapid, non-destructive method for SSC detection in kiwifruit. The findings offer valuable technical support for intelligent fruit quality monitoring, grading, and sorting systems, and have the potential to be applied across a wide range of fruit and agricultural products in related industries.
PDF ()