English

Journal of Chinese Agricultural Mechanization

Journal of Chinese Agricultural Mechanization ›› 2021, Vol. 42 ›› Issue (8): 196-202.DOI: 10.13733/j.jcam.issn.2095-5553.2021.08.26

Previous Articles     Next Articles

 Analysis of hyperspectral characteristics of fluecured tobacco oil based on improved RF feature selection strategy 

Ye Lei, Wei Kesu, Li Delun, Zhang Fugui, Wu Xuemei.   

  • Online:2021-08-15 Published:2021-08-15

基于改进RF特征选择策略的烤烟油分高光谱特征分析

叶磊;韦克苏;李德仑;张富贵;吴雪梅;   

  1. 贵州大学机械工程学院;贵州省烟草科学研究院;
  • 基金资助:
    贵州省科技计划项目(黔科合平台人才[2019]5616号)
    贵州省普通高等学校工程研究中心建设项目(黔教合KY字[2017]015)
    中国烟草总公司贵州省公司科技项目(中烟黔科2021XM01)

Abstract:  Aiming at the feature selection problem of the fluecured tobacco oil feature prediction model, an improved RF (random forest) algorithm feature selection strategy was proposed. First, the RFScore of each feature was calculated by the RF feature selection algorithm, and the features were added to the feature subset in order according to the size of the RFScore. If the classification accuracy of the classifier was improved, the feature was retained. If the classification accuracy of the classifier was not Increase or decrease, the feature was removed. The results show that when the RF feature selection algorithm features of the hyperspectral was used to screen fluecured tobacco, 176 high spectral characteristics in descending order the Gini coefficient was input in turn to the SVM classifier. The first 64 hyperspectral band features can make the support vector machine classifier perform the best. The dimension of the feature subset was 64, and the classification accuracy was 93.33%. Using the improved RF feature selection strategy of 176 high fluecured tobacco characteristics of spectral band selection, entering only six band hyperspectral characteristics, 371.08 nm, 716.71 nm, 378.31 nm, 487.77 nm, 484.09 nm, and 535.85 nm, will optimize the performance of the support vector machine classifier. The classification accuracy was 95%, and the dimension of the feature subset was 6, suggesting that the improved RF feature selection strategy can reduce the dimensionality of the data and reduce the feature set while ensuring the performance of the classifier. Compared with the full hyperspectral band, the number of features of the improved RF feature selection algorithm was reduced was 170 and the classification accuracy was improved by 3.33%. Compared with the RF feature selection algorithm, the number of features was reduced by 58, and the classification accuracy was increased by 1.67%.

Key words: improved RF algorithm, feature selection, fluecured tobacco, oil characteristics, hyperspectral

摘要: 针对烤烟油分特征预测模型的特征优选问题,提出一种改进RF(随机森林)算法特征选择策略,首先通过RF特征选择算法计算出各个特征的RF-Score,将特征按RF-Score的大小排序依次添加到特征子集中,若分类器分类准确率提高则保留该特征,若分类器分类准确率没有提高或降低则去除该特征。结果表明:利用RF特征选择算法对烤烟高光谱特征进行筛选时,将176个高光谱特征中按基尼系数降序排列依次输入SVM分类器中,前64个高光谱波段特征即可使支持向量机分类器性能最佳,特征子集维度为64,其分类准确率为93.33%。利用改进RF特征选择策略对176个烤烟高光谱波段特征进行筛选,只需输入371.08 nm、716.71 nm、378.31 nm、487.77 nm、484.09 nm、535.85 nm六个波段的高光谱特征即可使支持向量机分类器性能最佳,其分类准确率为95%,特征子集维度为6,说明改进的RF特征选择策略可以在保证分类器性能的前提下能较好地进行数据降维,减小特征集的冗余。改进后的RF特征选择算法与全高光谱波段相比,特征数量减少170个,分类准确率提高3.33%;与RF特征选择算法相比,特征数量减少58个,分类准确率提高1.67%。

关键词: 改进RF算法, 特征选择, 烤烟, 油分特征, 高光谱

CLC Number: