Contact Us

Tel:0371-63387308
      0371-65330928
E-mail:guoshuxuebao@caas.cn

Home-Journal Online-2023 No.5

Population and genetic analysis of Phyllanthus emblica by SNP and InDel markers

Online:2023/7/11 8:47:31 Browsing times:
Author: PU Tianlei, JIN Jie, HE Lu, QU Wenlin, LIAO Chengfei, YUAN Jianmin, LUO Huiying, ZHAO Qiongling
Keywords: Phyllanthus emblica; SNP; InDel; Population structure; Genetic diversity
DOI: 10.13925/j.cnki.gsxb.20220474
Received date:
Accepted date:
Online date:
PDF Abstract

Abstract:【Objective】Based on SNP and InDel molecular markers, the high-throughput sequencing technology- ddRADseq was used to analyze the genetic background of 112 wild Phyllanthus emblica germplasms collected from different origins. The population genetic structure and genetic diversity of P. emblica germplasm resources were analyzed in order to provide a theoretical basis for the systematic classification and innovative utilization of genetic resources of P. emblica.【Methods】The leaves of 112 P. emblica germplasms from different origins were collected and preserved for future use. Among them, 3 accessions were introduced from Fujian, 9 accessions from Guangxi, 11 accessions from Yunnan. The genomic DNA of P. emblica leaves was extracted by the improved CTAB method. The purity and concentration of the genomic DNA were tested. The ddRADseq technology was used to perform high-throughput simplified genome sequencing on 112 P. emblica germplasm resources. The original data were filtered by Cutadapt software and Trimmomatic software to obtain high-quality sequencing data. The MUNEAK software was used to develop polymorphic markers. Based on the obtained SNP and InDel markers, the STRUCTURE software was used to analyze the population structure and calculate the value of ΔK. The most reasonable number of group number and the attribution of each sample were determined. The principal component analysis of each sample was carried out by Plink software. The scatter diagram of principal component analysis was drawn by R software. The MEGA 7 software was used for phylogenetic analysis to calculate the genetic distance of the tested materials, and then iTOL was used to draw the phylogenetic tree diagram. The PowerMarker software was used to evaluate the genetic diversity of the P. emblica population. The expected heterozygosity (He), observed heterozygosity (Ho), polymorphism information content (PIC) and population differentiation index (Fst) of the population were calculated.ResultsThe samples of P. emblica were constructed and sequenced, a total of 64.70 Gb of data were obtained, and a total of 233 960 457 reads were obtained, with an average Q30 of 97.96% and an average GC of 37.30%. The quality of the sequencing data was in line with expectations and could be used for subsequent analysis. After quality control of the sequencing data, the total of 8934 SNPs and InDels markers were obtained from the sequenced samples of P. emblica. The P. emblica germplasms were divided into two groups by the population structure analysis It indicated that the 112 P. emblica germplasms from different regions might come from 2 ancestors. The Group was shown in red with a total of 88 samples, these germplasms came from Yunnan area. The Group was shown in blue with 23 sample introduced from Fujian, Guangxi and Guangdong regions. Some germplasms had mixed genetic backgrounds, it indicated that gene exchange occured between the germplasms in the process of germplasm selection and breed. The classification of groups was closely related to the geographical origins of P. emblica. The results were consistent with the principal component analysis and phylogenetic analysis. The three population structure analysis methods used in this paper complemented with each other, indicating that the division of the population structure of P. emblica was reliable. The genetic distances between the various P. emblica germplasms ranged from 0.027 to 0.459. The average genetic distance was 0.248. The genetic diversity analysis of the P. emblica population showed that the He value of the population ranged from 0.094 to 0.267, with an average value of 0.17. The Ho value of the P. emblica population ranged from 0.071 to 0.184, with an average value of 0.113. The PIC value of the P. emblica population was between 0.095 and 0.218, with an average value of 0.143. The expected heterozygosity, observed heterozygosity and polymorphism information content of the P. emblica germplasm in Yunnan area were the highest than those in Fujian, Guangxi and Guangdong, which were 0.267, 0.184 and 0.218 respectively. The germplasms in Yunnan region showed the most genetic variation. Fst among the populations of P. emblica germplasm ranged from 0.080 to 0.266. The degree of genetic differentiation of the P. emblica populations was medium- high level. Among them, the Fst between the GJ germplasms and the FJ germplasm was the largest, which was 0.266. It indicated that the genetic differentiation between them was very strong and the genetic relationship was far away.ConclusionIn this paper, the SNP and InDel molecular markers obtained by ddRADseq sequencing technology could effectively analyze the population structure and genetic diversity of the 112 wild P. emblica germplasms introduced from Yunnan, Fujian, Guangxi and Guangdong regions. It would provide data support for the identification and evaluation, systematic classification and genetic diversity research of the P. emblica germplasm resources. At the same time, it would provide a reference basis for the follow-up excellent gene mining, the introduction and innovative utilization of the excellent germplasms.ObjectiveBased on SNP and InDel molecular markers, the high-throughput sequencing technology- ddRADseq was used to analyze the genetic background of 112 wild Phyllanthus emblica germplasms collected from different origins. The population genetic structure and genetic diversity of P. emblica germplasm resources were analyzed in order to provide a theoretical basis for the systematic classification and innovative utilization of genetic resources of P. emblica.MethodsThe leaves of 112 P. emblica germplasms from different origins were collected and preserved for future use. Among them, 3 accessions were introduced from Fujian, 9 accessions from Guangxi, 11 accessions from Yunnan. The genomic DNA of P. emblica leaves was extracted by the improved CTAB method. The purity and concentration of the genomic DNA were tested. The ddRADseq technology was used to perform high-throughput simplified genome sequencing on 112 P. emblica germplasm resources. The original data were filtered by Cutadapt software and Trimmomatic software to obtain high-quality sequencing data. The MUNEAK software was used to develop polymorphic markers. Based on the obtained SNP and InDel markers, the STRUCTURE software was used to analyze the population structure and calculate the value of ΔK. The most reasonable number of group number and the attribution of each sample were