- Author: CHEN Zhe, HU Fuchu, WANG Xianghe, FAN Hongyan, ZHANG Zhili
- Keywords: Ananas comosus; Genome sequencing data; Codon usage bias; GC; ENC; RSCU; Optimal codons;
- DOI: 10.13925/j.cnki.gsxb.20160375
- Received date:
- Accepted date:
- Online date:
- PDF () Abstract()
Abstract: 【Objective】Codon usage bias refers to differences in the frequency of occurrence of synonymous codons in coding DNA. A codon is a series of three nucleotides (a triplet) which encodes a specific amino acid residue in a polypeptide chain or for the termination of translation (stop codons) . After a long evolution, each species forms its own codon usage patterns. Pineapple [Ananas comosus (L.) Merr.] is a nutrientdense fruit with strong consumer demand and high commericial value. However, little is known about the rules of pineapple codon usage. The aim of the present study was to investigate the pattern utilization of codons in genome sequencing data of pineapple in order to provide important guidance for genetic transformation, new gene discovery, functional gene expression regulation, protein structure and function prediction of genes, comparative genomics research with other species and molecular breeding in pineapple.【Methods】Data were obtained by JGI database, we analyzed the 30 663 genes in genome sequencing data of pineapple to study the pattern utilization of codons by perl script, and SPSS bioinformatics softwares, by which CG, Effective number of codon (ENC) , Relative synonymous codon usage (RSCU) and double codon werecaculated. The RSCU value was the relative probability of a codon encoding the same amino acid for a particular codon. In the absence of codon usage preference, the RSCU of each synonymous codon was 1.When the RSCU of a codon was over 1, the codon was defined as a high frequency codon, indicating that the codon had a higher frequency of use in a synonymous codon and that the gene had a preference for the codon. The ENC value described the degree to which codon usage is deviated from random selection. ENC could reflect the degree of preference for synonymous codon usage in the codon family. The smaller the ENC value was, the higher the expression level of the corresponding endogenous gene was. According to the size of the ENC of each gene, the values of RSCU of the genes in high and low expression levels were obtained. If the RSCU difference between the high and low expression genes was over 0.08, then the corresponding codon for the amino acid was determined to be a high-expression superior codon. If the codon was simultaneously determined to be a high frequency codon and a high expression superior codon, the codon was the optimal codon. The pineapple genes were imported into CUSP software for calculation, and then the codon usage frequencies were obtained. The genome data of Carica papaya, Glycine max, Arabidopsis thaliana, Ricinus communis, Prunus mume, Prunus persica, Cucumis sativus, Cajanus cajan, Oryza sativa, Brassica rapa, Carica papaya, Citrus sinensis, Brachypodium distachyo, Populus trichocarpa, Theobroma cacao, Vitis vinifera, Sorghum bicolor and Zea mays were searched through the JGI database. The gene codon usage frequencies of pineapple were compared with those of other species. If the difference of the frequencies between two species were in the range of 0.5-2.0, the codon preference of the two species was relatively close.【Results】The GC content of pineapple genes was 52.09%, the GC content in the third positions was 55.41%, which indicated the GC3Scontent (the GC content of the third nucleotide of synonymous codon) of pineapple genes had no obvious codon usage bias (CUB) . The ENC of whole genes was58.41, the majority of the ENC values were over 35, indicating that the pineapple transcriptome gene CUB was weak. In addition, it was determined that the RSCU of the 34 codons was over 1, they were defined as high frequency codons (CTC, TTG, CTT, AGG, CGC, AGA, CGG, TCC, TCT, AGC, TCG, GTG, GTT, GTC, GGC, GGG, ACC, ACT, CCT, CCG, CCC, ATC, ATT, GCC, GCG, TGC, AAG, GAG, TTC, GAT, TAC, CAG, CAC, AAT) , only 8 of them ended with AT base and 25 of them ended with GC base, which indicated tthat the pineapple gene codons preferred to the end of C or G, at the same time. 31 high-quality expression codons were obtained through analysis, 13 optimal codons were identified on the above basis. They were AGG, AGA, TCT, CTT, TTG, GTT, CCT, ACT, ATT, GAT, AAT, TTT and TAT. In addition, we also analyzed the sequence of codons with 20 amino acid pair codons. We found that the codon usage patterns of the monocotyledons plants gene were greatly different from those of the dicotyledonous plant genes through comparison with other 17 specise, pineapple is closer to dicotyledonous plants.【Conclusion】Eighteen optimal codons were selected through the analysis of codon bias of Ananas comosus, which would provide a basis for gene optimization and prediction of some function unknown genes in pineapple.