Prediction of novel miRNAs and associated target genes in Glycine max

Trupti Joshi1, Zhe Yan2, Marc Libault2, Dong-Hoon Jeong3, Sunhee Park3, Pamela J. Green3, D Janine Sherrier3, Andrew Farmer4, Greg May4, Blake C. Meyers3, Dong Xu1, Gary Stacey2

1Digital Biology Laboratory, Computer Science Department and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA.

2Division of Plant Sciences, National Center for Soybean Biotechnology, C.S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA.

3Department of Plant and Soil Sciences and Delaware Biotechnology Institute, University of Delaware, Newark, DE 19711, USA.

4National Center for Genome Resources, 2935 Rodeo Park Drive East, Santa Fe, NM 87505, USA.



Small non-coding RNAs (21 to 24 nucleotides) regulate a number of developmental processes in plants and animals by silencing genes using multiple mechanisms. Among these, the most conserved classes are microRNAs (miRNAs) and small interfering RNAs (siRNAs), both of which are produced by RNase III-like enzymes called Dicers. Many plant miRNAs play critical roles in nutrient homeostasis, developmental processes, abiotic stress and pathogen responses. Currently, only 70 miRNA have been identified in soybean.


We utilized Illumina's SBS sequencing technology to generate high-quality small RNA (sRNA) data from four soybean (Glycine max) tissues, including root, seed, flower, and nodules, to expand the collection of currently known soybean miRNAs. We developed a bioinformatics pipeline using in-house scripts and publicly available structure prediction tools to differentiate the authentic mature miRNA sequences from other sRNAs and short RNA fragments represented in the public sequencing data.


The combined sequencing and bioinformatics analyses identified 129 miRNAs based on hairpin secondary structure features in the predicted precursors. Out of these, 42 miRNAs matched known miRNAs in soybean or other species, while 87 novel miRNAs were identified. We also predicted the putative target genes of all identified miRNAs with computational methods and verified the predicted cleavage sites in vivo for a subset of these targets using the 5' RACE method. Finally, we also studied the relationship between the abundance of miRNA and that of the respective target genes by comparison to Solexa cDNA sequencing data.


Our study significantly increased the number of miRNAs known to be expressed in soybean. The bioinformatics analysis provided insight on regulation patterns between the miRNAs and their predicted target genes expression. We also deposited the data in a soybean genome browser based on the UCSC Genome Browser architecture. Using the browser, we annotated the soybean data with miRNA sequences from four tissues and cDNA sequencing data. Overlaying these two datasets in the browser allows researchers to analyze the miRNA expression levels relative to that of the associated target genes.

Soybean Genome Browser

The browser can be accessed using the "Browser" link under "Updated Sessions".


TJ and DX were supported by United Soybean Board. ZY, ML and GS were supported by a grant from the National Science Foundation, Plant Genome Research Program, #DBI-0421620). Work on legume small RNAs in the labs of PJG, DJS and BCM was supported by USDA award 2006-03567. We would also like to thank Robin Kramer for setting up and maintaining the genome browser locally, where the data of this study were deposited.