Glycine max

Overview
GenusGlycine
Speciesmax
Common NameSoybean
AbbreviationG.max

The soybean (US) or soya bean (UK) (Glycine max) is a species of legume native to East Asia, widely grown for its edible bean which has numerous uses. The plant is classed as an oilseed rather than a pulse by the UN Food and Agricultural Organization (FAO).

Fat-free (defatted) soybean meal is a significant and cheap source of protein for animal feeds and many prepackaged meals; soy vegetable oil is another product of processing the soybean crop. For example, soybean products such as textured vegetable protein (TVP) are ingredients in many meat and dairy analogues. Soybeans produce significantly more protein per acre than most other uses of land.

Traditional non-fermented food uses of soybeans include soy milk, and from the latter tofu and tofu skin. Fermented foods include soy sauce, fermented bean paste, natto, and tempeh, among others. The oil is used in many industrial applications. The main producers of soy are the United States (35%), Brazil (27%), Argentina (19%), China (6%) and India (4%). The beans contain significant amounts of phytic acid, alpha-linolenic acid, and isoflavones.

Sequence & Variant Data
The following sequence and variant data are currently present:
Feature TypeCount
contig137,174
contig137,174
Projects
2011
Preparation of EST data: Sequences were extracted from dbEST and were subjected to quality control screening (vector, E. coli, polyA, T, or CT removal, minimum length = 100 bp, < 3% N). Preparation of transcript (ET) database: All sequences from the appropriate divisions of GenBank (including RefSeq) were extracted. Non-coding sequences were discarded and cDNAs and coding sequences from genomic entries were saved. Sequences and related information (e.g. PubMed links) are stored in the qcGene database (qcGene). Assembly: Cleaned EST sequences and non-redundant transcript (ET) sequences were combined. Using the Paracel Transcript Assembler Program, sequences were assembled into contigs. TCs are consensus sequences based on two or more ESTs (and possibly an ET) that overlap for at least 40 bases with at least 94% sequence identity. These strict criteria help minimize the creation of chimeric contigs. These contigs are assigned a TC (Tentative Consensus) number. TCs may comprise ESTs derived from different tissues. The best hits for TC's were assigned by searching the TC set against a non-redundant amino acid database(nraa) using BLAT. The top five hits based on score were selected and displayed for each TC. Caveats: TCs are only as good as the ESTs underlying them; there may be unspliced or chimeric ESTs and thus TCs. There is still redundancy in the TC set because sequences must match end to end and at a certain percent identity to be combined. Directionality of the TCs should not be assumed. Not all TCs contain protein-coding regions.
Varieties