GPG2 Crop Registries

From ICISWiki

Jump to: navigation, search

Contents

Assigning values to ICIS records for GPG2 crop registries

Dr. Ruaraidh Hamilton, IRRI

This document provides some guidance on creating correct data values for the registry, extracted and edited from http://cropwiki.irri.org/icis/index.php/TDM_Genealogy_Management_System_5.4

The crop registry task in ICIS-speak

The task of cross-referencing accessions in the crop registries is essentially the task of identifying "derivative neighbourhoods" ( http://cropwiki.irri.org/icis/index.php/GRIMS_glossary#Derivative_neighbourhood). A derivative neighbourhood constitutes all samples of germplasm derived from one original sample by methods that seek either to maintain its genetic integrity intact (maintenance methods) or to select a subset of genotypes from a variable original (derivative methods), i.e. all the normal activities in collecting, maintaining and distributing accessions.

ICIS provides for tracking such neighbourhoods using two pointers for each GID or germplasm sample:

  • GPID2 points back to the immediate predecessor of the GID. In the case of the GID of an accession held by a genebank, its GPID2 points to the GID of the sample held by the donor. All MCPD data describing the donor logically are connected to the GID pointed to by the GPID2.
  • GPID1 points back to the original sample at the root of the derivative neighbourhood.
    • In the case of the GID of an accession originally collected from a farmer’s field or market place, its GPID1 points to the GID of the original collected sample. All MCPD data describing the collecting location logically are connected to the GID pointed to by GPID1:
    • In the case of the GID of an accession derived from a bred line, its GPID1 points to the GID of the cross from which it was selected. All MCPD data describing the breeding history logically are connected to the GID pointed to by GPID1.

Thus the crop registry task essentially involves ensuring that

  • ICIS contains GIDs representing all relevant samples – representing not only the accessions themselves but also relevant samples in the history of the accessions.
  • The GIDs are correctly linked through GPID1 and GPID2 to document the history of donation and selection
  • All other passport data are correctly associated with the appropriate GIDs.

New records or old?

IRIS already contains data on many of the accessions being collated for the GPID2 rice registry. The registry should use (and correct) the existing GIDs where possible, and will need to create new GIDs for samples that are not already in IRIS.

A genebank accession correctly entered into IRIS is recognized by having

  • one of its NAMES records with
    • NTYPE=1 (accession) and
    • NSTAT=8 (preferred ID)
  • the corresponding GERMPLSM record with
    • GLOCN=the location of the genebank (9016=IRRI GRC; 9001 =WARDA; 9003=CIAT; 9017=IRRI PBGB/INGER; 22001=National Small Grains Collection; ?= Dale Bumpers National Rice Research Center, hosting the Genetic Stocks Oryza (GSOR) collection (no location yet defined for them – need to define a new location) and
    • GRPLCE=0 (not a deleted or replaced GID)

All IRGC accessions are in IRIS with data as described above. Their GIDs are documented in IRGCIS or may be retrieved directly from IRIS by:

select GID
from [names] inner join germplsm on names.gid = germplsm.gid
where ntype=1 and nstat=8 and glocn=9016 and grplce=0;

In the case of CIAT, a special name type (NTYPE=1019) has been created to indicate a CIAT genebank accession. This is not good practice, and we will change CIAT data to conform to other genebank accessions, but in the meantime CIAT GIDs can be retrieved from IRIS by

select GID
from [names] inner join germplsm on names.gid = germplsm.gid
where ntype=1016 and grplce=0;

In many other cases uniform standards have not been rigorously followed for creating GIDs for accessions. NTYPE=1 has sometimes been used for the donor’s accession ID for derived samples held in other genebanks; NSTAT=8 has sometimes not been used to indicate the accession ID is an ID; GLOCN of the GID is commonly set to the location of the GID from which the accession originated, not the location of the GID. In these cases their GIDs may be better recognized by their accession ID prefix e.g.

select GID
from [names] inner join germplsm on names.gid = germplsm.gid
where ntype=1 and nval like "pi*" and grplce=0;

although even in this case the identification will not be perfect because of the incorrect use of NTYPE. A case-by-case manual inspection is often needed to decide whether or not the GID represents an accession.

Note: more generally, ICIS contains no means of identifying what a GID represents. A suggestion has been made to specify this using a new attribute, but we haven’t started that process yet. In the meantime, a manual inspection of data is needed for every GID to decide what it represents.

GERMPLASM TABLE (GERMPLSM)

Main details of germplasm genesis and origin

Field Description Type Size (bytes)
GID Germplasm IDentifier
Unique germplasm identifier. New GIDs created in your own local must be negative numbers, unique within your local. Links to GID in NAMES and ATRIBUTS. Positive GIDs will be created automatically in the central database when your local is uploaded to central.
There must be a separate GID representing each identifiable sample of an accession and its ancestry. A separate GID is needed for each cross, each new selection, and each sample under different management. Typically the minimum for cases with complete information is:
* one GID to represent the accession itself
* one GID to represent the sample held by the donor (= the GPID2 of the accession’s GID)
* one GID to represent the original material from which the accession was derived – either a collected sample or a cross (= the GPID1 of the accession’s GID)
Number (Long) 4
GNPGS Number of Parental GIDs
=-1 for all accessions unless there is no info at all on their origin
=0 for GIDs with unknown origin/parents. This typically includes (a) GIDs of original collected samples and (b) accessions or other samples with zero passport data, not even country of origin
>1 for the original cross from which bred accessions were derived
Number (Integer) 2
METHN Germplasm Creation Method number
Number that identifies the method of genesis for the germplasm. Details of the method are in the METHODS table. Links to MID in METHODS.
Distinguish three basic groups of methods:
* Generative (MTYPE="GEN") methods seek to generate greater genetic diversity in the offspring, typically by crossing two or more parents
* Derivative (MTYPE="DIR") methods seek to reduce genetic diversity in the offspring, typically by selecting specific genetic subsets
* Maintenance (MTYPE="MAN") methods seek to maintain the same genetic composition in the offspring as in the parents
Number (Integer). 2
GPID1 Parental GID 1
If GNPGS=-1 then GPID1 equals the GID of most recent cross or collected sample in the ancestry of the GID. This ID defines a group of germplasm in which all members are derived from the same generic parent. GPID1=0 if unknown.
Number (Long) 4
GPID2 Parental GID 2
If GNPGS=-1 then GPID2 equals the GID of the immediate source from which the current germplasm is derived. GPID2=0 if unknown.
Number (Long) 4
MGID Management GID
=0 for GIDs in crop registries.
> 0 if the current germplasm is a regenerated sample of an accession in a genebank. In this case MGID = GID of the original accession.
Number (Long) 4
GERMUID User ID
Your user ID
Number (Integer) 2
LGID Local GID
LGID contains the original, negative GID created in the Local GMS (needed to keep track of the original data because, when the data in local are uploaded to central, negative GIDs are replaced with positive GIDs).
Number (Long)  
GLOCN Germplasm Location Number
Identifier of the location where the germplasm was created as a distinct unit of management with a new GID. Links to LOCN in LOCATION table. GLOCN = MISSING or 0 if location is unknown. For crop registries this typically means:
• For samples of germplasm newly collected from a farm store, market place etc, this is the location from which the sample was collected.
• For accessions and other samples (crosses, breeding lines) produced or held ex situ, this is the location of the genebank, breeder or other organization holding the material.
Number (Long) 4
GDATE Germplasm Creation Date
Date on which the germplasm was created as a distinct unit of management with a new GID. (YYYYMMDD). GDATE = 0 if unknown. For crop registries, typically:
• For samples of germplasm newly collected from a farm store, market place etc, this is the date of collection.
• For accessions and other samples transferred into an ex situ collection, this is the date the receiving organization acquired the germplasm and so started managing the germplasm.
• For crosses, breeding lines and other samples created ex situ, this is the date the seed were harvested.
Number (Long) 4
GREF Germplasm Reference
A number that identifies a bibliographic reference from where the germplasm data was retrieved. GREF is missing if the source is unpublished or unknown
Number (Long) 4
GRPLCE Germplasm replacement
Records deletion or replacement for the current GERMPLASM record. 0 for unchanged, own GID for deleted, and replacement GID for replaced
Number (Integer) 4

NAME DATA

Germplasm collects a multitude of labels during the development and release process. These are all tracked through the NAMES table of GMS, i.e. one record in GERMPLSM may have any number (≥1) of corresponding names in the NAMES table.

The relationship between germplasm samples and names is actually many:many - each sample may have many names, and each name may be used for many samples. However, in ICIS the relationship is handled as 1:many, allowing duplicate name records. Each name record applies to only one GID. If several GIDs share the same name, a duplicate name record is created for each GID. It is up to the user to ensure that all relevant fields of each duplicate name record are correctly duplicated.

Names are classified by name types (NTYPE) which are defined in the UDFLDS table and by name states (NSTAT) which are not defined within IRIS. Some rules for name states:

  • Each GID must have one preferred name (NSTAT=1). This is typically a variety name or other common name, and can be shared with many GIDs.
  • Each GID may also have one preferred ID (NSTAT=8). This is a name that uniquely identifies the GID, distinguishing it from all other GIDs even if they share the same preferred name.

The Names Table (NAMES)

Stores all germplasm names, abbreviations and naming details.

Name Description Type Size (bytes)
NID Name ID
Unique internal identifier for the name. Create negative NIDs in your local; these will be replaced with automatically-assigned positive NIDs on being uploaded to central.
Number (Long) 4
GID Germplasm ID
Unique internal ID of the germplasm given this name. GID in GERMPLSM table.
Number (Long) 4
NTYPE Name Type
Number that identifies the type of name or abbreviation. FLDNO in UDFLDS table
Number (Integer) 2
NSTAT Name State
Number indicating the status of the name:
1=Preferred name;
8=Preferred ID;
2=Preferred abbreviation
3=Chinese-GBK (GD) DBCS names,
4=Chinese Big 5,
5=Japanese
6=Korean
10=other UNICODE names
9=the name is marked as deleted
Number (Integer) 2
NUID Name User ID
ID of the IRIS user naming the germplasm (i.e. not the person who originally named the germplasm). UID in the USERS table
Number (Integer) 2
NVAL Name Value
The name assigned to the germplasm.
Text 255
NLOCN Name location
The location where the name was first assigned to the maintenance neighbourhood of which this GID is a member. LOCN in LOCATION table. NLOCN = 0 if the location is unknown.
• If the name was newly created for this GID, then NLOCN=GLOCN
• If the name was inherited from the source GID (implying a maintenance method of germplasm creation), then NLOCN is inherited from the same name of the source GID. For example, if a GRC accession was donated to IRRI by the USDA, the PI accession number attached to the IRRI accession has USDA as its location, not GRC.
Number (Long) 4
NDATE Name date
Date on which the name was first assigned to the maintenance neighbourhood of which this GID is a member. (YYYYMMDD). NDATE = 0 if the date is unknown
• If the name was newly created for this GID, then typically NDATE=GDATE (or it may be a little different; for example, GRC accessions are assigned their IRGC numbers only after passing a series of tests, so the IRGC number has a NDATE that is later than the GDATE (the date of acquisition) of the sample.
• If the name was inherited from the source GID (implying a management method of germplasm creation), then NDATE is inherited from the same name of the source GID. For example, if a GRC accession was donated to IRRI by the USDA, the PI accession number attached to the IRRI accession has the date USDA originally assigned the PI number, not the date GRC attached that PI number to its accession.
Number (Long) 4
NREF Name reference
A number that identifies a bibliographic reference for the name. NREF is missing if the source is unpublished or unknown
Number (Long) 4

Methods

Name Types