Title: | Genomic Prediction of Hybrid Performance |
---|---|
Description: | Performs genomic prediction of hybrid performance using eight GS methods including GBLUP, BayesB, RKHS, PLS, LASSO, Elastic net, LightGBM and XGBoost. It also provides fast cross-validation and mating design scheme for training population (Xu S et al (2016) <doi:10.1111/tpj.13242>; Xu S (2017) <doi:10.1534/g3.116.038059>). |
Authors: | Yang Xu, Guangning Yu, Yanru Cui, Shizhong Xu, Chenwu Xu |
Maintainer: | Yang Xu <[email protected]> |
License: | GPL-3 |
Version: | 2.1.1 |
Built: | 2025-02-18 03:55:36 UTC |
Source: | https://github.com/cran/predhy |
Convert genotypes in HapMap format or in numeric format for hypred package.
convertgen( input_geno, type = c("hmp1", "hmp2", "num"), missingrate = 0.2, maf = 0.05, impute = TRUE )
convertgen( input_geno, type = c("hmp1", "hmp2", "num"), missingrate = 0.2, maf = 0.05, impute = TRUE )
input_geno |
genotype in HapMap format or in numeric format. The names of individuals should be provided. Missing (NA) values are allowed. |
type |
the type of genotype. There are three options: "hmp1" for genotypes in HapMap format with single bit, "hmp2" for genotypes in HapMap format with double bit, and "num" for genotypes in numeric format. |
missingrate |
max missing percentage for each SNP, default is 0.2. |
maf |
minor allele frequency for each SNP, default is 0.05. |
impute |
logical. If TRUE, imputation. Default is TRUE. |
A matrix of genotypes in numeric format, coded as 1, 0, -1 for AA, Aa, aa. Each row represents an individual and each column represents a marker. The rownames of the matrix are the names of individuals.
## load genotype in HapMap format with double bit data(input_geno) ## convert genotype for hypred package inbred_gen <- convertgen(input_geno, type = "hmp2") ## load genotype in numeric format data(input_geno1) head(input_geno1) ## convert genotype for hypred package inbred_gen1 <- convertgen(input_geno1, type = "num")
## load genotype in HapMap format with double bit data(input_geno) ## convert genotype for hypred package inbred_gen <- convertgen(input_geno, type = "hmp2") ## load genotype in numeric format data(input_geno1) head(input_geno1) ## convert genotype for hypred package inbred_gen1 <- convertgen(input_geno1, type = "num")
Generate a mating design for a subset of crosses based on a balanced random partial rectangle cross-design (BRPRCD) (Xu et al. 2016).
crodesign(d, male_name, female_name, seed = 123)
crodesign(d, male_name, female_name, seed = 123)
d |
an integer denoting 1/d percentage of crosses to be evaluated in the field. |
male_name |
a character string for the names of male parents. |
female_name |
a character string for the names of male parents. |
seed |
the random number, default is 123. |
A data frame of mating design result with three columns. The first column is "crossID", the second column is the "male_Name" and the third column is the "female_Name".
Xu S, Xu Y, Gong L and Zhang Q. (2016) Metabolomic prediction of yield in hybrid rice. Plant J. 88, 219-227.
## generate a mating design with 100 male parents and 150 female parents ## for 1/d = 1/50 percentage of crosses to be evaluated in the field. ## the total number of potential crosses is 100*150 = 15000. ## The number of crosses to be field evaluated is 15000*(1/50) = 300. male_name <- paste("m", 1:100, sep = "") female_name <- paste("f", 1:150, sep = "") design <- crodesign(d = 50, male_name, female_name)
## generate a mating design with 100 male parents and 150 female parents ## for 1/d = 1/50 percentage of crosses to be evaluated in the field. ## the total number of potential crosses is 100*150 = 15000. ## The number of crosses to be field evaluated is 15000*(1/50) = 300. male_name <- paste("m", 1:100, sep = "") female_name <- paste("f", 1:150, sep = "") design <- crodesign(d = 50, male_name, female_name)
The cv function evaluates trait predictability based on eight GS methods via k-fold cross validation. The trait predictability is defined as the squared Pearson correlation coefficient between the observed and the predicted trait values.
cv( fix = NULL, gena, gend = NULL, parent_phe = NULL, hybrid_phe, method = "GBLUP", drawplot = TRUE, nfold = 5, nTimes = 1, seed = 1234, CPU = 1 )
cv( fix = NULL, gena, gend = NULL, parent_phe = NULL, hybrid_phe, method = "GBLUP", drawplot = TRUE, nfold = 5, nTimes = 1, seed = 1234, CPU = 1 )
fix |
a design matrix of the fixed effects. |
gena |
a matrix (n x m) of additive genotypes for the training population. |
gend |
a matrix (n x m) of domiance genotypes for the training population. Default is NULL. |
parent_phe |
a matrix of a phenotypic values of parent.The names parent_phe must match the rownames of inbred_gen. Default is NULL. |
hybrid_phe |
a data frame with three columns. The first column and the second column are the names of male and female parents of the corresponding hybrids, respectively; the third column is the phenotypic values of hybrids. The names of male and female parents must match the rownames of inbred_gen. Missing (NA) values are not allowed. |
method |
eight GS methods including "GBLUP", "BayesB", "RKHS", "PLS", "LASSO", "EN", "XGBoost","LightGBM". Users may select one of these methods or all of them simultaneously with "ALL". Default is "GBLUP". |
drawplot |
when method ="ALL", user may select TRUE for a barplot about eight GS methods. Default is TRUE. |
nfold |
the number of folds. Default is 5. |
nTimes |
the number of independent replicates for the cross-validation. Default is 1. |
seed |
the random number. Default is 1234. |
CPU |
the number of CPU. |
Trait predictability
## load example data from predhy package data(hybrid_phe) data(input_geno) ## convert original genotype inbred_gen <- convertgen(input_geno, type = "hmp2") ##additive model infer the additive and dominance genotypes of hybrids gena <- infergen(inbred_gen, hybrid_phe)$add gend <- infergen(inbred_gen, hybrid_phe)$dom ##additive model R2<-cv(fix=NULL,gena,gend=NULL,parent_phe=NULL,hybrid_phe,method ="GBLUP") ##additive-dominance model R2<-cv(fix=NULL,gena,gend,parent_phe=NULL,hybrid_phe,method ="GBLUP")
## load example data from predhy package data(hybrid_phe) data(input_geno) ## convert original genotype inbred_gen <- convertgen(input_geno, type = "hmp2") ##additive model infer the additive and dominance genotypes of hybrids gena <- infergen(inbred_gen, hybrid_phe)$add gend <- infergen(inbred_gen, hybrid_phe)$dom ##additive model R2<-cv(fix=NULL,gena,gend=NULL,parent_phe=NULL,hybrid_phe,method ="GBLUP") ##additive-dominance model R2<-cv(fix=NULL,gena,gend,parent_phe=NULL,hybrid_phe,method ="GBLUP")
The HAT method is a fast algorithm for the ordinary cross validation. It is highly recommended for large dataset (Xu et al. 2017).
cv_fast(fix = NULL, y, kk, nfold = 5, seed = 123)
cv_fast(fix = NULL, y, kk, nfold = 5, seed = 123)
fix |
a design matrix of the fixed effects. If not passed, a vector of ones is added for the intercept. |
y |
a vector of the phenotypic values. |
kk |
a list of one or multiple kinship matrices. |
nfold |
the number of folds, default is 5. For the HAT Method, nfold can be set as the sample size (leave-one-out CV) to avoid
variation caused by random partitioning of the samples, but it is not recommended for |
seed |
the random number, default is 123. |
Trait predictability
Xu S. (2017) Predicted residual error sum of squares of mixed models: an application for genomic prediction. G3 (Bethesda) 7, 895-909.
## load example data from hypred package data(hybrid_phe) data(input_geno) ## convert original genotype inbred_gen <- convertgen(input_geno, type = "hmp2") ## infer the additive and dominance genotypes of hybrids gena <- infergen(inbred_gen, hybrid_phe)$add gend <- infergen(inbred_gen, hybrid_phe)$dom ## calculate the additive and dominance kinship matrix ka <- kin(gena) kd <- kin(gend) ##for the additive model predictability <- cv_fast(y = hybrid_phe[,3], kk = list(ka)) ##for the additive-dominance model predictability <- cv_fast(y = hybrid_phe[,3], kk = list(ka,kd))
## load example data from hypred package data(hybrid_phe) data(input_geno) ## convert original genotype inbred_gen <- convertgen(input_geno, type = "hmp2") ## infer the additive and dominance genotypes of hybrids gena <- infergen(inbred_gen, hybrid_phe)$add gend <- infergen(inbred_gen, hybrid_phe)$dom ## calculate the additive and dominance kinship matrix ka <- kin(gena) kd <- kin(gend) ##for the additive model predictability <- cv_fast(y = hybrid_phe[,3], kk = list(ka)) ##for the additive-dominance model predictability <- cv_fast(y = hybrid_phe[,3], kk = list(ka,kd))
This dataset contains phenotypic data of 410 hybrids for grain yield in maize.
hybrid_phe
hybrid_phe
A data frame with 410 rows and 3 variables:
M
The names of male parents.
F
The names of female parents.
GY
The grain yield of hybrids.
Infer additive and dominance genotypes of hybrids based on their parental genotypes.
infergen(inbred_gen, hybrid_phe)
infergen(inbred_gen, hybrid_phe)
inbred_gen |
a matrix for genotypes of parental lines in numeric format, coded as 1, 0 and -1. The row.names of inbred_gen must be provied. It can be obtained from the original genotype using |
hybrid_phe |
a data frame with three columns. The first column and the second column are the names of male and female parents of the corresponding hybrids, respectively; the third column is the phenotypic values of hybrids. The names of male and female parents must match the rownames of inbred_gen. Missing (NA) values are not allowed. |
A list with following information is returned:
$add additive genotypes of hybrids
$dom dominance genotypes of hybrids
## load example data from hypred package data(hybrid_phe) head(hybrid_phe) data(input_geno) ## convert original genotype inbred_gen <- convertgen(input_geno, type = "hmp2") gena <- infergen(inbred_gen, hybrid_phe)$add gend <- infergen(inbred_gen, hybrid_phe)$dom
## load example data from hypred package data(hybrid_phe) head(hybrid_phe) data(input_geno) ## convert original genotype inbred_gen <- convertgen(input_geno, type = "hmp2") gena <- infergen(inbred_gen, hybrid_phe)$add gend <- infergen(inbred_gen, hybrid_phe)$dom
Genotypic data of 348 maize inbred lines in Hapmap format with double bit.
input_geno
input_geno
A data frame with 4979 rows and 359 columns.
Genotypic data of 50 rice inbred lines with 1000 SNPs.
input_geno1
input_geno1
A data frame with 1000 rows and 50 variables.
Calculate the additive and dominance kinship matrix.
kin(gen)
kin(gen)
gen |
a matrix for genotypes, coded as 1, 0, -1 for AA, Aa, aa. Each row represents an individual and each column represents a marker. |
a kinship matrix
## random population with 100 lines and 1000 markers gen <- matrix(rep(0,100*1000),100,1000) gen <- apply(gen,2,function(x){x <- sample(c(-1,0,1), 100, replace = TRUE)}) ## generate 100*100 kinship matrix k <- kin(gen)
## random population with 100 lines and 1000 markers gen <- matrix(rep(0,100*1000),100,1000) gen <- apply(gen,2,function(x){x <- sample(c(-1,0,1), 100, replace = TRUE)}) ## generate 100*100 kinship matrix k <- kin(gen)
Solve linear mixed model using restricted maximum likelihood (REML). Multiple variance components can be estimated.
mixed(fix = NULL, y, kk)
mixed(fix = NULL, y, kk)
fix |
a design matrix of the fixed effects. If not passed, a vector of ones is added for the intercept. |
y |
a vector of the phenotypic values. |
kk |
a list of one or multiple kinship matrices. |
A list with following information is returned:
$v_i the inverse of the phenotypic variance-covariance matrix
$var estimated variance components of genetic effects
$ve estimated residual variance
$beta estimated fixed effects
Xu S, Zhu D and Zhang Q. (2014) Predicting hybrid performance in rice using genomic best linear unbiased prediction. Proc. Natl. Acad. Sci. USA 111, 12456-12461.
## load example data from hypred package data(hybrid_phe) data(input_geno) ## convert original genotype inbred_gen <- convertgen(input_geno, type = "hmp2") ## infer the additive and dominance genotypes of hybrids gena <- infergen(inbred_gen, hybrid_phe)$add gend <- infergen(inbred_gen, hybrid_phe)$dom ## calculate the additive and dominance kinship matrix ka <- kin(gena) kd <- kin(gend) ## for the additive model parm <- mixed(y = hybrid_phe[,3], kk = list(ka)) ## for the additive-dominance model parm <- mixed(y = hybrid_phe[,3], kk = list(ka, kd))
## load example data from hypred package data(hybrid_phe) data(input_geno) ## convert original genotype inbred_gen <- convertgen(input_geno, type = "hmp2") ## infer the additive and dominance genotypes of hybrids gena <- infergen(inbred_gen, hybrid_phe)$add gend <- infergen(inbred_gen, hybrid_phe)$dom ## calculate the additive and dominance kinship matrix ka <- kin(gena) kd <- kin(gend) ## for the additive model parm <- mixed(y = hybrid_phe[,3], kk = list(ka)) ## for the additive-dominance model parm <- mixed(y = hybrid_phe[,3], kk = list(ka, kd))
Predict all potential crosses of a given set of parents using a subset of crosses as the training sample.
predhy.predict( inbred_gen, hybrid_phe, parent_phe = NULL, method = "GBLUP", model = "A", select = "top", number = "100" )
predhy.predict( inbred_gen, hybrid_phe, parent_phe = NULL, method = "GBLUP", model = "A", select = "top", number = "100" )
inbred_gen |
a matrix for genotypes of parental lines in numeric format, coded as 1, 0 and -1. The row.names of inbred_gen must be provied. It can be obtained from the original genotype using |
hybrid_phe |
a data frame with three columns. The first column and the second column are the names of male and female parents of the corresponding hybrids, respectively; the third column is the phenotypic values of hybrids. The names of male and female parents must match the rownames of inbred_gen. Missing (NA) values are not allowed. |
parent_phe |
a matrix of a phenotypic values of parent.The names parent_phe must match the rownames of inbred_gen. Default is NULL. |
method |
eight GS methods including "GBLUP", "BayesB", "RKHS", "PLS", "LASSO", "EN", "XGBoost", "LightGBM". Users may select one of these methods. Default is "GBLUP". |
model |
the prediction model. There are four options: model = "A" for the additive model, model = "AD" for the additive-dominance model,model = "A-P" for the additive-phenotypic model,model = "AD-P" for the additive-dominance-phenotypic model. Default is model = "A". |
select |
the selection of hybrids based on the prediction results. There are three options: select = "all", which selects all potential crosses. select = "top", which selects the top n crosses. select = "bottom", which selects the bottom n crosses. The n is determined by the param number. |
number |
the number of selected top or bottom hybrids, only when select = "top" or select = "bottom". |
a data frame of prediction results with two columns. The first column denotes the names of male and female parents of the predicted hybrids, and the second column denotes the phenotypic values of the predicted hybrids.
## load example data from predhy package data(hybrid_phe) data(input_geno) ## convert original genotype inbred_gen <- convertgen(input_geno, type = "hmp2") pred<-predhy.predict(inbred_gen,hybrid_phe,method="LASSO",model="A",select="top",number="100") pred<-predhy.predict(inbred_gen,hybrid_phe,method="LASSO",model="AD",select="all")
## load example data from predhy package data(hybrid_phe) data(input_geno) ## convert original genotype inbred_gen <- convertgen(input_geno, type = "hmp2") pred<-predhy.predict(inbred_gen,hybrid_phe,method="LASSO",model="A",select="top",number="100") pred<-predhy.predict(inbred_gen,hybrid_phe,method="LASSO",model="AD",select="all")
Predict all potential crosses of a given set of parents using a subset of crosses as the training sample.
predhy.predict_NCII( inbred_gen, hybrid_phe, parent_phe = NULL, male_name = hybrid_phe[, 1], female_name = hybrid_phe[, 2], method = "GBLUP", model = "A", select = "top", number = "100" )
predhy.predict_NCII( inbred_gen, hybrid_phe, parent_phe = NULL, male_name = hybrid_phe[, 1], female_name = hybrid_phe[, 2], method = "GBLUP", model = "A", select = "top", number = "100" )
inbred_gen |
a matrix for genotypes of parental lines in numeric format, coded as 1, 0 and -1. The row.names of inbred_gen must be provied. It can be obtained from the original genotype using |
hybrid_phe |
a data frame with three columns. The first column and the second column are the names of male and female parents of the corresponding hybrids, respectively; the third column is the phenotypic values of hybrids. The names of male and female parents must match the rownames of inbred_gen. Missing (NA) values are not allowed. |
parent_phe |
a matrix of a phenotypic values of parent.The names parent_phe must match the rownames of inbred_gen. Default is NULL. |
male_name |
a vector of the names of male parents. |
female_name |
a vector of the names of female parents. |
method |
eight GS methods including "GBLUP", "BayesB", "RKHS", "PLS", "LASSO", "EN", "XGBoost", "LightGBM". Users may select one of these methods. Default is "GBLUP". |
model |
the prediction model. There are two options: model = "A" for the additive model, model = "AD" for the additive-dominance model,model = "A-P" for the additive-phenotypic model,model = "AD-P" for the additive-dominance-phenotypic model. Default is model = "A". |
select |
the selection of hybrids based on the prediction results. There are three options: select = "all", which selects all potential crosses. select = "top", which selects the top n crosses. select = "bottom", which selects the bottom n crosses. The n is determined by the param number. |
number |
the number of selected top or bottom hybrids, only when select = "top" or select = "bottom". |
a data frame of prediction results with two columns. The first column denotes the names of male and female parents of the predicted hybrids, and the second column denotes the phenotypic values of the predicted hybrids.
## load example data from hypred package data(hybrid_phe) data(input_geno) ## convert original genotype inbred_gen <- convertgen(input_geno, type = "hmp2") pred<-predhy.predict_NCII(inbred_gen,hybrid_phe,method="LASSO",model="A") pred<-predhy.predict_NCII(inbred_gen,hybrid_phe,method="LASSO",model = "AD",select="all")
## load example data from hypred package data(hybrid_phe) data(input_geno) ## convert original genotype inbred_gen <- convertgen(input_geno, type = "hmp2") pred<-predhy.predict_NCII(inbred_gen,hybrid_phe,method="LASSO",model="A") pred<-predhy.predict_NCII(inbred_gen,hybrid_phe,method="LASSO",model = "AD",select="all")