An application of change-point recursive models to the relationship between litter size and number of stillborns in pigs

Please download to get full document.

View again

of 12
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Document Description
An application of change-point recursive models to the relationship between litter size and number of stillborns in pigs
Document Share
Document Tags
Document Transcript
  See discussions, stats, and author profiles for this publication at: An application of change-point recursive modelsto the relationship between litter size andnumber of stillborns in pigs  Article   in  Journal of Animal Science · November 2010 DOI: 10.2527/jas.2009-2557 · Source: PubMed CITATIONS 6 READS 31 4 authors: Noelia IbanezThe Roslin Institute 77   PUBLICATIONS   552   CITATIONS   SEE PROFILE E. MaturanaCentro Nacional de Investigaciones Oncológicas 32   PUBLICATIONS   264   CITATIONS   SEE PROFILE JL NogueraIRTA Institute of Agrifood Research and Techno… 172   PUBLICATIONS   2,357   CITATIONS   SEE PROFILE Luis VaronaUniversity of Zaragoza 209   PUBLICATIONS   2,791   CITATIONS   SEE PROFILE All content following this page was uploaded by Noelia Ibanez on 18 December 2016. The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the srcinal documentand are linked to publications on ResearchGate, letting you access and read them immediately.  ABSTRACT:  We developed and implemented change-point recursive models and compared them with a linear recursive model and a standard mixed model (SMM), in the scope of the relationship between litter size (LS) and number of stillborns (NSB) in pigs. The proposed approach allows us to estimate the point of change in multiple-segment modeling of a nonlinear relationship between phenotypes. We applied the pro-cedure to a data set provided by a commercial Large White selection nucleus. The data file consisted of LS and NSB records of 4,462 parities. The results of the analysis clearly identified the location of the change points between different structural regression coeffi-cients. The magnitude of these coefficients increased with LS, indicating an increasing incidence of LS on the NSB ratio. However, posterior distributions of correla-tions were similar across subpopulations (defined by the change points on LS), except for those between re-siduals. The heritability estimates of NSB did not pres-ent differences between recursive models. Nevertheless, these heritabilities were greater than those obtained for SMM (0.05) with a posterior probability of 85%. These results suggest a nonlinear relationship between LS and NSB, which supports the adequacy of a change-point recursive model for its analysis. Furthermore, the re-sults from model comparisons support the use of recur-sive models. However, the adequacy of the different re-cursive models depended on the criteria used: the linear recursive model was preferred on account of its small-est deviance value, whereas nonlinear recursive models provided a better fit and predictive ability based on the cross-validation approach. Key words: Bayesian analysis, change point, litter size, piglet mortality, recursive model ©2010 American Society of Animal Science. All rights reserved. J. Anim. Sci. 2010. 88:3493–3503 doi:10.2527/jas.2009-2557 INTRODUCTION Structural equation models, such as recursive models, among others, were first introduced to describe causal relationships between phenotypes (Wright, 1934). Their emergence in a quantitative genetics scenario (Gianola and Sorensen, 2004) motivated several studies that ana-lyzed recursive relationships (e.g., de los Campos et al., 2006a,b; López de Maturana et al., 2007; Varona et al.,  2007). All these studies modeled linear relationships be-tween phenotypes; however, these relationships might not be linear. In a recent study, López de Maturana et al. (2009) applied structural equation models to ana-lyze the nonlinear relationships between 3 calving traits (gestation length, calving difficulty, and stillbirth) by implementing a segmented recursive regression. Al-though this approach was found useful to detect hetero-geneity of residual variance, contemporary group, and genetic correlations, it lacked flexibility because fixed predetermined change points were assumed.Using change-point techniques (Carlin et al., 1992; Chib, 1998) can provide flexibility to the analysis be-cause these methods allow us to estimate the location of the change point in a multiple-segment modeling of nonlinear relationships between phenotypes. In this study, we implemented a change-point technique to an-alyze the nonlinear recursive relationship between litter size ( LS ) and the number of stillborns ( NSB ). Because genetic improvement of the reproductive performance of sows seeks to maximize the number of live piglets per litter, LS and NSB have become 2 of the most im-portant traits in pig breeding programs. Current breed-ing programs have mainly focused on selecting for LS to achieve this goal. However, as selection experiments have shown, selection for LS increases the NSB (see review by Blasco et al., 1995). Thus, to improve the number of piglets born alive, we need more knowledge on the relationship between these 2 traits. An application of change-point recursive models to the relationship between litter size and number of stillborns in pigs 1 N. Ibáñez-Escriche,* 2  E. López de Maturana,† J. L. Noguera,* and L. Varona‡ *Genètica i Millora Animal, IRTA-Lleida, 25198 Lleida, Spain; †Dpto. de Mejora Genética Animal, INIA, 28040 Madrid, Spain; and ‡Unidad de Genética Cuantitativa y Mejora Animal, Universidad de Zaragoza, 50013 Zaragoza, Spain 1 Financial support was provided by the IRTA, Lleida, Spain (grant 0502-21102). 2 Corresponding author: Noelia.ibanez@irta.esReceived October 7, 2009.Accepted July 12, 2010. 3493  In summary, the aims of this study were 1) to imple-ment several change-point recursive models that differ in the number of change points and 2) to compare these models with a linear recursive model and a standard mixed model ( SMM ) to investigate the relationship between LS and NSB in the Large White pig breed. MATERIALS AND METHODS Animals were managed under standard intensive con-ditions; in all cases, reproduction was carried out by AI. Protocols were approved by the Ethical and Animal Care Committee at IRTA. Data  The data set included LS and NSB records of 4,513 parities from 1,074 sows belonging to 1 Large White purebred pig line selected for LS. Data were recorded between 1999 and 2006 at a nucleus pig farm owned by a commercial breeding company registered in the Span-ish Pig Data Bank (BDporc, Lleida, Spain; Sows were kept under commercial conditions, and farrowings occurred in individual crates at standard confinement facilities. The registration pro-tocol for farrowings in the nucleus farm included, ac-cording to current procedures, litter characteristics and individual data [i.e., a description of farrowing (identi-fication of sow, total number of born piglets and total number of piglets born alive, parity number, and date of farrowing) and the information about each individ-ual piglet, which included state at birth (alive or dead within the first 12 h after farrowing), sex, and birth date]. The NSB was measured as the total number of piglets dead at farrowing or within the first 12 h af-ter farrowing (early neonatal mortality). In a previous study using the same data set, Ibáñez-Escriche et al. (2009) found very low piglet heritability for farrowing mortality; thus, for this study we considered NSB as a maternal trait.After an exploratory analysis, outliers, defined as data points with a value greater or less than 4 SD of the mean, were removed from the database. Parities with all piglets stillborn were considered as abortions and also removed from the database. The final data set consisted on LS and NSB records of 4,462 parities from 1,070 sows. The pedigree dated back at least 3 genera-tions, and no selection for survival traits as farrowing mortality was performed. The pedigree file contained a total of 1,534 individuals. Figure 1 shows the relation-ship between the average LS and the average NSB after editions. Descriptive statistics can be found in Ibáñez-Escriche et al. (2009). Statistical Analysis According to Figure 1, the LS phenotype indicates a nonlinear recursive effect on the NSB phenotype. Therefore, 4 recursive models with different number of change points and a SMM were proposed to analyze the relationship between LS and NSB. Recursive model 1 assumes a linear recursive relationship between traits and uses a unique structural coefficient matrix across data. Models 2, 3, and 4 were recursive with 2, 3, and 4 change points, respectively. These models postulate heterogeneous structural coefficients (Wu et al., 2007) across subgroups of LS defined by the change points.Assuming a joint multivariate normal distribution for LS and NSB, the data of an individual i   belonging to subpopulation k   and model  j (  j = 1, …,   4) were mod-eled as  L kj i i j i j i j j  y X b Z a Wp e = + + +  ,  [1]where  y t b a p RX b Z a WpR i kj kj j j j j kj i j i j i j kj j  N  | , , , , ,~ [ ( ),' LLL L 0110 -- + + -- 1 kj   ].  [2]The latter expression can also be written as  y t b a p RX b Z a Wp i kj kj j j j j i kj i j i j i j  y  | , , , , ,exp[ ( )] ' LLL 01 2 µ -- + + -- 11011 kj kj tkj k m i n  I  R  L -= æèçççççççöø÷÷÷÷÷÷÷÷ åÕ  ( ). q  [3]In Eq. [1], [2], and [3], L kj   is the matrix of structural coefficients corresponding to subpopulation k   of model  j   for LS, and it takes the form Figure 1.  Relationship between litter size and number of still-borns. Ibáñez-Escriche et al. 3494    L kj NSB LS  kj  =-æèçççççöø÷÷÷÷÷÷ ¬ 1 01 l  ,  [4]where λ  is the structural coefficient, which in the case of SMM is equal to zero, t kj   is the vector of change points, and subpopulation k   for model  j   is defined by the records of LS between the change points t  k   and t  k  +1  for k   = 1, …,  m  ; here t  1  is equal to the smallest record of LS, and t  m  +1  is the largest record of LS. The m   index specifies the total number of subpopulations for each model, and it takes the values 1, 2, 3, and 4 for models 1, 2, 3, and 4, respectively. The variable I tkj  ( θ ) is the change-point indicator for model  j  , where I tkj  ( θ ) = 0 for y t  i kj  LS  <  and I tkj  ( θ ) = 1 for t y t  kj i kj  LS  < < + 1 . In Eq. [1] above, y i   is a 2 × 1 vector containing the LS and NSB for the i  th individual; b  j   is a vector in-cluding systematic effects on the 2 traits for model  j  , year-season (31 levels), and order of parity (6 levels); a  j   is the vector of direct additive genetic effects for model  j  ; p  j   is the vector of permanent effects for model  j  ; e  j   is the vector of the residuals for model  j  ; and X i  , Z i  , and W i   are known incidence matrices. Year-season was defined as 3-mo intervals between March 1999 and December 2006.The distributions assumed a priori for the additive effects a , permanent effects p , and residual effects e , for the 2 traits, were multivariate normal distributions, with null mean vector and covariance matrixes G A P I R I 0 0 0 Ä  ,  ⊗ p e  , and ,  respectively, where A  is the relationship matrix involving 1,530 animals, and I p  and I e   are the identity matrices of order equal to the num-ber of litters and records, respectively.  GP 02202 =éëêêêêùûúúúú= s ss ss s a a a a p p LS LS NSB NSB LS NSB LS LS N  ,,, , S SB NSB LS NSB LS LS NSB NSB LS  p pe e e  s ss ss ,,, 202 éëêêêêùûúúúú= , and R ss e  NSB  2 éëêêêêùûúúúú .  In recursive models, R 0  was assumed to be diagonal to achieve identifiability (Varona et al., 2007).The following uniform bounded priors were assigned to b  j  , λ kj  , and the elements of matrices G 0 , P 0 , and R 0 :p( b  j  )~U[−100, 100],p(λ kj  )~U[−20, 20],  p , p U 0, 100 and p U s ss s s aLS aNSB aLS NSB aLS aNS  2 22 2 ( ) ( )( )  - ∼∼ [ ] ,  B B aLS aNSB  , , s s 2 2 éëêêùûúú   p , p U 0, 100 and p U s ss s s pLS pNSB pLS NSB pLS pNS  2 22 2 ( ) ( )( )  - ∼∼ [ ] ,  B B pLS pNSB  , , s s 2 2 éëêêùûúú   p , p U 0, 100 s s eLS eNSB  2 2 ( ) ( )  ∼  [ ].  The same effects and prior distributions as assumed in the recursive models, with the exception of R 0 , were adopted in the standard mixed model ( SMM ). The residual (co)variance matrix R 0  in SMM was assumed an inverted Wishart distribution with 3 df, so that it reduces to improper uniform prior distributions.Bayesian inference via Markov chain Monte Carlo ( McMC ) methods was used to analyze the data. The fully conditional distributions were multivariate normal for b  j  , λ kj  , a  j  , and p  j   and inverted Wishart for the (co) variance matrices G 0  and P 0 . However, the fully con-ditional distribution for R 0  was inverted scaled χ  2  in recursive models and inverted Wishart in SMM. The prior assumed for t kj   was improper and its conditional distribution did not have a closed form. Marginal pos-terior distributions of all unknowns were estimated us-ing the Gibbs sampling algorithm (e.g., Geman and Ge-man, 1984), except for t kj   where a Metropolis-Hasting algorithm was used (Hastings, 1970; see Appendix for more details). After exploratory analyses, we used a single chain with a total of 500,000 samples for each analysis, after a burn-in period of 50,000. Convergence was tested separately for all dispersion parameters us-ing the Raftery and Lewis (1992) algorithm and the Z criterion of Geweke (1992). Effective sample size was evaluated using the method of Geyer (1992), and Mon-te Carlo sampling errors were computed using the time-series procedures described by Geyer (1992).Parameters in the recursive models are not directly comparable with those in a SMM. Gianola and So-rensen (2004) showed that to compare the covariance matrices ( G 0 , P 0 , R 0 ) with those obtained from SMM, these have to be adjusted by the matrix of structural coefficients ( Λ k ). In our case,G k * = Λ k−1 G 0 Λ k−1 ′ P k * = Λ k−1 P 0 Λ k−1 ′  R k * = Λ k−1 R 0 Λ k−1 ′ . [5] Model Comparison  Three approaches were performed to validate the models under different criteria. The approaches are shown below. Deviance Information Criterion. Spiegelhalter et al. (2002) proposed this approach as a measure of model comparison. The deviance information criterion ( DIC ) compares the global quality of 2 or more mod- Change-point recursive models for litter traits  3495  els, accounting for model complexity. For a particular model M  , the DIC is defined as  DIC D D  M  = - 2 ( ) q  .  [6]The term  D p y p M E D  M M M y M  M  =- = ò  2 [log ( | )] ( | , ) [ ( )] | q q q q q y  d  is the posterior expectation of the deviance D  M  ( ), q  and D p M M  ( ) log ( | ) q q = - 2  y  is the deviance evaluated at the posterior mean of the parameter vector q M  .  Ex-pression Eq. [6] is the result of combining both terms, where D   is a measure of model fit and D D  M  −  ( ) q  is related to the effective number of parameters. Models with smaller DIC exhibit a better global fit after ac-counting for model complexity. Differences in DIC of more than 7 are considered as important by Spiegelhal-ter et al. (2002). Leave-One-Out Cross-Validation. This approach, based on posterior predictive distributions, was pro-posed by Gelfand et al. (1992) as a means of checking the fit of the model and model choice. The approach involves constructing the posterior predictive density for a datum y  i   conditional on model M and on the data vector, but leaving out this particular datum, y - i   (Sorensen and Gianola, 2002):  p y M p y M p M  i i i i i  yy y -- - ( )  = ( )  ( ) ò  ,, , , . q q q d  [7]The values of the posterior predictive density p y M  i i  y - ( ) ,  are also known as conditional predictive ordinates ( CPO i  ) and can be used as a measure of ev-ery data point fitted by the model (Ibañez-Escriche et al., 2009). Small CPO i   indicates data poorly fitted by the model. A plot of the CPO i   for different models dis-closes which model does better and which points are poorly fitted under the different models.The sum of the CPO i   logarithms of the data (Geis-ser, 1993),  log log ( | , ) CPO p y y M  ModelM i i i n  = éëê ùûú -= å 1  and the Bayesian square standardized residual (Cog-don, 2001),  D y E y y y y  i i i i i i n  221 = -éëêêêêùûúúúú --= å       ( | )var( | ),  [8]are diagnostic measures to evaluate the overall predic-tive ability of the model. Models having a greater CPO or a smaller D 2  would be viewed as having a better predictive ability. The calculations involved in Eq. [7] and [8] are analytically intractable, but we estimated them using the McMC approach described by Gelfand (1996). K-Fold Cross-Validation. This method was used to evaluate the models based on their ability to predict future data. The entire data set was partitioned into 5 disjoint subsets, each with approximately 20% of the records, by taking random samples of data points. The cross-validation procedure used 4 of the 5 subsets for fitting and prediction (training set), and the remaining subset was used to test predictive ability (testing set). Two different criteria were used to compare the pre-dictive ability of the models. Mean Squared Error. The mean squared error ( MSE ) was computed as  MSE n n y y  iter data i n  j n  iter data  = - = = å å 1 1 1 12 ( ) ,   where y   and y    correspond to the observed and pre-dicted observations, respectively; n  data   is the number of data points in a testing subset; and n  iter   is the number of iterations of the McMC chain. Models having the smallest MSE were regarded as those with the best pre-dictive ability. Pearson Correlation. The Pearson correlation ( PC ) between observed and predicted observations was calculated as  rs s y y iter y y i n  n y y  iter  , cov( , ),   = = å 1 1  where cov( , ) y y    is the estimate of the covariance be-tween observed and predicted NSB records; s y   and s y    are the estimates of SD of observed and predicted re-cords; and, as above, n  iter   is the number of iterations of the McMC chain. The model providing the greatest correlation was considered as the one with the best pre-dictive ability of yet to be observed records. RESULTS Inferences of Model Parameters Posterior distributions for all unknown parameters in the 5 models exhibited a Monte Carlo error of less than 0.0005, and the Geweke test did not detect any lack of convergence.Table 1 presents the posterior means and highest 95% posterior density intervals ( HPD95% ) of change points (t 2  − t k  ) and structural coefficients (λ kj  ) of mod- Ibáñez-Escriche et al. 3496
Similar documents
View more...
Search Related
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks