See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/45440472
An application of changepoint recursive modelsto the relationship between litter size andnumber of stillborns in pigs
Article
in
Journal of Animal Science · November 2010
DOI: 10.2527/jas.20092557 · Source: PubMed
CITATIONS
6
READS
31
4 authors:
Noelia IbanezThe Roslin Institute
77
PUBLICATIONS
552
CITATIONS
SEE PROFILE
E. MaturanaCentro Nacional de Investigaciones Oncológicas
32
PUBLICATIONS
264
CITATIONS
SEE PROFILE
JL NogueraIRTA Institute of Agrifood Research and Techno…
172
PUBLICATIONS
2,357
CITATIONS
SEE PROFILE
Luis VaronaUniversity of Zaragoza
209
PUBLICATIONS
2,791
CITATIONS
SEE PROFILE
All content following this page was uploaded by Noelia Ibanez on 18 December 2016.
The user has requested enhancement of the downloaded file. All intext references underlined in blue are added to the srcinal documentand are linked to publications on ResearchGate, letting you access and read them immediately.
ABSTRACT:
We developed and implemented changepoint recursive models and compared them with a linear recursive model and a standard mixed model (SMM), in the scope of the relationship between litter size (LS) and number of stillborns (NSB) in pigs. The proposed approach allows us to estimate the point of change in multiplesegment modeling of a nonlinear relationship between phenotypes. We applied the procedure to a data set provided by a commercial Large White selection nucleus. The data file consisted of LS and NSB records of 4,462 parities. The results of the analysis clearly identified the location of the change points between different structural regression coefficients. The magnitude of these coefficients increased with LS, indicating an increasing incidence of LS on the NSB ratio. However, posterior distributions of correlations were similar across subpopulations (defined by the change points on LS), except for those between residuals. The heritability estimates of NSB did not present differences between recursive models. Nevertheless, these heritabilities were greater than those obtained for SMM (0.05) with a posterior probability of 85%. These results suggest a nonlinear relationship between LS and NSB, which supports the adequacy of a changepoint recursive model for its analysis. Furthermore, the results from model comparisons support the use of recursive models. However, the adequacy of the different recursive models depended on the criteria used: the linear recursive model was preferred on account of its smallest deviance value, whereas nonlinear recursive models provided a better fit and predictive ability based on the crossvalidation approach.
Key words:
Bayesian analysis, change point, litter size, piglet mortality, recursive model
©2010 American Society of Animal Science. All rights reserved.
J. Anim. Sci. 2010. 88:3493–3503 doi:10.2527/jas.20092557
INTRODUCTION
Structural equation models, such as recursive models, among others, were first introduced to describe causal relationships between phenotypes (Wright, 1934). Their emergence in a quantitative genetics scenario (Gianola and Sorensen, 2004) motivated several studies that analyzed recursive relationships (e.g., de los Campos et al., 2006a,b; López de Maturana et al., 2007; Varona et al.,
2007). All these studies modeled linear relationships between phenotypes; however, these relationships might not be linear. In a recent study, López de Maturana et al. (2009) applied structural equation models to analyze the nonlinear relationships between 3 calving traits (gestation length, calving difficulty, and stillbirth) by implementing a segmented recursive regression. Although this approach was found useful to detect heterogeneity of residual variance, contemporary group, and genetic correlations, it lacked flexibility because fixed predetermined change points were assumed.Using changepoint techniques (Carlin et al., 1992; Chib, 1998) can provide flexibility to the analysis because these methods allow us to estimate the location of the change point in a multiplesegment modeling of nonlinear relationships between phenotypes. In this study, we implemented a changepoint technique to analyze the nonlinear recursive relationship between litter size (
LS
) and the number of stillborns (
NSB
). Because genetic improvement of the reproductive performance of sows seeks to maximize the number of live piglets per litter, LS and NSB have become 2 of the most important traits in pig breeding programs. Current breeding programs have mainly focused on selecting for LS to achieve this goal. However, as selection experiments have shown, selection for LS increases the NSB (see review by Blasco et al., 1995). Thus, to improve the number of piglets born alive, we need more knowledge on the relationship between these 2 traits.
An application of changepoint recursive models to the relationship between litter size and number of stillborns in pigs
1
N. IbáñezEscriche,*
2
E. López de Maturana,† J. L. Noguera,* and L. Varona‡
*Genètica i Millora Animal, IRTALleida, 25198 Lleida, Spain; †Dpto. de Mejora Genética Animal, INIA, 28040 Madrid, Spain; and ‡Unidad de Genética Cuantitativa y Mejora Animal, Universidad de Zaragoza, 50013 Zaragoza, Spain
1
Financial support was provided by the IRTA, Lleida, Spain (grant 050221102).
2
Corresponding author: Noelia.ibanez@irta.esReceived October 7, 2009.Accepted July 12, 2010.
3493
In summary, the aims of this study were 1) to implement several changepoint recursive models that differ in the number of change points and 2) to compare these models with a linear recursive model and a standard mixed model (
SMM
) to investigate the relationship between LS and NSB in the Large White pig breed.
MATERIALS AND METHODS
Animals were managed under standard intensive conditions; in all cases, reproduction was carried out by AI. Protocols were approved by the Ethical and Animal Care Committee at IRTA.
Data
The data set included LS and NSB records of 4,513 parities from 1,074 sows belonging to 1 Large White purebred pig line selected for LS. Data were recorded between 1999 and 2006 at a nucleus pig farm owned by a commercial breeding company registered in the Spanish Pig Data Bank (BDporc, Lleida, Spain; http://www.bdporc.irta.es). Sows were kept under commercial conditions, and farrowings occurred in individual crates at standard confinement facilities. The registration protocol for farrowings in the nucleus farm included, according to current procedures, litter characteristics and individual data [i.e., a description of farrowing (identification of sow, total number of born piglets and total number of piglets born alive, parity number, and date of farrowing) and the information about each individual piglet, which included state at birth (alive or dead within the first 12 h after farrowing), sex, and birth date]. The NSB was measured as the total number of piglets dead at farrowing or within the first 12 h after farrowing (early neonatal mortality). In a previous study using the same data set, IbáñezEscriche et al. (2009) found very low piglet heritability for farrowing mortality; thus, for this study we considered NSB as a maternal trait.After an exploratory analysis, outliers, defined as data points with a value greater or less than 4 SD of the mean, were removed from the database. Parities with all piglets stillborn were considered as abortions and also removed from the database. The final data set consisted on LS and NSB records of 4,462 parities from 1,070 sows. The pedigree dated back at least 3 generations, and no selection for survival traits as farrowing mortality was performed. The pedigree file contained a total of 1,534 individuals. Figure 1 shows the relationship between the average LS and the average NSB after editions. Descriptive statistics can be found in IbáñezEscriche et al. (2009).
Statistical Analysis
According to Figure 1, the LS phenotype indicates a nonlinear recursive effect on the NSB phenotype. Therefore, 4 recursive models with different number of change points and a SMM were proposed to analyze the relationship between LS and NSB. Recursive model 1 assumes a linear recursive relationship between traits and uses a unique structural coefficient matrix across data. Models 2, 3, and 4 were recursive with 2, 3, and 4 change points, respectively. These models postulate heterogeneous structural coefficients (Wu et al., 2007) across subgroups of LS defined by the change points.Assuming a joint multivariate normal distribution for LS and NSB, the data of an individual
i
belonging to subpopulation
k
and model
j
(
j
= 1, …,
4) were modeled as
L
kj i i j i j i j j
y X b Z a Wp e
= + + +
,
[1]where
y t b a p RX b Z a WpR
i kj kj j j j j kj i j i j i j kj j
N
 , , , , ,~ [ ( ),'
LLL L
0110

+ +

1
kj
].
[2]The latter expression can also be written as
y t b a p RX b Z a Wp
i kj kj j j j j i kj i j i j i j
y
 , , , , ,exp[ ( )]
'
LLL
01 2
µ  + +

11011
kj kj tkj k m i n
I
R
L
=
æèçççççççöø÷÷÷÷÷÷÷÷
åÕ
( ).
q
[3]In Eq. [1], [2], and [3],
L
kj
is the matrix of structural coefficients corresponding to subpopulation
k
of model
j
for LS, and it takes the form
Figure 1.
Relationship between litter size and number of stillborns.
IbáñezEscriche et al.
3494
L
kj NSB LS
kj
=æèçççççöø÷÷÷÷÷÷
¬
1 01
l
,
[4]where
λ
is the structural coefficient, which in the case of SMM is equal to zero,
t
kj
is the vector of change points, and subpopulation
k
for model
j
is defined by the records of LS between the change points
t
k
and
t
k
+1
for
k
= 1, …,
m
; here
t
1
is equal to the smallest record of LS, and
t
m
+1
is the largest record of LS. The
m
index specifies the total number of subpopulations for each model, and it takes the values 1, 2, 3, and 4 for models 1, 2, 3, and 4, respectively. The variable I
tkj
(
θ
) is the changepoint indicator for model
j
, where I
tkj
(
θ
) = 0 for
y t
i kj
LS
<
and I
tkj
(
θ
) = 1 for
t y t
kj i kj
LS
< <
+
1
.
In Eq. [1] above,
y
i
is a 2 × 1 vector containing the LS and NSB for the
i
th individual;
b
j
is a vector including systematic effects on the 2 traits for model
j
, yearseason (31 levels), and order of parity (6 levels);
a
j
is the vector of direct additive genetic effects for model
j
;
p
j
is the vector of permanent effects for model
j
;
e
j
is the vector of the residuals for model
j
; and
X
i
,
Z
i
, and
W
i
are known incidence matrices. Yearseason was defined as 3mo intervals between March 1999 and December 2006.The distributions assumed a priori for the additive effects
a
, permanent effects
p
, and residual effects
e
, for the 2 traits, were multivariate normal distributions, with null mean vector and covariance matrixes
G A P I R I
0 0 0
Ä
,
⊗
p e
, and ,
respectively, where
A
is the relationship matrix involving 1,530 animals, and
I
p
and
I
e
are the identity matrices of order equal to the number of litters and records, respectively.
GP
02202
=éëêêêêùûúúúú=
s ss ss s
a a a a p p
LS LS NSB NSB LS NSB LS LS N
,,,
,
S SB NSB LS NSB LS LS NSB NSB LS
p pe e e
s ss ss
,,,
202
éëêêêêùûúúúú=
, and
R
ss
e
NSB
2
éëêêêêùûúúúú
.
In recursive models,
R
0
was assumed to be diagonal to achieve identifiability (Varona et al., 2007).The following uniform bounded priors were assigned to
b
j
, λ
kj
, and the elements of matrices
G
0
,
P
0
, and
R
0
:p(
b
j
)~U[−100, 100],p(λ
kj
)~U[−20, 20],
p , p U 0, 100 and p U
s ss s s
aLS aNSB aLS NSB aLS aNS
2 22 2
( ) ( )( )

∼∼
[ ]
,
B B aLS aNSB
, ,
s s
2 2
éëêêùûúú
p , p U 0, 100 and p U
s ss s s
pLS pNSB pLS NSB pLS pNS
2 22 2
( ) ( )( )

∼∼
[ ]
,
B B pLS pNSB
, ,
s s
2 2
éëêêùûúú
p , p U 0, 100
s s
eLS eNSB
2 2
( ) ( )
∼
[ ].
The same effects and prior distributions as assumed in the recursive models, with the exception of
R
0
, were adopted in the standard mixed model (
SMM
). The residual (co)variance matrix
R
0
in SMM was assumed an inverted Wishart distribution with 3 df, so that it reduces to improper uniform prior distributions.Bayesian inference via Markov chain Monte Carlo (
McMC
) methods was used to analyze the data. The fully conditional distributions were multivariate normal for
b
j
, λ
kj
,
a
j
, and
p
j
and inverted Wishart for the (co) variance matrices
G
0
and
P
0
. However, the fully conditional distribution for
R
0
was inverted scaled χ
2
in recursive models and inverted Wishart in SMM. The prior assumed for
t
kj
was improper and its conditional distribution did not have a closed form. Marginal posterior distributions of all unknowns were estimated using the Gibbs sampling algorithm (e.g., Geman and Geman, 1984), except for t
kj
where a MetropolisHasting algorithm was used (Hastings, 1970; see Appendix for more details). After exploratory analyses, we used a single chain with a total of 500,000 samples for each analysis, after a burnin period of 50,000. Convergence was tested separately for all dispersion parameters using the Raftery and Lewis (1992) algorithm and the Z criterion of Geweke (1992). Effective sample size was evaluated using the method of Geyer (1992), and Monte Carlo sampling errors were computed using the timeseries procedures described by Geyer (1992).Parameters in the recursive models are not directly comparable with those in a SMM. Gianola and Sorensen (2004) showed that to compare the covariance matrices (
G
0
,
P
0
,
R
0
) with those obtained from SMM, these have to be adjusted by the matrix of structural coefficients (
Λ
k
). In our case,G
k
* =
Λ
k−1
G
0
Λ
k−1
′
P
k
* =
Λ
k−1
P
0
Λ
k−1
′
R
k
* =
Λ
k−1
R
0
Λ
k−1
′
. [5]
Model Comparison
Three approaches were performed to validate the models under different criteria. The approaches are shown below.
Deviance Information Criterion.
Spiegelhalter et al. (2002) proposed this approach as a measure of model comparison. The deviance information criterion (
DIC
) compares the global quality of 2 or more mod
Changepoint recursive models for litter traits
3495
els, accounting for model complexity. For a particular model
M
, the DIC is defined as
DIC D D
M
= 
2 ( )
q
.
[6]The term
D p y p M E D
M M M y M
M
= =
ò
2 [log (  )] (  , ) [ ( )]

q q q q
q
y
d
is the posterior expectation of the deviance
D
M
( ),
q
and
D p
M M
( ) log (  )
q q
= 
2
y
is the deviance evaluated at the posterior mean of the parameter vector
q
M
.
Expression Eq. [6] is the result of combining both terms, where
D
is a measure of model fit and
D D
M
−
( )
q
is related to the effective number of parameters. Models with smaller DIC exhibit a better global fit after accounting for model complexity. Differences in DIC of more than 7 are considered as important by Spiegelhalter et al. (2002).
LeaveOneOut CrossValidation.
This approach, based on posterior predictive distributions, was proposed by Gelfand et al. (1992) as a means of checking the fit of the model and model choice. The approach involves constructing the posterior predictive density for a datum
y
i
conditional on model M and on the data vector, but leaving out this particular datum,
y

i
(Sorensen and Gianola, 2002):
p y M p y M p M
i i i i i
yy y
 
( )
=
( )
( )
ò
,, , , .
q q q
d
[7]The values of the posterior predictive density
p y M
i i
y

( )
,
are also known as conditional predictive ordinates (
CPO
i
) and can be used as a measure of every data point fitted by the model (IbañezEscriche et al., 2009). Small CPO
i
indicates data poorly fitted by the model. A plot of the CPO
i
for different models discloses which model does better and which points are poorly fitted under the different models.The sum of the CPO
i
logarithms of the data (Geisser, 1993),
log log (  , )
CPO p y y M
ModelM i i i n
= éëê ùûú
=
å
1
and the Bayesian square standardized residual (Cogdon, 2001),
D y E y y y y
i i i i i i n
221
= éëêêêêùûúúúú
=
å
(  )var(  ),
[8]are diagnostic measures to evaluate the overall predictive ability of the model. Models having a greater CPO or a smaller D
2
would be viewed as having a better predictive ability. The calculations involved in Eq. [7] and [8] are analytically intractable, but we estimated them using the McMC approach described by Gelfand (1996).
KFold CrossValidation.
This method was used to evaluate the models based on their ability to predict future data. The entire data set was partitioned into 5 disjoint subsets, each with approximately 20% of the records, by taking random samples of data points. The crossvalidation procedure used 4 of the 5 subsets for fitting and prediction (training set), and the remaining subset was used to test predictive ability (testing set). Two different criteria were used to compare the predictive ability of the models.
Mean Squared Error.
The mean squared error (
MSE
) was computed as
MSE n n y y
iter data i n j n
iter data
= 
= =
å å
1 1
1 12
( ) ,
where
y
and
y
correspond to the observed and predicted observations, respectively;
n
data
is the number of data points in a testing subset; and
n
iter
is the number of iterations of the McMC chain. Models having the smallest MSE were regarded as those with the best predictive ability.
Pearson Correlation.
The Pearson correlation (
PC
) between observed and predicted observations was calculated as
rs s
y y iter y y i n
n y y
iter
,
cov( , ),
=
=
å
1
1
where
cov( , )
y y
is the estimate of the covariance between observed and predicted NSB records;
s
y
and
s
y
are the estimates of SD of observed and predicted records; and, as above,
n
iter
is the number of iterations of the McMC chain. The model providing the greatest correlation was considered as the one with the best predictive ability of yet to be observed records.
RESULTS
Inferences of Model Parameters
Posterior distributions for all unknown parameters in the 5 models exhibited a Monte Carlo error of less than 0.0005, and the Geweke test did not detect any lack of convergence.Table 1 presents the posterior means and highest 95% posterior density intervals (
HPD95%
) of change points (t
2
− t
k
) and structural coefficients (λ
kj
) of mod
IbáñezEscriche et al.
3496