A DISCRETECONTINUOUS BAYESIAN MODEL FOR EMISSION TOMOGRAPHY
Mame Diarra Fall
†
⋆
, ´ Eric Barat
⋆
, Claude Comtat
‡
, Thomas Dautremer
⋆
, Thierry Montagu
⋆
and Ali MohammadDjafari
††
Laboratoire des Signaux et Syst`emes, (UMR8506: CNRS–SUPELEC–Univ ParisSud), 91192 GifsurYvette, FRANCE.
⋆
Laboratoire de Mod´elisation, Simulation et Syst`emes (LM2S), CEASaclay, 91191 GifsurYvette, FRANCE
‡
Service Hospitalier Fr´ed´eric Joliot, CEA/DSV/I2BM, 91401 Orsay, FRANCE
ABSTRACT
In this contribution, we propose a discretecontinuous reconstruction method for Positron Emission Tomography (PET).The goal is to reconstruct a continuous radiotracer activitydistribution from a ﬁnite set of measurements (namely, thediscrete projections of detected random emissions). Our approach can be viewed as an indirect density estimation problem,
i.e
, the problem of recovering a probability density function based on indirect observations. We cast the reconstruction problem in a
Bayesian nonparametric estimation
framework where regularization of the illposed inverse problem isachieved by putting a prior on the investigated radiotracer activity distribution.We proposea hierarchicalmodeland useitfor theMCMCschemes to generate samples from the posterior activity distribution and compute its functionals (mean, standard deviationetc.). Results will illustrate the performances of the proposedmethod and we compare our approach to another Bayesianmethod, the maximum a posteriori estimation (MAP), whichis based on a fully discretediscrete problem formulation.
Index Terms
—
Bayesian nonparametrics, indirect densityestimation, MCMC sampling, Positron Emission Tomography
1. INTRODUCTION
An interesting feature of Emission computed Tomographyis in producing 3D images that give information about themetabolic activity of an organ (the brain, for example). InPositron Emission Tomography (PET), the subject is injectedwith a molecule labelled with a positron emitter radionuclide.The emitted positrons travel a small distance before annihilating with an electron. This annihilation generates twogamma photons that ﬂyoff at the speed of light in oppositedirections. These photons are detected by rings of detectorsaround the subject. If two photons are detected within a shorttime interval, a coincidence event is recorded revealing thata positron emission occurred in the virtual line joining bothdetectors (called lineofresponse (LOR)). Tomographic algorithms are used to reconstruct 3D images of the radioactivitydistribution from the collected data sets.Conventional image reconstruction algorithms in Emission Tomography are roughly divided into two categories: analytical and statistical methods. Analytical methods like ﬁltered backprojection algorithmsare based on the directinversion of the Xray transform [1] and address the reconstructionproblem (at least theoretically) in continuous data and imagespaces. They provide fast reconstructions, however they arebasedonoversimpliﬁedmodelsofthephysicalprocessesanddo not account for the stochastic nature of positron emissionand photon detection. As a consequence, reconstructed images suffer from a lot of artifacts and signiﬁcant noise. Toovercome these deﬁciencies, statistical methods have beenproposed to model more realistically the process of acquiring data by taking into account the random nature of the phenomenon. They are based on optimization methods, typically maximum likelihood approaches (ML) ([2], [3]). They
usually provide reconstructed images with a higher signaltonoise ratio than analytical methods but nevertheless have highnoise due to illconditioned nature of the problem. This canbe solved by adding a regularization term in the form of a priori density for the image using Bayesian approaches.WhenBayesian methods are used in Emission Tomography, theyare based on the maximum a posteriori approach (MAP) ([4],[5]). However, all these statistical methods require a discreteformulation of the radiotracer distribution in the space overwhich the reconstruction is needed, which is not natural because emission distribution is continuous.The approach presented in this paper is a statistical onebut is fundamentally different from those mentioned above.The continuous radioactivity distribution is directly reconstructed from the discrete data. Thus, this can be viewed asa third category in image reconstruction algorithms for Emission Tomography. We consider the radiotracer activity distribution as being a probability density on
R
3
and infer on it.This approach is called nonparametric because the parameter of interest, which is a probability distribution, is inﬁnitedimensional and it is Bayesian since we put a prior on thisdistribution and infer on its posterior. So, the advantage of this approach is in offering a Bayesian regularization contextfor nonparametric estimation. In addition, we have accessto the entire distribution of the posterior activity distribution,not only a point estimate like in ML and MAP approaches.Therefore, any posterior uncertainty (like variance, highestposterior density regions etc.) can be estimated, which is wellsuited for quantitative imaging in “small samples” (referredto as “low doses” in PET).The Bayesian nonparametric approach was ﬁrst used forthe twodimensional (2D) PET reconstruction in [6]. In thispaper, we adopt the same approach for the three dimensionalcase. The 3D case is, however, more complicated since itrequires to deal with truncated data due to the limited ﬁeldofview of the system (FOV). The sampling scheme used inthis work is different from that in [6] and is not based on anytruncation of the inﬁnite dimensional distribution.The rest of this paper is structured as follows. In section 2,we describe the general problem formulation in the Bayesiannonparametric context. In section 3, the nonparametric priormodel for the unknown parameter of interest is given. Our hi
2011 18th IEEE International Conference on Image Processing
9781457713033/11/$26.00 ©2011 IEEE1401
erarchical model for emission tomographic data is presentedin section 4. Afterwards, the derived MCMC sampling is outlined in section 5. Results illustrating the quality of the proposed algorithm compared to MAPEM are shown in section6 and we conclude the paper in section 7.
2. PROBLEM STATEMENT IN THE BAYESIANNONPARAMETRIC FRAMEWORK
In this section, we develop the statiscal framework for theBayesian nonparametric reconstruction applied to PET data.We are interested in reconstructing the 3D image of theradiotracer activity distribution from measured data. We assume that data are stored in the socalled
listmode
formatsuch that the coordinates of the detectors receiving two coincident photons are observed (the approach is also applicableto
sinogram mode
, see [6]). We denote by the term “LOR”the virtual line that connects two detectors in coincidence.A LOR
l
is parametrized by
(
s
l
,̟
l
,
r
l
,ϕ
l
)
where
s
l
is thedistance between the
z
axis (ring center) and the LOR in atransaxial plane
(
xy
)
; the LOR and the
x
axis deﬁne the azimuthal angle
̟
l
;
r
l
is the axial coordinate of the LOR (if the coordinates of the two detector crystals in coincidence are
(
z
1
,z
2
)
,
r
l
= (
z
1
+
z
2
)
/
2)
. Finally, the LOR forms the angle
ϕ
l
with the transaxial plane. We map LOR parameters tothe index
l
such that the random variable
y
i
is the index of the LOR corresponding to the
i
th
observed coincidence. Theinverse problem can be formulated as follows in the Bayesiannonparametric context [7]
G
∼G
F
(
·
) =
X
P
(
·
x
)
G
(d
x
)
y
i
iid
∼
F,
for
i
= 1
,...,n
(1)where
X ⊂
R
3
is the object space (
i.e
, the space over whichreconstructioniscarriedout);
G
standsforthespatialdistribution of recorded events we want to estimate from the observed
F
distributed data sets
Y
= (
y
1
,...,y
n
)
. In Bayesian nonparametrics, we must put a prior over the unknown distribution
G
. This is given by the prior law
G
of
G
. The relation between the data
y
i
and the emissions locations
x
i
is embodiedin a probability distribution
P
(
·
x
)
called projection distribution,
i.e
, the probability for detecting photons pair in a LOR
l
given that the corresponding positron emission occurred in
x
.In the formulation (1) of the inverse problem,
G
(
·
)
is thespatial distribution for
recorded events
. Nevertheless, dataare truncated in the 3D case due to the ﬁnite axial extent of the PET scanner and detector efﬁciencies. Let introduce therandom variable
y
∗
:=
y >
0
to denote a recorded data. So,
G
(d
x
) =
G
∗
(d
x

y
∗
)
, where
G
∗
(d
x

y
∗
)
is the distribution of
x
conditionedonthefactthattheeventhasbeendetected. Thedistribution
G
†
(
·
)
for whole events (not only the recorded)can be expressed using a derivation of Bayes’ rule,
G
†
(d
x
) =
G
∗
(d
x

y
∗
)
Pr
(
y
∗
)
Pr
(
y
∗

x
)
(2)where for all
x
∈ X
, Pr
(
y
∗

x
) =
Ll
=1
Pr
(
y
=
l

x
)
isthe probability for recording an emission located in
x
bythe system accounting for system’s geometry and physical properties. Since
G
†
(d
x
)
is a normalized version of
G
∗
(d
x

y
∗
)
/
Pr
(
y
∗

x
)
, posterior inference on
G
∗
(d
x

y
∗
)
issufﬁcient to estimate the posterior of
G
†
(d
x
)
as soon asPr
(
y
∗

x
)
>
0
,
for all
x
∈ X
. For sake of simplicity in thenotations, in the rest of this paper we omit the conditioning since we deal with the recorded events in the inference.We use
G
(d
x
)
to stand for
G
∗
(d
x

y
∗
)
, the distribution of arecorded event.We model the detection probability using Bayes’ theorem:Pr
(
y
=
l

x
) =
f
geom
(
x

y
=
l
)
Pr
(
y
=
l
)
Ll
=0
f
geom
(
x

y
=
l
)
Pr
(
y
=
l
)
(3)where
y
= 0
stands for an unobserved event,
f
geom
(
x

y
=
l
)
is the geometric probability density that a photons pair detected in a LOR
l
srcinated from a spatial location
x
. Thisdistribution accounts for the fact that the probability to detect in coincidence emissions from inside the system’s FOVis positiondependent. Indeed, if we consider perfect detectors, the probability to record true events is merely geometric.It is determined by computing for each location
x
, the solidangle subtended from this point to the faces of the detectors.Other effects such as noncollinearity of photons and positronrange due to the small distance the emitted positron travelsbefore annihilating with an electron can also be included in
f
geom
.The probability for observing an event in a LOR
l
is modeledas follows, for any
l >
0
Pr
(
y
=
l
)
∝
n
(
l
)
a
(
l
)
.
(4)For each detector pair
l
,
n
(
l
)
is a normalizing factor standingfor its detection efﬁciency and is obtained from calibrationprocedures;
a
(
l
)
is the attenuation factor and can be estimatedby a preliminary transmission scan of the object. In absenceof attenuation and imperfections in detectors, Pr
(
y
=
l
) =1
/L
, where
L
is the total number of possible LORs.
3. NONPARAMETRIC PRIOR FOR THE RANDOMACTIVITY DISTRIBUTION
The formulation (1) requires to put a prior distribution
G
overthe random activity distribution
G
of recorded events. Wechoose the prior on the density of
G
(denoted
f
G
), as being aDirichlet process mixture (DPM). We refer to [7] for more details and [6] for a short introduction to this process and its application to 2D PET reconstruction. We brieﬂy recall that thisis obtained by ﬁrst generating a random discrete distributionfrom a Dirichlet process (DP) with parameters
α >
0
whichtunes the prior number of components and
G
0
a probabilitydistribution of components locations. Second, the realization
H
is smoothed out with a parametric kernel. We choose thiskernel as being a multivariate normal distribution (with density
f
N
, whose parameters are the mean
m
and the covariance matrix
Σ
), and
G
0
the NormalInverse Wishart model(
NIW
ρ,n
0
,
µ
0
,
Σ
0
).
H
∼
DP
(
α,G
0
)
⇒
H
(
·
) =
∞
k
=1
w
k
δ
θ
k
(
·
)
f
G
(
x
) =
f
N
(
x

θ
)
H
(
d
θ
) =
∞
k
=1
w
k
f
N
(
x

m
k
,
Σ
k
)
(5)where
2011 18th IEEE International Conference on Image Processing
1402
•
the sequence of weights
w
= (
w
1
,w
2
,...
)
is constructed as follows: for all
i
,
V
i
∼
Beta
(1
,α
)
and
w
1
=
V
1
and for all
k
≥
2
,
w
k
=
V
k
k
−
1
j
=1
(1
−
V
j
)
.ThisiscalledGEMdistribution, written
w
∼
GEM
(
α
)
,
•
θ
k
= (
m
k
,
Σ
k
)
∼
G
0
In the following section, we introduce our hierarchicalmodel for PET data.
4. THE BAYESIAN HIERARCHICAL MODEL
Let
X
= (
x
1
,...,
x
n
)
be
n
emissions locations in the domain of interest. In emission imaging systems, these locations are never observed directly, only their projections
y
=(
y
1
,...,y
n
)
are seen. Then our problem is an indirect density estimation since observations are not distributed according to the distribution of interest
G
but instead according to
F
, which is related to
G
via the integral in (1). We introduceemissions locations as hidden variables to complete the data.We also introduce allocation variables
c
= (
c
1
, c
2
,..., c
n
)
such that
c
i
=
k
if
θ
i
= (
m
,
Σ
)
i
=
θ
k
. Then the jointdistribution of all variables in the model can be written as
f
(
y
,
X
,
c
,
Θ
,
w
) =
P
(
y

X
)
×
f
(
X

c
,
Θ
)
×
P
(
c

w
)
×
f
(
w
)
×
f
(
Θ
)
(6)with distributions given by the following generative hierarchical model
y
i

x
i
ind
∼ P
(
y
i

x
i
)
x
i

c
i
,
Θ
ind
∼ N
(
x
i

θ
c
i
)
c
i

w
iid
∼
∞
k
=1
w
k
δ
k
(
·
)
w
∼
GEM
(
α
)
θ
k
iid
∼NIW
ρ,n
0
,
µ
0
,
Σ
0
(7)
5. MCMC SAMPLING
Having our hierarchical model (7), we can obtain samplesfrom the posterior distribution of the random activity
G
(
·
)

y
.For this purpose, we need to get draws from the posterior lawsof model variables. Since these posteriors have intractableforms, we opted for a MCMC sampling to generate thesedraws. The algorithm will update each block of variables inturn according to its conditional posterior as followsProposal of annihilations locations:
X

w
,
Θ
,
y
Allocation of locations to components:
c

w
,
Θ
,
X
Updating of the components parameters:
Θ

c
,
X
Updating of the components weights:
w

c
.
(8)By choosing conjugate priors, these conditional posteriors aredirectlygeneratedbyaGibbssampler. However, thesamplingof
X

w
,
Θ
,
y
is more complex since the distribution has noknown form. We use the MetropolisHastings algorithm witha proposal distribution being the product of the Dirichlet process mixture and a normal distribution extended in each observed LOR direction. The result is a reweighted mixture of Gaussian distributions in the LOR.The major problem in the sampling is to tackle the inﬁnite number of dimensions that appears in (7), without anyhard truncation. We developed an algorithm belonging to theclass of slice sampling methods [8]. The general idea of theslice sampling strategy is to introduce latent auxiliary variables that allow to work with a conditionally ﬁnite number of components at each iteration of the sampler.
6. SIMULATION RESULTS
We applied our algorithm to PET simulated data and compared its performances to those obtained with a Bayesianvoxelsbased algorithm using the maximum a posteriori approach (MAP) with a Gibbs distribution as prior.We randomly generated decays from a 3D voxelized brainphantom by a Monte Carlo technique such that
n
= 10
7
events were recorded. We have not included scattered andrandom coincidences in the model and detection probabilitieswere assumed purely geometrical.We give now the implementation details for the testedmethods.
Proposed method
We choose the parameters of the DPM as follows:
α
= 500
,
Σ
0
= 6
.
25
×
I
3
, n
0
= 4
. We ﬁrst computed
5000
iterationsfor the burnin period followed by
10000
for computing functionals of the posterior spatial emission distribution. After theburnin, each draw
(
X
(
t
)
,
c
(
t
)
,
w
(
t
)
,
Θ
(
t
)
)
obtained at iteration
t
is used to construct a sample of the random probabilitydensity of recorded events,
f
G
(
t
)
(
x
)
≈
K
∗
k
=1
w
(
t
)
k
f
N
x

θ
(
t
)
k
(9)where
K
∗
is the number of components required by the slicesampler. Using equations (2), (3) and (4), the probability density of all events is given by
f
G
†
(
t
)
(
x
)
∝
f
G
(
t
)
(
x
)
Ll
=1
n
(
l
)
a
(
l
)
f
geom
(
x

y
=
l
)
.
(10)We choose as estimator for the density the conditional expectation of
G
†
,
E
(
f
G
†

y
)
≈
1
N
N
t
=1
f
G
†
(
t
)
with
N
being the number of iterations after the burn in.
MAP method
The Gibbs prior is used to express spatial constraints suchthat neighboring voxels must have the same intensity valuewhile allowing curt changes at tissues boundaries. The potential function used was the
log cosh
function suggested by[4]. The parameters of the Gibbs prior were chosen such thatto minimize the mean squared error (MSE) w.r.t to the brainphantom. We used a 5voxels neighboorhood, the weights are
2011 18th IEEE International Conference on Image Processing
1403
a) b)c) d)
Fig. 1
. Reconstruction results: a) 3D brain fantom; b) proposed estimator; c) MAPEM estimator; d) proposed estimator standard deviation.the inverse of the L2distance between the two voxels considered. The algorithm used to maximize the posterior distribution is a modiﬁed version of the Expectation Maximizationalgorithm.
Results
Results obtained by the two algorithms are depicted in[Fig.1]. Image a) is the 3D brain phantom used to generatethe data, image b) the conditional mean of our method andimage c) the MAPEM reconstruction. Since any posteriorfunctional is available, we display in image d) the conditionalstandard deviation w.r.t. the phantom. Credible intervalsare also available for each region of interest (ﬁgures notshown). It is substantially notable on 3D curves [Fig.1] thatour method provides smooth isosurfaces while not removingboundaries and edges in the image. In contrast, the MAPEMapproach exhibits very noisy images. This underlines that theproposed method is suitable to improve signaltonoise ratioand image resolution. It is worth noting that in our results,the discretization of the spatial distribution is only for visualization and was chosen
a posteriori
to
256
×
256
×
128
corresponding to the phantom size. The same discretizationwas used for MAPEM.
7. CONCLUSION
We have presented an indirectdensity estimation model forimage reconstruction in emission tomography. The inferencescheme was done through a speciﬁc MCMC method to dealwith the inﬁniteness of the distribution. The structure of thesampler allows its easy parallelization. Simulation results onPET data have provided good quality images in the context of a relatively low number of events (
10
millions) and our reconstructions are by far better than those obtained by the MAPEM method. However, we did not account for background effects (randoms and scattered for example) in our simulations.Future works will be devoted to include these effects.The major drawback of our approach is its computationaldemands, although it was implemented on a parallel computer(
≈
one day on a high performance computer). To circumvent the time due to the MCMC sampling, an alternative isthe variational Bayesian method which approximates the posterior distributions analytically [9]. Another interesting wayto speed up our algorithm can be achieved by implementing itonGraphicalprocessor units(GPU)hardware. Sincewework directly with functionals belonging to
R
3
, this is appealing forthese architectures.
8. REFERENCES
[1] F. Natterer,
The Mathematics of Computerized Tomography
, Society for Industrial and Applied Mathematics,1986.[2] L.A. Shepp and Y. Vardi, “Maximum likelihood reconstruction in positron emission tomography,”
IEEE Trans. Med. Imag.
, vol. 1, pp. 113–122, 1982.[3] K. Lange and R. Carson, “EM reconstruction algorithmfor emission and transmission tomography,”
J. Comp. Assist. Tomo.
, vol. 8, pp. 306–316, 1984.[4] P. J. Green, “Bayesian reconstructions from emission tomography data using a modiﬁed EM algorithm,”
IEEE Trans. Med. Imaging
, vol. 9, no. 1, pp. 84–93, 1990.[5] S. Geman and D. E. McClure, “Statistical methods fortomographic image reconstruction,” in
Proceedings of the 46th Session of the International Statiscal Institute
,Bulletin of the ISI, Ed., 1987, vol. 52, pp. 5–21.[6] ´E. Barat, C. Comtat, T. Dautremer, T. Montagu, andR. Tr´ebossen, “A Bayesian nonparametric approach forPET reconstruction,” in
IEEE Med. Imaging Conf. Rec., NSS/MIC ’07.
, 2007, vol. 6, pp. 4155–4162.[7] N.L. Hjort, C. Holmes, P. M¨uller, and S.G. Walker,
Bayesian Nonparametrics
, Cambridge University Press,April 2010.[8] S.G. Walker, “Sampling the Dirichlet mixture model withslices,”
Comm. Statist.
, vol. 36, pp. 45–54, 2007.[9] D.M. Blei and M.I. Jordan, “Variational inference fordirichlet process mixtures,”
Bayesian Analysis
, vol. 1,no. 1, pp. 121–144, 2006.
2011 18th IEEE International Conference on Image Processing
1404