Gene expression analysis by signature pyrosequencing

Please download to get full document.

View again

of 9
0 views
PDF
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Document Description
Gene expression analysis by signature pyrosequencing
Document Share
Document Tags
Document Transcript
  Gene expression analysis by signature pyrosequencing Charlotta Agaton, Per Unneberg, Maria Sievertzon, Anders Holmberg, Maria Ehn,Magnus Larsson, Jacob Odeberg, Mathias Uhle´n, Joakim Lundeberg*  Department of Biotechnology, KTH, Royal Institute of Technology SCFAB, Roslagstullsbacken 21, 106 91 Stockholm, Sweden Received 9 October 2001; received in revised form 12 February 2002; accepted 5 March 2002Received by T. Sekiya Abstract We describe a novel method for transcript profiling based on high-throughput parallel sequencing of signature tags using a non-gel-basedmicrotiter plate format. The method relies on the identification of cDNA clones by pyrosequencing of the region corresponding to the 3 0 -endof the mRNA preceding the poly(A) tail. Simultaneously, the method can be used for gene discovery, since tags corresponding to unknowngenes can be further characterized by extended sequencing. The protocol was validated using a model system for human atherosclerosis. Two3 0 -tagged cDNA libraries, representing macrophages and foam cells, which are key components in the development of atheroscleroticplaques, were constructed using a solid phase approach. The libraries were analyzed by pyrosequencing, giving on average 25 bases. As acontrol, conventional expressed sequence tag (EST) sequencing using slab gel electrophoresis was performed. Homology searches were usedto identify the genes corresponding to each tag. Comparisons with EST sequencing showed identical, unique matches in the majority of caseswhen the pyrosignature was at least 18 bases. A visualization tool was developed to facilitate differential analysis using a virtual chip format.The analysis resulted in identification of genes with possible relevance for development of atherosclerosis. The use of the method forautomated massive parallel signature sequencing is discussed. q  2002 Elsevier Science B.V. All rights reserved. Keywords : 3 0 -tagged cDNA library; Virtual chip; Atherosclerosis; DNA sequencing 1. Introduction A key to the understanding of the genes discoveredthrough the whole genome sequencing projects is the avail-ability of methods for efficient and quantitative analysis of expression patterns in cells, tissues and organs. Severalpowerful techniques for such analysis have been developedeither by specific hybridization to microarrays with oligo-nucleotides or cDNA probes (Lockhart et al., 1996; Schenaet al., 1995; van Hal et al., 2000) or by counting signaturesof DNA fragments based on cDNA libraries. The formermethods have the advantage of high-throughput generationof data with relatively high dynamic range, but cannot beused for gene discovery and rely on the generation of pre-fabricated probes to allow the analysis. Furthermore, varia-bility problems relating to probe hybridization differencesand cross-reactivity may arise, demanding several replicatesto achieve reliable data. In contrast, the latter methods allowfor identification of differentially expressed genes notpresent in the current databases. In addition, these methodsprovide a digital representation of abundance and the abilityto identify genes of very low expression levels. However, toallow for precision and accuracy, a large number of clonesshould be analyzed. This constitutes a significant problemwith regard to scale and cost and limits the use of suchapproaches.Methods have therefore been sought that allow parallelanalysis of many clones. Serial analysis of gene expression(SAGE) (Velculescu et al., 1995) and MPSS (massivelyparallel signature sequencing) (Brenner et al., 2000) aretwo recent examples. In the former, type IIS recognitionenzymes are used in the generation of concatemers of short tags of cDNA, which are cloned and sequenced. Inthis way, the electrophoresis step is fully utilized and 20–30gene tags per lane may be identified instead of one. Themethod relies on the assumption that short gene specificnucleotide sequence tags (ten base pairs) contain sufficientinformation to uniquely identify a transcript through data-base searches. Several improvements have been performed Gene 289 (2002) 31–390378-1119/02/$ - see front matter q 2002 Elsevier Science B.V. All rights reserved.PII: S0378-1119(02)00548-6www.elsevier.com/locate/geneAbbreviations: BLAST, basic local alignment search tool; cDNA, DNAcomplementary to RNA; dNTP, deoxyribonucleoside triphosphate; LDL,low-density lipoprotein; LPS, lipopolysaccharide; mRNA, messengerRNA; PCR, polymerase chain reaction; PMA, phorbol 12-myristate-13acetate; EST, expressed sequence tag; SAGE, serial analysis of geneexpression; MPSS, massively parallel signature sequencing* Corresponding author. Tel.: 1 46-8-553-78327; fax: 1 46-8-553-78481. E-mail address:  joakim.lundeberg@biotech.kth.se (J. Lundeberg).  to speed up and simplify the process of SAGE. Theseinclude SAGE-variants such as TALEST (tandem arrayedligation of ESTs) (Spinella et al., 1999), MicroSAGE forexpression studies in small tissue samples (Datson et al.,1999), modi fi cations for enhanced concatemer cloning(Powell, 1998) and RAST-PCR (rapid analysis of unknownSAGE-tags) (van den Berg et al., 1999). For MPSS, the geneexpression analysis is performed by repetitive enzyme clea-vages, but instead of cloning concatamers, analysis isperformed by signature sequencing on microbead arraysusing a step-wise cleavage of the immobilized template,yielding 16 – 20 base pair signature tags. However, boththese methods involve elaborate schemes involving type IIrecognition enzyme cleavages and complex data collec-tions.In order to address some of these problems, we describe amethod for tag pyrosequencing (Ronaghi et al., 1998) in asimple microtiter plate format, avoiding both physicalseparation offragments and handling of individual microbe-ads. The length of the signature sequences described here iswell above the ten bases achieved for SAGE and is in mostcases more than the 16 – 20 bases obtained with the MPSS.Our approach was evaluated in an atherosclerotic model anda visualization software tool has been developed to allowautomated identi fi cation of the genes corresponding to thesignatures. 2. Materials and methods 2.1. Generation of cDNA libraries THP-1 cells (ATCC TIB-202) (Auwerx, 1991) wereseeded at a density of 14 £ 10 6 in 7.5 ml RPMI 1640,supplemented with 5 £ 10 2 5 M 2-mercaptoethanol (100 m g/ml), streptomycin (100 U) and 5% fetal calf serum(FCS Hyclone). To establish a macrophage phenotype, theTHP-1cellswere treated with 0.2 m M phorbol 12-myristate-13 acetate (PMA) (Sigma) for 24 h. Following PMA treat-ment, 50  m g/ml oxidized LDL (LDL oxidized by exposureto copper for 24 h) was added to the cells for 24 h to promotethe development of foam cells. Total RNA was preparedfrom cells and mRNA was isolated from the total RNAusing oligo(dT) paramagnetic beads (Dynal AS, Oslo,Norway). cDNA synthesis was performed essentiallyaccording to Gubler (1988) using a biotinylated oligo(dT)-primer containing a restriction site for  Not  I (5 0 -biotin-GAGGTG CCA ACC GCG GCC GC(T) 15 -3 0 ). The double-stranded cDNA (100 ng from each library) was digestedwith  Dpn II (NEB) and the 3 0 -end fragments were immobi-lized onto streptavidin-coated paramagnetic M-280 Dyna-beads (Dynal) at room temperature overnight. Aftermagnetic separation and washing with Binding/Washingbuffer (2 M NaCl, 0.1% Tween in 1 £  TE (10 mM Tris – HCl, 1 mM EDTA, pH 7.5), pH 7.7), the immobilizedcDNA was digested with  Not  I (37  8 C, 3 h), resulting inthe release of 3 0 tags. Following phenol/chloroform extrac-tion and ethanol precipitation, the tags were cloned into a  Bam HI - ,  Not  I-digested and alkaline phosphatase-treatedpRIT28 vector (Hultman et al., 1991). 2.2. Preparation of sequencing template Randomly chosen bacterial clones were added to micro-titer plates containing 100  m l of 0 : 1 £  TE (1 mM Tris – HCl,0.1 mM EDTA, pH 7.5). The bacteria were lysed by brie fl yheating to 100 8 C for 30 s. Templates for sequencing wereobtained by ampli fi cation using vector primers RIT28 (5 0 -AAAGGGGGATGTGCTGCAAGGCG-3 0 ) and MALO2(5 0 -Biotin-CCGCGCGTTGGCCGATTCATTAA-3 0 ). TheMALO2 primer was biotinylated to allow immobilizationonto 100  m g streptavidin-coated magnetic beads (M-280Streptavidin Dynabeads, Dynal AS, Norway) (Sterky etal., 1998) in 30  m l BW-buffer at 43 8 C for 15 min. Single-stranded DNA was obtained by incubating the beads withthe immobilized PCR product in 20  m l 0.1 M NaOH for 5min. The immobilized strand was resuspended in annealingbuffer (100 mM Tris-acetate (pH 7.75), 20 mM MgAc 2 )containing 5 pmol sequencing primer (5 0 -CTAGGA-GATCTCAGCTGG-3 0 ) in a total volume of 10  m l. Theamplicons were checked by agarose gel electrophoresisprior to their preparation for sequencing.All steps were automated and performed in a 96-wellformat using robotics (Magnetic BioSolutions AB, Sweden)with the procedure taking 40 min for 96 samples. Therobotic workstation consisted of a 12-tip pipette head, apeltier heating/cooling position for a microtiter plate andpositions for reagents, tips and waste. The beads were selec-tively captured inside the tips by a magnet, which facilitatedwashing and exchange of buffers. Primer annealing wasperformed by incubation at 94 8 C for 1 min with subsequentcooling to room temperature. Thirty microliters of H 2 O and0.5  m g SSB (Amersham Pharmacia Biotech) were added tothe single stranded DNA template before sequencing. 2.3. Pyrosequencing Real-time pyrosequencing (Ronaghi et al., 1998) wasperformed at 28  8 C in a total volume of 50  m l in an auto-mated 96-well PyroSequencer using PSQ SNP 96 enzymesand substrates (Pyrosequencing AB, Uppsala, Sweden) withcyclic dispersion of the nucleotides. Base calling of thepyrograms was performed manually. 2.4. EST sequencing Conventional Sanger DNA sequencing was performedusing the BigDye Terminator Cycle Sequencing kit (PerkinElmer Applied Biosystems, Norwalk, CT, USA) accordingto the manufacturer ’ s instructions. Sequencing productswere automatically cleaned using biotin-streptavidin chem-istry (Dynapure, Dynal AS, Norway) on an Attractor 1200biomagnetic workstation and loaded onto an ABI 377 DNA C. Agaton et al. / Gene 289 (2002) 31–39 32  sequencer (Perkin Elmer Applied Biosystems, Foster City,CA, USA). Sequences were manually edited using thePREGAP program in the STADEN package (Staden, 1996). 2.5. Data analysis The sequences obtained from the LDL library and thePMA library were pooled into a large data set, therebyproducing one data set for pyrosequencing and one forEST sequencing. These data sets will be referred to as the ‘ pyro ’  data set and the  ‘ EST ’  data set, respectively, in thefollowing sections. Both data sets were aligned againstUniGene (UniGene 116). Sequence alignment wasperformed with BLAST version 2.0.10.The data produced from the alignments were processed inthe following way. Firstly, the EST signatures that hadalignments with an  E   value  , 10 2 30 were extracted alongwith the UniGene identi fi er for the best alignment. Thisreduced the original EST data set to 924 signatures.Secondly, the corresponding pyrosignatures were extractedalong with their UniGene identi fi ers. For each signature, theidenti fi er(s) were compared. The comparison was groupedinto two distinct cases. (1) There was one unique UniGeneidenti fi er for both the EST signature and the pyrosignature,and these identi fi ers were identical. (2) There was oneunique UniGene identi fi er for the EST signature, but severalfor the pyrosignature, or $ 1 UniGene identi fi er(s) for boththe EST and the pyrosignature. Histogram plots over thelength distributions were made (Fig. 3).The pyrosignatures were also aligned against the Refseq(February 1, 2002) database. We chose to extract signaturesgiving a unique best alignment, with the additional criteriathat more than 90% of the bases in the query sequence hadto be identical to the subject sequence. We extracted theRefseq identi fi er for the best alignment, thus providingeach pyrosignature with a gene identi fi er. By counting thenumber of times a Refseq identi fi er appeared for eachlibrary (LDL and PMA), a frequency table could beobtained for the two libraries (see Table 1). 2.6. Visualization of transcript profiles Visualization of the homology searches and functionalclassi fi cation was performed using an in-house software(Larsson et al., 2000). In brief, the obtained sequenceswere compared to nucleotide sequences included in theUniGene (UniGene 123) and EGAD databases and resultswere visualized on a virtual chip enabling rapid pair-wisecomparison of data sets. 3. Results 3.1. Experimental design The strategy for gene expression pro fi ling using short tagcDNA libraries and pyrosequencing is outlined in Fig. 1.Two representative cDNA libraries were constructed bymRNA puri fi cation,  fi rst and second strand cDNA synthesis,fragmentation and cloning of the resultant 3 0 -cDNA ends.The fragmentation was performed by restriction cleavageusing  Dpn II, recognizing the 4 bp sequence GATC. Solidphase puri fi cation of the 3 0 -cDNA ends was achieved by theuse of a biotin-labeled oligo(dT)-primer employed in the fi rst-strand cDNA synthesis and streptavidin-coatedmagnetic beads. Non-labeled upstream cDNA fragmentswere removed and the 3 0 -ends could be released from thesolid support by digestion with  Not  I. The  Not  I restrictionsite was introduced through the oligo(dT)-primer in thecDNA synthesis. After directional cloning of the 3 0 -cDNAfragments into a restriction enzyme digested plasmid, theinserts were sequenced by both pyrosequencing (see Section3.2) and conventional Sanger DNA sequencing. The signa-ture sequences were used for homology searches to identify C. Agaton et al. / Gene 289 (2002) 31–39  33Fig. 1. Schematic overview of the experimental approach.   C .A   g a t   o n e t   a l    . /    G e n e2   8   9    (   2   0   0  2    )    3  1   –  3   9    3  4   Table 1Frequency table for the two libraries (for complete list see http://biobase.biotech.kth.se/pyrotag)Most frequently found transcripts in LDL library Most frequently found transcripts in PMA libraryAcc. no. Gene name Abundance (%) Average read length (bp) Acc. no. Gene name Abundance (%) Average read length (bp)NM_000146 Ferritin, light polypeptide 4.4 25.6 NM_006367 Adenylyl cyclase-associatedprotein1.78 28.2NM_006367 Adenylyl cyclase-associatedprotein2.12 28.2 NM_000146 Ferritin, light polypeptide 1.29 25.6NM_021029 Ribosomal protein L44 1.31 33.1 NM_000978 Ribosomal protein L23 1.29 23.6NM_000661 Ribosomal protein L9 0.98 31.2 NM_001015 Ribosomal protein S11 1.29 26.3NM_000986 Ribosomal protein L24 0.82 31.6 NM_001402 Eukaryotic translationelongation factor 1 (EEF1A1)1.29 34.6NM_000998 Ribosomal protein L37a 0.82 24.0 NM_000581 Glutathione peroxidase 1 0.97 31.8NM_001030 Ribosomal protein S27 0.82 28.5 NM_002038 Interferon, alpha-inducibleprotein0.97 26.9NM_022551 Ribosomal protein S18 0.82 25.4 NM_001022 Ribosomal protein S19 0.81 36.5NM_000100 Cystatin B 0.65 30.0 NM_001276 Chitinase 3-like 1 0.81 22.6NM_000978 Ribosomal protein L23 0.65 23.6 NM_001642 Amyloid beta (A4) precursor-like protein 20.81 23.7NM_001000 Ribosomal protein L39 0.65 19.5 NM_004559 Nuclease sensitive elementbinding protein0.81 28.7NM_001015 Ribosomal protein S11 0.65 26.3 NM_000661 Ribosomal protein L9 0.65 31.2NM_001023 Ribosomal protein S20 0.65 32.6 NM_000994 Ribosomal protein L32 0.65 31.2NM_001402 Eukaryotic translationelongation factor 1 (EEF1A1)0.65 34.6 NM_000998 Ribosomal protein L37a 0.65 21.2NM_001403 Eukaryotic translationelongation factor 1(EEF1A1L14)0.65 21.5 NM_001021 Ribosomal protein S17 0.65 37.6NM_001428 Enolase 1 0.65 25.3 NM_001645 Apolipoprotein C-I 0.65 27.2NM_001444 Fatty acid binding protein 5 0.65 30.3 NM_012268 Similar to vaccinia virus  Hin dIII K4L ORF0.65 26.8NM_002032 Ferritin, heavy polypeptide 1 0.65 30.0 NM_021104 Ribosomal protein L41 0.65 28NM_002295 Laminin receptor 1 0.65 24.0 NM_033625 Ribosomal protein L34 0.65 31.5NM_002510 Glycoprotein(transmembrane) nmb(GPNMB)0.65 30.2 NM_001752 Catalase 0.49 21.7NM_003792 Endothelial differentiation-related factor 10.65 30.8 NM_001908 Cathepsin B 0.49 30NM_021104 Ribosomal protein L41 0.65 25.2 NM_002664 Pleckstrin 0.49 21.7NM_033491 Cell division cycle 2-like 1 0.65 36.0 NM_004343 Calreticulin 0.49 25NM_033625 Ribosomal protein L34 0.65 31.5 NM_002796 Proteasome subunit 0.49 32.7  the corresponding genes. Relative abundance of each geneproduct was calculated and visualized using a dedicatedsoftware package. 3.2. Tag pyrosequencing of cDNA libraries To allow pyrosequencing of the cDNA tags, the insertswere ampli fi ed from bacterial lysates. The downstreamvector-primer was biotinylated to enable immobilizationof the PCR products onto streptavidin-coated magneticbeads. By alkali elution of the non-biotinylated strand,single-strand templates were obtained. All samples wereautomatically prepared by a magnetic workstation. Thepyrosequencing primer was designed to anneal one basefrom the insert sequence in the 5 0 to 3 0 direction of thecloned cDNA. Microtiter plates with sequencing templateswere analyzed by pyrosequencing as outlined in Fig. 2A.Nucleotides were added in an iterative manner and incor-poration was monitored in real time by an enzyme cascaderesulting in a light signal. In this report we have used the C. Agaton et al. / Gene 289 (2002) 31  –  39  35Fig. 2. The pyrosequencing principle. (A) The enzyme cascade is initiated by a DNA-polymerase that releases pyrophosphate (PPi) during incorporation of nucleotides. Sulfurylase converts PPi into ATP which  fi re fl y luciferase uses as a substrate in the production of a quantitative light signal, measured by a CCD-camera. Incorporation of two nucleotides compared to one yields double the light signal in a stoichiometric manner. The enzyme apyrase digests and removesthe nonincorporated nucleotides albeit with slower kinetics as compared to the polymerase. (B) Pyrogram of the 3 0 -tag shown in A. (C) Corresponding Sangersequencing chromatogram.
Similar documents
View more...
Search Related
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks