Share

Applied Microbial Genomics for Plant Disease

This project successfully integrated microbial genome sequencing and genome analysis into a plant disease microbiology course called Plant Pathology (PPATH) 417, Phytobacteriology.

Abstract

A genome sequence for a super-virulent strain of the plant pathogen Erwinia amylovora, which causes fire blight disease on apples and pears, was generated with the help of Student Technology Fee funds. Students were challenged to compare this genome sequence to already available E. amylovora genome sequences for less-virulent strains and try to determine what DNA changes might have occurred that could be responsible for the high level of disease caused by the super-virulent strain. Students accomplished this in stages. First, they sifted through the ~2,000 predicted DNA sequence differences between the super-virulent strain and the less virulent strains and determine which changes were artifacts of the  high-throughput genome sequencing process. About 90% of the predicted changes were eliminated in this way. Then the students looked through the remaining DNA sequence differences and determined which changes might affect protein structures and biological processes in the bacterium. In particular, they focused on those changes that might affect the activity of genes that are known to be involved in the fire blight disease process. These activities occurred both during and outside of class.

Detailed Description

I. Specific Activities


A. Sequencing of the super-virulent E. amylovora strain 06P1 genome.

Following receipt of Student Technology Fee funds, genomic DNA of E. amylovora
strain 06P1 was isolated in October 2011. The DNA was sent to the Penn State Genomics Core Facility, and next-generation PGM Ion Torrent sequencing was performed to obtain about 98% of the genome sequence with high fidelity. In early December of 2011, the genome sequence was assembled and aligned with other E. amylovora published genome sequences. A comprehensive list of over 2,000 predicted DNA sequence differences (DNA polymorphisms) between 06P1 and less-virulent E. amylovora strains was generated in mid-December, 2011. This list of DNA polymorphisms was generated using NextGene software availably at the Penn State Genomics Core Facility by Tim McNellis and McNellis lab personnel.

B. Data examination for robust polymorphisms

In January 2012, students enrolled in PPATH 417 Phytobacteriology were given the list of 2,000 DNA polymorphisms in E. amylovora strain 06P1 and were asked to  determine which of the predicted changes were robustly supported by the raw  genome sequences data. In this activity, students had to judge how strongly the raw data supported each predicted polymorphism. This was done by visually  assessing how frequently the polymorphism was detected during the genome sequencing. The PGM Ion Torrent technology provides a huge amount of DNA sequence information, including from 10 – 200 assessments of each base position of the genome. However, the data can be noisy. In determining how robust a polymorphism was, that is, how likely it was to be truly a polymorphism and not a result of sequencing noise, students too into account how frequently the polymorphic base was identified among the ~100 times it was assayed, among other data factors. Students were guided by the professor in these assessments, but they had to come up with their own parameters for assessing DNA polymorphism data robustness. There was no definite right answer, and this exercise helped students understand that in research, sometimes there is no obvious perfect answer or approach, and you have to proceed logically with the best available procedure. Initially, students were surprised by this, even frustrated, but they soon got the hang of it. Over the course of about 4 weeks, the student sifted through all of the DNA polymorphisms and produced a list of “high- confidence” polymorphisms in the super-virulent 06P1 E. amylovora strain. The organized themselves into small groups to divide up the work and devised a double-check system to verify each other’s determinations. This activity fostered a positive, collaborative dynamic in the course among the students. In the end, about 200 polymorphisms with robust genome data support were identified.


C. Predictions of which polymorphisms might contribute to the super-virulence of E. amylovora strain 06P1.

Students then looked through the list of robust polymorphisms for ones that would be predicted to have biological effects. The major factor in making this decision was whether the polymorphisms affected predicted protein sequences and  structures encoded by E. amylovora genes. This required using the published E. amylovora genome sequence available through GenBank online. This process took about 3 weeks of work, including time spent during course lab sessions. Students brought in their laptop computers or other wireless devices into the lab, and we used the Buckhout Lab wireless system so that everyone could connect to GenBank, compare results and work together. In particular, students decided they wanted to look at polymorphisms affecting the structures of proteins known to be involved in the fire blight disease process. This whittled down the set of polymorphisms to about 10 that students thought could potentially be changing or boosting virulence of 06P1 relative to other sequenced E. amylovora strains.

D. Relating the genome sequence analysis to other activities in PPATH 417
Another activity in PPATH 417 during spring 2012 was a mutant screen where students looked for mutants of strain 06P1 with reduced virulence.

This was done over the course of the first half of the semester as a genetic screen for 06P1 mutants with a loss or reduction of virulence. The class found over a dozen mutants, and the genes affected in these mutants was determined outside of class by Dr. McNellis’ group. Students had to locate the exact position of mutations in their mutants in the E. amylovora genome, using online analysis tools available through GenBank. On their own, students decided that they should look for  polymorphisms in some of the genes they identified in the mutant screen. They found polymorphisms, but none that were expected to affect protein structure or gene function. Nevertheless, this was an example of students synthesizing tools and information and developing their own approaches using the 06P1 genome sequence that they were analyzing. position of the genome. However, the data can be noisy. In determining how robust a polymorphism was, that is, how likely it was to be truly a polymorphism and not a result of sequencing noise, students too into account how frequently the polymorphic base was identified among the ~100 times it was assayed, among other data factors.

Students were guided by the professor in these assessments, but they had to come up with their own parameters for assessing DNA polymorphism data  robustness. There was no definite right answer, and this exercise helped students understand that in research, sometimes there is no obvious perfect answer or approach, and you have to proceed logically with the best available procedure. Initially, students were surprised by this, even frustrated, but they soon got the hang of it. Over the course of about 4 weeks, the student sifted through all of the DNA polymorphisms and produced a list of “high-confidence” polymorphisms in the super-virulent 06P1 E. amylovora strain. The organized themselves into small groups to divide up the work and devised a double-check system to verify each other’s determinations. This activity fostered a positive, collaborative dynamic in the course among the students. In the end, about 200 polymorphisms with robust genome data support were identified.

C. Predictions of which polymorphisms might contribute to the super-virulence of E. amylovora strain 06P1.

Students then looked through the list of robust polymorphisms for ones that would be predicted to have biological effects. The major factor in making this decision was whether the polymorphisms affected predicted protein sequences and structures encoded by E. amylovora genes. This required using the published E. amylovora genome sequence available through GenBank online. This process took about 3 weeks of work, including time spent during course lab sessions. Students brought in their laptop computers or other wireless devices into the lab, and we used the Buckhout Lab wireless system so that everyone could connect to GenBank, compare results and work together. In particular, students decided they wanted to look at polymorphisms affecting the structures of proteins known to be involved in the fire blight disease process. This whittled down the set of polymorphisms to about 10 that students thought could potentially be changing or boosting virulence of 06P1 relative to other sequenced E. amylovora strains.

D. Relating the genome sequence analysis to other activities in PPATH 417
Another activity in PPATH 417 during spring 2012 was a mutant screen where students looked for mutants of strain 06P1 with reduced virulence.

This was done over the course of the first half of the semester as a genetic screen for 06P1 mutants with a loss or reduction of virulence. The class found over a dozen mutants, and the genes affected in these mutants was determined outside  of class by Dr. McNellis’ group. Students had to locate the exact position of mutations in their mutants in the E. amylovora genome, using online analysis tools available through GenBank. On their own, students decided that they should look for polymorphisms in some of the genes they identified in the mutant screen. They found polymorphisms, but none that were expected to affect protein structure or gene function. Nevertheless, this was an example of students synthesizing tools and information and developing their own approaches using the 06P1 genome sequence that they were analyzing.

II. Educational Outcomes

The integration of genomics approaches into PPATH 417 was very successful. Students achieved several major learning milestones and objectives.

  • They learned how to approach and process the huge amounts of data produced by a state–of-the-art next-generation genome sequencing method.
  • They learned how to assess data quality from a next-generation DNA sequencing platform.
  • They learned how to derive biological meaning and functional predictions from whole-genome data for a microbe.
  • They devised their own ways to work together as a group to reach these objectives

III. Repeated Use

The genome sequence of 06P1 that was made possible through this Student Technology Fee program will be used for future offerings of PPATH 417. The activity of sifting through the raw data, assessing biological function, and working as a team will be repeated by the next set of students. Since there is no clear right answer, students will be able to repeat the exercise in their own way each time PPATH 417 is taught. Thus, the Student Technology Fee Program has created a resource that will enrich PPATH 417 in subsequent years, in addition to enriching PPATH 417 during the spring of 2012.