Building a phylogeny from one or a few genes is likely to be misleading about the relationships among species. A more accurate reconstruction will use many genes from the nucleus, but this is not cost-effective with traditional PCR and Sanger-sequencing based methods. Similarly, sequencing full genomes using high-throughput sequencing is not feasible for systematics in non-model organisms (yet).
One compromise is to bias high-throughput sequencing libraries to contain a reduced representation of the genome. This technique, known as targeted sequencing, is a cost-effective way to sample hundreds of loci from dozens of samples simultaneously. There are several targeted-sequencing techniques, including Anchored Phylogenetics (aka Ultra-Conserved Elements) or RAD-cap (the capture of restriction-digest associated elements). For plant phylogenetics, HybSeq– the targeting of exons and flanking intron regions, has proven highly effective. In our lab we focus on three aspects of HybSeq targeted sequencing: probe design, the use of herbarium specimens, and data analysis pipelines.
After sequencing hundreds of genes from dozens of individuals, the challenge is to create data files ready for phylogenetics analysis from sequencing reads. We designed HybPiper to efficiently process reads in three stages: read sorting, contig assembly, and exon extraction. We also designed scripts for extracting intron sequences, detecting paralogous sequence, calculating efficiency statistics, and data visualization.
Our future directions include incorporating allelic information into phylogenetic analysis, correcting errors in contig assembly, and improving the accuracy of assembly from herbarium specimens. x