Tumor samples are heterogeneous. They consist of different subclones that
are characterized by differences in DNA nucleotide sequences and copy numbers on
multiple loci. Heterogeneity can be measured through the identification of the
subclonal copy number and sequence at a selected set of loci. Understanding that
the accurate identification of variant allele fractions greatly depends on a
precise determination of copy numbers, we develop a Bayesian feature allocation
model for jointly calling subclonal copy numbers and the corresponding allele
sequences for the same loci. The proposed method utilizes three random matrices,
L, Z and
w to represent subclonal copy numbers
(L), numbers of subclonal variant alleles
(Z) and cellular fractions of subclones in
samples (w), respectively. The unknown number of
subclones implies a random number of columns for these matrices. We use
next-generation sequencing data to estimate the subclonal structures through
inference on these three matrices. Using simulation studies and a real data
analysis, we demonstrate how posterior inference on the subclonal structure is
enhanced with the joint modeling of both structure and sequencing variants on
subclonal genomes. Software is available at http://compgenome.org/BayClone2.