Numerous studies covering some aspects of SARS-CoV-2 data analyses are being
published on a daily basis, including a regularly updated phylogeny on
nextstrain.org. Here, we review the difficulties of
inferring reliable phylogenies by example of a data snapshot comprising all
virus sequences available on May 5, 2020 from gisaid.org. We find that it
is difficult to infer a reliable phylogeny on these data due to the large number
of sequences in conjunction with the low number of mutations. We further find
that rooting the inferred phylogeny with some degree of confidence either via
the bat and pangolin outgroups or by applying novel computational methods on the
ingroup phylogeny does not appear to be possible. Finally, an automatic
classification of the current sequences into sub-classes based on statistical
criteria is also not possible, as the sequences are too closely related. We
conclude that, although the application of phylogenetic methods to disentangle
the evolution and spread of COVID-19 provides some insight, results of
phylogenetic analyses, in particular those conducted under the default settings
of current phylogenetic inference tools, as well as downstream analyses on the
inferred phylogenies, should be considered and interpreted with extreme
caution.