Predicting crystal structure has
always been a challenging problem
for physical sciences. Recently, computational methods have been built
to predict crystal structure with success but have been limited in
scope and computational time. In this paper, we review computational
methods such as density functional theory and machine learning methods
used to predict crystal structure. We also explored the breadth versus
accuracy of building a model to predict across any crystal structure
using machine learning. We extracted 24 913 unique chemical
formulas existing between 290 and 310 K from the Pearson Crystal Database.
Of these 24 913 formulas, there exists 10 711 unique
crystal structures referred to as entry prototypes. Common entries
might have hundreds of chemical compositions, while the vast majority
of entry prototypes is represented by fewer than ten unique compositions.
To include all data in our predictions, entry prototypes that lacked
a minimum number of representatives were relabeled as “Other”.
By selecting the minimum numbers to be 150, 100, 70, 40, 20, and 10,
we explored how limiting class sizes affected performance. Using each
minimum number to reorganize the data, we looked at the classification
performance metrics: accuracy, precision, and recall. Accuracy ranged
from 97 ± 2 to 85 ± 2%; average precision ranged from 86
± 2 to 79 ± 2%, while average recall ranged from 73 ±
2 to 54 ± 2% for minimum-class representatives from 150 to 10,
respectively.