Identification of malware is a critical problem in computer security. Many signature-identification, behaviorrecognition, and reputation-based tools are available for hostbased detection. However, so many files are present on systems today that checking all files is time-consuming, and better methods are needed to suggest which files are of highest priority to check in partial scans. This work developed and tested local contextual clues to malware in the metadata of file systems on an international corpus of 248 million files on 3961 drives. 398,949 hash values of malware were found in this corpus using five methods, and 3,681,211 hash values of non-malware were chosen for comparison using three methods. Malware identification rates were compared for the fifteen combinations and were crosscorrelated for different types of drives and file types. Results showed that different malware identification methods find significantly different things. Then the strength of particular local clues in file metadata (directory and file names, sizes, times, and hash values) was assessed and results were compared for the fifteen combinations. Some classic clues (e.g. rare file extensions and deletion status) were confirmed and others were not (e.g. double extensions and occurrence in the operating system). With this data, a program was implemented to estimate the likelihood that a given file was malware based solely on its metadata context. With three random subsets of our corpus, our methods gave 51 times better precision (fraction of malware in files identified as malware) with 70% better recall (fraction of malware detected) than the approach of inspecting executables alone. They also ran significantly faster than signature checking, and can be used before other kinds of malware analysis.