With the proliferation of news documents on the Internet, online news reading has become an important approach for information acquisition in people’s daily lives. There has, however, been increasing concern with the growing infusion of misinformation. As a complement to news text, associated photos provide readers with additional information to facilitate their ability to find the information they need. To contextualise the vast amount of news that is published worldwide, the geographic content is crucial. On the other hand, the geographic content plays an important role in news recommendation to facilitate user desires. Existing approaches for geolocation estimation are primarily based on either text or photos as separate tasks. However, news photos can lack geographical cues, and text can include multiple locations. Therefore, it is challenging to recognise the focus location of the news story based on only one modality. We introduce novel datasets for multimodal geolocation estimation of news documents. We evaluate current methods on the benchmark datasets and suggest new methods for news geolocalisation using textual and visual content. In addition, we introduce a news retrieval system called GeoWINE based on the geographic content of news photos to emphasise the importance of geolocation estimation in the news domain.