Picture description tasks are used for the detection of cognitive decline associated with Alzheimer's disease (AD). Recent years have seen work on automatic AD detection in picture descriptions based on acoustic and word-based analysis of the speech. These methods have shown some success but lack an ability to capture any higher level effects of cognitive decline on the patient's language. In this paper, we propose a novel model that encompasses both the hierarchical and sequential structure of the description and detect its informative units by attention mechanism. Automatic speech recognition (ASR) and punctuation restoration are used to transcribe and segment the data. Using the DementiaBank database of people with AD as well as healthy controls (HC), we obtain an F-score of 84.43% and 74.37% when using manual and automatic transcripts respectively. We further explore the effect of adding additional data (a total of 33 descriptions collected using a 'digital doctor') during model training, and increase the F-score when using ASR transcripts to 76.09%. This outperforms baseline models, including bidirectional LSTM and bidirectional hierarchical neural network without an attention mechanism, and demonstrate that the use of hierarchical models with attention mechanism improves the AD/HC discrimination performance.