Simultaneous reconstruction of activity and attenuation using the maximum-likelihood reconstruction of activity and attenuation (MLAA) augmented by time-of-flight information is a promising method for PET attenuation correction. However, it still suffers from several problems, including crosstalk artifacts, slow convergence speed, and noisy attenuation maps (μ-maps). In this work, we developed deep convolutional neural networks (CNNs) to overcome these MLAA limitations, and we verified their feasibility using a clinical brain PET dataset. We applied the proposed method to one of the most challenging PET cases for simultaneous image reconstruction (F-fluorinated--3-fluoropropyl-2-β-carboxymethoxy-3-β-(4-iodophenyl)nortropane [F-FP-CIT] PET scans with highly specific binding to striatum of the brain). Three different CNN architectures (convolutional autoencoder [CAE], Unet, and Hybrid of CAE) were designed and trained to learn a CT-derived μ-map (μ-CT) from the MLAA-generated activity distribution and μ-map (μ-MLAA). The PET/CT data of 40 patients with suspected Parkinson disease were used for 5-fold cross-validation. For the training of CNNs, 800,000 transverse PET and CT slices augmented from 32 patient datasets were used. The similarity to μ-CT of the CNN-generated μ-maps (μ-CAE, μ-Unet, and μ-Hybrid) and μ-MLAA was compared using Dice similarity coefficients. In addition, we compared the activity concentration of specific (striatum) and nonspecific (cerebellum and occipital cortex) binding regions and the binding ratios in the striatum in the PET activity images reconstructed using those μ-maps. The CNNs generated less noisy and more uniform μ-maps than the original μ-MLAA. Moreover, the air cavities and bones were better resolved in the proposed CNN outputs. In addition, the proposed deep learning approach was useful for mitigating the crosstalk problem in the MLAA reconstruction. The Hybrid network of CAE and Unet yielded the most similar μ-maps to μ-CT (Dice similarity coefficient in the whole head = 0.79 in the bone and 0.72 in air cavities), resulting in only about a 5% error in activity and binding ratio quantification. The proposed deep learning approach is promising for accurate attenuation correction of activity distribution in time-of-flight PET systems.