Deep learning has progressively been the spotlight of innovations that aim to leverage the clinical time-series data that are longitudinally recorded in the Electronic Health Records(EHR) to forecast the patient's survival and vital signs deterioration. However, their recording velocity, as well as their noisiness, hinder the proper adoption of the recently proposed benchmarks. The Recurrent Neural Networks (RNN) especially the Long-short Term Memory (LSTMs) have achieved better results in recent studies but they are hard to train and interpret and fail to properly capture the long-term dependencies. Moreover, the RNNs suffer greatly with clinical time series due to their sequential processing which cripples the prospect of parallel processing. Recently the Transformer approach was proposed for Natural Language Processing (NLP) tasks and achieved state-of-the-art results. Hence to tackle the drawbacks that are suffered by the RNNs we propose a clinical time series Multi-head Transformer (MHT), which is a transformerbased model that forecasts the patient's future time series variables using the vitals signs. To prove the generalization of the model we use the same model for other critical tasks that describe the Intensive Care Unit (ICU) patient's progression and the associated risks like the remaining Length Of Stay(LoS), the Inhospital Mortality as well as the 24 hours mortality. Our model achieves an Area Under The Curve-Receiver Operating Characteristics( AUC-ROC) of 0.98 and an Area Under the Curve, Precision-Recall (AUC-PR) of 0.424 for vital time series prediction, and an AUC-ROC of 0.875 in the mortality prediction. The model performs well for the frequently recorded variables like the Heart Rate (HR) and performs barely like the LSTM counterparts for the intermittently captured records such as the White Blood Count (WBC).