Objective: To develop a system that accepts a sentence consisting of two and/or four languages and convert it to a target language text, termed as Cross Language Speech Identification and Text Translation System. Methods: A combinatorial model consisting of Hidden Markov Model, Artificial Neural Networks, Deep Neural Networks and Gaussian Mixture Model are utilized for direct and indirect speech mapping. Trained dataset consisting of thousand phonemes for each of the Hindi, Telugu, English and Kannada languages, initially for bank, hospital domains, later the grammatical phonemes of each language were added and wave files consisting of cross lingual spoken sentence were created which incurred a six months period to build from scratch, as cross lingual vocal data-set is not available. Hindi language dataset Shabdanjali was also referred. The basic parameters considered for creation of structured dataset are loudness, pause, pitch, tone, noise cancellation, sampling frequency, threshold etc. Findings: Comparative analysis of various techniques, target languages and features are tabulated. Research idea emerged from the comparative analysis of Monolingual Systems where there was a gap for cross lingual speech to text translation. The architecture can be enhanced in future for other regional languages of India. Novelty: A new bench mark for Cross Language dataset was created. This work presents CLSITT tool applicable in transforming public speeches spoken in multiple languages to a selected target language and the tool is helpful for a regional news editor, rural and agricultural activities, medical applications, defence and so on.