In recent study efforts, the importance of text identification and recognition in images of natural scenes has been stressed more and more. Natural scene text contains an enormous amount of useful semantic data that can be applied in a variety of vision-related applications. The detection of shape-robust text confronts two major challenges: 1. A large number of traditional quadrangular bounding box-based detectors failed to identify text with irregular forms, making it difficult to include such text within perfect rectangles.2. Pixel-wise segmentation-based detectors sometimes struggle to identify closely positioned text examples from one another. Understanding the surroundings and extracting information from images of natural scenes depends heavily on the ability to detect and recognise text. Scene text can be aligned in a variety of ways, including vertical, curved, random, and horizontal alignments. This paper has created a novel method, the Transformation Scaling Extention Algorithm (TSEA), for text detection using a mask-scoring R-ConvNN (Region Convolutional Neural Network). This method works exceptionally well at accurately identifying text that is curved and text that has multiple orientations inside real-world input images. This study incorporates a mask-scoring R-ConvNN network framework to enhance the model's ability to score masks correctly for the observed occurrences. By providing more weight to accurate mask predictions, our scoring system eliminates inconsistencies between mask quality and score and enhances the effectiveness of instance segmentation. This paper also incorporates a Pyramid-based Text Proposal Network (PBTPN) and a Transformation Component Network (TCN) to enhance the feature extraction capabilities of the mask-scoring R-ConvNN for text identification and segmentation with the TSEA. Studies show that Pyramid Networks are especially effective in reducing false alarms caused by images with backgrounds that mimic text. On benchmark datasets ICDAR 2015, SCUT-CTW1500 containing multi-oriented and curved text, this method outperforms existing methods by conducting extensive testing across several scales and utilizing a single model. This study expands the field of vision-oriented applications by highlighting the growing significance of effectively locating and detecting text in natural situations.