Recent research on deep learning has been applied to a diversity of fields. In particular, numerous studies have been conducted on self-driving vehicles using end-to-end approaches based on images captured by a single camera. End-to-end controls learn the output vectors of output devices directly from the input vectors of available input devices. In other words, an end-to-end approach learns not by analyzing the meaning of input vectors, but by extracting optimal output vectors based on input vectors. Generally, when end-to-end control is applied to self-driving vehicles, the steering wheel and pedals are controlled autonomously by learning from the images captured by a camera. However, high-resolution images captured from a car cannot be directly used as inputs to Convolutional Neural Networks (CNNs) owing to memory limitations; the image size needs to be efficiently reduced. Therefore, it is necessary to extract features from captured images automatically and to generate input images by merging the parts of the images that contain the extracted features. This paper proposes a learning method for end-to-end control that generates input images for CNNs by extracting road parts from input images, identifying the edges of the extracted road parts, and merging the parts of the images that contain the detected edges. In addition, a CNN model for end-to-end control is introduced. Experiments involving the Open Racing Car Simulator (TORCS), a sustainable computing environment for cars, confirmed the effectiveness of the proposed method for self-driving by comparing the accumulated difference in the angle of the steering wheel in the images generated by it with those of resized images containing the entire captured area and cropped images containing only a part of the captured area. The results showed that the proposed method reduced the accumulated difference by 0.839% and 0.850% compared to those yielded by the resized images and cropped images, respectively.