Per-pixel (or single instance) based classification schemes which have proven to be very useful in thematic classification have shown to be inadequate when it comes to analyzing very high resolution remote sensing imagery. The main problem being that the pixel size (less than a meter) is too small as compared to the typical object size (100s of meters) and contains too little contextual information to accurately distinguish complex settlement types. One way to alleviate this problem is to consider a bigger window or patch/segment consisting a group of adjacent pixels which offers better spatial context than a single pixel. Unfortunately, this makes per-pixel based classification schemes ineffective. In this work, we look at a new class of machine learning approaches, called multi-instance learning, where instead of assigning class labels to individual instances (pixels), a label is assigned to the bag (all pixels in a window or segment). We applied this multi-instance learning approach for identifying two important urban patterns, namely formal and informal settlements. Experimental evaluation shows the better performance of multi-instance learning over several well-known single-instance classification schemes.