Our mind can represent various objects from the physical world metaphorically into an abstract and complex high-dimensional object space, with a finite number of orthogonal axes encoding critical object features. Previous fMRI studies have shown that the middle fusiform sulcus in the ventral temporal cortex separates the real-world small-size map from the large-size map. Here we asked whether the feature of objects' real-world size constructed an axis of object space with deep convolutional neural networks (DCNNs) based on three criteria of sensitivity, independence and necessity that are impractical to be examined altogether with traditional approaches. A principal component analysis on features extracted by the DCNNs showed that objects' real-world size was encoded by an independent component, and the removal of this component significantly impaired DCNN's performance in recognizing objects. By manipulating stimuli, we found that the shape and texture of objects, rather than retina size, co-occurrence and task demands, accounted for the representation of the real-world size in the DCNNs. A follow-up fMRI experiment on humans further demonstrated that the shape, but not the texture, was used to infer the real-world size of objects in humans. In short, with both computational modeling and empirical human experiments, our study provided the first evidence supporting the feature of objects' real-world size as an axis of object space, and devised a novel paradigm for future exploring the structure of object space.