A good indexing of the learning objects is the better way to guarantee their reuse in the distance-learning context. We need to supply each content of a machine-understandable description including both technological and pedagogical information able to declare requirements and limits for its right use and to improve any research and delivery action. These descriptions are stored in the metadata: standard-based data structures. Filling in the metadata is a boring and timeconsuming activity but it is very important since it could influence, in the learner-centered processes, the choice of the best material to deliver. This paper describes a possible methodological approach to automate this activity by extracting metadata directly from the files setting up the learning object itself. In the literature there are many methods able to automatically characterize the technological aspects of the content (format, dimensions, HW and SW requirements, etc.) but very few of them are able to provide information about its pedagogical features (educational style, semantic density, difficulty, time to learn, interactivity level, etc.). The proposed approach tries to draw together information theory, learning models, statistical analysis and ad hoc heuristics to extract a wide set of fields of the metadata. The results of a first experimentation are particularly encouraging to think about this approach as a solution to enrich content management systems and, in particular, e-learning platforms having needs to manage wide content storage and huge amount of users with various personal features, devices for interaction and goals as in the MOOCs.