RECEIVED DATETITLE RUNNING HEAD: QMR representation of asphaltenes and MD simulation of aggregation. CORRESPONDING AUTHOR FOOTNOTE: EBoek@slb.com; Tel +44 1223 325222 2 ABSTRACT We have developed a computer algorithm to generate Quantitative Molecular Representations (QMR) of asphaltenes based on experimental data. First, we generate molecular representations using a Monte Carlo method. For this purpose, we use an extensive set of aromatic and aliphatic building blocks, which are sampled randomly from the corresponding distribution and then linked together using a connection algorithm. The building blocks can be taken from a pre-defined inventory or generated during run-time.Manually pre-fabricated blocks ensure model flexibility while automatically generated blocks allow us to build large aromatic sheets. We allow for both archipelago and peri-condensed structures to be generated. Then, we use a non-linear optimisation procedure to select a small subset of molecules that gives the best match with experimental data. These experimental data consist of Molecular Weight (MW), elemental analysis and NMR spectroscopy, including both 1 H and 13 C data. First, we validate the method by testing a number of single model compounds. Then we use a real asphaltene data set available in the literature. Different values of the MW were used as input parameter. We tested two specific values of the MW in detail, representing the peri-condensed and archipelago structure respectively: MW= 750 and MW = 4190. For both MWs, we generated 10 sets of 5000 samples each.The samples were then optimized with respect to the experimental objective function. Then we calculate the value of the objective function as an average over all the simulation runs. It turns out that the value of the objective function is significantly smaller for MW=750 than for MW=4190. This means that the lower Molecular Weight of 750 provides the best match with the experimental data. As an example, one of the optimised QMR asphaltene structures generated was then used as input in Molecular Dynamics (MD) simulations to study the formation of nano-aggregates.