BackgroundResponsiveness of physicians is the social actions that physicians do to meet the legitimate expectations of service seekers. Since there is no such scale, this study aimed at developing one for measuring responsiveness of physicians in rural Bangladesh, by structured observation method.MethodsData were collected from Khulna division of Bangladesh, through structured observation of 393 patient-consultations with physicians. The structured observation tool consisted of 64 items, with four Likert type response categories, each anchored with a defined scenario. Inter-rater reliability was assessed by same three raters observing 30 consultations. Data were analyzed by exploratory factor analysis (EFA), followed by assessment of internal consistency by ordinal alpha coefficient, inter-rater reliability by intra-class correlation coefficient (ICC), concurrent validity by correlating responsiveness score with waiting time, and known group validity by comparing public and private sector physicians.ResultsAfter removing items with more than 50% missing values, 45 items were considered for EFA. Parallel analysis suggested a 5-factor model. Nine items were removed from the list owing to < 0.50 communality, <0.32 loading in un-rotated matrix, and <0.30 on any factor in rotated matrix. Since 34 items (i.e., the number of remaining items after removing nine items by EFA) were loaded neatly under five factors, explained 61.38% of common variance, and demonstrated high internal consistency with coefficient of 0.91, this was adopted as the Responsiveness of Physicians Scale (ROP-Scale). The five factors were named as 1) Friendliness, 2) Respecting, 3) Informing and guiding, 4) Gaining trust, and 5) Financial sensitivity. Inter-rater reliability was high, with an ICC of 0.64 for individual rater’s reliability and 0.84 for average reliability scores. Positive correlation with waiting time (0.51), and higher score of private sector by 0.18 point denote concurrent, and known group validity, respectively.ConclusionsThe ROP-Scale consists of 34 items grouped under five factors. One can apply this with confidence in comparable settings, as this scale demonstrated high internal consistency and inter-rater reliability. More research is needed to test this scale in other settings and with other types of providers.Electronic supplementary materialThe online version of this article (10.1186/s12913-017-2722-1) contains supplementary material, which is available to authorized users.