Estimating the Reliability of Emotion Measures over Very Short Intervals: The Utility of Within- Session Retest Correlations

Lowman, Graham H.; Wood, Dustin

doi:10.31234/osf.io/mwd5g

Cited by 3 publications

(5 citation statements)

References 10 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Conduct retest analyses of items with adequate qualitative and quantitative properties Test-retest correlations over short spans are particularly good indicators of item quality: for an item to provide reliable and useful information, raters first have to answer it consistently in the short-run --that is, they have to be able to agree with themselves on the content of the item. The retest interval can be a couple of months (Watson, 2004), a couple of weeks (Mõttus, Sinick et al, 2019;Soto & John, 2017), a couple of days (Wood et al, 2010), or even a couple of minutes (Lowman et al, 2018;Wood et al, 2018). What makes these estimates so valuable is that they are particularly good predictors of many standard indicators of item validity simultaneously, such as self-other agreement correlations and stability correlations over longer time periods (McCrae, Kurtz, Yamagata, Terracciano, 2011;Henry & Mõttus, 2020), while also being estimable for single items.…”

Section: Step 2 Programmatic Evaluation and Documentation Of Item Characteristicsmentioning

confidence: 99%

Bottom Up Construction of a Personality Taxonomy

Condon

Wood

Mõttus

et al. 2020

European Journal of Psychological Assessment

Self Cite

View full text Add to dashboard Cite

Abstract. In pursuit of a more systematic and comprehensive framework for personality assessment, we introduce procedures for assessing personality traits at the lowest level: nuances. We argue that constructing a personality taxonomy from the bottom up addresses some of the limitations of extant top-down assessment frameworks (e.g., the Big Five), including the opportunity to resolve confusion about the breadth and scope of traits at different levels of the organization, evaluate unique and reliable trait variance at the item level, and clarify jingle/jangle issues in personality assessment. With a focus on applications in survey methodology and transparent documentation, our procedures contain six steps: (1) identification of a highly inclusive pool of candidate items, (2) programmatic evaluation and documentation of item characteristics, (3) test-retest analyses of items with adequate qualitative and quantitative properties, (4) analysis of cross-ratings from multiple raters for items with adequate retest reliability, (5) aggregation of ratings across diverse samples to evaluate generalizability across populations, (6) evaluations of predictive utility in various contexts. We hope these recommendations are the first step in a collaborative effort to identify a comprehensive pool of personality nuances at the lowest level, enabling subsequent construction of a robust hierarchy – from the bottom up.

show abstract

Section: Step 2 Programmatic Evaluation and Documentation Of Item Characteristicsmentioning

confidence: 99%

Bottom Up Construction of a Personality Taxonomy

Condon

Wood

Mõttus

et al. 2020

European Journal of Psychological Assessment

Self Cite

View full text Add to dashboard Cite

show abstract

“…Some may think that item‐level findings are notoriously unreliable. But as was discussed before, items often have retest reliabilities of .65 or higher (Lowman, Wood, Armstrong, Harms, & Watson, 2018; Mõttus et al, 2019; Wood, Nye, & Saucier, 2010; Henry & Mõttus, 2020), which may be higher than many intuitively expect. Higher‐than‐assumed single item reliability is also consistent with findings that items out‐predict scales for outcomes and other variables (Achaa‐Amankwaa, Olaru, & Schroeders, 2020; Elleman, McDougald, Condon, & Revelle, 2020; Mõttus & Rozgonjuk, 2019; Seeboth & Mõttus, 2018; Vainik, Mõttus, Allik, Esko, & Realo, 2015).…”

Section: Descriptive Personality Sciencementioning

confidence: 96%

“…4Retest correlations over shorter testing intervals can be higher still (Lowman, Wood, Armstrong, Harms, & Watson, 2018) and may provide even more accurate reliability estimates.…”

Section: Descriptive Personality Sciencementioning

confidence: 99%

Descriptive, Predictive and Explanatory Personality Research: Different Goals, Different Approaches, but a Shared Need to Move beyond the Big Few Traits

et al. 2020

View full text Add to dashboard Cite

We argue that it is useful to distinguish between three key goals of personality science—description, prediction and explanation—and that attaining them often requires different priorities and methodological approaches. We put forward specific recommendations such as publishing findings with minimum a priori aggregation and exploring the limits of predictive models without being constrained by parsimony and intuitiveness but instead maximizing out–of–sample predictive accuracy. We argue that naturally occurring variance in many decontextualized and multidetermined constructs that interest personality scientists may not have individual causes, at least as this term is generally understood and in ways that are human–interpretable, never mind intervenable. If so, useful explanations are narratives that summarize many pieces of descriptive findings rather than models that target individual cause–effect associations. By meticulously studying specific and contextualized behaviours, thoughts, feelings and goals, however, individual causes of variance may ultimately be identifiable, although such causal explanations will likely be far more complex, phenomenon–specific and person–specific than anticipated thus far. Progress in all three areas—description, prediction and explanation—requires higher dimensional models than the currently dominant ‘Big Few’ and supplementing subjective trait–ratings with alternative sources of information such as informant–reports and behavioural measurements. Developing a new generation of psychometric tools thus provides many immediate research opportunities. © 2020 European Association of Personality Psychology

show abstract

“…The r tt does not rely on the assumption that all items measure nothing but a single unidimensional trait, and it is less distorted by state-like artifacts. Indeed, unlike internal consistency, scales' r tt -s track their validities (Henry et al, 2022;McCrae, 2011), making it the preferred method of estimating reliability (Lowman et al, 2018;McCrae, 2015;Revelle & Condon, 2019;). Besides, it can be calculated for individual test items, allowing researchers to select the most reliable items into their scales.…”

Section: Reliability In Personality Measurementsmentioning

confidence: 99%

“…Besides, it can be calculated for individual test items, allowing researchers to select the most reliable items into their scales. Also, corrections of correlations between scale scores for measurement error that use internal consistencies often result in correlations above 1.00, whereas using r tt rarely results in such off-limit correlations (Lowman et al, 2018).…”

Section: Reliability In Personality Measurementsmentioning

confidence: 99%

Test-Retest Reliability and Construct Validity of the Brief Dark Triad Measurements

Dragostinov¹,

Mõttus²

2021

Preprint

View full text Add to dashboard Cite

Despite the widespread use and popularity of the Dirty Dozen (DD) and Short Dark Triad (SD3) as measurements of “dark” personalities, there appears to be a lack of appropriately powered studies on the test-retest reliability (rtt) of the two brief measurements. We report 12-day test-retest of the DD and SD3 at the level of domains and items. Leveraging the data, we calculated the convergent and discriminant correlations of the DD and SD3 scales while controlling for measurement error, and also evaluated the reliability of items’ unique variances. Median rtts were 0.87 and 0.90 (N = 500) for the DD and SD3 scales respectively, substantially higher than their internal consistencies. Convergent correlations were 0.77, 0.63 and 0.64 for Machiavellianism, Narcissism and Psychopathy respectively, whereas discriminant correlations between the Machiavellianism and Psychopathy scales had a median of 0.65. The unique variances of all DD and SD3 items had significant and often very high rtts (medians 0.49 and 0.57, respectively). We emphasize the importance of rtt for scale development and validation and conclude that the DD and SD3 have hierarchical structures with a Dark Dyad (Narcissism and Psychopathy/Machiavellianism) at a higher level and a Dark Myriad (individual items) at a lower level.

show abstract

Estimating the Reliability of Emotion Measures over Very Short Intervals: The Utility of Within- Session Retest Correlations

Cited by 3 publications

References 10 publications

Bottom Up Construction of a Personality Taxonomy

Bottom Up Construction of a Personality Taxonomy

Descriptive, Predictive and Explanatory Personality Research: Different Goals, Different Approaches, but a Shared Need to Move beyond the Big Few Traits

Test-Retest Reliability and Construct Validity of the Brief Dark Triad Measurements

Contact Info

Product

Resources

About