2020
DOI: 10.1007/s42421-020-00013-0
|View full text |Cite
|
Sign up to set email alerts
|

Mitigating Bias in Big Data for Transportation

Abstract: Emerging big data resources and practices provide opportunities to improve transportation safety planning and outcomes. However, researchers and practitioners recognise that big data from mobile phones, social media, and on-board vehicle systems include biases in representation and accuracy, related to transportation safety statistics. This study examines both the sources of bias and approaches to mitigate them through a review of published studies and interviews with experts. Coding of qualitative data enable… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
10
0

Year Published

2020
2020
2023
2023

Publication Types

Select...
9

Relationship

1
8

Authors

Journals

citations
Cited by 27 publications
(12 citation statements)
references
References 64 publications
0
10
0
Order By: Relevance
“…Biases in MDD-derived transportation metrics can be reduced or mitigated by combining measured trip patterns with ancillary data sources, including roadway counts, census data, and surveys, as well as by applying local knowledge and planning judgment in evaluating the resulting data products ( 12 ). The effects of differential sampling rates across demographic or user groups may affect the accuracy of O-D data but would not be anticipated to affect congestion and travel time metrics, as traffic conditions encountered by smartphone owners in the same place and time would not be expected to vary between visitors of different demographic groups.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Biases in MDD-derived transportation metrics can be reduced or mitigated by combining measured trip patterns with ancillary data sources, including roadway counts, census data, and surveys, as well as by applying local knowledge and planning judgment in evaluating the resulting data products ( 12 ). The effects of differential sampling rates across demographic or user groups may affect the accuracy of O-D data but would not be anticipated to affect congestion and travel time metrics, as traffic conditions encountered by smartphone owners in the same place and time would not be expected to vary between visitors of different demographic groups.…”
Section: Literature Reviewmentioning
confidence: 99%
“…Previous research has indicated the biases of emerging data, which in turn threaten the outcome legitimacy of these data. In particular, Strava is associated with several biases: (i) demographic bias towards young and white males [74]; (ii) social desirability bias, where the recorded trips may over-reflect trips with a sense of achievement and overlook mundane journeys [13]; and (iii) self-selection bias that arises when the participants can include or exclude themselves from the sample [129]. BSSs provide data on all their users, eliminating the potential of social desirability and self-selection biases.…”
Section: Biasesmentioning
confidence: 99%
“…The advent and ubiquity of information and communication technology, including smartphones and wearable devices, has allowed for emerging AT data (hereinafter emerging data) ventures. These datasets are considered Big Data, characterized by the three v's: volume (very large), variety (highly complex) and velocity (high growth rate), making them unmanageable through traditional methods [13]. Such unprecedented data also, however, provide new opportunities and challenges to aid the transport paradigm shift toward AT.…”
Section: Introductionmentioning
confidence: 99%
“…For example, there are legitimate concerns about bias of the Strava sample, which is disproportionally used by middle-aged, white men (Garber, Watkins, & Kramer, 2019), that have led to development of many studies of how best to use Strava (Mcnair & Arnold, 2016). As with many big data analyses, it is critical that when using Strava data for research and practice applications that expert opinion, local knowledge, and appropriate goals and metrics are also considered (Griffin, Mulhall, Simek, & Riggs, 2020). Additionally, methods to correct the bias in Strava data have been developed (Roy et al, 2019) to map all ridership.…”
Section: Gps/ Accelerometrymentioning
confidence: 99%