Emerging Autonomous Vehicles (AV) breed great potentials to exploit data-driven techniques for adaptive and personalized Human-Vehicle Interactions. However, the lack of high-quality and rich data supports limits the opportunities to explore the design space of data-driven techniques, and validate the effectiveness of concrete mechanisms. Our goal is to initialize the efforts to deliver the building block for exploring data-driven Human-Vehicle Interaction designs. To this end, we present BROOK dataset, a multi-modal dataset with facial video records. We first brief our rationales to build BROOK dataset. Then, we elaborate how to build the current version of BROOK dataset via a year-long study, and give an overview of the dataset. Next, we present three example studies using BROOK to justify the applicability of BROOK dataset. We also identify key learning lessons from building BROOK dataset, and discuss about how BROOK dataset can foster an extensive amount of follow-up studies.
INTRODUCTIONRecent advances in data-driven techniques (e.g. Neural Networks) breed an extensive amount of opportunities, to enable adaptive and personalized Human-Vehicle Interaction (HVI). More interestingly, the emerging trends of Autonomous Vehicles relax the conventional burdens of driving, and in-vehicle drivers/passengers are capable to obtain better user experiences through more complex HVI. Incorporating data-driven approaches can greatly improve user experiences during driving processes. For instance, unobtrusive monitors of multi-modal statuses (i.e. by taking alternative data sources as input), rather than directly equipping biosensors, are more user-friendly as data sources for personalized interaction between drivers and vehicles. Therefore, combining data-driven techniques with HVI becomes promising in the near future, and relevant datasets become essential and highly demanded.Though there are already several datasets for HVI, there are four major limitations of existing datasets. First, the design purposes of existing datasets focus on a specific type of study purposes (e.g. Stress Detection [20], Driving Workload [40] and etc.), which naturally limits their potentials due to the narrowly-selected types of data streams. Second, existing datasets only collect information