In future intelligent transportation systems, networked vehicles coordinate with each other to achieve safe operations based on an assumption that communications among vehicles and infrastructure are reliable. Traditional methods usually deal with the design of control systems and communication networks in a separated manner. However, control and communication systems are tightly coupled as the motions of vehicles will affect the overall communication quality. Hence, we are motivated to study the co-design of both control and communication systems. In particular, we propose a control theoretical framework for distributed motion planning for multiagent systems which satisfies complex and high-level spatial and temporal specifications while accounting for communication quality at the same time. Towards this end, desired motion specifications and communication performances are formulated as signal temporal logic (STL) and spatial-temporal logic (SpaTeL) formulas, respectively. The specifications are encoded as constraints on system and environment state variables of mixed integer linear programs (MILP), and upon which control strategies satisfying both STL and SpaTeL specifications are generated for each agent by employing a distributed model predictive control (MPC) framework. Effectiveness of the proposed framework is validated by a simulation of distributed communication-aware motion planning for multi-agent systems.
Automatic recognition of overlapped speech remains a highly challenging task to date. Motivated by the bimodal nature of human speech perception, this paper investigates the use of audio-visual technologies for overlapped speech recognition. Three issues associated with the construction of audio-visual speech recognition (AVSR) systems are addressed. First, the basic architecture designs i.e. end-to-end and hybrid of AVSR systems are investigated. Second, purposefully designed modality fusion gates are used to robustly integrate the audio and visual features. Third, in contrast to a traditional pipelined architecture containing explicit speech separation and recognition components, a streamlined and integrated AVSR system optimized consistently using the lattice-free MMI (LF-MMI) discriminative criterion is also proposed. The proposed LF-MMI time-delay neural network (TDNN) system establishes the state-of-the-art for the LRS2 dataset. Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29.98% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system. Consistent performance improvements of 4.89% absolute in WER reduction over the baseline AVSR system using feature fusion are also obtained.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.