With the increased number of Software-Defined Networking (SDN) installations, the data centers of large service providers are becoming more and more agile in terms of network performance efficiency and flexibility. While SDN is an active and obvious trend in a modern data center design, the implications and possibilities it carries for effective and efficient network management are not yet fully explored and utilized. With most of the modern Internet traffic consisting of multimedia services and media-rich content sharing, the quality of multimedia communications is at the center of attention of many companies and research groups. Since SDN-enabled switches have an inherent feature of monitoring the flow statistics in terms of packets and bytes transmitted/lost, these devices can be utilized to monitor the essential statistics of the multimedia communications, allowing the provider to act in case of network failing to deliver the required service quality. The internal packet processing in the SDN switch enables the SDN controller to fetch the statistical information of the particular packet flow using the PacketIn and Multipart messages. This information, if preprocessed properly, can be used to estimate higher layer interpretation of the link quality and thus allowing to relate the provided quality of service (QoS) to the quality of user experience (QoE). This article discusses the experimental setup that can be used to estimate the quality of speech communication based on the information provided by the SDN controller. To achieve higher accuracy of the result, latency characteristics are added based on the exploiting of the dummy packet injection into the packet stream and/or RTCP packet analysis. The results of the experiment show that this innovative approach calculates the statistics of each individual RTP stream, and thus, we obtain a method for dynamic measurement of speech quality, where when quality decreases, it is possible to respond quickly by changing routing at the network level for each individual call. To improve the quality of call measurements, a Convolutional Neural Network (CNN) was also implemented. This model is based on two standard approaches to measuring the speech quality: PESQ and E-model. However, unlike PESQ/POLQA, the CNN-based model can take delay into account, and unlike the E-model, the resulting accuracy is much higher.