Currently, with the rapid increasing of data scales in network traffic classifications, how to select traffic features efficiently is becoming a big challenge. Although a number of traditional feature selection methods using the Hadoop-MapReduce framework have been proposed, the execution time was still unsatisfactory with numeral iterative computations during the processing. To address this issue, an efficient feature selection method for network traffic based on a new parallel computing framework called Spark is proposed in this paper. In our approach, the complete feature set is firstly preprocessed based on Fisher score, and a sequential forward search strategy is employed for subsets. The optimal feature subset is then selected using the continuous iterations of the Spark computing framework. The implementation demonstrates that, on the precondition of keeping the classification accuracy, our method reduces the time cost of modeling and classification, and improves the execution efficiency of feature selection significantly.
Edge storage, as a supplement to cloud storage, reduces latency by providing services in a timely and efficient manner near the source. In a collaborative edge storage datacenter network (CESN), not only does the edge storage datacenter (ESDC) that is closest to the user provide services, but multiple ESDCs work together to provide better services. In this collaborative work mechanism, different application session requests create large persistent multicast flows with diverse performance requirements. Existing multicast scheduling methods such as unicast shortest path (USP) and static single tree (SST) do not consider flow characteristics or performance requirements. In this paper, we first modeled the multicast flow scheduling problem in a CESN. The model is based on different types of flows with diverse network requirements. Then, we tailored a multicast flow scheduling method based on multiple-attribute decisionmaking and a genetic algorithm (MDGA). MDGA selects appropriate multicast routing paths for flows in a CESN by considering the requested flow types and network status. The experimental results show that the proposed MDGA method can balance network loads and reduce the average transmission delay for highpriority flows better than USP and SST.INDEX TERMS collaborative edge storage, datacenter network, multicast flow, multiple-attribute decision-making, genetic algorithm.
Erasure coding has been widely deployed in today’s data centers for it can significantly reduce extra storage costs while providing high storage reliability. However, erasure coding introduced more network traffic and computational overhead in the data update process. How to improve the efficiency and mitigate the system imbalance during the update process in erasure coding is still a challenging problem. Recently, most of the existing update schemes of erasure codes only focused on the single stripe update scenario and ignored the heterogeneity of the node and network status which cannot sufficiently deal with the problems of low update efficiency and load imbalance caused by the multistripe concurrent update. To solve this problem, this paper proposes a Load-Aware Multistripe concurrent Update (LAMU) scheme in erasure-coded storage systems. Notably, LAMU introduces the Software-Defined Network (SDN) mechanism to measure the node loads and network status in real time. It selects nonduplicated nodes with better performance such as CPU utilization, remaining memory, and I/O load as the computing nodes for multiple update stripes. Then, a multiattribute decision-making method is used to schedule the network traffic generated in the update process. This mechanism can improve the transmission efficiency of update traffic and make LAMU adapt to the multistripe concurrent update scenarios in heterogeneous network environments. Finally, we designed a prototype system of multistripe concurrent updates. The extensive experimental results show that LAMU could improve the update efficiency and provide better system load-balancing performance.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.