Despite the tremendous growth in the cellular data network usage due to the popularity of smartphones, so far there is rather limited understanding of the network infrastructure of various cellular carriers. Understanding the infrastructure characteristics such as the network topology, routing design, address allocation, and DNS service configuration is essential for predicting, diagnosing, and improving cellular network services, as well as for delivering content to the growing population of mobile wireless users. In this work, we propose a novel approach for discovering cellular infrastructure by intelligently combining several data sources, i.e., server logs from a popular location search application, active measurements results collected from smartphone users, DNS request logs from a DNS authoritative server, and publicly available routing updates. We perform the first comprehensive analysis to characterize the cellular data network infrastructure of four major cellular carriers within the U.S. in our study.We conclude among other previously little known results that the current routing of cellular data traffic is quite restricted, as it must traverse a rather limited number (i.e., 4-6) of infrastructure locations (i.e., GGSNs), which is in sharp contrast to wireline Internet traffic. We demonstrate how such findings have direct implications on important decisions such as mobile content placement and content server selection. We observe that although the local DNS server is a coarse-grained approximation on the user's network location, for some carriers, choosing content servers based on the local DNS server is accurate enough due to the restricted routing in cellular networks. Placing content servers close to GGSNs can potentially reduce the end-to-end latency by more than 50% excluding the variability from air interface.
With lower latency and higher bandwidth than its predecessor 3G networks, the latest cellular technology 4G LTE has been attracting many new users. However, the interactions among applications, network transport protocol, and the radio layer still remain unexplored. In this work, we conduct an in-depth study of these interactions and their impact on performance, using a combination of active and passive measurements. We observed that LTE has significantly shorter state promotion delays and lower RTTs than those of 3G networks. We discovered various inefficiencies in TCP over LTE such as undesired slow start. We further developed a novel and lightweight passive bandwidth estimation technique for LTE networks. Using this tool, we discovered that many TCP connections significantly under-utilize the available bandwidth. On average, the actually used bandwidth is less than 50% of the available bandwidth. This causes data downloads to be longer, and incur additional energy overhead. We found that the under-utilization can be caused by both application behavior and TCP parameter setting. We found that 52.6% of all downlink TCP flows have been throttled by limited TCP receive window, and that data transfer patterns for some popular applications are both energy and network unfriendly. All these findings highlight the need to develop transport protocol mechanisms and applications that are more LTE-friendly.
Spam filtering has traditionally relied on extracting spam signatures via supervised learning, i.e., using emails explicitly manually labeled as spam or ham. Such supervised learning is labor-intensive and costly, more importantly cannot adapt to new spamming behavior quickly enough. The fundamental reason for needing labeled training corpus is that the learning, e.g., the process of extracting signatures, is carried out by examining individual emails. In this paper, we study the feasibility of unsupervised learning-based spam filtering that can more effectively identify new spamming behavior. Our study is motivated by three key observations of today's Internet spam: (1) the vast majority of emails are spam, (2) a spam email should always belong to some campaign, (3) spam from the same campaign are generated from some template that obfuscates some parts of the spam, e.g., sensitive terms, leaving other parts unchanged.We present the design of an online, unsupervised spam learning and detection scheme. The key component of our scheme is a novel text-mining-based campaign identification framework that clusters spam into campaigns and extracts the invariant textual fragments from spam as campaign signatures. While the individual terms in the invariant fragments can also appear in ham, the key insight behind our unsupervised scheme is that our learning algorithm is effective in extracting co-occurrences of terms that are generated by campaign templates and rarely appear in ham. Using large traces containing about 2 million emails from three sources, we show our unsupervised scheme alone achieves a false negative ratio of 3.5% and a false positive ratio of at most 0.4%. These detection accuracies are comparable to those of the de-facto supervised-learning-based filtering systems such as SpamAssassin (SA), suggesting that unsupervised spam filtering holds high promise in battling today's Internet spam.
Load balancers choose among load-balanced paths to distribute traffic as if it makes no difference using one path or another. This work shows that the latency difference between load-balanced paths (called latency imbalance ), previously deemed insignificant, is now prevalent from the perspective of the cloud and affects various latency-sensitive applications. In this work, we present the first large-scale measurement study of latency imbalance from a cloud-centric view. Using public cloud around the globe, we measure latency imbalance both between data centers (DCs) in the cloud and from the cloud to the public Internet. Our key findings include that 1) Amazon's and Alibaba's clouds together have latency difference between load-balanced paths larger than 20ms to 21% of public IPv4 addresses; 2) Google's secret in having lower latency imbalance than other clouds is to use its own well-balanced private WANs to transit traffic close to the destinations and that 3) latency imbalance is also prevalent between DCs in the cloud, where 8 pairs of DCs are found to have load-balanced paths with latency difference larger than 40ms. We further evaluate the impact of latency imbalance on three applications (i.e., NTP, delay-based geolocation and VoIP) and propose potential solutions to improve application performance. Our experiments show that all three applications can benefit from considering latency imbalance, where the accuracy of delay-based geolocation can be greatly improved by simply changing how \textttping measures the minimum path latency.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.