Understanding Internet access trends at a global scale, i.e., what do people do on the Internet, is a challenging problem that is typically addressed by analyzing network traces. However, obtaining such traces presents its own set of challenges owing to either privacy concerns or to other operational difficulties. The key hypothesis of our work here is that most of the information needed to profile the Internet endpoints is already available around us -on the web.In this paper, we introduce a novel approach for profiling and classifying endpoints. We implement and deploy a Google-based profiling tool, which accurately characterizes endpoint behavior by collecting and strategically combining information freely available on the web. Our 'unconstrained endpoint profiling' approach shows remarkable advances in the following scenarios: (i) Even when no packet traces are available, it can accurately predict application and protocol usage trends at arbitrary networks; (ii) When network traces are available, it dramatically outperforms state-ofthe-art classification tools; (iii) When sampled flow-level traces are available, it retains high classification capabilities when other schemes literally fall apart. Using this approach, we perform unconstrained endpoint profiling at a global scale: for clients in four different world regions (Asia, South and North America and Europe). We provide the first-of-its-kind endpoint analysis which reveals fascinating similarities and differences among these regions.
Abstract-Smartphones have changed the way people communicate. Most prominently, using commonplace mobile device features (e.g., high resolution cameras), they started producing and uploading large amounts of content that increases at an exponential pace. In the absence of viable technical solutions, some cellular network providers are considering to start charging special usage fees to address the problem.Our contributions are twofold. First, we find that the usergenerated content problem is a user-behavioral problem. By analyzing user mobility and data logs of close to 2 million users of a cellular network, we find that (i) users upload content from a small number of locations, typically corresponding to their home or work locations; (ii) because such locations are different for different users, we find that the problem appears ubiquitous, since user-generated content uploads grow exponentially at most locations. However, we also find that (iii) there exists a significant lag between content generation and uploading times. For example, we find that 55% of content that is uploaded via mobile phones is at least 1 day old.Second, based on the above insights, we propose a new cellular network architecture. Our approach proposes capacity upgrades at a select number of locations called Drop Zones. Although not particularly popular for uploads originally, Drop Zones seamlessly fall within the natural movement patterns of a large number of users. They are therefore better suited for uploading larger quantities of content in a postponed manner. We design infrastructure placement algorithms and demonstrate that by upgrading infrastructure in only 963 base-stations across the entire United States, it is possible to deliver 50% of total content via the Drop Zones.
Understanding Internet access trends at a global scale, i.e., what do people do on the Internet, is a challenging problem that is typically addressed by analyzing network traces. However, obtaining such traces presents its own set of challenges owing to either privacy concerns or to other operational difficulties. The key hypothesis of our work here is that most of the information needed to profile the Internet endpoints is already available around us -on the web.In this paper, we introduce a novel approach for profiling and classifying endpoints. We implement and deploy a Google-based profiling tool, which accurately characterizes endpoint behavior by collecting and strategically combining information freely available on the web. Our 'unconstrained endpoint profiling' approach shows remarkable advances in the following scenarios: (i) Even when no packet traces are available, it can accurately predict application and protocol usage trends at arbitrary networks; (ii) When network traces are available, it dramatically outperforms state-ofthe-art classification tools; (iii) When sampled flow-level traces are available, it retains high classification capabilities when other schemes literally fall apart. Using this approach, we perform unconstrained endpoint profiling at a global scale: for clients in four different world regions (Asia, South and North America and Europe). We provide the first-of-its-kind endpoint analysis which reveals fascinating similarities and differences among these regions.
Abstract-Human communication has changed by the advent of smartphones. Using commonplace mobile device features they started uploading large amounts of content that increases. This increase in demand will overwhelm capacity and limits the providers' ability to provide the quality of service demanded by their users. In the absence of technical solutions, cellular network providers are considering changing billing plans to address this.Our contributions are twofold. First, by analyzing user content upload behavior, we find that the user-generated content problem is a user-behavioral problem. Particularly, by analyzing user mobility and data logs of 2 million users of one of the largest US cellular providers we find that (i) users upload content from a small number of locations; (ii) because such locations are different for users, we find that the problem appears ubiquitous. However, we find that (iii) there exists a significant lag between content generation and uploading times, and (iv) with respect to users, it is always the same users to delay.Second, we propose a cellular network architecture. Our approach proposes capacity upgrades at a select number of locations called Drop Zones. Although not particularly popular for uploads originally, Drop Zones seamlessly fall within the natural movement patterns of a large number of users. They are therefore suited for uploading larger quantities of content in a postponed manner. We design infrastructure placement algorithms and demonstrate that by upgrading infrastructure in only 963 base-stations across the entire United States, it is possible to deliver 50% of content via Drop Zones.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.