This paper investigates the structure and dynamics of the Web 2.0 software ecosystem by analyzing empirical data on web service APIs and mashups. Using network analysis tools to visualize the growth of the ecosystem from December 2005 to 2007, we find that the APIs are organized into three tiers, and that mashups are often formed by combining APIs across tiers. Plotting the cumulative distribution of mashups to APIs reveals a power-law relationship, although the tail is short compared to previously reported distributions of book and movie sales. While this finding highlights the dominant role played by the most popular APIs in the mashup ecosystem, additional evidence reveals the importance of less popular APIs in weaving the ecosystem's rich network structure.
and YU SHIWEN, Peking University ________________________________________________________________________ k is the most important parameter in a text categorization system based on the k-nearest neighbor algorithm (kNN). To classify a new document, the k-nearest documents in the training set are determined first. The prediction of categories for this document can then be made according to the category distribution among the k nearest neighbors. Generally speaking, the class distribution in a training set is not even; some classes may have more samples than others. The system's performance is very sensitive to the choice of the parameter k. And it is very likely that a fixed k value will result in a bias for large categories, and will not make full use of the information in the training set. To deal with these problems, an improved kNN strategy, in which different numbers of nearest neighbors for different categories are used instead of a fixed number across all categories, is proposed in this article. More samples (nearest neighbors) will be used to decide whether a test document should be classified in a category that has more samples in the training set. The numbers of nearest neighbors selected for different categories are adaptive to their sample size in the training set. Experiments on two different datasets show that our methods are less sensitive to the parameter k than the traditional ones, and can properly classify documents belonging to smaller classes with a large k. The strategy is especially applicable and promising for cases where estimating the parameter k via cross-validation is not possible and the class distribution of a training set is skewed.
A tornado climatology in China was derived based on a recently completed data set with details on 4763 tornadoes in the period 1948–2012. The tornadoes were rated on the Fujita scale, and design basis tornado wind speeds were estimated. Annual tornadoes were estimated 108 (±44) in China, among which 0–4 exceed F3 on the Fujita scale. Three strongest tornadoes in the data set were rated F4. Assessing from frequency distribution of ratings, tornadoes in China could be mostly of the non‐supercell type. In average of all the country, most tornadoes occur in summer (June–August, about 64%). The peak month is July (31%). Geographically, the majority of tornadoes is distributed in eastern China, with the most frequent tornado occurrence being in the coastal provinces. There are southern type and northern type of tornado occurrence in eastern China. The former peaks in spring, while the latter peaks in summer. Two belts of design basis tornado wind speeds higher than 70 m s−1 were identified (on probability of 10−7 per year). One extends approximately along the coastline of China from south to north, while the other crosses this belt from east to west, approximately upstream of the Yangtze River. The Yangtze River Delta and Pearl River Delta have design basis tornado wind speeds higher than 90 m s−1, with maxima of 105 and 95 m s−1, respectively. However, by transferring into the enhanced Fujita scale, the wind speed is much lower, with the corresponding maxima being 82 and 76 m s−1 respectively.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.