In this thesis, we propose new algorithms, methods, and datasets that can be used to classify, to mine information, and rank web domains or similar resources containing text. Motivated by our joint work with INCIBE, we focus our efforts on detecting web resources which content could indicate illegal activities. Most of these textual web pages are hosted in a darknet, and, because of that, we centered our analysis in The Onion Router (Tor) Darknet, based on the common belief that this net hosts plenty of criminal activities. Additionally, we also addressed the same problem in Online Notepad Services (ONS), in particular, Pastebin service.Several of the contributions that we present here are already incorporated in tools developed by INCIBE that help Spanish Law Enforcement Agencies (LEAs) to monitor the contents of the Tor Darknet. Our work relies on the application of machine learning, both classical and deep, using most of the time supervised learning. This approach required the creation of different datasets, naming the first of them as Darknet Usage Text Addresses (DUTA), which contained 6, 831 labeled samples distributed over 26 classes. Posteriorly, we extended this dataset up to 10, 367 samples, naming it as DUTA-10K.Using DUTA, we evaluated the combination of two text representation techniques with three well-known classifiers to categorize the Tor domains. The combination of TF-IDF words representation with Logistic Regression achieved a 93.7% macro F1 score, in a subset of DUTA where eight categories of illegal activities were selected. To classify Pastebin contents, we use Active Learning to select and label only the most informative samples, reducing in this way, the cost of building a labeled dataset. Our design requires three cascade classifiers, saying the last one whether a sample belongs to one out of six categories related to criminal activities, obtaining an average class recall of 95.24% as binary, and 80.33% as multiclass.To enrich the information that we provide to LEAs, we develop first a semi-automatic algorithm to identify emerging products in Tor marketplaces. Using Graph Theory, we build a Products Correlations Graph (PCG), in which the nodes are the markets' products, and the edges reflect the simultaneous offering of two products in the same market. Our algorithm decomposes the PCG, using the k-shell algorithm, and analyzes the connectivity of the products in the core-shell. We apply this method to drug Hidden Services (HS) in DUTA, finding that MDMA and Ecstasy were the most emerging drug products during the analyzed period. Second, we used Named Entity Recognition (NER) to recognize rare and emerging named entities in noisy user-generated text. We overcome the use of gazetteers to incorporate external resources to neural network architectures, presenting a novel feature that we named Local Distance Neighbor (LDN), obtaining in this way the state-of-the-art F1 score on three categories of the W-NUT-2017 dataset: Group, Person, and Product. Furthermore, we present an application of NER...