The detection of malicious behavior, that is, judging if a host/domain is malicious or benign (i.e., negative or positive labels), is complicated by the issue of imbalanced label distributions, as well as the limited amount of ground truth available to train supervised models or build rules. To tackle these challenges, we propose a novel framework to learn cost-sensitive models on both network hosts and external domains simultaneously, based on a bipartite connectivity graph constructed between them. We also explicitly incorporate behavioral features of the hosts computed from the network data as well as lexical and reputational features computed for the external domains into the proposed framework. Specifically, we model the predicted labels, measure the misclassification errors by the Hamming distance between the predicted and true labels, incorporate different costs for different misclassification types (i.e., false negative or false positive), and constrain connected nodes to share the same labels in high probability. The proposed framework is then formulated as an optimization problem, which minimizes the total cost, that is, the misclassification costs multiplied by the misclassification errors. As the Hamming distance function is non-differentiable, we introduce a continuous loss function to approximate it with performance guaranteed. We develop an effective algorithm with good convergence property via Stochastic Gradient Descent technique. Experimental results on both synthetic and a real network dataset collected from an enterprise demonstrate the effectiveness of the proposed framework.