Due to the hardness of solving the minimum spanning tree (MST) problem in stochastic environments, the stochastic MST (SMST) problem has not received the attention it merits, specifically when the probability distribution function (PDF) of the edge weight is not a priori known. In this paper, we first propose a learning automata-based sampling algorithm (Algorithm 1) to solve the MST problem in stochastic graphs where the PDF of the edge weight is assumed to be unknown. At each stage of the proposed algorithm, a set of learning automata is randomly activated and determines the graph edges that must be sampled in that stage. As the proposed algorithm proceeds, the sampling process focuses on the spanning tree with the minimum expected weight. Therefore, the proposed sampling method is capable of decreasing the rate of unnecessary samplings and shortening the time required for finding the SMST. The convergence of this algorithm is theoretically proved and it is shown that by a proper choice of the learning rate the spanning tree with the minimum expected weight can be found with a probability close enough to unity. Numerical results show that Algorithm 1 outperforms the standard sampling method. Selecting a proper learning rate is the most challenging issue in learning automata theory by which a good trade off can be achieved between the cost and efficiency of algorithm. To improve the efficiency (i.e., the convergence speed and convergence rate) of Algorithm 1, we also propose four methods to adjust the learning rate in Algorithm 1 and the resultant algorithms are called as Algorithm 2 through Algorithm 5. In these algorithms, the probabilistic distribution parameters of the edge weight are taken into consideration for adjusting the learning rate. Simulation experiments show the superiority of Algorithm 5 over the others. To show the efficiency of Algorithm 5, its results are compared with those of the multiple edge sensitivity method (MESM). The obtained results show that Algorithm 5 performs better than MESM both in terms of the running time and sampling rate.