Many-core architecture is becoming an attractive design choice in high-end embedded systems design. There are, however, many important design issues, and load balancing is one of them. In this work, we take the approach of diffusive load balancing which enables automatic load distribution in many-core systems. We modify the existing scheme by adding the concept of simulated annealing and apply the modified one to a many-core architecture.As an experiment, we map a synthetic application with 30 threads on a many-core architecture with 21 cores and 4 memory tiles. The experiment shows that the modified scheme gives results better than the existing approaches.