Large language models (LLMs) have demonstrated great capabilities in various natural language understanding and generation tasks. Platforms such as Hugging Face facilitate access and utilization of the pre-trained LLMs for different entities, ranging from computer science researchers to users with little machine learning background. Different entities can further improve the performance of those LLMs on their specific downstream tasks by fine-tuning LLMs. When several entities have similar interested tasks, but their local data cannot be shared directly because of privacy concerns regulations, federated learning (FL) is a mainstream solution to leverage the data of different entities. Besides avoiding direct data sharing, FL can also achieve rigorous data privacy protection, model intelligent property protection, and model customization via composition with different techniques. However, fine-tuning LLMs in federated learning settings still lacks adequate support from the existing FL frameworks because it has to deal with optimizing the consumption of significant communication and computational resources, various data preparation for different tasks, and distinct information protection demands. This paper first discusses these challenges of federated fine-tuning LLMs in detail, and introduces our implemented package FederatedScope-LLM (FS-LLM) as a main contribution, which consists of the following components: (1) we build a complete end-to-end benchmarking pipeline, automizing the processes of dataset preprocessing, federated fine-tuning execution or simulation, and performance evaluation on federated LLM fine-tuning with different capability demonstration purposes; (2) we provide comprehensive and off-the-shelf federated parameterefficient fine-tuning (PEFT) algorithm implementations and versatile programming interfaces for future extension to enhance the capabilities of LLMs in FL scenarios with low communication and computation costs, even without accessing the full model (e.g., closed-source LLMs); (3) we adopt several accelerating operators and resource-efficient operators for fine-tuning LLMs with limited resources and the flexible pluggable sub-routines for interdisciplinary study (e.g., LLMs in personalized FL). We conduct extensive and reproducible experiments to validate the effectiveness of FS-LLM and benchmark advanced LLMs with state-of-theart parameter-efficient fine-tuning algorithms in a federated setting, which also yields many valuable insights into federated fine-tuning LLMs for the research community. To facilitate further research and adoption, we release FS-LLM at https://github.com/alibaba/FederatedScope/tree/llm. 1