Direct preference optimization (DPO) is a successful fine-tuning strategy for aligning large language models with human preferences without the need to train a reward model or employ reinforcement learning. DPO, as originally formulated, relies on binary preference data and fine-tunes a language model to increase the likelihood of a preferred response over a dispreferred response. However, not all preference pairs are equal: while in some cases the preferred response is only slightly better than the dispreferred response, there can be a stronger preference for one response when, for example, the other response includes harmful or toxic content. In this paper, we propose a generalization of DPO, termed DPO with an offset (ODPO), that does not treat every preference pair equally during fine-tuning. Intuitively, ODPO requires the difference between the likelihood of the preferred and dispreferred response to be greater than an offset value. The offset is determined based on the extent to which one response is preferred over another. Our experiments on various tasks suggest that ODPO significantly outperforms DPO in aligning language models, especially when the number of preference pairs is limited.
Rare earth elements (REEs) are crucial to many modern products used in both civilian and defense applications. Currently, a reliable supply of these elements is uncertain with the clear majority of REE production and refining occurring predominately in China. Furthermore, domestic ore deposits with commercially attractive concentrations of REEs are uncommon in the United States. As a result, the identification of a domestic supply of these technology metals is essential not only for manufacturing consumer merchandise but also for national security. Recently, one promising source of REEs has been identified: coal and coal-byproducts. One of those is acid mine drainage (AMD), the most prevalent water quality impediment in the Appalachian coal mining region. This research found that AMD concentrates REEs through an autogenous process where the presence of sulfide material in an oxidizing environment results in a general lowering of water pH. This acidic water in turn leaches metals, including REEs, from the surrounding geologic strata. Accordingly, this degraded water holds potential value as a REE source. Furthermore, identification of this environmental burden as a reliable supply of REEs could incentivize additional treatment efforts, while providing an additional revenue stream to those responsible for mitigating this substantial source of water pollution. However, current scientific literature lacks systemic studies that describe the content, distribution, and processing amenability of this resource. Therefore, this research details a study that: (1) characterized the REEs contained in AMD and its byproducts; (2) classified the REEs inherent to AMD and identified the size of the resource; (3) designed a process to recover REEs from AMD byproducts; and (4) demonstrated the feasibility of the beneficiation process by generating a concentrated REE product from AMD. This was accomplished by conducting a broad sampling campaign where 185 raw AMD and 623 AMD precipitate (AMDp) samples were collected across the Northern and Central Appalachian coal basins. Next, a series of laboratory experiments were conducted to determine a hydrometallurgical processing route to recover the REEs from AMDp. The results of the laboratory-scale studies were utilized to design a bench-scale plant capable of producing a concentrated REE product. Finally, an acid leaching and solvent extraction demonstration plant was constructed and operated which produced a rare earth oxide product with a purity greater than 60%. ACKNOWLEDGMENTS I would first like to express my sincere gratitude to my committee chair, Dr. Aaron Noble, for his expert guidance, understanding, support, and encouragement. Dr. Nobles charisma, wisdom, and astute analytical reasoning have guided me throughout graduate school. I will be eternally grateful for everything you have taught me.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.