Hyeontaek Lim scite author profile

Large language models have been shown to achieve remarkable performance across a variety of natural language tasks using few-shot learning, which drastically reduces the number of task-specific training examples needed to adapt the model to a particular application. To further our understanding of the impact of scale on few-shot learning, we trained a 540-billion parameter, densely activated, Transformer language model, which we call Pathways Language Model (PaLM).We trained PaLM on 6144 TPU v4 chips using Pathways, a new ML system which enables highly efficient training across multiple TPU Pods. We demonstrate continued benefits of scaling by achieving state-ofthe-art few-shot learning results on hundreds of language understanding and generation benchmarks. On a number of these tasks, PaLM 540B achieves breakthrough performance, outperforming the finetuned stateof-the-art on a suite of multi-step reasoning tasks, and outperforming average human performance on the recently released BIG-bench benchmark. A significant number of BIG-bench tasks showed discontinuous improvements from model scale, meaning that performance steeply increased as we scaled to our largest model. PaLM also has strong capabilities in multilingual tasks and source code generation, which we demonstrate on a wide array of benchmarks. We additionally provide a comprehensive analysis on bias and toxicity, and study the extent of training data memorization with respect to model scale. Finally, we discuss the ethical considerations related to large language models and discuss potential mitigation strategies. * Equal Contribution. Author contributions and ordering details are listed in Appendix A.

show abstract

Task-aware virtual machine scheduling for I/O performance.

Kim

Lim

Jeong

et al. 2009

166

View full text Add to dashboard Cite

The use of virtualization is progressively accommodating diverse and unpredictable workloads as being adopted in virtual desktop and cloud computing environments. Since a virtual machine monitor lacks knowledge of each virtual machine, the unpredictableness of workloads makes resource allocation difficult. Particularly, virtual machine scheduling has a critical impact on I/O performance in cases where the virtual machine monitor is agnostic about the internal workloads of virtual machines. This paper presents a task-aware virtual machine scheduling mechanism based on inference techniques using gray-box knowledge. The proposed mechanism infers the I/O-boundness of guest-level tasks and correlates incoming events with I/O-bound tasks. With this information, we introduce partial boosting, which is a priority boosting mechanism with tasklevel granularity, so that an I/O-bound task is selectively scheduled to handle its incoming events promptly. Our technique focuses on improving the performance of I/O-bound tasks within heterogeneous workloads by lightweight mechanisms with complete CPU fairness among virtual machines. All implementation is confined to the virtualization layer based on the Xen virtual machine monitor and the credit scheduler. We evaluate our prototype in terms of I/O performance and CPU fairness over synthetic mixed workloads and realistic applications.

show abstract

Scalable, high performance ethernet forwarding with CuckooSwitch

et al. 2013

View full text Add to dashboard Cite

Several emerging network trends and new architectural ideas are placing increasing demand on forwarding table sizes. From massivescale datacenter networks running millions of virtual machines to flow-based software-defined networking, many intriguing design options require FIBs that can scale well beyond the thousands or tens of thousands possible using today's commodity switching chips. This paper presents CUCKOOSWITCH, a software-based Ethernet switch design built around a memory-efficient, high-performance, and highly-concurrent hash table for compact and fast FIB lookup. We show that CUCKOOSWITCH can process 92.22 million minimumsized packets per second on a commodity server equipped with eight 10 Gbps Ethernet interfaces while maintaining a forwarding table of one billion forwarding entries. This rate is the maximum packets per second achievable across the underlying hardware's PCI buses.

show abstract

Building a Bw-Tree Takes More Than Just Buzz Words

Wang

Pavlo

Lim

et al. 2018

View full text Add to dashboard Cite

Architecting to achieve a billion requests per second throughput on a single key-value store server platform

Lim

Lee

et al. 2015

View full text Add to dashboard Cite

Distributed in-memory key-value stores (KVSs), such as memcached, have become a critical data serving layer in modern Internet-oriented datacenter infrastructure. Their performance and efficiency directly affect the QoS of web services and the efficiency of datacenters. Traditionally, these systems have had significant overheads from inefficient network processing, OS kernel involvement, and concurrency control. Two recent research thrusts have focused upon improving key-value performance. Hardware-centric research has started to explore specialized platforms including FPGAs for KVSs; results demonstrated an order of magnitude increase in throughput and energy efficiency over stock memcached. Software-centric research revisited the KVS application to address fundamental software bottlenecks and to exploit the full potential of modern commodity hardware; these efforts too showed orders of magnitude improvement over stock memcached.We aim at architecting high performance and efficient KVS platforms, and start with a rigorous architectural characterization across system stacks over a collection of representative KVS implementations. Our detailed full-system characterization not only identifies the critical hardware/software ingredients for high-performance KVS systems, but also leads to guided optimizations atop a recent design to achieve a recordsetting throughput of 120 million requests per second (MRPS) on a single commodity server. Our implementation delivers 9.2X the performance (RPS) and 2.8X the system energy efficiency (RPS/watt) of the best-published FPGA-based claims. We craft a set of design principles for future platform architectures, and via detailed simulations demonstrate the capability of achieving a billion RPS with a single server constructed following our principles.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Hyeontaek Lim

PaLM: Scaling Language Modeling with Pathways

Task-aware virtual machine scheduling for I/O performance.

Scalable, high performance ethernet forwarding with CuckooSwitch

Building a Bw-Tree Takes More Than Just Buzz Words

Architecting to achieve a billion requests per second throughput on a single key-value store server platform

Contact Info

Product

Resources

About