This report deals with the four main constraints on the scalability of programs parallelized using loop-level parallelism. They are as follows: (1) The available parallelism in the algorithm. (2) The availability and scalability of appropriate hardware (including the operating system and the compilers). (3) Limitations in the design of the hardware. (4) The cost of getting into and out of a parallel section of code. This, in turn, will lead to two important discussions: (1) the theoretical limitations on the scalability of shared memory codes and (2) the role that the choice of hardware and usage policies play in determining the performance of a shared memory code. These discussions will include examples from the author's own work in porting the implicit computational fluid dynamics code F3D from the Cray C90 to a variety of shared memory platforms. iii Acknowledgments The author thanks Marek Behr, formerly of the U.S. Army High Performance Computing Research Center (AHPCRC), for sharing his results and the many colleagues who worked on these research projects over the years and helped collect this data and prepare this report. The author would also like to thank the employees of Business Plus, especially Claudia Coleman and Maria Brady, who assisted in the preparation and editing of this report. Special thanks to Tom Kendall, Denice Brown, and the entire systems staff at the ARL-MSRC for their support of the various projects for which these runs were originally done.