The popularity of Partitioned Global Address Space (PGAS) languages has increased during the last years thanks to their high programmability and performance through an efficient exploitation of data locality, especially on hierarchical architectures such as multicore clusters. This paper describes UP-CBLAS, a parallel numerical library for dense matrix computations using the PGAS Unified Parallel C (UPC) language. The routines developed in UPCBLAS are built on top of sequential BLAS functions and exploit the particularities of the PGAS paradigm, taking into account data locality in order to achieve a good performance. Furthermore, the routines implement other optimization techniques, several of them by automatically taking into account the hardware characteristics of the underlying systems on which they are executed. The library has been experimentally evaluated on a multicore supercomputer and compared to a message-passing based parallel numerical library, demonstrating good scalability and efficiency.