“…PSSA executions took place in the 8 worker nodes; these offer a total of 32 CPU-cores, fully specified in the base MPI hostfile supplied to coPSSA, that are used, four at a time, to service the PSSA execution requests of each coPSSA task; thus, each PSSA execution always consumed 4 cores, with 1 core for the master process, and 3 cores for slave processes. In order to fully exploit the 3 slave cores, the number of subdomains processed by PSSA was defined to be no less (and as close as possible) than 3; for 2-dimensional problems, like the ones we tested, this is achieved with a granularity g = 0.5, that generates 4 sub-domains [15]. The PSSA variant used was always the HoD variant, once it is the fastest and uses a fixed number of sub-domains (4, in our evaluation scenario).…”