Shaping codes are used to encode information for use on channels with cost constraints. Applications include data transmission with a power constraint and, more recently, data storage on flash memories with a constraint on memory cell wear. In the latter application, system requirements often impose a rate constraint. In this paper, we study rate-constrained fixed-to-variable length shaping codes for noiseless, memoryless costly channels and general i.i.d. sources. The analysis relies on the theory of word-valued sources. We establish a relationship between the code expansion factor and minimum average symbol cost. We then determine the expansion factor that minimizes the average cost per source symbol (total cost), corresponding to a conventional optimal source code with cost. An equivalence is established between codes minimizing average symbol cost and codes minimizing total cost, and a separation theorem is proved, showing that optimal shaping can be achieved by a concatenation of optimal compression and optimal shaping for a uniform i.i.d. source. Shaping codes often incorporate, either explicitly or implicitly, some form of non-equiprobable signaling. We use our results to further explore the connections between shaping codes and codes that map a sequence of i.i.d. source symbols into an output sequence of symbols that are approximately independent and distributed according to a specified target distribution, such as distribution matching (DM) codes. Optimal DM codes are characterized in terms of a new performance measuregeneralized expansion factor (GEF) -motivated by the costly channel perspective. The GEF is used to study DM codes that minimize informational divergence and normalized informational divergence.