We investigate numerically transient linear growth of three-dimensional perturbations in a stratified shear layer to determine which perturbations optimize the growth of the total kinetic and potential energy over a range of finite target time intervals. The stratified shear layer has an initial parallel hyperbolic tangent velocity distribution with Reynolds number Re = U 0 h/ν = 1000 and Prandtl number ν/κ = 1, where ν is the kinematic viscosity of the fluid and κ is the diffusivity of the density. The initial stable buoyancy distribution has constant buoyancy frequency N 0 , and we consider a range of flows with different bulk Richardson number Ri b = N 2 0 h 2 /U 2 0 , which also corresponds to the minimum gradient Richardson number Ri g (z) = N 2 0 /(dU/dz) 2 at the midpoint of the shear layer. For short target times, the optimal perturbations are inherently three-dimensional, while for sufficiently long target times and small Ri b the optimal perturbations are closely related to the normal-mode 'Kelvin-Helmholtz' (KH) instability, consistent with analogous calculations in an unstratified mixing layer recently reported by Arratia et al. (J. Fluid Mech., vol. 717, 2013, pp. 90-133). However, we demonstrate that non-trivial transient growth occurs even when the Richardson number is sufficiently high to stabilize all normal-mode instabilities, with the optimal perturbation exciting internal waves at some distance from the midpoint of the shear layer.