We have developed streaming implementations of two numerical linear algebra operations that further exploit the block decomposition strategies commonly used in these operations to obtain performance. The implementations formulate algorithms as data flow graphs and use coarse-grained parallelism to (1) emit a block in the result matrix as soon as it becomes available and (2) compute on multiple blocks in parallel. This streaming design benefits data flow graphs consisting of multiple linear algebra operations as it removes synchronization points between successive operations: a result block from an operation can be used immediately in an algorithm's successor operations without waiting for the full result from the first operation. Early comparisons with OpenBLAS functions on CPUs show comparable performance for computing with large dense matrices and an earliest arrival time of a result block that is up to 50x smaller than the time needed for a full result. More thorough studies can show the impact of such implementations on the performance of systems by chaining multiple linear algebra operations.