This study presents a comprehensive workflow for developing and deploying Multi-Layer Perceptron (MLP)-based soft sensors on embedded FPGAs, addressing diverse deployment objectives. The proposed workflow extends our prior research by introducing greater model adaptability. It supports various configurations—spanning layer counts, neuron counts, and quantization bitwidths—to accommodate the constraints and capabilities of different FPGA platforms. The workflow incorporates a custom-developed, open-source toolchain ElasticAI.Creator that facilitates quantization-aware training, integer-only inference, automated accelerator generation using VHDL templates, and synthesis alongside performance estimation. A case study on fluid flow estimation was conducted on two FPGA platforms: the AMD Spartan-7 XC7S15 and the Lattice iCE40UP5K. For precision-focused and latency-sensitive deployments, a six-layer, 60-neuron MLP accelerator quantized to 8 bits on the XC7S15 achieved an MSE of 56.56, an MAPE of 1.61%, and an inference latency of 23.87 μs. Moreover, for low-power and energy-constrained deployments, a five-layer, 30-neuron MLP accelerator quantized to 8 bits on the iCE40UP5K achieved an inference latency of 83.37 μs, a power consumption of 2.06 mW, and an energy consumption of just 0.172 μJ per inference. These results confirm the workflow’s ability to identify optimal FPGA accelerators tailored to specific deployment requirements, achieving a balanced trade-off between precision, inference latency, and energy efficiency.