The previous chapters have looked at methods for designing optimal filters based on mathematical modelling of the signal and noise processes. This is either done through statistical properties such as autocovariance functions or Z-transform models. The Z-transform approach leads us to areas such as spectral factorization and Diophantine equations as part of the solution. These can be computationally demanding for real-time applications. Moreover, the Diophantine approach realizes pole-zero filters which could have stability issues in finite precision arithmetic implementations, especially if the process models are inaccurate. The FIR Wiener filter is not troubled by stability issues, but it does require autocovariance information of the signal plus noise and the noise on its own. To a certain extent the FIR approach is already a kind of adaptive filter since we can estimate noise covariances from direct measurements. This does of course require a means of determining what part of the data is signal and what is signal plus noise and this in turn necessitates the use of voice-activity detectors (VADs) in the case where the signal is a speech waveform. Other approaches which have been tried in the past such as spectral subtraction could also be described as a form of adaptive filter. These approaches are ad-hoc in nature however and more of an after-thought from the main theory of Wiener filtering. They are not generically designed from the ground upwards, but after the filtering theory is complete the thinking goes into how the solution can be made adaptive or "self-tuning". The first attempts at adaptive filtering were analogue, and after a digital theory was developed the computer technology lagged behind the theory and so hardware versions of adaptive filters had to be developed to get the speed faster [1]. Although attempts were made at IIR adaptive filters [2], the FIR method appears to have withstood the test of time. Adaptive FIR filters were implemented in CMOS and hard-wired in the early days of the theory. The earliest application was an adaptive equalizer for communication channels [3]. It used a small number of FIR weights (as they are commonly known instead of coefficients) and used steepest descent to optimize them. Only later did the inventor become aware [4] of the more general work