A Hardware Architecture for Efficient Implementation of Real-Time Weighted Median Filters

 

By John Wiseman

 

 

 

Introduction

Weighted median filters have been shown to be effective in the removal of impulsive noise from a wide variety of video images.  Median filters in general, however, are computationally expensive to implement because of the fact that a running sort of the incoming data must be performed.  Weighted median filters, although they remove noise more effectively than standard median filters, are even more difficult to implement because the traditional software-based approach has been to increase the amount of sorting that is done, proportional to the weighting values used.  Because of the computational difficulties associated with the use of the algorithm, the use of weighted median filters in video processing applications has mostly been limited to non-real-time software implementations.  This paper describes a fairly simple solution to the problem, and allows a high-speed, low-complexity hardware solution to be built for real-time applications.

The technique described in this paper is unique because it uses three additional information fields that are appended to the incoming data words as they are being processed.  These three fields, the weight value, the new bit, and the age value are updated appropriately by associated control logic, and then stored within the array.  Advantages of this technique are that a minimal amount of extra storage is used and the control logic necessary to update the three extra fields is of low complexity.  Because of this, relatively inexpensive real-time hardware implementations may be accomplished by using devices such as Field Programmable Gate Array (FPGA) chips.

Technical Description

The standard two-dimensional median filter is good at removing impulse noise from video images, but the price that is paid is that the image tends to get “softer”, as there is a low-pass filtering effect.  This effect may be minimized by the use of a weighted two-dimensional median filter, whose weights may be either permanently fixed or adaptively modified depending on the particular criteria being satisfied.  This description does not go into details on methods for the adaptation process itself, as that is a totally independent problem from that of the weighted median filter calculations themselves.  The mechanism described is also independent of the width of the data values to be filtered, the NxN size of the data array, and the actual weight values themselves.  For the sake of simplicity, examples will be given detailing the use of 8-bit data values, a 3x3 size, and a fixed array of weights (1,3,1,3,5,3,1,3,1 in linear order).  These values are the ones that were used in software simulations and VHDL hardware descriptions.

The basic updating scheme for the 3x3 matrix is shown in Figure 1.  It should be noted that this method is not limited to the 3x3 example given here, and it is straightforward to extend it to any arbitrarily sized filter.  It is simpler to think in single-dimensional terms, so the data values are linearized to appear to be a 9x1 filter block.  The data is stored in an array of N2 + 1 register or memory locations.  In the example case, 32 + 1 = 10, so there are 10 registers allocated for data storage.  

 

In the first part of the example, the nine 8-bit values are pre-sorted so that the current median filter output will be the sample in the exact middle of the register array, or the output of register #(N2 – 1)/2, which in this example is register #4.  A new data value is about to be read in during the next cycle.

The new data value is compared to all of the values currently stored in the array, and is placed in the correct location immediately.  Lower values are shifted to the right, and the lowest value “overflows” into the extra register at the end of the array (register #9).

In the second half of the cycle, the data tags are read to determine which data value to remove from the array, and all data values to the right of it are now shifted left, stopping at the point where the oldest previous value was removed from.  After these operations are completed, the median value is again the value in the middle of the array, or the one located in register #4 of this example.

The implementation of the weighted median filter is similar to that of the 3x3 median filter, but more complicated because of the weighting operations that must take place.  As the three new data values are input into the sorting array, they must be assigned a weight depending on what position they have in the original data matrix.  The first data value to be input is assigned a weight of “1”, the second is assigned a weight of “3”, and the third is assigned a weight of “1”.  This weighting is now maintained and updated within the sorting array registers, so another three bits must be assigned to the register length to accommodate the highest possible weight value of 5.  Because of this, the actual width of this bit field in an implementation must be wide enough to accommodate a binary representation of the highest weight value that will be used.  A final single bit is also associated with each register within the array, and that bit is used to tell the internal logic which three weight values have just been entered as new values, and which six remaining weights need to be updated reflective of their new positions in the data matrix.  As this value merely tells the internal logic which data values are new and which ones are not, the width of this field does not have to change.  Because of these extra bits associated with weight maintenance, each register within the sorting array must be at least 16 bits wide in this 3x3 weighted median filter implementation, and possibly more depending on the other factors described previously.  See Figure 2 for more information.  

 

A fourth clocking cycle is added compared to the 3x3 median filter implementation to allow for weight updates to occur within the sorted array.  Because of this, the total time necessary for an output is eight clock cycles.  The use of eight clock cycles is somewhat arbitrary, as the example implementation is done using general-purpose registers constructed of flip-flops.  A memory-based storage array could also be designed that would result in the use of less clock cycles for a particular output.  Note that the basic sorting operation of the register array is accomplished in the same manner as in Figure 1, but the middle register value is not automatically output as the answer.  Further processing via the data value weights is necessary, and there are basically two methods for doing this.  There methods are shown in Figure 3.  After analyzing the hardware designs for both approaches, the second method was chosen for the final implementation, as it is considerably simpler.  

 

RTL VHDL code was written to compare implementation complexities between a standard 3x3 median filter, a 3x3 multi-stage median filter, and a 3x3 weighted median filter.  Examples of this code may be found at the following pages -

3x3 median filter VHDL

3x3 multi-stage median filter VHDL

3x3 weighted median filter VHDL

Conclusion

The architecture described in this paper is very useful for the implementation of real-time impulse noise filters for video images, without unduly disturbing the basic image quality.  This particular type of noise is extremely detrimental to the performance of video compression schemes such as MPEG, so it is desirable to implement the filter in an efficient manner.  The use of a hardware architecture to implement the filter is preferred over a typical software approach as the effective throughput is considerably higher for this type of non-linear filter structure.