A Hardware Architecture for Efficient Implementation of Real-Time Weighted Median Filters
Weighted median filters have been shown to be effective in the removal of impulsive noise from a wide variety of video images. Median filters in general, however, are computationally expensive to implement because of the fact that a running sort of the incoming data must be performed. Weighted median filters, although they remove noise more effectively than standard median filters, are even more difficult to implement because the traditional software-based approach has been to increase the amount of sorting that is done, proportional to the weighting values used. Because of the computational difficulties associated with the use of the algorithm, the use of weighted median filters in video processing applications has mostly been limited to non-real-time software implementations. This paper describes a fairly simple solution to the problem, and allows a high-speed, low-complexity hardware solution to be built for real-time applications.
The technique described in this paper is unique because it uses three additional information fields that are appended to the incoming data words as they are being processed. These three fields, the weight value, the new bit, and the age value are updated appropriately by associated control logic, and then stored within the array. Advantages of this technique are that a minimal amount of extra storage is used and the control logic necessary to update the three extra fields is of low complexity. Because of this, relatively inexpensive real-time hardware implementations may be accomplished by using devices such as Field Programmable Gate Array (FPGA) chips.
The standard two-dimensional median filter is good at
removing impulse noise from video images, but the price that is paid is that
the image tends to get “softer”, as there is a low-pass filtering effect.
This effect may be minimized by the use of a weighted two-dimensional
median filter, whose weights may be either permanently fixed or adaptively
modified depending on the particular criteria being satisfied.
This description does not go into details on methods for the adaptation
process itself, as that is a totally independent problem from that of the
weighted median filter calculations themselves.
The mechanism described is also independent of the width of the data
values to be filtered, the NxN size of the data array, and the actual weight
values themselves. For the sake
of simplicity, examples will be given detailing the use of 8-bit data values,
a 3x3 size, and a fixed array of weights (1,3,1,3,5,3,1,3,1 in linear order).
These values are the ones that were used in software simulations and
VHDL hardware descriptions.
The basic updating scheme for the 3x3 matrix is shown
in Figure 1. It should be noted
that this method is not limited to the 3x3 example given here, and it is
straightforward to extend it to any arbitrarily sized filter.
It is simpler to think in single-dimensional terms, so the data values
are linearized to appear to be a 9x1 filter block.
The data is stored in an array of N2 + 1 register or memory
locations. In the example case, 32 + 1 = 10, so there are 10
registers allocated for data storage.
In the first part of the example, the nine 8-bit
values are pre-sorted so that the current median filter output will be the
sample in the exact middle of the register array, or the output of register
#(N2 – 1)/2, which in this example is register #4.
A new data value is about to be read in during the next cycle.
The new data value is compared to all of the values
currently stored in the array, and is placed in the correct location
immediately. Lower values are
shifted to the right, and the lowest value “overflows” into the extra
register at the end of the array (register #9).
In the second half of the cycle, the data tags are
read to determine which data value to remove from the array, and all data
values to the right of it are now shifted left, stopping at the point where
the oldest previous value was removed from.
After these operations are completed, the median value is again the
value in the middle of the array, or the one located in register #4 of this
example.
The implementation of the weighted median filter is
similar to that of the 3x3 median filter, but more complicated because of the
weighting operations that must take place.
As the three new data values are input into the sorting array, they
must be assigned a weight depending on what position they have in the original
data matrix. The first data value
to be input is assigned a weight of “1”, the second is assigned a weight
of “3”, and the third is assigned a weight of “1”.
This weighting is now maintained and updated within the sorting array
registers, so another three bits must be assigned to the register length to
accommodate the highest possible weight value of 5. Because of this, the actual width of this bit field in an
implementation must be wide enough to accommodate a binary representation of
the highest weight value that will be used.
A final single bit is also associated with each register within the
array, and that bit is used to tell the internal logic which three weight
values have just been entered as new values, and which six remaining weights
need to be updated reflective of their new positions in the data matrix.
As this value merely tells the internal logic which data values are new
and which ones are not, the width of this field does not have to change.
Because of these extra bits associated with weight maintenance, each
register within the sorting array must be at least 16 bits wide in this 3x3
weighted median filter implementation, and possibly more depending on the
other factors described previously. See
Figure 2 for more information.
A fourth clocking cycle is added compared to the 3x3
median filter implementation to allow for weight updates to occur within the
sorted array. Because of this,
the total time necessary for an output is eight clock cycles.
The use of eight clock cycles is somewhat arbitrary, as the example
implementation is done using general-purpose registers constructed of
flip-flops. A memory-based
storage array could also be designed that would result in the use of less
clock cycles for a particular output. Note
that the basic sorting operation of the register array is accomplished in the
same manner as in Figure 1, but the middle register value is not automatically
output as the answer. Further
processing via the data value weights is necessary, and there are basically
two methods for doing this. There
methods are shown in Figure 3. After
analyzing the hardware designs for both approaches, the second method was
chosen for the final implementation, as it is considerably simpler.
RTL VHDL code was written to compare implementation complexities between a standard 3x3 median filter, a 3x3 multi-stage median filter, and a 3x3 weighted median filter. Examples of this code may be found at the following pages -
3x3
multi-stage median filter VHDL
3x3
weighted median filter VHDL
The architecture described in this paper is very useful for the implementation of real-time impulse noise filters for video images, without unduly disturbing the basic image quality. This particular type of noise is extremely detrimental to the performance of video compression schemes such as MPEG, so it is desirable to implement the filter in an efficient manner. The use of a hardware architecture to implement the filter is preferred over a typical software approach as the effective throughput is considerably higher for this type of non-linear filter structure.