This is rateconv.txt in view mode; [Download] [Up]
Parameter design for RATECONV
=============================
Copyright (c) 1992, 1995 by Markus Mummert
Contents
--------
1. Formulas
2. Quick design strategies
3. Introduction to flexible paramter design
4. Design trade-offs
5. Design process step-by-step
6. Examples
7. Comments to experts
8. References
(Implementation notes see source rateconv.c)
1. Formulas
-----------
(1) out(n/fsout) = SUM in(m/fsin)*g((n*d/u-m)*fsin)/fsin
over all m
(2) g(t) = 2*fgK*sinc(pi*2*fgK*t) * exp(-pi*(2*fgG*t)**2)
(3) fsout = fsin * u / d
(4) fSTB = min{fsin,fsout} / 2
(5a) aPAB = -20*log10(0.5*(1-erf(sqrt(pi)*xPAB/2)))dB
(5b) aSTB = -20*log10(0.5*(1-erf(sqrt(pi)*xSTB/2)))dB
(6) fgG = (fSTB - fPAB) / (xSTB - xPAB)
(7) fgK = fPAB - fgG * xPAB
(8) SN = 20*log10(y)
(9) Ty = sqrt(log(y)/pi) / fgG
(10) L = [fsin * Ty] + 1
in(t) input function sampled at t = m/fsin
out(t) output function sampled at t = n/fsout
g(t) interpolation function
fgG key frequency of gaussian time window in g(t)
fgK cutoff frequency of sinc-filter in g(t)
fsin input sample rate (frequency)
fsout output sample rate (frequency)
u integer upsampling factor
d integer downsampling factor
fPAB upper edge frequency of passband
aPAB maximum loss in passband
xPAB fPAB-fgK frequency distance normalized to fgG
fSTB lower edge frequency of stopband
aSTB minimum loss in stopband
xPAB fSTB-fgK frequency distance normalized to fgG
y number of significant sample quantize steps
SN signal-to-noise ratio in dB correponding to y
Ty time width of g(t) with respect to y
L length of FIR-filter realization
pi 3.14156..
[] integer operator
min{} minimum operator
**2 to the power of 2
erf() gaussian error function as defined by UN*X erf(3M)
log() natural log
log10() decadic log
2. Quick design strategies
--------------------------
I To design a rate conversion that is - no matter what the
execution speed penalty might be - on the safe side
even for audio-freaks, we assume SN=96dB, aSTB=96dB
aPAB=1dB at fPAB=fSTB*9/10. (The following design has
already been implemented in the frontend `rcv')
Find a pair of integer values u <= 1024, d in (3)
to match the originally desired fsout with a deviation
of 0.1percent at maximum.
if (u/d >= 1) { /* upsampling */
fgG = fsin/2 * (1 - 9/10) / 4.311
= fsin * 0.0116
fgK = fsin/2 * 9/10 + fgG * 0.981
= fsin * 0.461
L = [1.877 * fsin / fgG] + 1
= 162
} else { /* downsampling */
fgG = u/d * fsin * (1 - 9/10) / 4.311
= u/d * fsin * 0.0116
fgK = u/d * fsin/2 * 9/10 + fgG * 0.981
= u/d * fsin * 0.461
L = [162 * d/u]
}
Ok, you now you've got: fsin, fgK, fgG, L, u, d
That's all you need to make RATECONV happy.
II To fully exploit your CPU-speed for on-the-fly conversion
to/from a AD/DA-device, select a device rate close above
the rate you want to convert from/to. Then determine the
maximum L for a given fsin, d and u, by using fantasy
values for fgK, fgG. If this Lmax for continuous stream
processing is bigger than L of strategy I, continue
there. Else prescribe an SN, maybe 72dB or 64dB, and
use (8,9,10) or TABLE_B below to find fgG. Now only fgK
is left to be specified. You could find fgK by trial and
error, a good heuristic is to have fgK = fSTB - 2.5*fgG
or even 2.15 instead of 2.5. First you could check with a
speech signal for sanity, or what ever material you
commonly work with. If you're happy with the result - fine.
If you're not sure, see 5.h) below. If you really want to
know what you are doing, read on.
3. Introduction to flexible paramter design
-------------------------------------------
_ RATECONV uses an interpolation function g(t) to reconstruct
the bandlimited input function represented by its samples.
This reconstruction will only be evaluated at time
instances corresponding to the new sampling rate (1). The
new rate can take on ratios of the old rate, that defined
by integer values for nominator and denominator in (3).
- The ideal interpolation function is the sinc-function,
which has an infinite length in time domain and cannot
be realized in a practical system. Therefore, in (2) it is
windowed by gaussian-function to rapidly push the skirts
of g(t) below a certain level fast enough, where it can be
chopped off safely. At this point, the level of the gaussian
window below is treated as the minimum realizable signal-to-noise
ratio (S/N) in the output signal which is a safe heuristic.
Thus a prescription for a tolerated signal-to-noise ratio
relates to certain a lenght of g(t) (8-10).
- To specify the parameters of the original g(t) in (2) a
frequency domain approach is taken. The task of interpolation
in the time domain corresponds to lowpass filtering in the
frequency domain. The desired function g(t) now happens to be
the impulse response (IR) of this lowpass. A suitable lowpass
should should suppress unwanted mirror frequencies as muchs
as possible that are found above the minimum-half of the
input and output sampling frequencies (4). This is because
of the sampling theorem, see refs. [1] and [3] at the end.
The presence of mirror frequencies in the baseband below
is called aliasing. It can't be avoided to some extent.
On the other hand, the lowpass should affect baseband
frequencies as little as possible. This leads to a set of
stop- and passband- prescriptions in frequency domain of
g(t) (3-7).
- If the output sampling rate is lower than input sampling
rate, even an ideal interpolating lowpass has to cut some
baseband frequencies used by an input signal. If the output
sampling rate is higher, a portion of higher frequencies
remains unused. This is inevitable.
4. Design trade-offs
--------------------
- In frequency domain, the sinc-function in (2) corresponds
to an ideal lowpass with cutoff frequency fgK. The sharp
cutoff is smoothed by a gaussian-window with key
frequency fgG, corresponding to the gaussian-window in the
time domain. This involves an inevitable but controllable
amount of aliasing. It is adressed as mirror frequency
suppression. Because of the good properties of the gaussian
window it will increase rapidly starting from an amount
of aSTB at fSTB down to lower frequencies of the baseband.
Keeping it low already at fSTB competes with the extent
of g(t) in time.
- The smoothing of the cutoff of the ideal lowpass also
means loosing some bandwith in the baseband. The upper
corner of the baseband is defined by the loss aSTB at
a frequency fPSB, fPSB < fSTB. Shifting it towards fSTB
by insisting on a low aSTB again results in a longer extent
of g(t) in time.
- Disregarding efficiency a sharp passband-to-stopband
transition seems to be attractive as the loss of bandwith
is small. This is ONLY the case if the transition
lies outside the audible frequency range. But if you have
to have a filter cut-off e.g. at around 4kHz you don't
want a sharp transition since it induces `ringing', i.e.
you can hear the cut-off frequency in the output signal.
This is because a sharp-edge in the frequency domain
corresponds to a long impuse response in the time domain.
A short-time frequency-analyzer like the ear can detect
this (and finds it annoying).
- On the the other hand one could argue that preserving as much
bandwith as possible is a better choice, leaving the task
of psychoacoustic smoothing to later stages of processing.
E.g. it could be viewed as the task of the final analog
reconstruction filter.
- The time limited version of g(t) is realized as FIR-filter
of length L. For computational efficiency, time extent
should be short. The more the old and new rates differ,
the less efficient is the conversion process. In this case,
multiple application of RATECONV scales better (multistage
design is beyond the scope of this text, see reference).
- (8-10) rely on the ASSUMPTION that the cutoff level of the
gaussian window below its maximum reflects the realizable
S/N. This heuristic is on the safe side but often leads to
values of L around a THIRD larger than necessary.
- Time limitation competes with aliasing cumulatively
described by a certain amount of S/N in the output signal.
Chopping off g(t) to early will result in additional
smoothing of the frequency characteristics of g(t)
with a sinc-function in the frequency domain. This is
MUCH worse than smoothing with the gaussian-window and
induces a severe amount of aliasing easily. That's why
time limitation is identified directly with a certain
amount of noise in the output signal, as opposed to
mirror frequency suppression described above.
- It is not possible to have a the input signal use the full
(16-bit) dynamic range and to avoid overload distortion
in the output signal in all cases - if gain has to be one.
Because of Gibb's phenomenon the interpolation process
can overshoot on the input range to an amount of 18percent.
To be one the safe side use a default gain of 0.8.
5. Design process step-by-step
------------------------------
a) First, the desired rate conversion ratio is used to find the
smallest possible integer values for u and d (3). The value
of u is limited by coefficient set memory size in RATECONV.
As errors of the new rate below 0.1percent should be
psychoacoustically irrelvant, this is no true limitation.
b) Choose passband and stopband parameters fPAB, aPAB and
fSTB, aSTB, where fSTB is bound to the minimum-half of the
input and output frequencies by (4).
aSTB represents the minimum suppression of aliasing
frequencies from above fSTB. In most audio applications,
aSTB can be about 20dB lower than the intented signal-to-
noise ratio for the following reasons: In frequency domain,
suppression will increase rapidly above fSTB, having more
effect on mirror frequencies that correspond to lower baseband
frequencies; but most audio spectra gradually decrease with
increasing frequency, especially the critical tonal
components.
c) Find values of x corresponding to a using (5a,5b). (5a,5b)
become inaccurate, if not fgK >> fgG (s.below)). Since
inversion of the formula is difficult, use following table to
interpolate:
TABLE_A
a = 1dB 3dB 6dB 30dB 36dB 42dB 48dB 54dB 60dB 66dB 72dB 96dB
x = -.981-.435 .0 1.48 1.71 1.86 2.12 2.30 2.47 2.63 2.78 3.33
d) Determine the key frequencies fgK and fgG using (6) and (7)
e) Choose minimum signal-to-noise ratio SN. Below noise level,
SEVERE aliasing occurs due to time limitation of g(t).
This is a CRITICAL parameter.
f) Find y followed by Ty using (8) and (9), and finally the
FIR-filter lenght L using (10). The following table can be
used to resolve the dependencies of L, SN, fgG and fsin in
common cases:
TABLE_B
fgG/fsin = .100 .075 .050 .040 .030 .025 .020 .015 .010 .005
L = 10 SN = 22 12 . . . . . . . .
20 98 55 25 16 . . . (SN < 10dB)
30 . 129 57 37 21 14 . . . .
40 . . 104 66 37 26 17 . . .
50 . . 164 105 59 41 26 15 . .
60 . . . 152 85 59 38 21 . .
70 . . . . 117 81 52 29 13 .
80 . . . . 153 106 68 38 17 .
90 . . . . 194 135 86 49 22 .
100 . . . . . 167 107 60 27 .
120 . . . . . . 154 87 39 10
140 . . . . . . . 119 53 13
160 . . . . . . . 155 69 17
180 . . . . . . . 197 87 22
200 . . (SN > 200dB) . . . 108 27
250 . . . . . . . . 169 42
300 . . . . . . . . . 61
350 . . . . . . . . . 83
400 . . . . . . . . . 109dB
g) Repeat from b), if L is too big to perform rate conversion
fast enough, relaxing design prescriptions in b) and e).
h) Finally, a good testing method is to use a slow sweeping
sine wave passing between fsin/2 and 0. Listening to the
converted signal reveals mirror frequencies moving the
opposite direction, if mirror frequency suppression is
too weak and/or if SN is too low.
6. Examples
-----------
DAT-to-CD:
fsin = 48kHz*
fsout = 44.1kHz
(3) -> u = 147*
(3) -> d = 160*
(3) -> realized fsout = 44.1kHz (exact)
fPAB = 20kHz
aPAB = 1dB
(4) -> fSTB = 22.05kHz
aSTB = 96dB
(5a) -> xPAB = -0.981
(5b) -> xSTB = 3.33
(6) -> fgG = 476Hz*
(7) -> fgK = 20467Hz*
SN = 96
(8) -> y = 63096
(9) -> Ty = 3.94ms
(10) -> L = 190*
*arguments for RATECONV: 48000 20467 476 190 147 160
CD-to-DAT:
fsin = 44.1kHz
fsout = 48kHz
fPAB = 20kHz
aPAB = 1dB
aSTB = 96dB
use the same calculation as in example 1 until it gets to L
(10) -> L = 174*
*arguments for RATECONV: 44100 20467 476 174 160 147
Double rate:
fsin = 1*
fsout = 2
(3) -> u = 2*
(3) -> d = 1*
fPAB = 0.45
aPAB = 1(3)dB
(4) -> fSTB = 0.5
aSTB = 72(66)dB
(5a) -> xPAB = -0.981(0.435)
(5b) -> xSTB = 2.78(2.63)
(6) -> fgG = 0.0133(0.0163)*
(7) -> fgK = 0.463(0.457)*
SN = 72(66)dB
(8) -> y = 3981(1995)
(9) -> Ty = 122(96)
(10) -> L = 122(96)
*arguments for RATECONV: 1 0.45 0.0133 122 2 1
(1 0.45 0.0163 96 2 1)
Halve rate:
fsin = 1*
fsout = 0.5
(3) -> u = 1*
(3) -> d = 2*
fPAB = 0.225
aPAB = 1dB
(4) -> fSTB = 0.25
aSTB = 72dB
(5a) -> xPAB = -0.981
(5b) -> xSTB = 2.78
(6) -> fgG = 0.00665*
(7) -> fgK = 0.2315*
SN = 72dB
(8) -> y = 3981
(9) -> Ty = 245
(10) -> L = 245
*arguments for RATECONV: 1 0.225 0.00665 245 1 2
12.8kHz to 11.05kHz:
fsin = 12.8kHz*
fsout = 11.05kHz
(3) -> u = 25*
(3) -> d = 29*
(3) -> realized fsout = 11.03kHz
fPAB = 5000Hz
aPAB = 3dB
(4) -> fSTB = 5.525kHz
aSTB = 48dB
(5a) -> xPAB = -0.435
(5b) -> xSTB = 2.12
(6) -> fgG = 205Hz*
(7) -> fgK = 5089Hz*
SN = 66
(8) -> y = 2048
(9) -> Ty = 7.60ms
(10) -> L = 98*
*arguments for RATECONV: 12800 5089 205 98 25 29
7. Comments to experts
----------------------
- I have yet no proof for the ASSUMPTION that the time-cutoff
level of the gaussian window governs S/N. Experiments have
shown that this heuristic is on the safe side meaning that
L is longer than necessary. One could argue that the absolute
height of the envelope of the sampled version gs(n) =
g(n/fsin)/fsin of g(t) should be taken. The envelope for
|2*fgK*n/fsin| >= 1 is, using sinc(x) = sin(x)/x,
e{gs(n)} = e{2*fgK*sinc(pi*2*fgK*n/fsin)/fsin} *
e{exp(-pi*(2*fgG*n/fsin)**2)}
= 1/(pi * n) * exp(-pi*(2*fgG*n/fsin)**2).
We want to have e{gs([L/2]+1)} <= 1/y for a given y specified
by (8). Obviously the effect of the fraction 1/(pi * n) is
not present in (9). Taking it into account leads to values
of L being around a third smaller. However, experiments prove
these values to realize an S/N often smaller than intended.
This needs clarification.
- Deduction of the frequency response G(w) of g(t):
g(t), G(w) form a Fourier-correspondence
g(t) = 2*fgK*sinc(pi*2*fgK*t) * exp(-pi*(2*fgG*t)**2),
G(w) = (s(w + 2*pi*fgK) - s(w - 2*pi*fgK))
(convolution along w)
exp(-pi*(w/4*pi*fgG)**2)/(2*fgG),
where s(x) = {1 if x > 1; 0 if x < 0; 0.5 if x = 0}.
Evalution of the convolution with on of the s(...)-terms
corresponds to integration of the exp(...)-term over w and a
w-shift of +/-2*pi*fgK, yielding
G(w) = 0.5*(1+erf(sqrt(pi)/2 * (w/(2*pi) + fgK)/fgG)) -
0.5*(1+erf(sqrt(pi)/2 * (w/(2*pi) - fgK)/fgG))
Since the first term amounts to 1, if w > 0 and fgK >> fgG,
we obtain in sane cases the sufficiently close approximation
G(w) = 0.5*(1-erf(sqrt(pi)*x/2),
relied on by (5-7), where x = (w/(2*pi) - fgK)/fgG, w=2*pi*f.
- The approach to use a time limited, gaussian-windowed
sinc-function possibly doesn't lead to a very short FIR-lenght
as more elaborate decimation/interpolation-filter designs.
But I presume that, largely, it's only off by a factor of two.
- Of course all frequencies can be normalized by fsin. Only
ratios of frequencies to sampling frequencies are
signififcant. This goes especially for the specifications
of the lowpass design. Therefore, explicit specification of
fsin could be avoided. But for psychoacoustic reasons the
design enforces the notion of absolute frequencies
(See trade-off on steepness).
8. References
-------------
Specifically:
[1] Crochiere, R. E., Rabiner, L. R.: "Multirate Digital Signal
Processing", Prentice-Hall, Englewood Cliffs, New Jersey, 1983
Generally:
[2] Zwicker, E., Fastl, H.: "Psychoacoustics - Facts and Models",
Springer-Verlag, Berlin, Heidelberg, New-York, Tokyo, 1990
[3] Books on Time discrete signal processing, systems theory ...
like those from Oppenheim and Schafer (check with your
librarian...)
===EOT===$Id: rateconv.txt,v 1.5 1995/12/09 02:16:21 mummert Exp mummert $
These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.