rateconv.txt

This is rateconv.txt in view mode; [Download] [Up]
    Parameter design for RATECONV
    =============================

    Copyright (c) 1992, 1995 by Markus Mummert


    Contents
    --------

    	1. Formulas
    	2. Quick design strategies
    	3. Introduction to flexible paramter design
    	4. Design trade-offs
    	5. Design process step-by-step
    	6. Examples
    	7. Comments to experts
 	8. References

      	(Implementation notes see source rateconv.c)


    1. Formulas
    -----------

    	(1)	out(n/fsout) = SUM in(m/fsin)*g((n*d/u-m)*fsin)/fsin
    			    over all m

    	(2)	g(t) = 2*fgK*sinc(pi*2*fgK*t) * exp(-pi*(2*fgG*t)**2)
    	(3)	fsout = fsin * u / d
    	(4)	fSTB = min{fsin,fsout} / 2
    	(5a)	aPAB = -20*log10(0.5*(1-erf(sqrt(pi)*xPAB/2)))dB
    	(5b)	aSTB = -20*log10(0.5*(1-erf(sqrt(pi)*xSTB/2)))dB
    	(6)	fgG = (fSTB - fPAB) / (xSTB - xPAB)
    	(7)	fgK = fPAB - fgG * xPAB
    	(8)	SN = 20*log10(y)
    	(9)	Ty = sqrt(log(y)/pi) / fgG
    	(10)	L = [fsin * Ty] + 1

    	in(t)	input function sampled at t = m/fsin
    	out(t)	output function sampled at t = n/fsout
    	g(t)	interpolation function
    	fgG	key frequency of gaussian time window in g(t)
    	fgK	cutoff frequency of sinc-filter in g(t)
    	fsin	input sample rate (frequency)
    	fsout	output sample rate (frequency)
    	u	integer upsampling factor
    	d	integer downsampling factor
    	fPAB	upper edge frequency of passband
    	aPAB	maximum loss in passband
    	xPAB	fPAB-fgK frequency distance normalized to fgG
    	fSTB	lower edge frequency of stopband
    	aSTB	minimum loss in stopband
    	xPAB	fSTB-fgK frequency distance normalized to fgG
    	y	number of significant sample quantize steps
    	SN	signal-to-noise ratio in dB correponding to y
    	Ty	time width of g(t) with respect to y
    	L	length of FIR-filter realization
    	pi	3.14156..
    	[]	integer operator
    	min{}	minimum operator
    	**2	to the power of 2
    	erf()	gaussian error function as defined by UN*X erf(3M)
    	log()	natural log
    	log10()	decadic log


    2. Quick design strategies
    --------------------------

    I	To design a rate conversion that is - no matter what the
    	execution speed penalty might be - on the safe side
    	even for audio-freaks, we assume SN=96dB, aSTB=96dB
    	aPAB=1dB at fPAB=fSTB*9/10. (The following design has
        already been implemented in the frontend `rcv')

    	Find a pair of integer values u <= 1024, d in (3)
    	to match the originally desired fsout with a deviation
    	of 0.1percent at maximum.

    	if (u/d >= 1) {	/* upsampling */
    		fgG = fsin/2 * (1 - 9/10) / 4.311
    		    = fsin * 0.0116
    		fgK = fsin/2 * 9/10 + fgG * 0.981
    		    = fsin * 0.461
    		L   = [1.877 * fsin / fgG] + 1
    		    = 162
    	} else {	/* downsampling */
    		fgG = u/d * fsin * (1 - 9/10) / 4.311
    		    = u/d * fsin * 0.0116
    		fgK = u/d * fsin/2 * 9/10 + fgG * 0.981
    		    = u/d * fsin * 0.461
    		L   = [162 * d/u]
    	}

    	Ok, you now you've got:  fsin, fgK, fgG, L, u, d
    	That's all you need to make RATECONV happy.

    II	To fully exploit your CPU-speed for on-the-fly conversion
    	to/from a AD/DA-device, select a device rate close above
	the rate you want to convert from/to. Then determine the
    	maximum L for a given fsin, d and u, by using fantasy
    	values for fgK, fgG. If this Lmax for continuous stream
    	processing is bigger than L of strategy I, continue
    	there. Else prescribe an SN, maybe 72dB or 64dB, and
    	use (8,9,10) or TABLE_B below to find fgG. Now only fgK
    	is left to be specified. You could find fgK by trial and
    	error, a good heuristic is to have fgK = fSTB - 2.5*fgG
    	or even 2.15 instead of 2.5. First you could check with a
    	speech signal for sanity, or what ever material you
    	commonly work with. If you're happy with the result - fine.
    	If you're not sure, see 5.h) below. If you really want to 
    	know what you are doing, read on.


    3. Introduction to flexible paramter design
    -------------------------------------------

    _	RATECONV uses an interpolation function g(t) to reconstruct
    	the bandlimited input function represented by its samples. 
    	This reconstruction will only be evaluated at time
    	instances corresponding to the new sampling rate (1). The
    	new rate can take on ratios of the old rate, that defined 
    	by integer values for nominator and denominator in (3).

    -	The ideal interpolation function is the sinc-function,
    	which has an infinite length in time domain and cannot
    	be realized in a practical system. Therefore, in (2) it is 
    	windowed by gaussian-function to rapidly push the skirts
    	of g(t) below a certain level fast enough, where it can be
    	chopped off safely. At this point, the level of the gaussian
	window below is treated as the minimum realizable signal-to-noise
    	ratio (S/N) in the output signal which is a safe heuristic.
	Thus a prescription for a tolerated signal-to-noise ratio
	relates to certain a lenght of g(t) (8-10).

    -	To specify the parameters of the original g(t) in (2) a
    	frequency domain approach is taken. The task of interpolation
    	in the time domain corresponds to lowpass filtering in the
    	frequency domain. The desired function g(t) now happens to be
    	the impulse response (IR) of this lowpass. A suitable lowpass
    	should should suppress unwanted mirror frequencies as muchs
    	as possible that are found above the minimum-half of the
    	input and output sampling frequencies (4). This is because
    	of the sampling theorem, see refs. [1] and [3] at the end.
    	The presence of mirror frequencies in the baseband below
    	is called aliasing. It can't be avoided to some extent.
    	On the other hand, the lowpass should affect baseband
    	frequencies as little as possible. This leads to a set of
    	stop- and passband- prescriptions in frequency domain of
    	g(t) (3-7).

    -	If the output sampling rate is lower than input sampling
    	rate, even an ideal interpolating lowpass has to cut some
    	baseband frequencies used by an input signal. If the output
    	sampling rate is higher, a portion of higher frequencies
    	remains unused. This is inevitable.


    4. Design trade-offs
    --------------------

    -	In frequency domain, the sinc-function in (2) corresponds
    	to an ideal lowpass with cutoff frequency fgK. The sharp
    	cutoff is smoothed by a gaussian-window with key
    	frequency fgG, corresponding to the gaussian-window in the
    	time domain. This involves an inevitable but controllable
    	amount of aliasing. It is adressed as mirror frequency
    	suppression. Because of the good properties of the gaussian
    	window it will increase rapidly starting from an amount
    	of aSTB at fSTB down to lower frequencies of the baseband.
    	Keeping it low already at fSTB competes with the extent
    	of g(t) in time.

    -	The smoothing of the cutoff of the ideal lowpass also
    	means loosing some bandwith in the baseband. The upper
    	corner of the baseband is defined by the loss aSTB at
    	a frequency fPSB, fPSB < fSTB. Shifting it towards fSTB
    	by insisting on a low aSTB again results in a longer extent
    	of g(t) in time.

    -	Disregarding efficiency a sharp passband-to-stopband
    	transition seems to be attractive as the loss of bandwith 
    	is small. This is ONLY the case if the transition
    	lies outside the audible frequency range. But if you have
    	to have a filter cut-off e.g. at around 4kHz you don't
    	want a sharp transition since it induces `ringing', i.e.
    	you can hear the cut-off frequency in the output signal.
    	This is because a sharp-edge in the frequency domain
    	corresponds to a long impuse response in the time domain. 
    	A short-time frequency-analyzer like the ear can detect
    	this (and finds it annoying).

    -	On the the other hand one could argue that preserving as much 
    	bandwith as possible is a better choice, leaving the task
    	of psychoacoustic smoothing to later stages of processing.
    	E.g. it could be viewed as the task of the final analog
    	reconstruction filter. 

    -	The time limited version of g(t) is realized as FIR-filter
    	of length L. For computational efficiency, time extent
    	should be short. The more the old and new rates differ,
	the less efficient is the conversion process. In this case,
	multiple application of RATECONV scales better (multistage
	design is beyond the scope of this text, see reference).

    -	(8-10) rely on the ASSUMPTION that the cutoff level of the
	gaussian window below its maximum reflects the realizable
	S/N. This heuristic is on the safe side but often leads to
	values of L around a THIRD larger than necessary.

    -   Time limitation competes with aliasing cumulatively
    	described by a certain amount of S/N in the output signal.
    	Chopping off g(t) to early will result in additional
    	smoothing of the frequency characteristics of g(t) 
    	with a sinc-function in the frequency domain. This is
    	MUCH worse than smoothing with the gaussian-window and
    	induces a severe amount of aliasing easily. That's why
    	time limitation is identified directly with a certain
    	amount of noise in the output signal, as opposed to
    	mirror frequency suppression described above.

    -	It is not possible to have a the input signal use the full
    	(16-bit) dynamic range and to avoid overload distortion
    	in the output signal in all cases - if gain has to be one.
    	Because of Gibb's phenomenon the interpolation process
    	can overshoot on the input range to an amount of 18percent.
    	To be one the safe side use a default gain of 0.8.


    5. Design process step-by-step
    ------------------------------

    a)	First, the desired rate conversion ratio is used to find the
	smallest possible integer values for u and d (3). The value
    	of u is limited by coefficient set memory size in RATECONV.
    	As errors of the new rate below 0.1percent should be
    	psychoacoustically irrelvant, this is no true limitation.

    b)	Choose passband and stopband parameters fPAB, aPAB and
    	fSTB, aSTB, where fSTB is bound to the minimum-half of the
    	input and output frequencies by (4).

    	aSTB represents the minimum suppression of aliasing
    	frequencies from above fSTB. In most audio applications,
    	aSTB can be about 20dB lower than the intented signal-to-
    	noise ratio for the following reasons: In frequency domain,
    	suppression will increase rapidly above fSTB, having more
    	effect on mirror frequencies that correspond to lower baseband
    	frequencies; but most audio spectra gradually decrease with
    	increasing frequency, especially the critical tonal
    	components.

    c)	Find values of x corresponding to a using (5a,5b). (5a,5b)
    	become inaccurate, if not fgK >> fgG (s.below)). Since
    	inversion of the formula is difficult, use following table to
    	interpolate:
				TABLE_A

        a =   1dB  3dB 6dB 30dB 36dB 42dB 48dB 54dB 60dB 66dB 72dB 96dB
        x = -.981-.435  .0 1.48 1.71 1.86 2.12 2.30 2.47 2.63 2.78 3.33

    d)	Determine the key frequencies fgK and fgG using (6) and (7)

    e)	Choose minimum signal-to-noise ratio SN. Below noise level,
    	SEVERE aliasing occurs due to time limitation of g(t).
    	This is a CRITICAL parameter.

    f)	Find y followed by Ty using (8) and (9), and finally the
    	FIR-filter lenght L using (10). The following table can be
	used to resolve the dependencies of L, SN, fgG and fsin in
	common cases:
				TABLE_B
        
        fgG/fsin = .100 .075 .050 .040 .030 .025 .020 .015 .010 .005
        
        L = 10  SN = 22   12    .    .    .    .    .    .    .    . 
            20       98   55   25   16    .    .    .    (SN < 10dB)  
            30        .  129   57   37   21   14    .    .    .    . 
            40        .    .  104   66   37   26   17    .    .    . 
            50        .    .  164  105   59   41   26   15    .    . 
            60        .    .    .  152   85   59   38   21    .    . 
            70        .    .    .    .  117   81   52   29   13    . 
            80        .    .    .    .  153  106   68   38   17    . 
            90        .    .    .    .  194  135   86   49   22    . 
           100        .    .    .    .    .  167  107   60   27    . 
           120        .    .    .    .    .    .  154   87   39   10 
           140        .    .    .    .    .    .    .  119   53   13 
           160        .    .    .    .    .    .    .  155   69   17 
           180        .    .    .    .    .    .    .  197   87   22 
           200        .    .    (SN > 200dB)   .    .    .  108   27 
           250        .    .    .    .    .    .    .    .  169   42 
           300        .    .    .    .    .    .    .    .    .   61 
           350        .    .    .    .    .    .    .    .    .   83 
           400        .    .    .    .    .    .    .    .    .  109dB 


    g)	Repeat from b), if L is too big to perform rate conversion
    	fast enough, relaxing design prescriptions in b) and e).

    h)	Finally, a good testing method is to use a slow sweeping
    	sine wave passing between fsin/2 and 0. Listening to the
    	converted signal reveals mirror frequencies moving the
    	opposite direction, if mirror frequency suppression is
    	too weak and/or if SN is too low.


    6. Examples
    -----------

        DAT-to-CD:

    		fsin = 48kHz*
     		fsout = 44.1kHz
    		(3) -> u = 147*
    		(3) -> d = 160*
    		(3) -> realized fsout = 44.1kHz (exact)
    		fPAB = 20kHz
    		aPAB = 1dB
    		(4) -> fSTB = 22.05kHz
    		aSTB = 96dB
    		(5a) -> xPAB = -0.981
    		(5b) -> xSTB =  3.33
    		(6) -> fgG = 476Hz*
    		(7) -> fgK = 20467Hz*
    		SN = 96 
    		(8) -> y = 63096
    		(9) -> Ty = 3.94ms
    		(10) -> L = 190*

    	    *arguments for RATECONV: 48000 20467 476 190 147 160

        CD-to-DAT:

    		fsin = 44.1kHz
     		fsout = 48kHz
    		fPAB = 20kHz
    		aPAB = 1dB
    		aSTB = 96dB

    	    use the same calculation as in example 1 until it gets to L

    		(10) -> L = 174*

    	    *arguments for RATECONV: 44100 20467 476 174 160 147

        Double rate:

    		fsin = 1*
     		fsout = 2
    		(3) -> u = 2*
    		(3) -> d = 1*
    		fPAB = 0.45
    		aPAB = 1(3)dB
    		(4) -> fSTB = 0.5
    		aSTB = 72(66)dB
    		(5a) -> xPAB = -0.981(0.435)
    		(5b) -> xSTB =  2.78(2.63)
    		(6) -> fgG = 0.0133(0.0163)*
    		(7) -> fgK = 0.463(0.457)*
    		SN = 72(66)dB 
    		(8) -> y = 3981(1995)
    		(9) -> Ty = 122(96)
    		(10) -> L = 122(96)
 
    	    *arguments for RATECONV: 1 0.45 0.0133 122 2 1
    			            (1 0.45 0.0163 96 2 1)

        Halve rate:

    		fsin = 1*
     		fsout = 0.5
    		(3) -> u = 1*
    		(3) -> d = 2*
    		fPAB = 0.225
    		aPAB = 1dB
    		(4) -> fSTB = 0.25
    		aSTB = 72dB
    		(5a) -> xPAB = -0.981
    		(5b) -> xSTB =  2.78
    		(6) -> fgG = 0.00665*
    		(7) -> fgK = 0.2315*
    		SN = 72dB 
    		(8) -> y = 3981
    		(9) -> Ty = 245
    		(10) -> L = 245

    	    *arguments for RATECONV: 1 0.225 0.00665 245 1 2

        12.8kHz to 11.05kHz:

    		fsin = 12.8kHz*
     		fsout = 11.05kHz
    		(3) -> u = 25*
    		(3) -> d = 29*
    		(3) -> realized fsout = 11.03kHz
    		fPAB = 5000Hz
    		aPAB = 3dB
    		(4) -> fSTB = 5.525kHz
    		aSTB = 48dB
    		(5a) -> xPAB = -0.435
    		(5b) -> xSTB =  2.12
    		(6) -> fgG = 205Hz*
    		(7) -> fgK = 5089Hz*
    		SN = 66
    		(8) -> y = 2048
    		(9) -> Ty = 7.60ms
    		(10) -> L = 98*

    	    *arguments for RATECONV: 12800 5089 205 98 25 29


    7. Comments to experts
    ----------------------

    -	I have yet no proof for the ASSUMPTION that the time-cutoff
	level of the gaussian window governs S/N. Experiments have
	shown that this heuristic is on the safe side meaning that
	L is longer than necessary. One could argue that the absolute
        height of the envelope of the sampled version gs(n) =
	g(n/fsin)/fsin of g(t) should be taken. The envelope for
	|2*fgK*n/fsin| >= 1 is, using sinc(x) = sin(x)/x, 

    	e{gs(n)} = e{2*fgK*sinc(pi*2*fgK*n/fsin)/fsin} *
    				e{exp(-pi*(2*fgG*n/fsin)**2)}
    		 = 1/(pi * n) *  exp(-pi*(2*fgG*n/fsin)**2).

    	We want to have e{gs([L/2]+1)} <= 1/y for a given y specified
	by (8). Obviously the effect of the fraction 1/(pi * n) is
	not present in (9). Taking it into account leads to values
	of L being around a third smaller. However, experiments prove
	these values to realize an S/N often smaller than intended.
	This needs clarification.

    -	Deduction of the frequency response G(w) of g(t):
    	g(t), G(w) form a Fourier-correspondence

    	    g(t) = 2*fgK*sinc(pi*2*fgK*t) * exp(-pi*(2*fgG*t)**2),
    	    G(w) = (s(w + 2*pi*fgK) - s(w - 2*pi*fgK))
    	             (convolution along w)
    	                    exp(-pi*(w/4*pi*fgG)**2)/(2*fgG),

    	where s(x) = {1 if x > 1; 0 if x < 0; 0.5 if x = 0}.
    	Evalution of the convolution with on of the s(...)-terms
    	corresponds to integration of the exp(...)-term over w and a
    	w-shift of +/-2*pi*fgK, yielding

    	    G(w) = 0.5*(1+erf(sqrt(pi)/2 * (w/(2*pi) + fgK)/fgG)) -
    	           0.5*(1+erf(sqrt(pi)/2 * (w/(2*pi) - fgK)/fgG))

    	Since the first term amounts to 1, if w > 0 and fgK >> fgG,
    	we obtain in sane cases the sufficiently close approximation

    	    G(w) = 0.5*(1-erf(sqrt(pi)*x/2),

    	relied on by (5-7), where x = (w/(2*pi) - fgK)/fgG, w=2*pi*f.

    -	The approach to use a time limited, gaussian-windowed
	sinc-function possibly doesn't lead to a very short FIR-lenght
	as more elaborate decimation/interpolation-filter designs.
	But I presume that, largely, it's only off by a factor of two.

    -	Of course all frequencies can be normalized by fsin. Only
    	ratios of frequencies to sampling frequencies are
    	signififcant. This goes especially for the specifications
    	of the lowpass design. Therefore, explicit specification of
    	fsin could be avoided. But for psychoacoustic reasons the
    	design enforces the notion of absolute frequencies
    	(See trade-off on steepness).


    8. References
    -------------

    Specifically:

    [1] Crochiere, R. E., Rabiner, L. R.: "Multirate Digital Signal
        Processing", Prentice-Hall, Englewood Cliffs, New Jersey, 1983

    Generally:

    [2] Zwicker, E., Fastl, H.: "Psychoacoustics - Facts and Models",
        Springer-Verlag, Berlin, Heidelberg, New-York, Tokyo, 1990

    [3] Books on Time discrete signal processing, systems theory ...
        like those from Oppenheim and Schafer (check with your
        librarian...)


    ===EOT===$Id: rateconv.txt,v 1.5 1995/12/09 02:16:21 mummert Exp mummert $
These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.