/*------------- Telecommunications & Signal Processing Lab ------------- McGill University Routine: void RSresamp (AFILE *AFpI, AFILE *AFpO, double Sratio, long int Nout, double toffs, const struct Fpoly_T *PF) Purpose: Resample data from an audio file Description: This routine resamples data from one audio file and writes it to another audio file. Parameters: -> AFILE *AFpI Audio file pointer for the input audio file -> AFILE *AFpO Audio file pointer for the output audio file -> double Sratio Ratio of output sampling rate to input sampling rate -> long int Nout Number of output samples to be calculated -> double toffs Time offset into the input data. This is the fractional sample value corresponding to the first filter coefficient. -> const struct Ffilt_T *Filt Filter coefficient structure Author / revision: P. Kabal Copyright (C) 1996 $Revision: 1.9 $ $Date: 1996/07/25 01:41:59 $ ----------------------------------------------------------------------*/ static char rcsid[] = "$Id: RSresamp.c 1.9 1996/07/25 AFsp-V2R1 $"; #include #include #include "ResampAudio.h" #define MINV(a, b) (((a) < (b)) ? (a) : (b)) #define MAXBUF 8192 void RSresamp (AFpI, AFpO, Sratio, Nout, toffs, PF) AFILE *AFpI; AFILE *AFpO; double Sratio; long int Nout; double toffs; const struct Fpoly_T *PF; { int lmem, NbufI, NbufO, Nx, Ny, Nxm; long int k, ls, ls0, ld; double Ns, Ds, ds, dsr; double di; float *x, *y; float buf[MAXBUF]; /* General interpolation: This routine implements both conventional interpolation and general interpolation. In the general case, an interpolating filter is used to generate two points which are close to, and bracket the desired output point position. The process can be viewed as follows. |-----| |------| |-----| |-------| |-----| ---->| ^ N |---->| H(z) |---->| D/C |---->| inter |--->| C/D |----> |-----| |------| |-----| |-------| |-----| x(n) xi(k) yi(k) yc(t) y(t) y(m) sampling h(n) D to C linear sample rate 1:N filter interpolation As an example consider changing the sampling rate from 8000 Hz to 44100 Hz using an interpolation filter with N=10. (1) Upsample x(n) by inserting N-1 zeros between each sample to form xi(k). The signal xi(k) has a sampling rate of 80 kHz, but with a repeating spectrum every 8 kHz. (2) Filter with H(z). This passes frequencies below 4000 Hz and suppresses frequencies above 4000 Hz. The resulting signal yi(k) has a repeating spectrum every 80 kHz. The images at 8, 16, ..., 72 kHz have been suppressed. (3) Form impulses with the discrete-to-continuous converter. The resulting continuous-time signal yc(t) has a repeating spectrum every 80 kHz. (4) Linearly interpolate between the digital sample values. This process can be modelled as filtering with a continuous-time filter with a triangular impulse response. The linear interpolating filter has a (sin(x)/x)^2 frequency response with zeros at multiples of 80 kHz. The zero response points are coincident with the repetitions of the input spectrum of yc(t). The output signal y(t) will have energy mainly around zero. (5) Resample the continuous-time output at 44100 Hz. We can move the D/C block ahead of the filter H(z) if we replace it with a transveral filter h(t). Then we can form a composite continuous-time filter g(t) by convolving the linear interpolation filter with h(t). The impulse response of the resulting filter is that of h(n) with linear interpolation filling the space between the sample values. The frequency response of g(t) can be obtained by multiplying H(w) by the (sin(x)/x)^2 response of the linear interpolator. H(w) has a repeating spectrum every 80 kHz. The first zero of the linear interpolating filter response occurs at 80 kHz. This tends to suppress the image of the response at 80 kHz. This suppression is incomplete however since the image extends 4 kHz on each side of 80 kHz. The worst suppression is for a frequency 4 kHz below 80 kHz. In addition, the desired frequencies surrounding zero are subject to a slight lowpass effect, this effect being worst at 4 kHz. Consider different interpolation ratios N. For large N, the images are confined to a relatively small distance from the zero of the linear interpolating filter and so are well suppressed. We can measure the suppression at the bottom of the first image. The suppression of other image frequencies will be better. The following table shows the results. N attenuation freq fs-fs/(2N) 6 42 dB 10 51 dB 20 64 dB 24 67 dB Note that if we use sample-and-hold instead of linear interpolation (i.e. use the value of the closest bracketing sample), the attenuation will be half of these dB values. */ /* Buffering: For the k'th output sample, the filter coefficient h[0] is aligned with the input array at time t(k), t(k) = t(0) + k * dt (samples). The data is processed frame by frame. Consider frame j, k[j] - first output sample in frame j, N[j] - number of output samples in frame j. Then the first time point in a frame is ts[j] = t(k[j]) = t(0) + k[j] * dt. The time interval between input samples will, if possible, be expressed as a ratio of integers, dt = 1 / Sratio = Ds / Ns. If dt cannot be expressed as a ratio of integers of reasonable size, we set Ds = 1 and set Ns to be a floating point value. If dt is rational, this procedure will prevent round-off error from accumulating. For frame zero, the first sample occurs at time t[0]. Represent t[0] as an integer part and a fractional part, ts[0] = t(0) = floor(t(0)) + frac(t(0)) = ls[0] + (ds[0] + dsr) / Ns where ds[0] = floor(Ns*frac(t(0))), dsr = Ns*frac(t(0)) - ds[0] and frac(x) = x - floor(x). Assume that Ns and Ds are integers. Then given a representation for ts[j], ts[j] = ls[j] + (ds[j] + dsr) / Ns, the representation for ts[j+1] is ts[j+1] = ts[j] + Ds*N[j] / Ns = ls[j] + (ds[j] + dsr + Ds*N[j]) / Ns = ls[j+1] + (ds[j+1] + dsr) / Ns, where ls[j+1] = ls[j] + floor((ds[j] + Ds*N[j]) / Ns) ds[j+1] = mod(ds[j] + Ds*N[j], Ns) If Ns and Ds are integers, ds[j] is an integer and the fractional part dsr remains constant. Errors do not accumulate in the computation of the time values. If Ns and Ds are not integers, we set dsr=0 and let ds[j] be non-integer. Filtering: Each output sample is created by filtering with an interpolation filter (interpolation factor Ir). The interpolation filter is aligned on a position which is a multiple of 1/Ir. The filter is used twice, once to the left of the desired position and once to the right of the desired position. The input signal x(.) is the data in the file, with x(0) corresponding to the first data value in the file. Conceptually a new increased rate signal, xi(.), is formed from the input signal x(.) by inserting Ir-1 zeros between each element of x(.), xi(l*Ir) = x(l), with other elements of xi(.) being zero. Indexing: - m is an offset into xi(.), referring to sample xi(m). - l is an offset into x(), referring to sample x(l). - m and l are related by m=l*Ir+mr where l=floor(m/Ir), mr=m-Ir*l, 0 <= mr < Ir. Each output point is formed as the dot product of the filter vector {h[0],...,h[Ncof-1]} with the data elements {xi(m),...,xi(m-Ncof+1)}. Because of the zeros in xi(.), the dot product can use the vectors {h[mr],h[mr+Ir],...} and {x(l),x(l-1),...}. An output sample will be calculated at position Ir*t in the signal xi(.). This position will be bracketed by ml(t) and mu(t) where ml(t) and mu(t) are integer values. ml(t) = floor(Ir*t) <= Ir*t < ml(t) + 1 = mu(t) The value ml(t) can be expressed as ml(t) = Ir*ll + mlr, where ll = floor(t), mlr = floor(Ir*t) - Ir*ll. The value mlr is the filter offset for the convolution; filter element h[mlr] is aligned with input sample x(ll). Similarly mu(t) can be expressed as mu(t) = Ir*lu + mur. This decomposition can be obtained from that for ml(t) by setting mur equal to mlr+1, modulo Ir. Then lu is set to ll, but incremented if mlr wraps around. Buffer sizes: The input buffer must contain lmem = Ncmax-1 past values for the filter memory. The first output sample in a frame corresponds to time ts, ml(ts) = Ir*lsl + mslr . The last sample in a frame corresponds to time tf = ts + (N-1)*dt, ml(tf) = Ir*lfl + mflr . The value mu(tf) may involve access to data element lfl+1 (if mflr is equal to Ir-1). The number of input samples needed in a frame with Ny output samples is then bounded as follows, Nx <= lfl+1 - lsl + 1 = ls + floor((ds + dsr + Ds*(Ny-1)) / Ns) + 1 - ls + 1 = floor((ds + Ds*(Ny-1)) / Ns) + 2 <= Ds*(Ny-1) / Ns + 3 = (Ny-1) / Sratio + 3 . Let the buffer size be MAXBUF, then we can choose MAXBUF as MAXBUF >= (Ny-1)/Sratio + 3 + lmem + Ny >= Nx + lmem + Ny Then we have a limit on the size of Ny Ny <= (Sratio * (MAXBUF - 3 - lmem) + 1) / (1 + Sratio) */ /* Split up the buffer */ lmem = PF->Ncmax - 1; NbufO = (int) ((Sratio * (MAXBUF - 3 - lmem) + 1) / (1.0 + Sratio)); y = buf; x = buf + NbufO; NbufI = MAXBUF - NbufO; if (NbufO + NbufI > MAXBUF) UThalt ("Error: Invalid buffer size"); /* Express the sampling rate ratio in rational form Sratio = Ns / Ds The maximum value for Ds is determined by the need to avoid overflow of the calculation of the quantity dsr + ds + Ds*Ny, dsr + ds + Ds*Ny < Ns + Ds*Ny <= LMAX. Using Sratio = Ns/Ds, this can be expressed as Ds <= LMAX / (Sratio + Ny) . */ RSratio (Sratio, PF->Ir, &Ns, &Ds, LONG_MAX, (long int) (LONG_MAX / (Sratio + NbufO))); /* Express the time offset in the form, toffs = ls + (dsr + ds) / Ns, where ls is an integer value, ds is less than Ns, 0 <= ds < Ns dsr is a fractional value (0 <= dsr < 1) */ RSexpTime (toffs, Ns, &ls, &ds, &dsr); ls0 = ls - lmem; /* First sample in the buffer */ RSrefresh (AFpI, 0L, x, 0); /* initialize the buffer */ k = 0; while (k < Nout) { Ny = (int) MINV (NbufO, Nout - k); Nx = (int) ((Ds * (Ny-1)) / Ns) + 3; Nxm = Nx + lmem; if (Nx > NbufI - lmem) UThalt ("%s: Error, Insufficient array space", PROGRAM); /* Refresh the input buffer */ RSrefresh (AFpI, ls0, x, Nxm); /* Generate the output samples */ RSinterp (x, Nxm, y, Ny, dsr, ds, Ds, Ns, PF); AFwriteData (AFpO, y, Ny); /* Update the sample pointers */ k += Ny; di = ds + Ds*Ny; ld = floor (di / Ns); ls0 += ld; ds = di - Ns*ld; } return; }