ResampAudio

Routine

ResampAudio [options] AFileIn AFileOut

Purpose

Resample data from an audio file

Description

This program resamples data from an audio file. This process involves interpolating between the samples in the original file to create a new sequence of samples with a new spacing (sampling rate).

The process used for interpolation depends on the ratio of the output sampling rate to the input sampling rate.

1: If the output sampling rate over the input sampling rate is expressible as a ratio of small integers, the sample rate change process is done using a conventional interpolation filter designed for the interpolation factor (numerator of the sampling rate ratio) followed by subsampling by the subsampling factor (denominator of the sampling rate ratio).
2: For the general case, an interpolating filter is designed using an interpolation factor of 24. For each output sample, the interpolating filter is used to create two samples that bracket the desired sampling point. Linear interpolation is used between these values to generate the output value.

The default interpolation filter is a linear phase FIR filter designed by applying a Kaiser window to an ideal lowpass filter response. The filter is characterized by a cutoff frequency, a window shape parameter, and the number of coefficients. The window shape parameter (alpha) controls the passband ripple and the stopband attenuation. For a fixed number of coefficients, decreasing ripple and increasing attenuation (larger alpha) come at the expense of a wider transition width.

The cutoff of the default interpolation filter depends on the input and output sampling rates. Let fsi be the sampling rate of the input signal and fso be the sampling rate of the output signal.

1: fso > fsi. The cutoff of the interpolation filter is set to fsi/2.
2: fso < fsi. The cutoff of the interpolation filter is set to fso/2.

The default design aims for an 80 dB stopband attenuation and a transition width which is 15% of the cutoff frequency. The attenuation directly determines alpha. The value of alpha together with the transition width determines the number of filter coefficients.

The parameters of the interpolating filter can also be set by the user. The design parameters are the interpolation factor, the filter cutoff frequency, the Kaiser window parameter, and the number of filter coefficients. The following table shows the effect of changing the Kaiser window parameter.

   stopband   alpha  transition   passband
  attenuation         width D      ripple
     30 dB    2.210     1.536    +/- 0.270 dB
     40 dB    3.384     2.228    +/- 0.0864 dB
     50 dB    4.538     2.926    +/- 0.0274 dB
     60 dB    5.658     3.621    +/- 0.00868 dB
     70 dB    6.764     4.317    +/- 0.00275 dB
     80 dB    7.865     5.015    +/- 0.00089 dB
     90 dB    8.960     5.712    +/- 0.00027 dB
    100 dB   10.056     6.408    +/- 0.00009 dB

The transition width parameter D = (N-1) dF, where dF is the transition width normalized by the filter sampling rate. Consider interpolating from 8 kHz to 44.1 kHz. The filter will be designed for a sampling rate of 80 kHz. The cutoff of the filter will be 4 kHz. The stopband attenuation is to be 80 dB. The attenuation requirement gives alpha=7.865. The parameter D corresponding to this value of alpha is 5.015. A transition width which is 15% of the cutoff corresponds to a width of 600 Hz. This is a normalized transition width of dF = 600/80000 = 0.0075. Solving for N, gives 670 coefficients. It is common to choose N to be odd, and furthermore for N to be of the form 2*Ir*M+1, where Ir is the interpolation factor (here 10). Such a time response has M sidelobes on either side of the reference point. In this example, we can choose M = 34, giving N = 681 coefficients.

If we designate the interpolation factor for the interpolation filter as Ir, about 1/Ir of the coefficients are used to calculate each output sample. The number of coefficients needed for a given value of alpha and given transition width is proportional to Ir. Increasing Ir improves the accuracy of the linear interpolation step and increases the total number of filter coefficients, but does not increase the computation effort time for the filtering operation.

For the transition width expressed as a percentage of the cutoff frequency, the number of coefficients needed to calculate each output sample is approximately 2D/P where P is the fractional bandwidth (e.g. 0.15 for a 15% transition width). The number of coefficients (rounded up) used to calculate each interpolated point is shown in the following table.

   stopband   alpha  transition  no. coeffs per output
  attenuation         width D    15% trans. 25% trans.
     30 dB    2.210     1.536       22         14
     40 dB    3.384     2.228       31         19
     50 dB    4.538     2.926       41         25
     60 dB    5.658     3.621       50         30
     70 dB    6.764     4.317       59         36
     80 dB    7.865     5.015       68         42
     90 dB    8.960     5.712       78         47
    100 dB   10.056     6.408       87         53

On a medium speed workstation (Sun Sparc 4), with the default filter (15% transition width), this program generates about 60,000 output samples per CPU second for ordinary interpolation and about half that number for the general case.

The accuracy of the sample rate operation depends on the frequency content of the input signal. Consider changing the sampling rate for a speech file with a 8000 Hz sampling rate. The default filter uses a cutoff frequency of 4000 Hz with a transition width of 600 Hz. The filter passband extends to 3700 Hz and the stopband starts at 4300 Hz. The interpolation will be imperfect in that (1) high frequencies falling in the lower part of the transition band will be attenuated and (2) aliased frequencies falling in the upper part of the transition band will be only be partially attenuated. If the input signal has little energy above 3700 Hz, then the error due to both factors will be small. Tests on speech files indicate that the signal-to-distortion ratios after interpolation (say from 8000 Hz to 8001 Hz) range from 46 to 77 dB. The poorest SDR occurs for signals that have significant energy above 3700 Hz. For a fixed stopband attenuation, the SDR can be improved by increasing the number of filter coefficients to affect a decrease in the transition band width.

The interpolation filter can also be read in as a filter file. For such a filter, the filter interpolation factor must be specified.

The output sample positions are determined by the output sampling rate and a sample offset parameter. The sample offset determines the position of the first output sample relative to the input samples. The default is that the first output sample coincides with the first input sample. The number of samples in the output file can also be specified. The default is to make the time corresponding to the end of the output (rounded to the nearest sample) be the same as the time corresponding to the end of the input.

Options

The command line specifies options and the input and output file names. Note that several of the parameters can be specified as a single number or as a ratio (pair of numbers separated by a "/").

-s SFREQ, --srate=SFREQ

Sampling frequency for the output file.

-i SRATIO, --interpolate=SRATIO

Ratio of the output sampling rate to the input sampling rate. This argument is specified as a single number or as a ratio of the form N/D, where each of N and D can be floating point values. This option is an alternate means to specify the output sampling rate.

-a OFFS, --alignment=OFFS

Time offset of the first output sample relative to the input data. The units are samples of the input data. This value can be specified as a single number or a ratio.

-n NOUT, --number_samples=NOUT

Number of output samples to be calculated.

-f FPARMS, --filter_spec=FPARMS

Filter parameters. The filter parameters are given as keyword values. There are two cases: the filter coefficients are supplied in a file or the filter is calculated as a Kaiser windowed lowpass filter.

  Filter file:
    file="file_name"  Input filter file name.  If specified, the filter
                      coefficients are read from the named file.
    ratio=Ir          Filter interpolation factor
    delay=Del         Filter delay in units of filter samples (default
                      (N-1)/2, where N is the number of coefficients for
                      symmetrical filters).  The delay can be specified
                      as a single number or as a ratio.  The filter delay
                      must be supplied for non-symmetrical filters.
  Windowed lowpass:
   ratio=Ir           Filter interpolation factor.  The default depends
                      on the ratio of output sampling frequency to
                      input sampling frequency.  This parameter can be
                      specified as a single number or as a ratio.
   cutoff=Fc          Filter cutoff in normalized frequency relative to
                      the filter interpolation factor (0 to Ir/2).  This
                      value can be specified as a single number or as a
                      ratio.  The default cutoff frequency is determined
                      from the the input and output sampling rates.  For
                      an increase in sampling rate, it is set to 0.5.
                      For a decrease in sampling rate it is set to
                      0.5*fso/fsi.
   atten=A            Filter stopband attenuation in dB.  The attenuation
                      must be at least 21 dB.  The default is 80.  The
                      attenuation is an alternate way to specify the
                      Kaiser window parameter alpha.
   alpha=a            Kaiser window parameter.  Zero corresponds to a
                      rectangular window (stopband attenuation 21 dB).
                      The default is 7.865 corresponding to a stopband
                      attenuation of 80 dB.
   N=Ncof             Number of filter coefficients.  The default is
                      to chose the number of coefficients to give a
                      transition band which is 15% of the cutoff
                      frequency.
   span=Wspan         Window span.  The default window span is equal to
                      the number of filter coefficients minus one.
   offset=Woffs       Window offset in units of filter samples.  This is
                      the offset of the first filter sample from the
                      beginning of the window.  The default is a
                      fractional value determined from the fractional
                      part of the input sample offset value.
   gain=g             Passband gain.  The default gain is equal to
                      the filter interpolation factor.  This choice
                      reproduces signals within the passband with the
                      correct amplitude.
   write="file_name"  Output filter file name.  If specified, the filter
                      coefficients are written to the named file.

-D DFORMAT, --data_format=DFORMAT

Data format for the output file.

  "mu-law8"   - 8-bit mu-law data
  "A-law8"    - 8-bit A-law data
  "unsigned8" - offset-binary 8-bit integer data
  "integer8"  - two's-complement 8-bit integer data
  "integer16" - two's-complement 16-bit integer data
  "float32"   - 32-bit IEEE floating-point data
  "text"      - text data

The data formats available depend on the output file type. AFsp (Sun) audio files:

  mu-law, A-law, 8-bit integer, 16-bit integer, float

RIFF WAVE files:

  mu-law, A-law, offset-binary 8-bit integer, 16-bit integer

AIFF-C audio files:

  mu-law, A-law, 8-bit integer, 16-bit integer

Headerless files:

  all data formats

-F FTYPE, --file_type=FTYPE

File type, default "AFsp".

  "AFsp", "Sun" or "sun" - AFsp (Sun) audio file
  "WAVE" or "wave"       - RIFF WAVE file
  "AIFF-C"               - AIFF-C audio file
  "raw" or "raw_native"  - Headerless file (native byte order)
  "raw_swap"             - Headerless file (byte swapped)
  "raw_big-endian"       - Headerless file (big-endian byte order)
  "raw_little-endian"    - Headerless file (little-endian byte order)

-P PARMS, --parameters=PARMS

Parameters to be used for headerless input files. See the description of the environment variable RAWAUDIOFILE below for the format of the parameter specification.

-I INFO, --info=INFO

Header information string.

-h, --help

Print a list of options and exit.

-v, --version

Print the version number and exit.

For AFsp output files, the audio file header contains an information string.

  Standard Header Information:
    date:1994/01/25 19:19:39 UTC    date
    user:kabal@aldebaran            user
    program:ResampAudio             program name

This information can be changed with the header information string which is specified as one of the command line options. Structured information records should adhere to the above format with a named field terminated by a colon, followed by numeric data or text. Comments can follow as unstructured information. For the purpose of this program, records are terminated by newline characters. However in the header itself, the newline characters are replaced by nulls. To place a newline character into the header, escape the newline character by preceding it with a '\' character. If the first character of the user supplied header information string is a newline character, the header information string is appended to the standard header information. If not, the user supplied header information string replaces the standard header information.

Examples

1: File copy. Copy audio file abc.au to new.au. ResampAudio -i 1 abc.au new.au

2: Delay the input signal. The output samples are delayed by 1/8 sample from the input samples. ResampAudio -i 1 -a -1/8 abc.au new.au

3: Change the sampling rate to 8001 Hz.

  ResampAudio -s 8001 abc.au new.au

4: Change the sampling rate by an integral value (e.g. 8000 to 48000 Hz). ResampAudio -i 6 abc.au new.au

Environment variables

RAWAUDIOFILE:

This environment variable defines the data format for headerless or non-standard input audio files. The string consists of a list of parameters separated by commas. The form of the list is

  "Format, Start, Sfreq, Swapb, Nchan, ScaleF"

Format: File data format

The lowercase versions of these format specifiers cause a headerless file to be accepted only after checking for standard file headers; the uppercase versions cause a headerless file to be accepted without checking the file header.

 "undefined"                - Headerless files will be rejected
 "mu-law8" or "MU-LAW8"     - 8-bit mu-law data
 "A-law8" or "A-LAW8"       - 8-bit A-law data
 "unsigned8" or "UNSIGNED8" - offset-binary 8-bit integer data
 "integer8" or "INTEGER8"   - two's-complement 8-bit integer data
 "integer16" or "INTEGER16" - two's-complement 16-bit integer data
 "float32" or "FLOAT32"     - 32-bit floating-point data
 "text" or "TEXT"           - text data

Start: byte offset to the start of data (integer value)

Sfreq: sampling frequency in Hz (floating point number)

Swapb: Data byte swap parameter

 "native" - no byte swapping
 "little-endian" - file data is in little-endian byte order
 "big-endian" - file data is in big-endian byte order
 "swap" - swap the data bytes as the data is read

Nchan: number of channels

The data consists of interleaved samples from Nchan channels

ScaleF: Scale factor

Scale factor applied to the data from the file

The default values for the audio file parameters correspond to the following string.

    "undefined, 0, 8000., native, 1, 1.0"

AUDIOPATH:

This environment variable specifies a list of directories to be searched when opening the input audio files. Directories in the list are separated by colons (semicolons for MS-DOS).

Author / version

P. Kabal / v1r1 1996/08/13