CompAudio

Routine

CompAudio [options] AFileA [AFileB]

Purpose

Compare audio files, printing statistics

Description

This program gathers and prints statistics for one or two input audio files. The signal-to-noise ratio (SNR) of the second file relative to the first file is printed. For this calculation, the first audio file is used as the reference signal. The "noise" is the difference between sample values in the files. This program can also be invoked with just one file name. In that case, only the statistics for that file are printed.

Multi-channel audio files are treated as if they were single channel files with the effective sampling frequency increased by a factor equal to the number of channels.

For each file, the following statistical quantities are calculated and printed.

Mean:: Xm = SUM x(i) / N
Standard deviation:: sd = sqrt [ (SUM x(i)^2 - Xm^2) / (N-1) ]
Max value:: Xmax = max (x(i))
Min value:: Xmin = min (x(i))

For data which is restricted to the range [-32768,+32767], two additional counts (if nonzero) are reported.

Number of Overloads:: Count of values taking on values -32768 or +32767, along with the number of such runs. For 16-bit data from a saturating A/D converter, the presence of such values is an indication of a clipped signal.
Number of Anomalous Transitions:: Dividing the 16-bit data range into 2 positive regions and 2 negative regions, an anomalous transition is a transition from a sample value in the most positive region directly to a sample value in the most negative region or vice-versa. A large number of such transitions is an indication of wrapped values or byte-swapped data.

An optional delay range can be specified when comparing files. The samples in file B are delayed relative to those in file A by each of the delay values in the delay range. For each delay, the SNR with optimized gain factor (see below) SNR is calculated. For the delay corresponding to the largest SNR, the full regalia of file comparison values is reported.

Conventional SNR:

                       SUM xa(i)^2
  SNR = ------------------------------------------- .
        SUM xa(i)^2 - 2 SUM xa(i)*xb(i) + SUM xb(i)

The corresponding value in dB is printed.

SNR with optimized gain factor:

  SNR = 1 / (1 - r^2) ,

where r is the (normalized) correlation coefficient,

                 SUM xa(i)*xb(i)
  r = -------------------------------------- .
      sqrt [ (SUM xa(i)^2) * (SUM xb(i)^2) ]

The SNR value in dB is printed. This SNR calculation corresponds to using an optimized gain factor Sf for file B,

       SUM xa(i)*xb(i)
  Sf = --------------- .
        SUM xb(i)^2

Segmental SNR:

This is the average of SNR values calculated for segments of data. The segment length by default corresponds to 16 ms (128 samples at a sampling rate of 8000 Hz). However if the sampling rate is such that the segment length is less than 64 samples or more than 1024 samples, the segment length is set to 256 ssamples. For each segment, the SNR is calculated as

                            SUM xa(i)^2
  SS(k) = log10 (1 + --------------------------) .
                     0.01 + SUM [xa(i)-xb(i)]^2

The term 0.01 in the denominator prevents a divide by zero. This value is appropriate for data with values significantly larger than 0.01. The additive unity term discounts segments with SNR's less than unity. The final average segmental SNR is calculated as

  SSNR = 10 * log10 ( 10^[SUM SS(k) / N] - 1 ) dB.

The subtraction of the unity term tends to compensate for the unity term in SS(k).

If any of these SNR values is infinite, only the optimal gain factor is printed as part of the message (Sf is the optimized gain factor),

  "File A = Sf * File B".

Options

The command line specifies options and file names.

-d DL:DU, --delay=DL:DU: Specify a delay range. Each delay in the delay range represents a delay of file B relative to file A. The default range is 0:0.
-s SAMP, --segment=SAMP: Segment length (in samples) to be used for calculating the segmental signal-to-noise ratio. The default is a length corresponding to 16 ms.
-P PARMS, --parameters=PARMS: Parameters to be used for headerless input files. This option may be given more than once. Each invocation applies to the files that follow the option. See the description of the environment variable RAWAUDIOFILE below for the format of the parameter specification.
-h, --help: Print a list of options and exit.
-v, --version: Print the version number and exit.

Environment variables

RAWAUDIOFILE:

This environment variable defines the data format for headerless or non-standard input audio files. The string consists of a list of parameters separated by commas. The form of the list is

  "Format, Start, Sfreq, Swapb, Nchan, ScaleF"

Format: File data format

The lowercase versions of these format specifiers cause a headerless file to be accepted only after checking for standard file headers; the uppercase versions cause a headerless file to be accepted without checking the file header.

 "undefined"                - Headerless files will be rejected
 "mu-law8" or "MU-LAW8"     - 8-bit mu-law data
 "A-law8" or "A-LAW8"       - 8-bit A-law data
 "unsigned8" or "UNSIGNED8" - offset-binary 8-bit integer data
 "integer8" or "INTEGER8"   - two's-complement 8-bit integer data
 "integer16" or "INTEGER16" - two's-complement 16-bit integer data
 "float32" or "FLOAT32"     - 32-bit floating-point data
 "text" or "TEXT"           - text data

Start: byte offset to the start of data (integer value)

Sfreq: sampling frequency in Hz (floating point number)

Swapb: Data byte swap parameter

 "native" - no byte swapping
 "little-endian" - file data is in little-endian byte order
 "big-endian" - file data is in big-endian byte order
 "swap" - swap the data bytes as the data is read

Nchan: number of channels

The data consists of interleaved samples from Nchan channels

ScaleF: Scale factor

Scale factor applied to the data from the file

The default values for the audio file parameters correspond to the following string.

    "undefined, 0, 8000., native, 1, 1.0"

AUDIOPATH:

This environment variable specifies a list of directories to be searched when opening the input audio files. Directories in the list are separated by colons (semicolons for MS-DOS).

Author / version

P. Kabal / v1r11 1996/08/12