ftp.nice.ch/pub/next/unix/audio/sms.N.bs.tar.gz#/sms/docs/sms.rtf

This is sms.rtf in view mode; [Download] [Up]

file: sms/README

version of 5/7/95

-Overview of subdirectories


bin                  - directory where the binaries will be installed.
docs               - various documents
examples       - examples of how to make the analysis and synthesis of different sounds.
hybridMk	     - sound hybridization program based on the short-time Fourier transform.
library             - library of common routines used in the sms programs.
smsAnal         - computes a .sms file from a .snd file. 
smsClean	     - program to clean the .sms files. 
smsEditor       - NeXTSTEP application for displaying .sms files.
smsMk            - creates a .snd file from .score and .sms files.
smsMod          - modifies amplitude of the stochastic component in a .sms file.
smsPrint          - prints the contents of a .sms file.
smsResample - supports frame rate decimation in a .sms file.
smsSynth        - synthesizes a soundfile from a .sms file using IFFT.
tools                - several small programs:
		           calcCorr:	compute the correlation of a sound
		           printWindow:   print the windows used in smsAnal
		           smsToLisp:      convert sms output to Lisp syntax
		           smsToML:        convert sms output to MatLab format
		           smsUnDb:        convert magnitudes in the sms file from dB to linear
		          sndReverse:     reverse a sound
		           smsSynthDet:  synthesize deterministic part of sms using table lookup oscillators


- Standard way of using sms programs

.snd   -------->   .sms  ------------------> .snd
        smsAnal            smsMk, smsSynth

[the .sms file can be viewed with smsEditor.app and smsPrint]

-How to install

	Go the the main directory and do a "make install" on the Terminal. The  binaries will be installed in the bin subdirectory.


-Description of the SMS file format

The SMS file includes a header of variable length and a set of records, each one of the same size.

The header is defined by the following structure:

typedef struct 
{
	/* fix part */
	int iSmsMagic;         /* magic number for SMS data file */
	int iHeadBSize;        /* size in bytes of header */
	int nRecords;	         /* number of data records */
	int iRecordBSize;      /* size in bytes of data record */
	int iFormat;           /* type of data format */
	int iFrameRate;        /* rate in Hz of data records */
	int iStochasticType;   /* representation of stochastic coefficients */
	int nTrajectories;     /* number of trajectoires in each record */
	int nStochasticCoeff;  /* number of stochastic coefficients in each record */
	float fAmplitude;      /* average amplitude of represented sound */
	float fFrequency;      /* average fundamental frequency */
	float fOriginalSRate;  /* sampling rate of original sound */
	int iBegSteadyState;   /* record number of begining of steady state */
	int iEndSteadyState;   /* record number of end of steady state */
	float fResidualPerc;  /* percentage of the residual with respect to the original */
	int nLoopRecords;      /* number of loop records specified */
	int nSpecEnvelopePoints; /* number of breakpoints in spectral envelope */
	int nTextCharacters;   /* number of text characters */
	/* variable part */
	int *pILoopRecords;    /* array of record numbers of loop points */
	float *pFSpectralEnvelope; /* spectral envelope of partials */
	char *pChTextCharacters; /* Textual information relating to the sound */
	char *pChDataRecords;   /* pointer to data records */
} SMSHeader;

The header has two parts, one of fix length and another of variable length. The actual length of the variable part is specified in the fixed part.

The file sms.h has the macros that define some of the header parameters, such as the magic number of the different data formats. The file smsIO.c in the library has the functions that read and write this header structure.

After the header the file has the actual SMS data as a set of records. Each record includes the deterministic and stochastic representation of a given frame. The function setSmsRecord puts the data of a record into the following structure:

typedef struct 
{
	float *pFFreqTraj;       /* frequency of sinusoids */
	float *pFMagTraj;        /* magnitude of sinusoids */
	float *pFPhaTraj;        /* phase of sinusoids */
	int nTraj;               /* number of sinusoids */
	float *pFStocGain;       /* gain of stochastic component */
	float *pFStocCoeff;      /* filter coefficients for stochastic component */
	int nCoeff;              /* number of filter coefficients */
} SMS_DATA;

This structure is the one generated by the analysis program and the one used by the synthesis program to generate a sound. This assumes equally spaced data. The data stored in the file is in a more compact form. In the file there is no need to store the number of trajectories and the number of coefficients in every record, since they are the same for every record in the file. There is also no need to store the pointers to the data since the arrays are stored in order. Thus when the file is actually used in the program the SMS_DATA structure is set to point to the appropiate places in the data record to be used.

file: sms/smsAnal/README

Command line 

smsAnal [-d debugMode][-f format][-q soundType][-x analysisDirection][-s windowSize][-i windowType][-r frameRate][-j highestFreq][-k minPeakMag][-y refHarmonic][-u defaultFund][-l lowestFund][-h highestFund][-m minRefHarmMag][-z refHarmMagDiffFromMax][-n nGuides][-p nTrajectories][-v freqDeviation][-t peakContToGuide][-o fundContToGuide][-g cleanTraj][-a minTrajLength][-b maxSleepingTime][-e stochasticType][-c nStocCoeff] <inputSoundFile> <outputSmsFile>"

Description of parameters

-d debugMode (default 0) [1,2,3,4,5,6,7,8,9,10,11,12]
0 no debug, 1 debug initialitzation functions, 2 debug peak detection function, 3 debug harmonic detection function, 4 debug peak continuation function, 5 debug clean trajectories function, 6 debug sine synthesis function, 7 debug stochastic analysis function, 8 debug stochastic synthesis function, 9 debug top level analysis function, 10 debug everything, 11 write residual into a file (residual.snd), 12 write original, synthesis and residual to a text file (debug.txt).

-f format (default 1) [1,2,3,4]
format of the representation: 1 harmonic, 2 inharmonic, 3 harmonic with phase, 4 inharmonic with phase.

-q soundType (default 0) [0,1]
type of sound to be analyzed. 0: sound phrase, 1: single note. Useful for  single stable notes. When this is set to 1 the default fundamental (-u) is used as the reference fundamental, there is practically no pitch detection.

-x analysisDirection (default 0) [0,1]
direction of the analysis. 0: direct, 1: reverse. Reverse is very useful for  percussive sounds or sounds with a noisy attack.

STFT parameters

-s windowSize (default 3.5) [3 <-> 7]
number of periods of fundamental frequency to use in the analysis window. The actual window size in seconds will be this value divided by the fundamental frequency found at every given moment.

-i windowType (default 1) [0,1,2,3,4]
type of analysis window to use. 0: Hamming, 1: Blackman-Harris 62 dB, 
2: Blackman-Harris 70 dB, 3: Blackman-Harris 74 dB, 4: Blackman-Harris 92 dB.

-r frameRate (default 400) [50 <-> 600]
number of analysis windows per second (Hz). This will determine the hop size of the analysis window. If given as a negative number this value will be the overlap factor, and the frame rate will be calculated from that.

Peak detection paramenters

-j highestFreq (default 12000) [20 <-> 22500]
highest frequency in Hz of the peaks to be detected. Therefore no partials higher than this frequency will be detected. It will never be higher than half the sampling-rate.

-k minPeakMag (default 0) [0 <-> 20]
minimum magnitude in dB of a peak. Peak softer than this dB value will not have any chance to be considered part of the deterministic component, that is, of the partials. This value should not be smaller than 0 since 0 is the noise threshold used in the analysis.

Harmonic detection parameters

-y refHarmonic (default 1) [1, 2, 3 ....]
number of the harmonic used for reference, 1 is the fundamental. The are some sounds, like many piano sounds, that have a very soft fundamental. In these cases it is helpful to find the fundamental frequency by looking for a harmonic other than the actual fundamental.

-m minRefHarmMag (default 30) [5 <-> 60]
minimum magnitude in dB of the harmonic used for reference in the harmonic detection process. 

-z refHarmDiffFromMax (default 30) [5 <-> 60]
maximum dB difference between the harmonic used for reference and the maximum peak.

-u defaultFund (default 100) [20 <-> 5000]
default fundamental frequency in Hz. This is the frequency that is used to set the actual analysis window size when no fundamental has been found. In normal situations it is convinient to give the value of the fundamental frequency of the begining of the sound so that it can start with a good guess. In the case of inharmonic sounds this value will be used to set the window size for the whole sound. When defaultFund is higher than highestFund it is set to this value and when it is lower than lowestFund it is set to it.

-l lowestFund (default 50) [20 <-> 5000]
lowest fundamental frequency in Hz to be searched for. Only used in harmonic sounds. In the case of inharmonic sounds this value is used as the lowest frequency to track.

-h highestFund (default 1000) [20 <-> 5000]
highest fundamental frequency in Hz to be searched for. Only used in harmonic sounds.

Peak continuation paramenters

-n nGuides (default 100) [1 <-> 500]
number of guides to be used in analysis. These guides will be used to track the partials in the sound and are the ones that will be subtracted from the original sound. The number of output trajectories is defined by the parameter nTrajectories.

-p nTrajectories (default 60) [1 <-> 500]
maximum number of trajectories, or partials, to be found. This will be the output number of trajectories. 

-v freqDeviation (default .45) [.1 <-> .5]
maximum deviation that is permited from the "guide frequency" to the continuation peak of the guide. In the case of harmonic sounds the deviation in Hz is the product of this value times the fundamental frequency. In the case of inharmonic sounds the deviation in Hz is this value times the guide frequency.

-t peakContToGuide (default .4) [0 <-> 1]
contribution of the frequency of the previous peak of a given trajectory to the current guide frequency value. If the value is 1, it means that the previous peak will completely define the guide value, the possible current fundamental will not be used to set the guide's frequency. If the value is 0, the previous guide will not be used at all.

-o fundContToGuide (default .5) [0 <-> 1]
contribution of the fundamental frequency of the current frame to the current guide frequency. This is only relevant in harmonic sounds.
 
Trajectory cleaning paramenters

-g cleanTraj (default 1) [0,1]
whether or not to clean the deterministic data after analysis. 0 no cleaning, 1 cleaning. This cleaning process gets rid of short trajectories that may not be part of an stable partial of the sound and also fill gaps in stable partials.

-a minTrajLength (default .1) [0 <-> 10]
minimum length of the trajectories in seconds. Trajectories shorter than this value will be deleted if the cleanTraj flag (-g) has been set.

-b maxSleepingTime (default .1) [0 <-> .5]
maximum sleeping time in seconds for a given trajectory. Time shorter than this value will be considered gaps in the trajectory and if the cleanTraj flag (-g) has been set, this gaps will be filled by interpolating the boundaries.

Stochastic analysis parameters

-e stochasticType (default 2) [1,2,3]
type for the stochastic representation: 1 IIR filter, 2, line segments on magnitude spectrum, 3 no stochastic analysis. The first time the analysis is done it is useful to set this to 3, this will let you check if the analysis was well done and the computation time will be much shorter.

-c nStocCoeff (default 16) [4 <-> 64 for stochasticType type 2, 
                           			     4 <-> 20 for stochasticType type 1]
number of filter coefficients for the stochastic representation. When the stochastic type is set to 2 (line segments on magnitude spectrum), this number corresponds to de number of inflexion points.





file: sms/smsMk/README

smsMk is a program that accepts a MusicKit scorefile and outputs a sound file.

Usage: smsMk <input score file> <output snd file>

smsMk recognizes the following parameters inside the scorefile:

-smsFile        
	 The sms file to be synthesized, specified as a string.

-timeOffset    
	Time offset into the sms file to begin reading. E.g. a time offset of 1 means begin 1 second into the file. 

-dur           
	Duration of the file to use. E.g. a dur of .5  means to use only .5 seconds of the file. Note that dur is computed before any time stretch change. (this parameter is specified as a number inside parentesis after the part name)

-amp           
	Scaling of the overall amplitude. A value of 1 does not modify the original amplitude, a value of 2 generates a sound with twice the amplitude of the original sound.

-ampEnv       
	Envelope for the parameter amp. The X values cover de whole sound independently of the maximum value given. The Y values are scaled by the amp parameter.  The resulting amplitude is the original amplitude multiplied by amp and by ampEnv.

-ampDet       
	The amplitude scaling to be applied to the deterministic component.

-ampDetEnv     
	The amplitude envelope for ampDet. 

-ampPartialsEnv 
	Envelope to determine which partials are affected by the amplitude change. The envelope covers all the partials independently of the length of the envelope. A value of 0 means that the given partial is not affected, a value of 1 means that the partial is completely affected, and a value in between means that the partials is only affected the given percentage.

-ampStoc	     
	The amplitude scaling applied to the stochastic component.

-ampStocEnv   
	The amplitude envelope for ampStoc.	

-ampStocCoeffEnv
	 Envelope to determine which coefficients are affected by the amplitude change. This can only be used when the analysis was done with line-segment approximation on the magnitude spectrum of the residual.		

-freq1         
	 Frequency multiplier when the envelope has value y = 1.  A value of 1 leaves the original frequency as it was, a value of 2 transposes the frequency up an octave. This modification does not change the duration of the original sound.

-freq0
	Frequency multiplier when the envelope has value y = 0. Same behaviour as freq1.

-freqEnv       
	The frequency envelope for frequency. 

-freqPartialsEnv
	Envelope to determine which partials are affected by the frequency change. It works de same than ampPartialsEnv.

-timeStretch    
	Time stretch factor applied to the sms data. A value of .5 shortens the sound to half its duration, and a value of 2 stretches the duration to  twice its original value. This modification does not affect the frequency of the sound.

-timeStretchEnv
	Envelope for the timeStretch parameter. The final duration of the sound will be the result of both timeStretch and timeStretchEnv.

-freqStretch	  
	 Modification of the distribution of the harmonics of the original sound by stretching or 
compressing them. A value of 1 leaves the partials in its original place. A value of 2 stretches progressively all the harmonics such that the fundamental remains in its own place, but as we going up the harmonics as being transposed higher and higher. The last harmonic gets transposed an octave higher than its original frequency.

-freqStretchEnv
	Envelope for the freqStretch parameter.	

-maxTraj        
	 Maximum number of trajectories to synthesize. Due that the computing time is directly related with the number of partials to synthesize, we can give a small value to this parameter when we are experimenting and put the highest value to obtain the final version.

-smsFileHyb     
	The sms file to be used to hybridize with the smsFile.

-timeOffsetHyb 
	Time offset into the smsFileHyb file.

-durHyb         
	Duration of the smsFileHyb file to use.

-freqHybFactor  
	Contributing factor of the frequency of the smsFileHyb file into the resulting sound. When the value is negative (between 0 and -1) the hybridization is done using of the first sound only its fundamental. So, when the value is -1 the resulting sound will have the frequency evolution according to the fundamental of the first sound, but the frequency relations of the second sound.

-freqHybEnv    
	Envelope for freqHybFactor.

-ampDetHybFactor
	Contributing factor of the deterministic magnitude of the smsFileHyb file into the resulting sound.When the value is negative (between 0 and -1) the hybridization is done using of the first sound only its fundamental magnitude. So,  when the value is -1 the resulting sound will have the magnitude evolution according to the fundamental of the first sound, but the magnitude relations of the second sound.

-ampDetHybEnv   
	Envelope for ampDetHybFactor.

-ampStocHybFactor
	Contributing factor of the stochastic magnitude of the smsFileHyb file into the resulting sound.

-ampStocHybEnv	 
	 Envelope for ampStocHybFactor.

-timeStretchHyb
	 Time stretch factor applied to the sms data. A value of .5 shortens the sound to half its duration, and a value of 2 stretches the duration to twice its original value. This modification does not affect the frequency of the sound.

-timeStretchHybEnv 
	Envelope for the timeStretchHyb parameter. The final duration of the sound will be the result of both timeStretch and timeStretchEnv.


there are two general parameters that can be specified in the info statement of the scorefile:

-samplingRate  
	Sampling rate of the output file, it should be 22050 or 44100.

-synthesisRate  
	Rate used in the synthesis, it does not make any sense to put it higher than the analysis rate and the change of this value does not affect much the computation time. Its main purpose is to be able to handle different analysis rates for the two sms files used in a single note.




file: sms/hybridMk/README


hybridMk is a program that accepts a MusicKit scorefile and outputs a sound file.

Usage: hybridMk <input score file> <output snd file>

hybridMk recognizes the following parameters inside the scorefile:

-sndFile1
	Sound file used as excitation (sound1).

-frameRate (default 100)
	Analisis rate for the STFT (frames/second) of sound1.

-overlapping1 (default 4)
 	Amount of overlapping of the analysis window used for sound1. The actual length of the window in seconds is 1 / ( frameRate * overlapping1).

-overlapping2 (default 4)
	Amount of overlapping of the analysis window used for sound2.

-timeOffset1 (default 0)
	Time offset in seconds for sound1.

-dur1
	Duration in sec. taken from sndFile1. If value is smaler than 1 it takes the whole sound. (this value goes in parenthesis in the scorefile)

-sndFile2
	Sound file used as the hybridizing sound (sound2).

-timeOffset2 (default 0)
	Time offset in sec. for sndFile2.

-dur2 (default total-duration)
	Duration in sec. taken from sndFile2.

-nCoefficients0 (default maximum possible)
	Number of line segments used to approximate the spectrum of sound2 at point 0 of nCoefficientsEnv.

-nCoefficients1 (default maximum possible) 
	Number of line segments used to approximate the spectrum of sound2 at point 1 of nCoefficientsEnv.

-nCoefficientsEnv
	Function to interpolate between nCoefficients0 and nCoefficients1.

-magBalance0 (default 0)
	Magnitude balance between the two sound files at value 0 of magBalanceEnv.

-magBalance1 (default 0)
 	Magnitude balance between the two sound files at value 1 of magBalanceEnv.

-magBalanceEnv (default 1)
	Function to interpolate between magBalance0 and  magBalance1. When only magnitude hybridization is desired, so, no compression parameters are given, the process is done in the time domain and the computation is much faster.

-gain0 (default 0)
	Multiplicative gain for excitation at value 0 of gainEnv. 

-gain1 (default 1) 
	Multiplicative gain for excitation at value 1 of gainEnv. 

-gainEnv (default 1)
	Function to interpolate between gain0 and gain1.

-timeStretch0 (default 0)
	Time stretch value for sound2. Given that the two input sounds can have different durations, an implicit stretching or compression is applied to sound2 in order to have the same number of frames in the two sounds. Therefore the analysis frame-rate used for sound2 is different than the one used for sound1. However when the time-stretch parameter is set, it modifies the implicit stretching. timeStretch0 is the time stretch at point 0 of timeStretchEnv.

-timeStretch1 (default 1)
	time stretch of hybridizing sound at point 1of timeStretchEnv.

-timeStretchEnv (default 1)
	function to interpolate between timeStretch0 and timeStretch1.

-smoothOrder (default 0)
	order of the smoothing filter to use on the spectral envelope.

-compression0Env (default 0)
Before multiplying the magnitude spectra of the two sounds, they are normalized and compressed by applying a compression-envelope as a way to control the relative contribution of each one in the final output. This envelope can have any value between 0 and 2. A value of 0 compresses the corresponding spectrum magnitude point of sound2 and does not modify the spectral value of sound1, therefore the only contributing spectral value is the one of sound2. A value of 2 produces the opposite, and a value of 1 leaves the two spectral magnitude values as they are. compression0Env is the envelope used when compressionIntEnv is at value 0.
	
-compression1Env (default 1)
	 envelope used when compressionIntEnv is at value 1.
	
-compressionIntEnv (default 1)
	Interpolation envelope between compression0Env and compression0Env. When the value is 0 the envelope used is compression0Env, when the value is 1 the envelope used is compression1Env. Values in between result in the interpolation from the two envelopes.

Notes on Envelopes:
Envelopes are formed by (x,y) pairs, being x the relative time, and y the relative amplitude. 
The x range (diference beetween the last (maximal) and first (minimal) x values of the envelope) is applied to the total length of the resulting output sound. 
The y range can also be compressed/expanded using the pairs of parameters formed by two subparameters 0 and 1 (param0, param1), that can be related to these envelopes, and do represent the two values corresponding to y amplitudes of 0 and 1 respectively.

These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.