07_DMA_To_DSP.rtf

This is 07_DMA_To_DSP.rtf in view mode; [Download] [Up]
Written by J. Laroche at the Center for Music Experiment at UCSD, San Diego California. December 1990.

Using Host to DSP DMA.

It is now possible to use DMA to pass data from the host to the DSP. DMA transfer rate is about 2 Mbytes per second. This means that about 20 mono 44.1 KHz channels can be processed simultaneously using DMA protocol. Therefore, DMA should be used whenever a fast transfer rate is necessary, like in the case of sound files.
The document 04_Through_DSP gives an example of how to play a sound through the DSP, sending the samples to the DSP in a non DMA stream, then getting them back to the DACs using DMA. This works with most sounds except when the sampling rate is 44100 and the sound is stereo, in which case the driver cannot send enough data to keep up with the DACs. This is the kind of case where you need to use DMA.

The C program is given below. 
It can also be found in Examples/06_DMA_To_DSP.


// ------------------------------- Beginning of program.

#import <sound/sound.h>
#import <sound/sounddriver.h>
#import <mach.h>
#import <stdio.h>


#define Error(A,B) if((A)) {fprintf(stderr,"%s %s\n",B, SNDSoundError((A)));\
mach_error(B,(A)); }

#define DMASIZE 4096

static int done;

static void write_started(void *arg, int tag)
{
    fprintf(stderr,"Starting playing... %d \n",tag);
}

static void write_completed(void *arg, int tag)
{
    fprintf(stderr,"Playing done... %d\n",tag);
    done = 1;
}


static void over_run(void *arg, int tag)
{
    fprintf(stderr,"Under or Over run... %d\n",tag);
}


void main (int argc, char *argv[])
{
    static port_t dev_port, owner_port, cmd_port;
    static port_t reply_port, read_port, write_port;
    int i, protocol;
    kern_return_t k_err;
    snddriver_handlers_t handlers = { 0, 0, 
    		write_started, write_completed,0,0,0,over_run, 0};
    msg_header_t *reply_msg;
    SNDSoundStruct *dspStruct;
    SNDSoundStruct *sound;
    short *location;
    int length;
    int low_water = 48*1024;
    int high_water = 512*1024;	// 64 instead of 512 makes it work like shit!
    short *foo;
    int LENGTH;
    int low_SR = 0;
    int stereo = 0;

    if(argc == 1) { printf("I need a 16bit linear sound file...\n");
    exit(1);}
        

    k_err = SNDAcquire(SND_ACCESS_DSP|SND_ACCESS_OUT,0,0,0,
    	NULL_NEGOTIATION_FUN,0,&dev_port,&owner_port); 
    Error(k_err,"SND and DSP acquisition  ");
    
    k_err = snddriver_get_dsp_cmd_port(dev_port,owner_port,&cmd_port);
    Error(k_err,"Cmd port acquisition  ");

    k_err = SNDReadSoundfile(argv[1], &sound);
    Error(k_err,argv[1]);
    
    low_SR = (sound->samplingRate == SND_RATE_LOW);
    stereo = (sound->channelCount == 2);
    printf("Playing at %d Hz %s\n",((low_SR)?22050:44100),((stereo)?
    	"stereo":"mono"));

    k_err = SNDGetDataPointer(sound,(char**)&location,&length,&i);
    Error(k_err,"Data Pointer");

    
    protocol = SNDDRIVER_DSP_PROTO_RAW;
    k_err = snddriver_stream_setup(dev_port, owner_port,
    				 SNDDRIVER_DMA_STREAM_TO_DSP,
				 DMASIZE,2, 
				 low_water, high_water,
				 &protocol, &read_port);
    Error(k_err,"Stream 1 set_up");
    k_err = snddriver_stream_setup(dev_port, owner_port,((low_SR) ?
			SNDDRIVER_STREAM_DSP_TO_SNDOUT_22:
			SNDDRIVER_STREAM_DSP_TO_SNDOUT_44),
			    	 DMASIZE, 2, 
				 low_water, high_water,
				 &protocol, &write_port);
    Error(k_err,"Stream 2 set_up");
    
    k_err = snddriver_dsp_protocol(dev_port, owner_port, protocol);
    Error(k_err,"Protocol set-up  ");
    
    k_err = port_allocate(task_self(),&reply_port);

    k_err = SNDReadDSPfile("perso_b.lod", &dspStruct, NULL);
    Error(k_err,"Reading .lod file  ");

    k_err = SNDBootDSP(dev_port, owner_port, dspStruct);
    Error(k_err,"Booting DSP  ");
    printf("DSP booted\n");

    if(!stereo)		// If mono, tell it to the DSP!
    k_err = snddriver_dsp_host_cmd(cmd_port,21,SNDDRIVER_LOW_PRIORITY);


    // To use DMA to the DSP, you need alignment with vm pages. Therefore you
    // need to allocate virtual memory, and copy your sound in it. Here, we 
    // just copy an integer number of virtual memory pages. To play the whole
    // sound, you would need to allocate more memory, and copy the rest 
    // in a for loop. vm_page_size is a global variable containing the size of
    // the virtual memory pages (in bytes.)
    
    LENGTH = (length*sizeof(short)/vm_page_size)*vm_page_size/sizeof(short);

    vm_allocate(task_self(),(vm_address_t *)(&foo),2*LENGTH,TRUE);
    Error(k_err,"VM Allocation  ");
    
    vm_write(task_self(), (vm_address_t)(foo), (pointer_t)location,2*LENGTH);
    Error(k_err,"VM Write  ");

    k_err = snddriver_stream_start_writing(read_port,
    					 (void *)foo, LENGTH,
					 1,
					 0,0,
					 1,1,0,0,0,1, reply_port);
    Error(k_err,"Starting writing  ");


    reply_msg = (msg_header_t *)malloc(MSG_SIZE_MAX);

    done = 0;
    while (done != 1) 
	{
	int i[2];	// 2 values: the header, and the value
	i[0] = 1;	// 1 stands for volume
	
	    printf("Value of the volume (max 8388608)? ");
	    scanf("%d",i+1);
	    k_err = snddriver_dsp_write(cmd_port,i,2,sizeof(int),
   						SNDDRIVER_MED_PRIORITY);
	}
    
}
// ---------------------------------- End of program.



SETTING-UP THE DRIVER.

We start reading the sound file, and getting info about its sampling rate, and its number of channels, because we'll need these info to set-up the stream correctly.
Then comes the actual streams set-ups

	protocol = SNDDRIVER_DSP_PROTO_RAW;
	k_err = snddriver_stream_setup(dev_port, owner_port,
					SNDDRIVER_DMA_STREAM_TO_DSP,
					DMASIZE,2, 
					low_water, high_water,
					&protocol, &read_port);

	k_err = snddriver_stream_setup(dev_port, owner_port,((low_SR) ?
			    SNDDRIVER_STREAM_DSP_TO_SNDOUT_22:
			    SNDDRIVER_STREAM_DSP_TO_SNDOUT_44),
					DMASIZE, 2, 
					low_water, high_water,
					&protocol, &write_port);
					
We set up two streams: one from the DSP to the DACs with a sampling rate corresponding to that of the sound file, and on from the memory to the DSP. This one is special since instead of SNDDRIVER_STREAM_TO_DSP (like in the non-DMA examples) we use SNDDRIVER_DMA_STREAM_TO_DSP. This signals to the driver that the stream from the memory to the DSP will use DMA protocol. It also means that the DSP code should implement DMA protocol to get samples from the host.
The high_water value is now set to 512*1024. This is higher than the value used in the non-DMA example: if the high water lark is not high enough, we'll get drop-outs, like in the case of non-DMA protocol. 512*1024 is a empiric value that gives satisfactory results.

The DSP is then loaded with the DSP code, and booted. If the sound is mono (only one channel) a host command is issued to signal the DSP that only one channel is sent (see DSP code below.)


SENDING THE SAMPLES.

Another difference with non-DMA stream is that the samples we send have to be aligned with virtual memory pages (if you don't, you'll get a "bad alignment" message.) This means that the first sample must lie exactely at the start of a virtual memory page. This is not usually the case when you read a sound file using SNDReadSoundfile(), in which case the samples are just copied to a newly allocated place in the memory, not necessarily correponding to a virtual memory boundary. This is why we allocate virtual memory

    vm_allocate(task_self(),(vm_address_t *)(&foo),2*LENGTH,TRUE);

and copy the sound into it

    vm_write(task_self(), (vm_address_t)(foo), (pointer_t)location,2*LENGTH);

Refer to the mach functions document for info about these functions.
The thing is that these mach functions will allocate (resp. copy) only an integer number of vm pages. The size of a vm page is given in the global variable vm_page_size, and is currently 8192 bytes. Therefore, if you want to copy the whole sound into an area in the virtual memory, you need to allocate more than really needed, and then copy an integer number of pages using vm_write() and finish the rest with a for loop. Here, we just copy as many pages as possible, and just dump the remaining samples:

    LENGTH = (length*sizeof(short)/vm_page_size)*vm_page_size/sizeof(short);

The samples are then sent using the classical call to the function   

    k_err = snddriver_stream_start_writing(read_port,
    					 (void *)foo, LENGTH,
					 1,
					 0,0,
					 1,1,0,0,0,1, reply_port);

Note that we ask for write_started, write_completed and overflow messages. Overflow messages are sent each time the DACs run out of samples. They are usually associated with blanks or drop-outs.


CONTROLLING the PLAY-BACK.

Now that the driver is sending and receiving data to and from the DSP, we need to control the play-back and therefore, we need to send parameters to the DSP (play-back volume, amount of pitch shift...) We cannot use the scheme we used when we only had DMA from the DSP to the host because there is a risk of collision between the samples and the parameter values. The driver won't reliably write a data and send a host-command in one DMA-out buffer. However, it's possible to send many data in between DMA buffers. We pass data the following way: Each parameter update is composed of a header and a value. The header indicates what parameter is to be updated, and the value is obviously the update value. The DSP stores these values on a queue inside its memory and dispatches the values to the parameters according to the headers, after each DMA-out buffer. 
For each parameter update, we have two values, the first of which contains the header (1 in our case.)

	int i[2];	
	i[0] = 1;	

These two values are sent using the classical snddriver_dsp_write() function.

	snddriver_dsp_write(cmd_port,i,2,sizeof(int),SNDDRIVER_MED_PRIORITY);

Note that it's the driver's responsability to avoid collisions betwee DMA and parameters. It does that pretty well.


DSP CODE.

The DSP code is given below. 
It can also be found in Examples/06_DMA_To_DSP.


    ;; -------------------------- Beginning of program
    

	include "ioequ.asm"

IW_Buff			equ	8192		;Start address of input buffer
Buff_size		equ	8191
Control_Queue	equ	0		; Start address of the control value queue
Control_Size	equ	99		; Size of that queue.
DMA_SIZE		equ	4096

DM_R_REQ		equ	$050001	;message to host to request dma-OUT
DM_W_REQ	equ	$040002	;message to host to request dma-IN 
VEC_R_DONE	equ	$0024		;host command: dma-OUT complete
VEC_W_DONE	equ	$0028		;host command: dma-IN complete

Vol_Header		equ	$01		; Signals that next value is a volume.
Stuff_Header	equ	$02		; Signals that next value is something.


;;;------------------------- Variable locations
;;;

x_sFlags		equ	$00fd		;dspstream flags
DMA_DONE	equ	0		;  indicates that dma is complete
DMA_ACCEPTED equ	1
Stop_Flag		equ	$00		; Stop DMA flag
bull				equ	$02
volume			equ	$03


writeHost macro source
_one	
	jclr	#m_htde,x:m_hsr,_one	
	movep	source,x:m_htx
	endm
	
readHost macro	dest
_two
	jclr	#m_hrdf,x:m_hsr,_two	
	movep	x:m_hrx,dest
	endm
	


	org	p:$0			
	jmp	reset

	org	p:$20
	movep	x:m_hrx,y:(R2)+
	nop

	org	p:$2A
	move	#>2,N1			; When the sound is mono.
	nop

	org	p:VEC_R_DONE		; DMA-OUT completed.
	bset	#DMA_DONE,x:x_sFlags
		
	org	p:$2C			; DMA-IN accepted: start reading.
	jsr	startDMA_In		

	org	p:100
	
reset
	movec   	#6,omr			;data rom enabled, mode 2
	bset    	#0,x:m_pbc		;host port
	bset		#3,x:m_pcddr		;   pc3 is an output with value
	bclr		#3,x:m_pcd		;   zero to enable the external ram
	movep   #>$000000,x:m_bcr	;no wait states on the external sram
        movep   #>$00BC00,x:m_ipr  	;intr levels: SSI=2, SCI=1, HOST=2
	clr		a
	move	a,x:x_sFlags		;clear flags
	bset    	#m_hcie,x:m_hcr		;host command interrupts
	move	#0,sr			;enable interrupts
	
	move	#>1,N1			; Stereo sound by default.
	jmp	main
	
		
main
	move	#>IW_Buff,R0
	move	#>Buff_size,M0
	move	#>0,R1
	move	#>DMA_SIZE-1,M1
	move	#>Control_Queue,R2
	move	#>Control_Queue,R3
	move	#>Control_Size,M2
	move	#>Control_Size,M3
	move	#>.9,a
	move	a,x:volume
	clr		a 
	move	a,x:x_sFlags


_main_loop
	jsr		Read_DMA_Buffer	; Get a buffer from the host
	jsr		Write_DMA_Buffer	; Send it back!
	jsr		update_para		; dispatch the received control values.
	jmp		_main_loop			; Until the next earthquake...
	
	
;; Subroutine that reads one complete DMA from the host, and puts it in the 
;; input buffer. If the sound is mono, then two samples are copied instead
;; of just one.

Read_DMA_Buffer
	jset		#m_hf1,x:m_hsr,Read_DMA_Buffer
	move	#>IW_Buff,R0
	bclr		#m_hrie,x:m_hcr		; Disable the host receive interrupt.
							; since the following values are samples...
	writeHost #DM_W_REQ
	move	#>IW_Buff,R0
	jclr		#m_hf0,x:m_hsr,_ready		
_ready
	btst 		#DMA_ACCEPTED,x:x_sFlags
	jcc		_ready
	
	move	#DMA_SIZE,b
	do		b,_end_DMA_loop
_clear
	jclr		#m_hrdf,x:m_hsr,_clear	
	movep	x:m_hrx,a
	move	a,x:bull
	jclr		#15,x:bull,_no_correct	; This is a modification which corrects
	move	#>$FF,a2			; the driver's bug. It sign-extends the
	move	#>$FF0000,X1		; received short value, if necessary
	or		X1,a	
_no_correct
	rep		N1			; If mono, copies samples twice.
	move	a,y:(R0)+		; for left and right channels.
_end_DMA_loop
	jclr		#m_hrdf,x:m_hsr,_then	
	move	x:m_hrx,X0		; Continue reading incoming data...
_then
	jset		#m_hf1,x:m_hsr,_end_DMA_loop	; until HF1 is reset.
	rts



;; Subroutine that sends a DMA buffer to the host.
;; This is a classical DMA out routine...

Write_DMA_Buffer	
	bset		#m_hrie,x:m_hcr		; enable reception of control values.
	move	#>IW_Buff,R0
	do		N1,_ackEnd		; If mono, we need to send two buffer
_DMA_out					; for each received one...
	jclr		#m_htde,x:m_hsr,_DMA_out
	movep	#DM_R_REQ,x:m_htx		
	
_ackBegin
	jclr		#m_hf1,x:m_hsr,_ackBegin	;    wait for HF1 to go high
	move	#>DMA_SIZE,b

	do		b,_prodDMA
_ddd	
	move	y:(R0)+,X1
	move	x:volume,X0
	mpyr	X0,X1,a
	writeHost a
	
_prodDMA
	btst		#DMA_DONE,x:x_sFlags
	jcs		_endDMA
	jclr		#m_htde,x:m_hsr,_prodDMA
	movep	#0,x:m_htx		    ; send zeros until noticed
	jmp		_prodDMA
_endDMA
	bclr		#DMA_DONE,x:x_sFlags	; Clear the flag for next buffer!
_ackEnd
	rts


;; Subroutine called when the host is ready to send the samples. It reads an
;; integer.

startDMA_In
	readHost X0				; The host sends a integer.
	bset		#DMA_ACCEPTED,x:x_sFlags	; But we don't really need it.
	rti

;; Subroutine that checks the control values queue, and dispatches the received
;; values to the corresponding parameters (here, only the volume...)


update_para
	move	R2,a
	move	R3,b		
	cmp	a,b #>Vol_Header,b	; Is the queue empty?
	jeq		_end
	move	y:(R3)+,a			; If not, what's the header?
	cmp	a,b #>Stuff_Header,b	; Is it a volume header?
	jeq		_update_vol			; YES: update the volume
	cmp	a,b 				; Is it a stuff header?
	jeq		_update_stuff		; YES: update the stuff, etc...
	jmp		update_para		; do it again Sam
_end
	rts
	
_update_vol				; Updates the value of the volume
	move	y:(R3)+,a		; The next value is the volume.
	move	a,x:volume
	jmp		update_para

_update_stuff				; would update the value of another
	jmp		update_para		; parameter (amount of reverb etc...)
    
    ;; ----------------------- End of program.


To compile this DSP program, you would type:
asm56000 -a -b -os,so -l myProgram.asm

Host to DSP DMA transfer is DSP initiated and is done the following way:
� When the DSP is ready to read a DMA buffer, it sends the driver a DM_W_REQ request.
� The driver performs all sorts of initializations and sends a DMA-accepted host command (address $2C on the DSP) when it's ready to start, along with an integer containing additional info.
� Upon reception of the host command, the DSP can start reading data from the host, and continues reading after one DMA buffer has been read until HF1 is reset.
� When the host has finished sending a buffer, it resets HF1 and sends a host command to the DSP (address $24 on the DSP program memory.)
� When it receives this host command (or as soon as it tests that FH1 is low), the DSP can get another DMA buffer, or send a DMA out buffer to the host.

The program implemented here is very simple: it reads one DMA buffer from the host, then sends it back to the host using also DMA protocol and loops forever. The DMA out protocol is the same as in other examples (05_Dig_Ears etc...) The DMA in is done as follows;

 A DM_W_REQ DMA request is sent to the host and the DSP waits until it receives a DMA-accepted host command. This host command calls the startDMA_In subroutine which reads an integer and sets a bit in x_sFlags:
 
	startDMA_In
	    readHost X0					; The host sends a integer.
	    bset	#DMA_ACCEPTED,x:x_sFlags	; But we don't really need it.
	    rti
    
The DSP checks that precise bit in x_sFlags to find out whether the driver is ready to send the data.
    
    _ready
	    btst 	#DMA_ACCEPTED,x:x_sFlags
	    jcc	_ready

Then it starts reading the samples from the host, sign extending them if necessary (see Pitfalls for more info about the necessity of sign extension.)
After reading one DMA buffer, the DSP continues reading data sent by the host until the driver resets HF1, indicating that the DMA transfer is done.
    
    _end_DMA_loop
	    jclr	#m_hrdf,x:m_hsr,_then	
	    move	x:m_hrx,X0				; Continue reading incoming data...
    _then
	    jset	#m_hf1,x:m_hsr,_end_DMA_loop	; until HF1 is reset.

The DSP can then perform other tasks and eventually sends the buffer back using standard DSP->host DMA protocol.	


RECEIVING PARAMETERS UPDATES.

As we saw in the C program, the host sends parameters using couples of values {header,value}. The DSP receives them during the time it sends one DMA-out buffer (to avoid collision) using host data receive interrupts, and put them in a control queue indexed by R2:
	
	org	p:$20
	movep	x:m_hrx,y:(R2)+
	nop

These host data receive interrupts are disabled during DMA-in because the driver is then sending only samples, and the DSP reads them using normal hardware-handshake. 
At the end of the DMA-out buffer, the DSP calls a subroutine, update_para, to update its parameters according to what's present in its control queue. The queue is examined using R3, until no control data is present:

update_para
	move	R2,a
	move	R3,b		
	cmp	a,b 	
	jeq		_end		;; _end simply returns (rts)
	
the headers in the queue are compared to the predefined headers, and the DSP jumps to the corresponding function when it recognizes a header (here, the Stuff_Header could be anything, a filter coefficient etc...) and does that until the queue is completely checked.

	move	#>Vol_Header,b
	move	y:(R3)+,a
	cmp	a,b 
	jeq		_update_vol			
	move	#>Stuff_Header,b	
	cmp	a,b 				
	jeq		_update_stuff
	......	
	jmp		update_para		

The update function simply store the following parameter, and returns to the beginning of the queue check:

_update_vol				
	move	y:(R3)+,a		
	move	a,x:volume
	jmp		update_para

This scheme for passing control data to the DSP is very general (very similar to that used in the Orchestra). It makes it possible to send many data in one call to the snddriver_dsp_write() function, and always updates the parameters at the end of a DMA buffer (which is a nice way to be sure all the parameters are updated at the same time.) This scheme could have been implemented in all other examples when host commands where used instead.


IMPORTANT REMARKS.

� After sending a DMA-OUT buffer to the host, the DSP expects another DMA-IN buffer and therefore sends the host a DM_W_REQ DMA request. If the host doesn't have any more data to send, the driver will stay stuck in that position where the DSP is expecting a DMA buffer. This can be annoying, therefore the C program should signal the DSP that all the samples have been sent after the last DMA buffer, then send an additionnal DMA buffer containing junk data to deblock the driver. This can be done with a special host command sent in the write_completed function for example. The DSP should recognize the host command, receive the following DMA buffer (throwing away the data), and NOT send another DMA-IN request after the end of the reception.

� Be sure you send the correct amount of data to the DSP when you use the snddriver_stream_start_writing() function. Sending non allocated or non copied data can wedge the NeXT machine (with a long reboot...) The arguments to vm_allocate() and vm_write() are expressed in bytes, not in samples!
These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.