merge.c

This is merge.c in view mode; [Download] [Up]
/* Copyright William F. Schelter, University of Texas 1987 */

/* This file may be copied by anyone, but changes may not
be made without permission of the author.  The author hopes it will be
useful but cannot assume any responsibility for its use or problems
caused by its use.  The program is provided as is. */

/* The program takes two files file1 = orig and file2 = orig.V 

tutorial% merge orig orig.V > foo 

and copies orig according to the recipe in orig.V.  The advantage of
this program is that it does this according to the context of orig.
Thus even though orig might change slightly (eg some one added an
extra line to the copyright notice), the same change file will
probably still be valid.  

If the first argument is - then the orig is standard input.
If a third argument is supplied, it is the name of a file to use
instead of standard output.


tutorial% merge orig orig.V | merge - change2 final

would take the result of merge of orig and orig.V and use it to merge
with change2 to produce the file final.


The format of a change (.V) file is very simple: There is only ONE
type of command in a change file.  REPLACE X by Y.  Here X represents
a chunk of text in the orig file, and Y the substitution which you
wish to make for this occurrence.  The Y appears explicitly in the
change file, while the text X may be specified fully and explicitly,
OR by giving sufficient context from the beginning and end of X.  Thus
in general it takes three things to specify a change.  The beginning
of X (Xbegin), the end of X (Xend), and all of Y.  These three pieces
of text are separated by four delimiters.  The delimiters are not
single characters, but rather sequences of four characters.  This is
done so as to avoid having to quote the delimiter (see QUOTING below).
The delimiters are "\n@s[" "\n@s," "\n@s|" and "\n@s]". 
NOTE: The \n (Newline) Character IS PART OF THE DELIMITER in ALL CASES.

@s[X
@s|Y
@s]

Thus in the above case the X text is only "X" it does not have any
newlines in it! They belong to the delimiters.  For "X\n" we would see


@s[X

@s|Y
@s]


The general case where X is a very long chunk of text, or perhaps something
sensitive to copyright, so that you cannot include several pages, you 
could make Xbegin be the first few lines, and Xend the last few lines.
All intervening lines (including the Xbegin and the Xend, would be ripped
out, and replaced by Y.

@s[Xbegin
@s,Xend
@s|Y
@s]


One cycle of the merge may be thought of as:
The merge program looks in the change (.V) file for the next \ns[, 
in order to determine  the next values for Xbegin,Xend,and Y.
Having determined these, its position in the (.V) file will have 
advanced to after the \n@s].

The merge program then starts at its current position in the original
file and searches for the next occurrence of Xbegin, marking its
beginning, then for the end of Xend.  The inclusive interval so
marked, is deleted and Y is substituted. The current position in the
original file is now at the end of the Xend text.  The next Xbegin
text must occur after that point.  Only one pass is made through the
files.  

It is an error if the start of Xend does not follow
the end of Xbegin.  Thus Xbegin and Xend may NOT overlap.  A common
case will be that Xbegin is the entire interval and Xend = ""
In this case the merge program, if it finds \n@s| before \n@s,
will assume you want Xend="".

EXAMPLES:

@s[Hi bill
@s,
@s|new body
@s]

would delete the string "Hi bill" replacing it with "new body"
Xbegin="Hi bill"
Xend=""
Interval = "Hi bill"
Equivalently since the E interval is empty, we could have just
omitted the \@s,

@s[Hi bill
@s|new body
@s]


Example of change file with two changes:
****************

@s[(defmacro lcase (item &body body)
@s,      (setq  v (car rest))
@s|(defmacro lcase (items &body body)
      (setq  v (cadr rest))
@s]

Comments are allowed in change files.  In fact anything not between
matching "\ns[" and "\ns]" is a comment. 


@s[How is he
@s,
He is fine.
@s|He is sick.
@s]

*******
end of change file

The first change would replace the interval of the original file
"(defmacro....       (setq  v (car rest))"
by
"(defmacro lcase (items &body body)
      (setq  v (cadr rest))"

If the program could not find the interval "(defmacro....  (setq v
(car rest))" in the orignal file it would warn you. 

The intervals in the change file, must occur in the same order as in
the original file.  There is an emacs program merge.el which can
mechanically produce a changes file from an original and an edited
version.

Note: For convenience we pretend that the change file starts with
a new line, even if it does not.  Thus if @s[  are the first three
characters of the file and CHSTART1 = \n@s[, we count this as a
CHSTART1.  Since it is in the first column, it "appears" to have
the new line there.


QUOTING:

In order to have a change which involves one of the four letter
delimiters given above, we use the convention that "\n@@" in the first
column translates to "\n@".  You need not perform this quoting of @
unless the merger would be confused.  For example \n@(defun .. would
be ok, since this can't be mistaken for one of our delimiters.
Nonetheless \n@@(defun or \n@@s[ would translate to have one @ sign,
in the merge output.  The reason for not doubling all @ signs, is that
it is very easy to scan (visually) a change you are constructing, to
see that there are no @ signs in the first column, or at least none
which could be confused for the four letter change delimiters
"\n@s[","\n@s," ...  A poor human constructing a change (.V) file
should not have to sort through the X or Y text adding quoting
characters.

Note on length: Y may be any length, but Xbegin or Xend, may only be
CONTEXT_LIMIT long.

*/










/******************  THE CODE ********************/


#include <stdio.h>


#define CONTEXT_LIMIT 3000 /* size of the longest delimiter or replacement */

char *malloc();
void copy_rest();
char ssearch_for_string();

#define NULL_OUT (FILE *)0

#define CHSTART1 "\n@s["
#define CHSTART2 "\n@s,"
#define CHSTART3 "\n@s|"
#define CHSTART4 "\n@s]"
#define ACCEPT ",|"
#define NOACCEPT (char *) 0
#define NUL '\0'
#define TRUE 1
#define FALSE 0
#define eofch(ch) ((unsigned char)ch == (unsigned char) EOF)

char filenames[600];

#define myerror(string,arg) {(void)fprintf(stderr,string,arg); exit(-1);}

main(argc,argv)
int argc;
char *argv[];
{FILE *orig,*changes,*out;
 char *context,*endcontext;
 char *origname,*altername,*outname;
 char found;
 context=malloc(CONTEXT_LIMIT+2);
 endcontext=malloc(CONTEXT_LIMIT+2);

 outname=(char *)0;

 if (argc==1)
   {int tem;
    origname=filenames; altername=filenames+200; outname=filenames+400;
     /* get names from stdin */
     if (tem=scanf("%s %s %s",origname,altername,outname));
            else myerror("Three args weren't supplied: scanf returned %d\n",tem);
   }
 else{  if (!((argc==3) || (argc==4)))
       { myerror("Usage: merge file-orig file-changes [out-file]\n %d args given",argc-1);}
 else
   { origname=argv[1];
     altername=argv[2];
     if (argc >= 4) outname=argv[3];}};


/* now we have the names either from command or stdin, so open files */

if(origname[0]=='-' && origname[1]==NULL)
  orig=stdin;
 else{
 orig=fopen(origname,"r");
 if (!orig) {perror(origname); exit(-1);};
}
 
 changes=fopen(altername,"r");
 if (!changes) {perror(origname); exit(-1);};
 if (outname)
   {out=fopen(outname,"w");
    if (out); else {perror(outname); exit(-1);}}
 else out=stdout;

/* check if the file starts with chstart1 - newline. to avoid 
people thinking that starting file with @s[ is ok. */
 {char ch, *str = CHSTART1;
  while(*(++str)) /* skip the newline start */
    { (ch=getc(changes));
      if (ch == *str) ;
      else
      { ungetc(ch,changes); goto not_found;}
    }
   goto got_one;
  not_found:;}

 {while(search_for_string(changes,CHSTART1,NULL_OUT,FALSE) > 0)
    { got_one:
      if (found=
	  ssearch_for_string(changes,CHSTART2,context,CONTEXT_LIMIT,TRUE,
			     ACCEPT));
      else
	{myerror("\nNo end for start change context in change file:\n`%s'\n",context);};
      if (found==ACCEPT[1])
	*endcontext=NUL;
      else
	{
	  if /* there is probably a non null  endcontext */
	    (ssearch_for_string(changes,CHSTART3,endcontext, 
				CONTEXT_LIMIT,TRUE,NOACCEPT));
	  else
	    {myerror("No %s at beginning of line to denote end of change context",
		CHSTART3);}};
/* skip in orig down to the end of the context,copying thru begin context */
      if (search_for_string(orig,context,out,FALSE)>0); 
      else{myerror("\nCould not find the change start in original:\n`%s'\n"
	       ,context);};
      if  /* copy out the changed version */
	(search_for_string(changes,CHSTART4,out,TRUE)>0);
      else
	{myerror("No %s at beginning of line to denote end of change context",
	    CHSTART4);};
         
/*finish skipping over the region to be deleted in orig */	
      {if( search_for_string(orig,endcontext,NULL_OUT,FALSE) > 0);
       else
	 {myerror("\nCould not find the end of the change in original:\n`%s'\n",
	     endcontext);}}
    }
    copy_rest(orig,out);
    return 0;
  }}


string_match(sta,stb)
char *sta, *stb;
{while(*sta!=0)
   {if (*(sta++) != *(stb++)) return 0;}
  if (*stb==0) return 1; else return 0;
 }

void
copy_rest(file,out)
FILE *file,*out;
{register char ch;
while(1)
 {  
   ch=getc(file);
   if (eofch(ch) && feof(file)) break;
   putc(ch,out);}}

/* advance file to end of first occurrence of string, copying to out 
until the beginning of string */

#define USE_UNQUOTE 1

search_for_string(file,string,out,unquoting)
FILE *file,*out;
char *string;
int unquoting;
{int result;
 result=search_for_string1(file,string,out,USE_UNQUOTE && unquoting);
 return result;}


char *nxt,*lim,*ungetlim,*bp; 
char buffer[CONTEXT_LIMIT];

/*
void
myungetc(ch)
char ch;
{*bp++ = ch;}

char
mygetc(file)
FILE *file; 
{char x=((bp==buffer)? getc(file) : *--bp);
 return x;
}
*/

#define mygetc(file) ((bp==buffer)? getc(file) : *--bp)
#define myungetc(ch) *bp++ = ch

search_for_string1(file,string,out,unquoting)
FILE *file,*out;
char *string;
int unquoting;
{ /* char *nxt,*lim; */
  char  ch,*s;
  nxt=lim=(char *)0;
 bp=buffer;
 if (*string==NUL) return 1; unquoting;
  while(1)
  {begin:
     ch=mygetc(file);
   if ((eofch(ch)) &&(feof(file))) return 0;
   if( ch==*string)
     {				/* loop for checking */
       s = string;
       while(*(++s)!=0)
	 {(ch=mygetc(file));
	  if (eofch(ch) && feof(file))
	    {char *cp=string;while (cp++<=s)
	       {putc(*cp,file) ; return 0;}};
	  if (*s!=ch)
	    { if (out) putc(*string,out);
	      {char *cp=s;
	       if (!(unquoting && ch==string[1] && (s-string ==2)))
	           myungetc(ch);
	       while (--cp > string)
		 myungetc(*cp);
	      goto begin;}}
	}
        return 1;
   /*    printf("<found one>");  */
       }
   else if (out) putc(ch,out);};
  }



#define PUTC(ch,out) {if(ind++ < outlim) ((*(out++))=(ch));\
			       else return -1;}

char
ssearch_for_string1(file,string,out,outlim,unquoting,accept)
int outlim,unquoting;
FILE *file;
char *out;
char *string,*accept;
{register char ch,*s; int ind=0;
 if (*string==NUL) return 'a';
 while(1)
   {ch = getc(file);
  begin:
    if (feof(file)) return (char) 0;
    if (ch==(*string))
      {s=string; ind=0;
       while(*(++s)!=0)
	 {if ((*s==(ch=getc(file)))
	      || (accept && *s==*accept && ch == *(accept+1)));
	  else
	    {if (out)
	       {char *cp; cp=string;
		 if (unquoting && ch==string[1] && (s-string ==2))  s--;
		while (cp!=s)
		  {PUTC(*cp,out);cp++;}
	      }
	       break;}}
       if(*s==0)
	  {PUTC(((char) 0),out);
	 /* found a match */
	   return(ch);}
       else goto begin;}
    else if (out) PUTC(ch,out);
  }
}

char
ssearch_for_string(file,string,out1,outlim,unquoting,accept)
int outlim;
FILE *file;
char *out1;
char *string,*accept;
int unquoting;
{char result;
 result=ssearch_for_string1(file,string,out1,outlim,unquoting,accept);
 return result;}

/* 
* 
* To do:
* 1)The buffering for mygetc could be more efficient (in local variable).
* 2)Eliminate the double function calls used during debugging.
* 3)Improve error message, for help in finding context if a change
*   is not found.
*/
These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Netfuture.ch.