ftp.nice.ch/peanuts/GeneralData/Usenet/news/1989/CSN-89.tar.gz#/comp-sys-next/1989/Nov/kernel-corruption-on-330Mb-hd???

This is kernel-corruption-on-330Mb-hd??? in view mode; [Up]


Date: Sun 09-Nov-1989 19:31:36 From: Unknown Subject: kernel corruption on 330Mb hd??? Has anyone had any problems with your NeXT not booting up off the hard drive? What happens is the machine locks up during the boot process. You fix the problem by booting off optical and copying the sdmach file from optical to the hd. This has happened to me once and others three more times. Any suggestions, comments? Blake Hughes, undergrad, University of Kansas >From: eht@f.word.cs.cmu.edu (Eric Thayer)
Date: Sun 09-Nov-1989 21:53:00 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? In article <17504@kuhub.cc.ukans.edu> 2FHGKINGLY@kuhub.cc.ukans.edu writes: >Has anyone had any problems with your NeXT not booting up off the hard >drive? What happens is the machine locks up during the boot process. >You fix the problem by booting off optical and copying the sdmach >file from optical to the hd. This has happened to me once and others >three more times. Any suggestions, comments? If you get the waiting for SCSI to become ready ............... I've seen this before a couple of times. > >Blake Hughes, undergrad, University of Kansas
Date: Sun 09-Nov-1989 23:43:18 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? In article <6906@pt.cs.cmu.edu> eht@f.word.cs.cmu.edu (Eric Thayer) writes: >In article <17504@kuhub.cc.ukans.edu> 2FHGKINGLY@kuhub.cc.ukans.edu writes: >>Has anyone had any problems with your NeXT not booting up off the hard >>drive? What happens is the machine locks up during the boot process. >>You fix the problem by booting off optical and copying the sdmach >>file from optical to the hd. This has happened to me once and others >>three more times. Any suggestions, comments? > >If you get the waiting for SCSI to become ready ............... I've seen >this before a couple of times. > >> >>Blake Hughes, undergrad, University of Kansas > >-- >Eric H. Thayer School of Computer Science, Carnegie Mellon >(412) 268-7679 5000 Forbes Ave, Pittsburgh, PA 15213 Well, I've got another piece of bad news. I was talking to my NeXT sales rep yesterday. He had just gotten his 40 MB accelerator drives. Guess what, they ARE Quantum drives and he didn't know which firmware these drives had. Sigh, I think Apple just stuck it to their old buddy Steve. Maybe someone from NeXT could check with Quantum to see whether they're ready for a HOLE bunch of returns! Roger Jagoda System Support Cornell University FQOJ@CORNELLA.CIT.CORNELL.EDU >From: feldman@umd5.umd.edu (Mark Feldman)
Date: Sun 10-Nov-1989 21:14:34 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? In article <17504@kuhub.cc.ukans.edu> 2FHGKINGLY@kuhub.cc.ukans.edu writes: ... >What happens is the machine locks up during the boot process. >You fix the problem by booting off optical and copying the sdmach >file from optical to the hd. ... >Blake Hughes, undergrad, University of Kansas I've had three systems corrupted the same way. It's a bug. Many others have suffered the same bug. As of yet, no one knows what is causing the boot file to become corrupted, let alone how to prevent it. Mark >From: anthony@cit-vax.Caltech.Edu (Lawrence Anthony)
Date: Sun 11-Nov-1989 02:14:49 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? In article <17504@kuhub.cc.ukans.edu> 2FHGKINGLY@kuhub.cc.ukans.edu writes: >Has anyone had any problems with your NeXT not booting up off the hard >drive? What happens is the machine locks up during the boot process. >You fix the problem by booting off optical and copying the sdmach >file from optical to the hd. This has happened to me once and others >three more times. Any suggestions, comments? This is a real problem and NeXT is aware of it. The current theory is that some user level process is opening /sdmach carelessly for write and "accidentally" dropping garbage. The symptoms seem to be that a block of zeroes is written at the beginning of the data segment and on the next reboot, the machine hangs after printing out the memory configuration. The fix listed above is currently the best workaround for the problem, so keep a distribution OD within your reach for a while. NeXT has a few sites running some tests hoping to isolate the faulty program. Once that is done, you should expect to see an updated version of the faulty bugger, possibly in the archives on cc.purdue.edu, possibly available via email, possibly available more directly from NeXT. gerrit >From: madler@tybalt.caltech.edu (Mark Adler)
Date: Sun 10-Nov-1989 19:36:49 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? Yep, I've seen just that happen a few times to some NeXT's here. It seems to be contagious since it happens to a few machines connected over ethernet at the same time (but not all of them?). It's never happened (fingers crossed) to my standalone NeXT (no net connection). I have no suggestions. Mark Adler >From: drapeau@choctaw.Stanford.EDU (George Drapeau)
Date: Sun 11-Nov-1989 18:50:18 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? In article <12609@cit-vax.Caltech.Edu> madler@tybalt.caltech.edu.UUCP (Mark Adler) writes: > >Yep, I've seen just that happen a few times to some NeXT's here. It seems >to be contagious since it happens to a few machines connected over ethernet >at the same time (but not all of them?). It's never happened (fingers crossed) >to my standalone NeXT (no net connection). I have no suggestions. > >Mark Adler Dare I ask... Virus???? >From: ali@polya.Stanford.EDU (Ali T. Ozer)
Date: Sun 11-Nov-1989 20:00:47 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? In article <5604@umd5.umd.edu> feldman@umd5.umd.edu (Mark Feldman) writes: >In article <17504@kuhub.cc.ukans.edu> 2FHGKINGLY@kuhub.cc.ukans.edu writes: >>What happens is the machine locks up during the boot process. >>You fix the problem by booting off optical and copying the sdmach >>file from optical to the hd. >I've had three systems corrupted the same way. It's a bug. Many others >have suffered the same bug. As of yet, no one knows what is causing the >boot file to become corrupted, let alone how to prevent it. Yes, this is a bug. NeXT is working on it. If your system freezes up during the boot process, after announcing the amount of memory and possibly the number of buffers used, then you might be bitten by this bug. You will need to boot from a 1.0 optical to fix things; please diff your /sdmach file against the good one from the OD; if they are different copy the one from the OD oveer the corrupted one. If you can duplicate the problem, please send me mail and I'll get it to the OS engineers. Ali >From: ed@uunet!dtgcube (Edward Jung)
Date: Sun 15-Nov-1989 18:36:04 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? Please be patient with me, as this is my first post to a net. I seem to have suffered this kernel corruption problem, with a nasty twist. Not only does the machine fail to boot from the hard drive (locking up shortly after checking the RAM), but it will not boot from the 1.0 System floptical. When I use the Mach boot-from-floptical command, the disk begins to load in, but after setting up several of the daemons, I receive the message that the window servers cannot be accessed/opened. The boot-up locks at this point, but I *am* able to power down using the power switch. Still, I cannot successfully boot up, and thus cannot fix the kernel corruption problem as per the suggested fix posted in the various articles about this problem. This floptical problem could be due to a variety of things which have no relation to the original kernel problem, and user error has not as yet been ruled out. However, has anyone had this happen to them? If so, I would greatly appreciate any input anyone has to offer. Thanks much for the time. With Thanks, Dwight Divine (div3@tank) NeXT System Administrator Usite, U of Chicago >From: GFX@PSUVM.BITNET
Date: Sun 15-Nov-1989 18:36:04 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? Please be patient with me, as this is my first post to a net. I seem to have suffered this kernel corruption problem, with a nasty twist. Not only does the machine fail to boot from the hard drive (locking up shortly after checking the RAM), but it will not boot from the 1.0 System floptical. When I use the Mach boot-from-floptical command, the disk begins to load in, but after setting up several of the daemons, I receive the message that the window servers cannot be accessed/opened. The boot-up locks at this point, but I *am* able to power down using the power switch. Still, I cannot successfully boot up, and thus cannot fix the kernel corruption problem as per the suggested fix posted in the various articles about this problem. This floptical problem could be due to a variety of things which have no relation to the original kernel problem, and user error has not as yet been ruled out. However, has anyone had this happen to them? If so, I would greatly appreciate any input anyone has to offer. Thanks much for the time. With Thanks, Dwight Divine (div3@tank) NeXT System Administrator Usite, U of Chicago >From: ali@polya.Stanford.EDU (Ali T. Ozer)
Date: Sun 16-Nov-1989 15:58:20 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? In article <12811@polya.Stanford.EDU> Ali T. Ozer writes: >In article <5604@umd5.umd.edu> feldman@umd5.umd.edu (Mark Feldman) writes: >>I've had three systems corrupted the same way. It's a bug. Many others >>have suffered the same bug. As of yet, no one knows what is causing the >>boot file to become corrupted, let alone how to prevent it. >Yes, this is a bug. NeXT is working on it. The bug has been discovered and there is a workaround, in fact, an incredibly simple one. Launch a Shell, become root, and remove the executable bit on your kernel: su [type password] chmod a-x /sdmach The problem occurs if you try to launch an executable in the Mach preload format; depending on how the pages our laid out in the file, a part of the file might become corrupted if paging occurs after the file is "launched." Mach preload executables are meant to be bootable images and are not meant to be executed by the demand-paged system; thus your system will not lose any functionality when you remove the executable bit. You will just be assuring that the kernel is not launched inadvertently (either from the Shell or with a double-click), which is probably what caused the problem in all cases. There are only two preload format files in the system, the kernel and the boot file. The boot file has been shipped without the executable bit so it's fine. Thanks to Alan Marcum and Avie for the explanation and workaround. Ali >From: feldman@umd5.umd.edu (Mark Feldman)
Date: Sun 16-Nov-1989 20:34:08 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? In article <12837@polya.Stanford.EDU> ali@Polya.Stanford.EDU (Ali T. Ozer) ... >The bug has been discovered and there is a workaround, in fact, an incredibly >simple one. Launch a Shell, become root, and remove the executable bit on >your kernel: > > su > [type password] > chmod a-x /sdmach Ok, I read it, I did it, but I'm not very happy about the implications. >The problem occurs if you try to launch an executable in the Mach preload >format; depending on how the pages our laid out in the file, a part of the >file might become corrupted if paging occurs after the file is "launched." The files /sdmach and /odmach (which are the same file) are owned by root and their permissions are 555 -- readable and executable by all, writable by none. How is it that the file can be written to when it is executed by a user other than root? >Mach preload executables are meant to be bootable images and are not meant >to be executed by the demand-paged system; thus your system will not lose >any functionality when you remove the executable bit. You will just be >assuring that the kernel is not launched inadvertently (either from the >Shell or with a double-click), which is probably what caused the >problem in all cases. The fact that it is possible to write to a file when you don't have permission is very bad. Very, very bad. And why would the system ever try to page back to a program file? Me thought that that is what a swap file was for. >There are only two preload format files in the system, the kernel and the boot >file. The boot file has been shipped without the executable bit so it's fine. Not fine. Getting an error back when trying to execute one of these files would be fine. Getting a core dump would be ok. Having the original, write protected file corrupted is not. >Thanks to Alan Marcum and Avie for the explanation and workaround. > >Ali > Ali, if the fix will keep my kernels from being corrupted, thanks! If it's one thing that I can't stand, it's a corrupted kernel. But what am I missing? Mark p.s. If someone has a NeXT and does not have USENET access, how will they find out about the fix? >From: ali@polya.Stanford.EDU (Ali T. Ozer)
Date: Sun 17-Nov-1989 02:12:52 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? In article <5631@umd5.umd.edu> feldman@umd5.umd.edu (Mark Feldman) writes: >In article <12837@polya.Stanford.EDU> I wrote: >>The bug has been discovered and there is a workaround ... >>The problem occurs if you try to launch an executable in the Mach preload >>format; depending on how the pages our laid out in the file, a part of the >>file might become corrupted if paging occurs after the file is "launched." > >The files /sdmach and /odmach (which are the same file) are owned by root >and their permissions are 555 -- readable and executable by all, writable by >none. How is it that the file can be written to when it is executed by a >user other than root? This is a bug, after all --- and bugs break rules. The bug will only occur if you try to execute a preload format file, and even then only under special circumstances, which the sdmach file exhibits. This bug will not occur when executing normal demand-paged executables or trying to execute other non-executable files. Again --- this is not a file system bug but rather a bug in the program loader trying to load a preload format file. sdmach is the only file in the system that will cause this bug to occur. >If someone has a NeXT and does not have USENET access, how will they find >out about the fix? NeXT is getting the news out to customers through various other channels. Ali >From: dcarpent@sjuphil.uucp (D. Carpenter)
Date: Sun 17-Nov-1989 13:03:40 From: Unknown Subject: Re: kernel corruption on 330Mb hd??? >>If someone has a NeXT and does not have USENET access, how will they find >>out about the fix? > >NeXT is getting the news out to customers through various other channels. > >Ali What other channels? Does NeXT have any regular means for communicating with its customers? All I know is what I read in this newsgroup or in the newspapers and trade press. Being a NeXT owner without at the same time being a NeXT support person can leave one feeling rather isolated.

These are the contents of the former NiCE NeXT User Group NeXTSTEP/OpenStep software archive, currently hosted by Marcel Waldvogel and Netfuture.ch.