[Discuss] The system that won't die

Alan W. Irwin irwin at beluga.phys.uvic.ca
Wed Apr 4 18:23:34 PDT 2007


On 2007-04-04 14:59-0700 Peter Scott wrote:

> At 01:13 PM 4/4/2007, Steven Kurylo wrote:
>>> merely send out the initial "system going down..." message (where
>>> they're supposed to) and then return without anything else happening.
>> 
>> What does the log say?  For instance on my RH machine
>> /var/log/messages says things like:
>> 
>> init: Switching to runlevel: 0
>> nfs: nfsd shutdown succeeded
>> snmpd: snmpd shutdown succeeded
>> xinetd[1188]: Exiting...
>> 
>> And so on for every service on the machine.  I'm wondering if you have
>> a service which is kernel related (misbehaving module?) and so its
>> hanging the shutdown process.
>
> Good question, I forgot to say I'd looked there.  /var/log/messages:
>
> Apr  4 12:01:49 tweety shutdown[30643]: shutting down for system reboot
> Apr  4 12:02:20 tweety shutdown[30649]: shutting down for system reboot
> Apr  4 12:03:32 tweety shutdown[30666]: shutting down for system halt
> Apr  4 12:03:45 tweety shutdown[30670]: shutting down for system halt
> Apr  4 12:06:27 tweety shutdown[30698]: shutting down for system halt
> Apr  4 12:07:30 tweety shutdown[30738]: shutting down for system halt
>
> i.e., just the same message that goes out via wall every time I try, nothing 
> else.  Is there somewhere else I should look?  This is so weird.

It appears your shutdown process is starting okay each time you try it which
means init is still responding.  Under these circumstances I think the idea
of a misbehaving module is a good working hypothesis.  You can get those by
buggy modules or more likely an update of a currently running kernel that
makes a currently running module inconsistent with that new kernel. (When
such updates happen for Debian stable they warn you to reboot ASAP because
of the potential for such trouble.)

As in any debugging exercise I would try to simplify the problem as much as
possible.

Did you start X with the venerable startx method?  If so, I would get out of
that and down to the console before you try anything else (or use the
appropriate run-level to do the same thing.) Then I would look at each
process running (with ps auxww) and kill off any extraneous ones that you
don't need.  One of those kills (if it concerns the misbehaving module)
might make the system shutdown go smoothly after the kill so keep good notes
of exactly what you try. You may also want to look at modules (using lsmod)
that have been loaded. Some of those may be associated with hardware (i.e.,
your NIC) that you don't really need to keep diagnosing/fixing the problem
so you may want to try modprobe -r to get rid of modules (as a last resort
since often it is hard to tell whether a module is truly needed or not).

In sum, I suggest you try and simplify as much as possible killing processes
and removing modules (checking whether shutdown works each time) until you
are really down to the bare basics.

>
>
> Yeah, I could hit the reset button, but I'm conditioned against that.  That 
> sort of thing is for Windoze boxes.
>

Excellent point.  You may learn something interesting (or we will!) by your
attempts to diagnose this.  However, at some point you will want to give in
and hit the reset button because (a) there may be no way out from a
misbehaving module except to reset (but oh, the shame!  :-) ) or (b) you
killed an essential process or deleted an essential module or (c) bad
hardware might be the explanation.

Alan
__________________________
Alan W. Irwin

Astronomical research affiliation with Department of Physics and Astronomy,
University of Victoria (astrowww.phys.uvic.ca).

Programming affiliations with the FreeEOS equation-of-state implementation
for stellar interiors (freeeos.sf.net); PLplot scientific plotting software
package (plplot.org); the Yorick front-end to PLplot (yplot.sf.net); the
Loads of Linux Links project (loll.sf.net); and the Linux Brochure Project
(lbproject.sf.net).
__________________________

Linux-powered Science
__________________________


More information about the Discuss mailing list