How to kill a process

Question

How to kill a process

Hi all,

I need to kill a very peculiar running process. The kill command doesn't work. I've tried:

kill -KILL <pid>
kill -9 <pid>
sudo kill -KILL <pid>
sudo kill -9 <pid>

and none of those worked!
every time I do a

ps -p <pid>

I see the process listed as:

PID TT STAT TIME COMMAND
542 p1 SX 0:00.01 ./att2

This process is peculiar because its parent is init (pid=1) and it is being debuged by init.
Any suggestions how I could kill/remove this process?

Thanks,

Antonello

Mac OS X (10.3.9)

Posted on Sep 16, 2006 9:26 AM

Reply

Answer 1

Sep 16, 2006 7:36 PM in response to Antonello Cruz

Hi Antonello,
That's unusual; I had to look that one up. The state of this process is that it is asleep (S) and that the process is being traced or debugged. (X) Thus, execution of it's main thread may have been halted at a break point. I can thus see why the process can't handle signals but I was under the impression that that wasn't required for a SIGKILL to work.

The only processes that I've seen resist a SIGKILL are zombied ones. I was told that this occurs when a parent process dies without cleaning up and so I imagine that the system has lost the path by which it communicate with such a process. (possibly an analog of unlinking a file) Maybe the debugger died unnaturally and orphaned the process it was debugging. I can't explain why the state of the process wasn't changed but if it's immune to a SIGKILL, it shouldn't be doing anything and probably isn't worth rebooting, which is the only way to get rid of a zombie.
--
Gary
~~~~
C makes it easy for you to shoot yourself in the foot.
C++ makes that harder, but when you do, it blows
away your whole leg.
-- Bjarne Stroustrup

Reply

Answer 2

Sep 17, 2006 7:42 AM in response to Antonello Cruz

Antonello,
who started this particular process?

Generally speaking, zombi processes cannot be killed. If this is a zombi process, you may need to restart your machine.

Mihalis.

Reply

Answer 3

Sep 17, 2006 8:44 AM in response to Gary Kerbaugh

Hi Gary,

Thanks for taking the time to look it up and reply. The process in question was my solution for a class assignment. We should find a way to prevent an attacker from attaching to our running process through the ptrace system call. BTW, ptrace is what gdb uses to debug programs. The strategy I used works if the attacker doesn't have root privileges, but it seems to have the side effect of "demonizing" the process on Darwin... It worked without problems on Linux which was what the assignment required. Of course, I used my PowerBook G3 running OS X 10.3.9 to demonstrate my strategy in class. Hence I found that after sending a SIGKILL to my process, it would "stop running" but would still be "there".
My strategy relied on the fact that a process can be ptraced by only one other process at a time; The init process cannot be ptraced; orphaned processes are adopted by init; and a process can request to its parent to ptrace it.
I am posting the code here in case anyone wants to play with it.
<pre>
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <sys/ptrace.h>
#include <unistd.h>
#include <errno.h>
#include <signal.h>

int main(void)
{
pid_t p;

printf("Starting parent...\n");
if ((p = fork()) == -1) /* an error has occurred */
{
perror(NULL);
exit(0);
}
if (p > 0) /* This is the parent process */
{
exit(0);
}
if (p == 0) /* This is the child */
{
while(getppid() != 1); /* spinlock until parent exits */
printf("This is the child running...\n");
printf("Calling ptrace(PTRACE_TRACEME). "
"The parent is not expecting it\n");
/* ptrace failed */
if (ptrace(PT TRACEME, NULL, NULL, NULL) == -1)
{
perror("TRACEME failed");
fprintf(stderr, "exiting\n");
exit(1);
}
else
{
printf("TRACEME worked!\n");
printf("parent prosses is %d\n", getppid());
printf("this process is %d\n", getpid());

int i = 0;

while(1)
{
printf("count %d: ppid(%d), pid(%d)\n",
i++, getppid(), getpid());
sleep(1);
}
}
}
return 0;
}
</pre>

The kill -KILL <pid> makes the process stop its output. But doesn't really get rid of the process.

Here is the code that attempts to attach to a process:
<pre>
#include <stdio.h>
#include <stdlib.h>
#include <sys/ptrace.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <unistd.h>

int main(int argc, char *argv[])
{
int pid;
int status;
if (argc < 2)
{
printf("usage: %s <pid>\n", argv[0]);
exit(1);
}

if ((pid=atoi(argv[1])) <= 0)
{
printf("Invalid pid number: %s\n", argv[1]);
exit(1);
}

printf("attach pid: %d\n", getpid());
if (ptrace(PT_ATTACH, pid, NULL, NULL) == -1)
{
perror("\n\n **** Attach failed");
}
else
{
printf("Waiting for the signal that we've attached\n");
waitpid(pid,&status,0);
printf("Attached to pid [%d], "
"hit enter to continue...\n", pid);
getchar();

if (ptrace(PT_DETACH, pid, NULL, NULL) == -1)
{
perror("\n\n **** Detach failed");
}
}
return 0;
}
</pre>

The man pages for ptrace states that a process should not call ptrace(PT TRACEME) if the parent is not expecting it. However, it didn't have any side effects on Linux.

Anyway, I would expect that sending a SIGKILL to a precess would terminate it, unless you don't have privileges over the process. I guess that's not the case on Darwin. Somehow the SIGKILL is being traped alogn the way even tough the man page for signal states that SIGKILL and SIGSTOP cannot be caught, ignored or generate an interrupt. Or something else is going on with that process.

I just wonder what are the implications of not being able to kill a process which you have ownership.

Antonello

PS: I just tested and if the call ptrace(PT TRACEME) is removed the process can be killed.

Reply

Answer 4

Sep 17, 2006 8:58 AM in response to Mihalis Tsoukalos

Hi Mihalis,

I did! You can see the details in my reply to Gary.

Thanks,

Antonello

PS: Just wondering, what makes a process to become a zombie? In this case, the process doesn't get killed because it is been ptraced by init.

Reply

Answer 5

Sep 17, 2006 10:41 AM in response to Antonello Cruz

Hi Antonello,
I'm over my head here; I don't know the details of Darwin process management. However as I suggested, it is my feeling that the problem is one of communication. I'm guessing that a zombied process doesn't ignore the SIGKILL; it simply doesn't get the signal. Fortunately there's never been a report here of a zombied process even using a single CPU cycle.
--
Gary
~~~~
It is impossible for an optimist to be pleasantly surprised.

Reply

Answer 6

Sep 18, 2006 3:54 PM in response to Antonello Cruz

I'm wondering is the process is hung waiting for the parent process to trace it as the manpage for ptrace suggests. But the original parent has exited. And init doesn't know to trace the process. So the process has stopped.

from ptrace(2):

PT TRACEME

"(If the parent process does not expect to trace the child, it will
probably be rather confused by the results; once the traced
process stops, it cannot be made to continue except via
ptrace().) "

I would think that you'd be able to trace the process if you're logged in as root. Have you tried attaching gdb or ktrace to this process as root?

does

ps -opid,state,flags,ktrace,xstat,command -p <pid>

return anything interesting.

if the process is sleeping in the kernel. you're not going to be able to kill it.
as signals are only delivered when the process exits the kernel.

Just my 2 cents.

Andy

Reply

Answer 7

Sep 18, 2006 4:56 PM in response to Nils C. Anderson

Hi Andy,

If I try to attach to the process using gdb (as root or not) I get the same error I'd get if I try to attach using ptrace with the second program I posted. I belive gdb uses ptrace to attach. The message gdb gives is:
<pre>
Unable to attach to process-id 542: Device busy (16)
</pre>
Recall that this process was desined to prevent another process to attach to it using a ptrace call!
<pre>
ps -opid,state,flags,ktrace,xstat,command -p 542
</pre>
returns
<pre>
PID STAT F KTRACE XSTAT COMMAND
542 SX 1806 0 9 ./att2
</pre>

When a process calls ptrace(PT_ATTACH, pid) it sends a SIGSTOP to process pid. However, if a process calls ptrace(PT TRACEME) it doesn't send itself a SIGSTOP! What I believe happens if the parent is not expecting to trace the child, and this is undocumented, is that the child keeps running until it receives a signal telling it otherwise.
This undocumented behavior is supported by what I saw happening to the process. You can compile the two codes in the post above and try yourself. Just keep in mind that you will probably have to reboot to get rid of the 'zombied' process.

The process is definitely not running since nothing is being printed out and the only thing it does is increment a counter and print it out. It is possible the process is sleeping in the kernel, I just have no idea of what could wake it. I could speculate that the process was killed, but the process descriptor has not been cleaned up because the process was being traced. Tracing process would gave to issue a ptrace(PT_DETTACH, pid) to release the process descriptor so that the OS can clean it up.

Thanks for your two cents 🙂

Antonello

Reply

Answer 8

Sep 19, 2006 8:57 PM in response to Antonello Cruz

A process becomes a zombie when it dies but its parent process hasn't read its exit status. The exit status is stored in the process' control block, which is part of the process; and the parent is supposed to be able to read the exit status; so when the process exits, the OS can't discard its control block until the status has been read. Zombies don't consume CPU cycles because everything but their control block has been discarded; but control blocks still use up space in the kernel, so it's bad practice to create zombies.

Processes can prevent themselves from becoming zombies by executing a system call to "divorce" themselves from their parent -- I think the call is setsid, but the procedure has diverged on different variants of UNIX, so I'm not 100% sure.
Parents can prevent their children from becoming zombies by handling the SIGCLD signal and calling one of the wait system call variants to "reap" the child's exit status.

Reply

Answer 9

Sep 20, 2006 12:26 AM in response to Karl Zimmerman

Hi Karl,
Since a time not long after I joined this list, when I first learned of zombied processes, I've wanted a cogent explanation of this. Thank you very much.

Since you're on a roll, maybe you could shed light on an expression that I don't think I grok. I think it was Andy that referred once to a process "sleeping in the kernel". It's the "in the kernel" part that I'm fuzzy on. Consequently, I don't even know if that's exactly what he said but if I've gotten close to something you can clarify, please do. I apologize for the fuzzy question and to the thread for suggesting a diversion.
--
Gary
~~~~
The best book on programming for the layman is "Alice
in Wonderland"; but that's because it's the best book
on anything for the layman.

Reply

Answer 10

mkfs

Level 2

195 points

Sep 20, 2006 2:13 PM in response to Antonello Cruz

You can't ptrace to a process that has already been attached to via ptrace(); this is why your solution works in the first place. In order to not break things, you need to use the parent process to attach to the child, then have it wait on the child -- this prevents another process from attaching to the child.

The implementaions of ptrace and signal handling are entirely different in Darwin and Linux; in fact Darwin provides the PT DENYATTACH ptrace option that will do what you are intending.

You can review the ptrace and signal implementations here:
Darwin
http://fxr.watson.org/fxr/source/bsd/kern/mach_process.c?v=xnu-792.6.70
http://fxr.watson.org/fxr/source/bsd/kern/kern_sig.c?v=xnu-792.6.70

Linux
http://fxr.watson.org/fxr/source/kernel/ptrace.c?v=linux-2.6.11.8
http://fxr.watson.org/fxr/source/arch/i386/kernel/ptrace.c?v=linux-2.6.11.8
http://fxr.watson.org/fxr/source/kernel/signal.c?v=linux-2.6.11.8

The PT_TRACEME implementations are similar enough:

Darwin
154 if (uap->req == PT TRACEME) {
155 SET(p->p_flag, P_TRACED);
156 /* Non-attached case, our tracer is our parent. */
157 t->p_oppid = t->p pptr->ppid;
158 return(0);
159 }

Linux
362 if (request == PTRACE_TRACEME) {
363 /* are we already being traced? */
364 if (current->ptrace & PT_PTRACED)
365 goto out;
366 ret = security_ptrace(current->parent, current);
367 if (ret)
368 goto out;
369 /* set the ptrace bit in the process flags. */
370 current->ptrace |= PT_PTRACED;
371 ret = 0;
372 goto out;
373 }

...with darwin's setting of oppid ('/* Save parent pid during ptrace. XXX */') being the only difference of any substance.

You'll have to read through the rather complex signal handling code to find why Darwin doesn't send a KILL signal but Linux does.

It looks like signal wakeup() in Linux sys/kernel/signal.c is intended to handle exactly this case, while issignal() in Darwin sys/bsd/kern/kern_sig.c stops the rocess regardless of whether the signal is a KILL or not (though I may have missed something in psignal_lock() that handles a ptraced KILL).

As for how to kill it, well, KILL (and HUP, STOP, ILL and all the others) are signals, and signals are not making it to your process. You need to modify the process task structure directly to unset the ptrace flag, which means using a kernel debugger or a kernel module. Or you could just reboot 😉

Actually there is one trick you could try. The way ptrace usually works is that when the target receives a signal, the target is stopped (via SIGSTOP) and control is turned over to the tracer, giving it a chance to intercept the signal. The tracer the starts the target using SIGCONT, and the target processes the signal.

There are two exceptions to this: SIGSTOP and SIGCONT are exempt from being caught, because they are used to manage the target process and must always get through. You can try sending a SIGKILL followed by a SIGCONT, in the hopes that the SIGKILL will be in the target's signal queue and the SIGCONT will make the target active (whereupon it will examine its signal queue). Not sure if it will work though; SIGKILL might never make it into the process signal queue.

Reply

Answer 11

mkfs

Level 2

195 points

Sep 20, 2006 2:18 PM in response to Gary Kerbaugh

I believe he is referring to the process state, which is stored in the process or task structure in the kernel.

Darwin has the following process states:
150 #define SIDL 1 /* Process being created by fork. */
151 #define SRUN 2 /* Currently runnable. */
152 #define SSLEEP 3 /* Sleeping on an address. */
153 #define SSTOP 4 /* Process debugging or suspension. */
154 #define SZOMB 5 /* Awaiting collection by parent. */

(from http://fxr.watson.org/fxr/source/bsd/sys/proc.h?v=xnu-792.6.70#L149)

Sorry for the cut-n-paste reply, but I was in the area from my other post 🙂

Reply

Answer 12

Sep 20, 2006 3:41 PM in response to mkfs

Hi mkfs,

PT DENYATTACH is neat! It is not documented in the man pages however. One needs to look in the source code! Too bad I didn't know this before, it would be interesting after presenting my solution to say "of course if you're using a BSD compatible OS you can just issue a ptrace(PT DENYATTACH)!"
What is really weird is when I try to attach to it, the attaching program seg faults instead of returning an error.
Would you know if seg faulting is the correct behavior for a program trying to attach to a process that called ptrace(PT DENYATTACH)? I could not figure it out from the source...

Thanks,

Antonello

PS: a SIGSTOP followed by a SIGCONT did not cleaned up the process. I.e. after those signals, the process still showed in ps.

Reply

Answer 13

mkfs

Level 2

195 points

Sep 20, 2006 6:00 PM in response to Antonello Cruz

Heya Antonello,

Actually PTRACE DENYATTACH is in the ptrace(2) man page:

PT DENYATTACH
This request is the other operation used by the traced
process; it allows a process that is not currently being
traced to deny future traces by its parent. All other
arguments are ignored. If the process is currently being
traced, it will exit with the exit status of ENOTSUP; oth-
erwise, it sets a flag that denies future traces. An
attempt by the parent to trace a process which has set this
flag will result in a segmentation violation in the parent.

The man page is dated Novermber 7, 1994 which cannot possibly be accurate 🙂 Still, check the man pages on an OS X box. I am using 10.4.7 on both of my OS X boxes, by the way.

Seg faulting: According to the man page, it is perfectly normal (though still a stupid response, they should have added an error for this).

Well, if you are up for a little hackery, there is an article out there called 'Abusing Mach on OS X' which details how to use task forpid():

http://www.milw0rm.org/papers/67

There is some work being done on disabling/fixing PT DENYATTACH, which you might be able to use for your own purposes:

http://landonf.bikemonkey.org/code/macosx/TigerPT_DENYATTACH.20051121020514.50199.zadder.html
http://landonf.bikemonkey.org/code/macosx/ptracedenyattach.20041010201303.11809.mojo.html
http://steike.com/HowToDebugITunesWithGdb

These obviously aren't directly related to what you need, but you could modify the source of one of the kexts provided to unset the ptrace flag of the task you are attaching to. If you implement this as an unused ptrace command number, you can safely wrap ptrace on your local machine and 'un-attach' ptraced tasks at will.

Hmm, come to think of it, I could use something like that myself.

Reply

Answer 14

Sep 20, 2006 9:17 PM in response to mkfs

Oooh, thanks, mkfs! Yes, he was certainly referring to the process state SSLEEP. "Sleeping in the kernel" means the process made a system call which encounted some condition that meant it couldn't return in a timely manner; typically it needs data from some device. The kernel arranges for the hardware to deliver the data, which will take some time; rather than hang the whole machine, the kernel marks the process as "sleeping" (SSLEEP), then finds another runnable (SRUN) process to execute. Eventually, some event (like a hardware interrupt) causes the kernel to decide that the sleeping process is now runnable; the kernel changes its state to SRUN, and the next time the kernel needs to find a process to execute, the process that slept will get its chance.

(Note the "SZOMB 5 /* Awaiting collection by parent. */ 🙂

Powerbook G4 1GHz Mac OS X (10.3.9)

Reply

Answer 15

Sep 22, 2006 5:37 PM in response to mkfs

A bit late, but what i was thinking about when I mention uninterruptible sleep was along the lines of the PZERO priority.

see ./bsd/kern/kern_synch.c, in the xnu-792.6.76 source, there a comment the sheds a little light.

123 /*
124 * Give up the processor till a wakeup occurs
125 * on chan, at which time the process
126 * enters the scheduling queue at priority pri.
127 * The most important effect of pri is that when
128 * pri<=PZERO a signal cannot disturb the sleep;
129 * if pri>PZERO signals will be processed.
130 * If pri&PCATCH is set, signals will cause sleep
131 * to return 1, rather than longjmp.
132 * Callers of this routine must be prepared for
133 * premature return, and check that the reason for
134 * sleeping has gone away.
135 *
136 * if msleep was the entry point, than we have a mutex to deal with
137 *
138 * The mutex is unlocked before the caller is blocked, and
139 * relocked before msleep returns unless the priority includes the PDROP
140 * flag... if PDROP is specified, _sleep returns with the mutex unlocked
141 * regardless of whether it actually blocked or not.
142 */
143
144 static int
145 _sleep(
146 caddr_t chan,
147 int pri,
148 const char *wmsg,
149 u int64t abstime,
150 int (*continuation)(int),
151 lck mtxt *mtx)

Andy

Reply