Saturday, January 29, 2011

How do I figure out why my (Ubuntu 8.04) server is thrashing and becoming unresponsive?

I have a server running Ubuntu 8.04. It's a web server (mysql/php/apache2) and mail server (dovecot/postfix/spam assassin).

Normally, memory usage is nominal and everything runs smoothly. Every once in a while though, memory usage jumps through the roof and it starts thrashing and then becomes completely unresponsive, requiring a hard reboot.

The question is, how do I diagnose the problem? It doesn't seem to happen on any consistent schedule, and isn't predictable. Is there something I can set-up to catch the culprit?

Here's part of the log from about when the problem occured:

Dec  5 07:58:28 mail kernel: [587023.374916] lowmem_reserve[]: 0 0 0 0
Dec  5 07:58:28 mail kernel: [587023.374919] DMA: 3*4kB 3*8kB 3*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1076kB
Dec  5 07:58:28 mail kernel: [587023.374926] DMA32: 88*4kB 30*8kB 10*16kB 0*32kB 4*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 2032kB
Dec  5 07:58:28 mail kernel: [587023.374934] Swap cache: add 5959244, delete 5959242, find 4577237/5361570, race 31+2723
Dec  5 07:58:28 mail kernel: [587023.374936] Free swap  = 0kB
Dec  5 07:58:28 mail kernel: [587023.374937] Total swap = 524280kB
Dec  5 07:58:28 mail kernel: [587023.374939] Free swap:            0kB
Dec  5 07:58:28 mail kernel: [587023.377091] 67584 pages of RAM
Dec  5 07:58:28 mail kernel: [587023.377096] 2652 reserved pages
Dec  5 07:58:28 mail kernel: [587023.377098] 5432 pages shared
Dec  5 07:58:28 mail kernel: [587023.377099] 2 pages swap cached
Dec  5 07:58:28 mail kernel: [587078.150437] pickup invoked oom-killer: gfp_mask=0x1201d2, order=0, oomkilladj=0
Dec  5 07:58:28 mail kernel: [587078.150450] Pid: 4649, comm: pickup Not tainted 2.6.24-22-xen #1
Dec  5 07:58:28 mail kernel: [587078.150453] 
Dec  5 07:58:28 mail kernel: [587078.150453] Call Trace:
Dec  5 07:58:28 mail kernel: [587078.150473]  [] oom_kill_process+0xf6/0x110
Dec  5 07:58:28 mail kernel: [587078.150478]  [] out_of_memory+0x19e/0x1e0
Dec  5 07:58:28 mail kernel: [587078.150483]  [] __alloc_pages+0x389/0x3c0
Dec  5 07:58:28 mail kernel: [587078.150490]  [] __do_page_cache_readahead+0x104/0x260
Dec  5 07:58:28 mail kernel: [587078.150495]  [] filemap_fault+0x2de/0x3e0
Dec  5 07:58:28 mail kernel: [587078.150500]  [] __do_fault+0x6a/0x5d0
Dec  5 07:58:28 mail kernel: [587078.150504]  [] handle_mm_fault+0x1d1/0xd60
Dec  5 07:58:28 mail kernel: [587078.150508]  [] do_sync_write+0xd9/0x120
Dec  5 07:58:28 mail kernel: [587078.150515]  [] do_page_fault+0x1f3/0x11e0
Dec  5 07:58:28 mail kernel: [587078.150530]  [] :ext3:free_rb_tree_fname+0x4c/0xb0
Dec  5 07:58:28 mail kernel: [587078.150535]  [] vfs_write+0x14e/0x190
Dec  5 07:58:28 mail kernel: [587078.150539]  [] error_exit+0x0/0x79

"mail" is the name of the server.

  • Troubleshooting memory usage

    Take a look a this page

    Nick : Ok, I found an entry like the one in the article. The log doesn't say which process is causing it though?
    From nrgyz
  • When you notice it is thrashing take a look at the output from

    ps aux
    

    and

    top
    

    If you want a "pretty" version of top you can install "htop" and use that instead.

    Your log files should also give you a clue. As the root user check out the logs in /var/log.

    I would highly recommend you install logwatch and have it email you log reports periodically so you can review logs that way too.

    J.Zimmerman : Also, if you are able to I would recommend running two different servers for the services you described. One for web services and one for mail services.
    Nick : Once it starts thrashing I can't connect to it any more. The server's physically in another state, so no direct access. Is there a way to setup top to log the current state every 10 minutes or so? That way I can see it's history once i reboot. I'll checkout logwatch. With the logs, I'm not sure exactly what I should be looking for? There's lots of chatter.
    J.Zimmerman : You can add commands to the crontab of the root user (man crontab). 'sudo su - ' from an admin account on ubuntu will get you logged on as root. Once you are the root user you can then type 'crontab -e' and put commands to run on a schedule (see 'man crontab' for details). When a command is run via cron the output is sent to the user it is running as. If you edit /etc/aliases (man aliases) you can add your email address to the root account. Then all cron output would be sent to your email address.
    pauska : Set up remote syslogging.
  • I've had some problems with slow MySQL queries making the whole server unresponsive. It might help to enable log_slow_queries and perhaps look at the size of your buffers and cache.

    Also, if you spawn too much Apache 2 processes it can hog up a lot of memory as well.

    If you're running clamav for virus scanning e-mail and a LOT of mail comes in it can eat up quite some resources as well.

    From Htbaa

0 comments:

Post a Comment