Question

Locked

Why does Nagios puke all over my inbox?

By cypher.msix ·
So we (the IT group) setup a Nagios system running on Ubuntu. It monitors a bunch of our servers, and every so often we get a FLOOD of e-mails about servers going down, and immediately going right back up again.

The Director is questioning why this is being reported, and our support staff has no idea what is going on.

Since I am one of ... well, one people in the group who knows anything about linux at all, I've been asked to see what's what.

I don't know much about Nagios, but as a web and software developer, I am not afraid of it either.

So what gives? Has anyone run into something like this before? I know my question is vague, but so is my understanding of how Nagios actually works. I guess I'm just throwing this question out there hoping that it's something that comes up often... maybe some rookie mistake? :-)

This conversation is currently closed to new comments.

10 total posts (Page 1 of 1)  
| Thread display: Collapse - | Expand +

All Answers

Collapse -

So, the servers aren't really down?

by seanferd In reply to Why does Nagios puke all ...

And as to the flood - do you get a separate email for each server, or there are multiple copies, or what?

You may do better to look or ask here:
http://www.nagios.org/documentation
http://community.nagios.org/
http://forums.meulie.net/viewforum.php?f=60

Collapse -

Thanks

by cypher.msix In reply to So, the servers aren't re ...

No, the servers really aren't down and the flood is a separate e-mail for each server.

Then contents are always the same:

Notification Type: PROBLEM
Host: server-name
State: DOWN
Address: 10.0.0.xxx
Info: CRITICAL - Host Unreachable (10.0.0.xxx)

Date/Time: Thu Mar 11 15:56:33 PST 2010

Thanks for the URLs, I'll check them out.

Collapse -

Ah. Described that way,

by seanferd In reply to Thanks

I would agree with mamies.

Collapse -

Thanks,

by mamies In reply to Ah. Described that way,

Ive had the same issue before but it was a fault NIC in my Nagios box that was causing the problems.

Collapse -

Hey, you have the practical experience.

by seanferd In reply to Thanks,

For me, it would just be a guess, but a sort of network-obvious one, that the failure was with the Nagios box or cable or something.

Because if I can't see the servers, they're all down, right?

Collapse -

Something like that anyway :) (nt)

by mamies In reply to Hey, you have the practic ...
Collapse -

Basically

by mamies In reply to Why does Nagios puke all ...

How nagios works is that you have a service running on the machine/s that you want to monitor. This service then reports back to the nagios machine to give a status for it to display on its page or to email to you.

I would be checking the connectivity of the Nagios box as if it drops out it will think the servers have fallen over and send an email.

Collapse -

Thanks

by cypher.msix In reply to Basically

Thanks for the info. I had a feeling that's how it would work (since it can monitor drive space, memory usage and what-not).

Connectivity of the Nagios box... that makes sense to me and was brought up a few times as well. I just don't believe our support guys know how to figure that out. Time to go snooping.

Thanks for the info.

Collapse -

Is...

by mamies In reply to Thanks

Is Nagios installed on a server or is it on a small box. If its on a small box it might be worth trying to reinstall it onto another box. This way it helps to eliminate hardware issues. Also try plugging it into a differant port on the switch using a differant patch cable.

Thanks,

Back to Networks Forum
10 total posts (Page 1 of 1)  

Related Discussions

Related Forums