Networking

Logging and Monitoring Enterprise Applications


One thing that has always astounded me about many “enterprise class

About

Justin James is the Lead Architect for Conigent.

21 comments
georgeou
georgeou

I use a centralized SYSLOG server that dumps in to a SQL database which I can write fast queries for so I kind of like SYSLOGS. You can even centralize the Windows event viewer with various tools including Microsoft MOM. What scares me are the proprietary logging methods that are outside of the norm. Those are really hard to scale.

felixhh
felixhh

If your IpApps/Devices are Buisnesscritical, Nagios would be the right choice. For shure, its alot of work, but once running, its worth it, even if you need Technicans for help on setup.

stress junkie
stress junkie

I remember when I wrote simple software applications. About sixty percent of my code was devoted to formatting input and output and checking data that was entered by the operator. Logging is much like that. I suppose that all of the debugging code that is often removed before a final product release is made could be left in and serve as the logging facility that you described. Otherwise I think that program managers would see the kind of logging that you describe as just more work to do. Given that commercial software products are often given an arbitrary release date and all development must be done by that date I think that adding functions like detailed logging to most commercial applications is not likely to be added to the feature list of these applications. It looks like we will have to remain content with contriving a kind of health monitoring system by watching the messages of facilities that interact with large applications. We can watch security logs and CPU and disk I/O and network bandwidth usage, then interpret these data to infer the health of the applications.

Justin James
Justin James

Do applications need better logging and monitoring facilities, or am I off my rocker?

Justin James
Justin James

... that the application supports logging to syslog. Many of them do not. Indeed, just dumping to a text file seems to be the norm. J.Ja

Justin James
Justin James

I am curious, how does it monitor applications? J.Ja

Justin James
Justin James

I think that you are right, that logging and monitoring is up there with "good documentation", "quality installation tools" and "access for disabled users" in that category of "there is no ROI" or "none of our competitors are doing it so we do not need to." It will never be a "killer feature" except to a niche market. But it would be great if an app or two in each major space raised the bar a touch. J.Ja

nelson.oles
nelson.oles

.......... that no one mentioned NetIQ. It is quite customizable, setup is easy and will monitor anything you want to monitor with low system/network overhead. Just my 2?.

karsten.breivik
karsten.breivik

Good points, Justin. Log4J does a lot, but still it is important that the developers pay attention to this. For non-java programs - especially C-programs I find the logging often du be lacking often. However, NMS systems like Nagios or even Unicenter are generally not very well suited to read logs. They can generally tail a log file with a regexp and look for words like "critical", but that is generally not good enough. For .Net and J2EE there has been real progress the latest years, since most NMS vendors these days provide components that can do rather un-intrusive introspections to JVM's and look for runtime errors. Both CA and Mercury have tools for this. I am sure the open source will be getting there soon also. As for log files, I found a small perl program called Simple Event Correlator, or SEC by Risto Vaarandi which acturally goes a long way in correlating events in the log files. and helps getting your NMS look for the right things. We use Mercury Business Availability Center. It is a very nice tool for apps like Siebel, SAP, web. You can even write custom code in Java, C or VB, but comes at a steep price... I believe a saw an IBM initiative that did some of this as open source, But I did not investigate it that deeply.

FBuchan
FBuchan

As the architect of a large enterprise application that logs everything from diagnostic hints, access and rejected access, record-level changes, and more...I have to say that the real barrier isn't overhead, or complexity, but perception of value. Almost none of our clients understand the audit functionality has value, and in a 5 year period we had so many want to turn it off, I had to have my team add a switch to do just that. The average system admin we deal with assumes there is serious overhead, even though our evidence shows it is minimal -- except in terms of storage over time, though the audit elements can be archived. My point is that I would bet that most of us who try to impart audit features end up with the same basic problem -- no one perceives the true value, and so they don't use it, so those features become orphaned over time. Very sad...and I do entirely agree with your contention about granularity, etc. It's just very sad it's so hard to get support for that feature-set.

debuggist
debuggist

You can set Nagios to check a URL and search for a keyword in the response. Nagios checks at regular intervals and can notify you by email after one or more failures (also configurable).

saihib
saihib

Do you really want to introduce that kind of overhead to your network?

Justin James
Justin James

You are right, many NMS systems really cannot do much more than tail a filoe and apply a regex to them. Even many Java programs have nearly useless "logging"... dumping a Java exception to a file is only helpful to the programmer who wrote it. That is why the onus is really on the programmer. Nearly every error should be trapped, handled correctly, and logged. You really should never see exceptions floating in the flotjam, IMHO. J.Ja

Justin James
Justin James

For some reason, people seem to think that logging causes a lot of overhead, which it really doesn't except for storage. And as you say, that storage can be easily and cheaply archived, especially in a compressed format and stored "near line" on the chance that auditing is required. J.Ja

Justin James
Justin James

It is nice to hear from someone who has used it for more than just SNMP collection and basic Web app pinging. Now I know it will do what I would like, and feel confident in recommending it at work. They use Netsaint now, the precursor to Nagios, and I am fairly certain that they are planning to uograde anyways... J.Ja

grephead
grephead

I used to monitor everything in a medium biz environment with Nagios/Syslog. This included Windows/AD, AS400, and Linux operating systems, Oracle apps/db, Open source apps, JD Edwards World ERP, Multiple websites/ecommerce, infrastructure (switches, firewalls, IDS, routers, VPN), barcoding in wharehouse, and data warehouse/ETL/reporting systems. Note - your mileage may vary. I have Linux/Unix experience and I'm a DBA. Shell scripting is very helpful to do this setup. I used Nagios as primary and Syslog-ng as secondary on a Linux server. Nagios performed checks on clients via passive checks (no client installs), SNMP, and active checks (nagios client software installation). I checked URL responses, any kind of TCP check, system performance and charted history with nagiosstat. I wrote very simple plugins for things like Pix if traffic, HTTPS reverse proxy, Oracle logs, barcode gun checks, etc. I used nt-syslog client for the Windows machines and sent it to a linux syslog-ng collection point. For the syslog-ng I used swatch for real time monitoring. This worked very well - the windows event log errors/warnings were put into syslog-ng. I knew immediately if ETL jobs failed, logon failure attempts, Exchange problems, disk errors, IIS errors, anything that ran on windows usually puts in event log entries. The syslog also collected cisco/network device info. I had the clear text nagios/syslog traffic flowing over an admin VLAN. Not the best security but it would be very difficult for users to sniff it unless they could get on the VLAN. You can send the syslog from windows over ssh - I never went that far. The biggest pain was the legacy AS400. It had no native syslog capability that I could find, the snmp randomly bombed, the guys running it had little to no understanding of TCP/IP and the internet, and the JD Edwards World ERP was written in the early 1990s. Anyone who has used JDE World can feel my pain here, the core ERP was written in the pre internet era and wasn't very friendly for modern IT environments. I can't wait for Oracle to replace JDE with SOA/Fusion... The OS/400 SNMP gave disk, memory, cpu, etc. information.

felixhh
felixhh

For shure many applications are not http. You can check for services on the machine or open ports, disk space, availabilty or whatever. You can perform scrips that are included or write your own, depending on what you want to check or how the server/service is nessesary. Most Companys are very satisfied with this product because its free, very customizeable, it works proaktiv and the integration into any alerting process is given. I've you interested , may take a look on : http://nagios.sourceforge.net/docs/2_0/toc.html Its an alive product with a large community, unfortunatly my linux knowledge is not as good as it should be...working on it :) regards, felix

Justin James
Justin James

Not all "applications", and certainly not all enterprise class applications are Web-based apps! Indeed, very few of them are. J.Ja

Justin James
Justin James

... the data throughput for logging and SNMP is pretty tiny. J.Ja

Justin James
Justin James

... the 5% - 10% of traffic that is DNS. J.Ja

stress junkie
stress junkie

:D Actually if you ever watch network packets you may be amazed at the bandwidth that is being eaten up with ARP messages. Given that, a few SNMP messages will not be noticed.

Editor's Picks