I've been doing some more reading about network event handling and found some interesting articles and a few facts that I'd like to share. I have my own ideas about handling network events, but am open to learning what other people do and why they prefer their approach. It helps me learn new approaches or to validate the approach that I use. Sometimes I run into weird approaches to things, but that allows me to think about alternatives and potentially identifies a variation to an approach that I use.
I prefer using syslog over snmp traps and just learned an interesting tidbit from a report Cisco did for a customer in which they quoted statistics for the number of traps versus the number of syslog messages. A 6500 has about 90 traps that it can send. But it has about 6000 syslog messages. Wow, that's more than 60 times more messages via syslog than via traps. That puts some facts on my impression that syslog is a much richer source of network events than traps.
I don't mind snmp traps. In fact, I think of both syslog and traps as asynchronous network events. Each has a different format. I prefer syslog because of its simplicity and that I can read the information without having to decode an OID. But since both function similarly and only use differences in encoding (ignoring TCP for syslog and snmp informs), I think of both as events.
Pete Welcher of Chesapeake Netcraftsmen and I were talking several years ago about handling syslog and we both agree that it is useful to filter syslog messages, removing the common messages that are unimportant. Then look at what's left because they are the more important and less common events. I'm thinking of things like Pinnacle or Coil ASIC errors in the 6500. Or environmental events like a power supply or fan failure. I've even seen a rare memory parity error on 6500s (Cisco's message decoder says to reseat the card and if the error persists, call TAC for a replacement card).
I decided to do some web searches this week on the topic and found a couple of interesting web pages that talk about doing the processing events. The first is a blog by Robert Fekete at http://lwn.net/Articles/369075/. In it, he mentions a quote from Markus Ranum: Artificial Ignorance - a process whereby you throw away the log entries you know aren't interesting. If there's anything left after you've thrown away the stuff you know isn't interesting, then the leftovers must be interesting.
He goes on to describe processes for handling logs.
The second article, by DataCenterWorks, at http://datacenterworks.com/stories/antilog.html, is titled "Sherlock Holmes on Log Files." In it, they use the quote from the Sherlock Holmes books: It is an old maxim of mine that when you have excluded the impossible, whatever remains, however improbable, must be the truth.
They describe a similar process and include a script that you can use to quickly filter syslog messages. Its premise is that log messages from a normal ay of operation can be quickly identified by looking at the messages over the course of several days. Then discard those common messages. The resulting messages are the ones that are unique to the current day. They include a script that does this processing. [Note: I've not looked at Splunk recently to see if they offer that kind of functionality. If they don't, it would be a good feature to add.]
Both of these systems match the log processing paradigm that Pete and I have discussed in the past and even offer tools to aid in log processing. If you're running a Cisco infrastructure you can use the EEM function (TCL in IOS) to generate custom events and do a lot more than what's already provided by Cisco. For me, event management and notification is the first and most important function in a network management framework.
Original Post by: Terry Slattery on Jun 6, 2010
Terry Slattery, CCIE #1026, is a senior network engineer with decades of experience in the internetworking industry. Prior to joining Chesapeake NetCraftsmen as a full time consultant, Terry was the founder and CTO of Netcordia, and inventor of NetMRI, a suite of network management products. Terry started Netcordia as a consulting company in 2000 and transitioned to a network management product company in 2003. During the consulting days, he used his network design and implementation skills to lead a team in the design and implementation of a high availability network at a brokerage clearing house. Terry is the former President and founder of Chesapeake Computer Consultants, Inc., a networking and computer systems training and consulting company. He co-invented and patented the vLab(tm) internet-based remote lab system. He is co-author of the McGraw Hill text Advanced IP Routing in Cisco Networks. Terry led the team that developed the current Cisco IOS user interface under contract to Cisco Systems. Terry is experienced in the design and installation of large TCP/IP based networks and is a successful network protocol instructor. He is the second Cisco Certified Internetworking Expert (CCIE) #1026 and the first outside of Cisco. He enjoys membership on the Vanderbilt University Engineering School’s Industrial Advisory Board and the IEEE.