narrow default width wide
colour style colour style colour style colour style

Cleaning up MOM alerts

The company I am consulting at has had MOM installed for quite a while but no one has ever really "owned" it so alerts haven't always gotten processed like they should. I'm no MOM expert (fighting off the temptation to throw in a yo' momma comment here...) but I figured I would do some hacking around.

The first thing I noticed what that all of the servers were "Critical" in the status view but there weren't nearly enough alerts in the Alerts view for all of the server to be critical. After going through every menu 3 times to find a way to see if there were some special view filters (and creating a new custom view that still didn't show everything), I stumbled across the stupid clock/calendar in the upper-right corner.

Presto magic! I changed it from the default of 7-days to 1000-days and lo-and-behold, I have thousands of alerts. I wanted to acknowledge them all because I figured MOM would just regenerate them if they were still valid issues that need to be addressed. So, I right-click and do a Select-All so I can acknowledge them, but alas, there is no 'Select All' option. Surely that option is in the menu somewhere, right? Nope. Ok, I'll just select the first record and scroll to the bottom and do a Shift-Select. No dice there either. MOM popped up some stupid message about exceeding the limit for how many records could be selected at once. I went through the process of selecting the maximum number of records and acknowledging them about 10 times and (based on how far the scroll bar had adjusted) figured it would take me about 3 days to get them all acknowledged.

Now, being a programmer, I figure I can pull it off via scripting or SQL commands but I always like to see if there is a built-in mechanism first. I stumbled across the "Data Grooming" settings and it looked like adjusting them should cause the records to purge themselves automatically (at least anything older than 45 days). I set everything up to auto-resolve within 30 days and for all data older than 45 days to be groomed and let it run for a couple of days. Unfortunately, this didn't seem to work. I don't know if it just needs more time to run but I got impatient and decided to do things "My Way".

First off, let's open up the database and see how the data is stored. I checked a few tables to figure out how everything was cross-referenced (after dealing with the SMS Database I never expect things to be organized) but I was happily surprised to see that everything I needed to work with was in the table named "Alerts". I didn't run across this until after I had everything fixed but Microsoft has a great breakdown on the MOM database/object properties.

Long story short, after a bit of tinkering with manual SQL changes to individual records to make sure everything worked the way I thought it did, I ended up running the following script in Query Analyzer:

update alert SET ResolutionState = 255 where resolutionstate 255 and culprit = 'Citrix Metaframe'

Bascially, it just changes all of the records that haven't been acknowledged so that they are now acknowledged. (that's what 255 means-check the MS link above for a list of all possible values). I chose to also limit my changes to alerts that came from 'Citrix Metaframe' because they comprised about 95% of them. Apparently the MOM Citrix Management Pack alerts every time a printer mapping fails--which happens ALL OF THE TIME. Someone had shut it off because they were tired of getting e-mailed but obviously all of the MOM alerts were still lurking out there--hidden by the "7-day" filter that the Alerts view defaults to.

So, I learned a bit more about MOM and I am on my way to cleaning up a bunch of other issues with their MOM setup. I am down to a much more managable 150 active alerts that I need to review and determine how to handle.