I’ve been brainstorming with the idea of having a sort of “brain” process to keep tabs on systems I watch. For recurring problems with consistent solutions, my hope is that it can eventually take care of these without human intervention.
- Attracting or sending too much traffic to/from a peer? Query a snapshot of cFlow records and pick some CIDR blocks to block accordingly.
- Inbound DDOS attack to a single source? Sink the traffic to a mitigator device up to a certain threshold of collateral damage. Past that, automatically inject a BGP-blackhole-community-tagged /32 route upstream.
- Route being hijacked? Determine hijacked block size, and slice up an aggregate into more specific routes where possible.
- Router CPU spiking again? Capture a process list as it happens.
I’m envisioning a set of ruby or python libraries that can interact with different systems and tools in a monitored environment, make some sane choices about what to do, and at the very least, be smart about alerting the administrator. I would much rather receive nagios and cacti threshold alerts over XMPP/Jabber than SMS while I’m in front of a real keyboard.
Come to think of it, receiving alerts over SMS isn’t that great. Email to SMS gateways add a lot of garbage to the message, and they show up as coming from different senders, so they never seem to thread on my phone. Having a programatic way to access alert data would make gathering this data into a ‘Systems Alerts’ type of app on the mobile phone possible.
Here’s what I was brainstorming in OmniGraffle:
