Security Visualization - State of 2010 and 2011 Predictions

At the recent SANS Incident response and log management summit, I was part of a panel on security visualization. As an introduction, I presented the attached slides on the security visualization trends and where we are today.
I looked at four areas for security visualization: Data, Cloud, Tools, and Security. I started with looking at the log maturity scale that I developed a while ago. Barely any of the present companies could place themselves to the right of correlation point. It's sad, but probably everyone expected it. We have a long way to go with log analysis!

Data

It's very simple. If you don't have the data, you cannot visualize it. A lot of companies are still struggling to collect the necessary data. In some cases, the data is not even available because applications do not generate it. This is where data analysis or security people have to start voicing their needs to the application owners and developers in order to generate the data that they need. In addition, developers and security people have to communicate more to learn from each other. Ideally, it is not even the security folks that visualize and analyze the application logs, but it is the application people. Just a thought!
What we will see next year is that the Big Data movement is going to enable us to crunch more and bigger data sets. Hopefully 2011 will also give us an interoperability standard that is going to ease log analysis.

Cloud

What does the cloud have to do with security visualization? Well, it has to do with processing power and with application development. Applications generate logs and logs are used for security visualization. Cloud services are new pieces of software that are being developed. We have a chance here to build visibility into those applications, meaning we have an opportunity to educate these developers to apply logging in the right way.
Next year we will see a lot of companies that are going to roll their own log analysis systems based on big data technology, such as Hadoop. We have seen a number of companies doing this already in 2010: Facebook, Linkedin, NetFlix, Zynga, etc. Traditional log management solutions just don't scale to these companies' needs. This will continue next year.

Tools

With tools I mean security visualization tools. We are absolutely nowhere with this. There are a couple of simple tools out there, but there is no tool that really does what we need: brushing, linked views, supports large data sets, easy to use, contextualized, etc.
Next year won't really change anything in this area. What we will see is that more and more tools are built on the Web. The cloud movement is kind of responsible for this push, but so is the broad utilization of HTML5 with all of it's goodness (e.g., Websockets, Canvas). We will see advances in the social space with regards to visualization tools. Security will continue utilizing those tools to analyze security data. It's not ideal because these tools are not meant for this, but hey, better than nothing! Maybe this will help creating awareness and will surface some interesting use-cases for security visualization.

Security

What will we see in security visualization? Well, as we saw earlier, we don't have the data. What that means is that we haven't really had a chance to learn how to visualize that data. And because we didn't have that chance, we don't really understand our data. Read that again. I think this is an important point!
Next year will give us more bad security visualization examples. And I am lumping product displays into this. Have you looked at your tool lately? During the SANS summit, I had a chance to look at some of the vendor's dashboards. They are horrible. 3D charts, no legends, bad choice of colors, non actionable dashboards, etc. Note to log management vendors: I offer a security visualization class. You might want to consider taking it! But back on topic. Visualization, just like security, will stay an afterthought. It's being added when everything else is in place already. We know how that generally turns out.

I know, I am painting a gloomy picture. Hopefully 2011 will have some surprises for us!

Network and Security Data Visualization Tool Vendor

If you are looking for security data visualization tools, you may want to consider Edge Technologies, Inc. (I work for them.) We do a lot of data visualization work in the defense/intelligence and telco space. Most of the data we visualize comes from real-time network and cyber security monitoring and management tools. There are a ton of visualization tools out there, including open source, but I'm not aware of any that have our focus on network/cyber security data and which combine existing web content from underlying tools and new dynamic data visualizations to create a unified interface. If you are curious, check out our demo at www.edgeti.com or reach out to me becky.land@edgeti.com.

Network and Security Data Visualization Tool Vendor

If you are looking for security data visualization tools, you may want to consider Edge Technologies, Inc. (I work for them.) We do a lot of data visualization work in the defense/intelligence and telco space. Most of the data we visualize comes from real-time network and cyber security monitoring and management tools. There are a ton of visualization tools out there, including open source, but I'm not aware of any that have our focus on network/cyber security data and which combine existing web content from underlying tools and new dynamic data visualizations to create a unified interface. If you are curious, check out our demo at www.edgeti.com or reach out to me becky.land@edgeti.com.

Couple of other thoughts -

Couple of other thoughts -

Data - Interoperability of log formats - they should be able to deal with the tcpdump standard. In other words an existing format that can log any part of the packet or not. In terms of column separators I reckon the best are the old ones again - CSV, TAB. Keeps it simple, plus it wouldn't be difficult to add options to a parsing application to use any separator preferred - like new lines per colum / value, or whatever suits the individual concerned, as the separators can be changed around anytime.
Reg Ex works best for parsing, and could relatively easily be tied in to a GUI app. with Boolean option buttons, if that hasn't already been done? I'm meaning that more for folks that don't know Reg Ex syntax. Grep already has options to colour the returned values in searches according to specified parameters, and of course Ethereal / Wireshark's done loads with tcpdump already in terms of logging options and displays of.

You probably know of this - http://www.malwaredomainlist.com/ - in case not, they have free lists of data on, as the name suggests, malware domains.

Re. Tools - Traditional

Re. Tools - Traditional graphics programming looks like the best choice for developing tools that are able to deal with large datasets, displaying realtime changes graphically & scaling them, and linking them to other realtime events - games programs have been there for ages. They also do dashboards very well.

Take any multiplayer networked game - any genre would do but focus on the FPS types (run around with a gun games) because they are more fast-paced and requiring a lot of updates to the server, from each player, and then subsequent updates to each players individual machine as to what the other players are doing. The structure in the code is already there - and these games tend to have realtime maps that show locations of other gamers, changes to those other gamers status. And the maps also already show up as onscreen windows as part of the visible dashboard. Sustitute for example the individual gamers machines for different nodes collecting logs in a network, with each node then also having an update of other nodes status - which can be viewed by people in an easy graphical representation (the map status display), from any of those nodes management dashboards. Instead of the game, the server is running say an IDS, and the nodes are correlating data in realtime according to the IDS ruleset, and also of course saving the log data for offline use.

Related to that is the visualisation hardware itself - especially when it comes to the big-data that is going to need multiple monitors. I looked into this a bit in a degree project I did, and a good example of a free clustering OS that has multiple monitor support is Rocks, with the Viz Roll module added. The configuration isn't too complex - with that build you're looking at designating monitors per each processing node, and you can split them up in varying ways such as n displays per processing node. It just gives more screen realestate to work on.
Rocks recently did some team-up with HPC Windows, so there's already a groundwork there for interoperability too.

(This seems to be a large file of many megabytes, there's a cached google html version available, without the images - http://www.rocksclusters.org/roll-documentation/viz/5.3/roll-viz-usersguide.pdf )

Another visualisation tool is Vampir - its used for analysing what cluster nodes are up to, the messages they are passing to one another and so forth. It has a granularity scale though that is very suitable to the visualisation tools required, in that it can zoom right in and out, and shows this in a graphical display that is well labeled, from say nodes communications level down to the raw data itself.

http://www.vampir.eu/

Another area to look to is music visualisers, as again the code is there already it just is displaying different datasets to what's required for log analysis and so forth. I'm meaning the software such as at Sound Spectrum (like Whitecap) and that ships with the XBox360 unit, that matches music to visual display. These look to be useful for pattern analysis - think of it like the DoomCube but with more than 3 axis available. So what that could be useful for is having very recognisable 3D waveforms for safe traffic status, so that when anything untoward takes place there is a very discernable change to the overall waveform. And these already work as animations too, in the sense the coding is timebased - alter the code, and watch the data traffic playing out rather than the representation of the music.