Two days before New Years, something interesting happened in the world of cyber security. The Department of Homeland Security released a report on hacking activities by Russian Intelligence Services related to activities against the U.S. Government. The report was somewhat interesting, however DHS also released a set of indicators in a .csv file with 956 lines of data. As the CEO of a new cyber security startup focused on using data in smarter, more interesting ways, this data tugged and pulled at me in a way that I did not expect. Over the next two days, in between (and through) family events, football games, and dogs grabbing food off of the counters, I sat on a stool in my in-law’s kitchen and tuned out the world. There was something about this analysis that I could not ignore.
“On October 7, 2016, the Department Of Homeland Security (DHS) and the Office of the Director of National Intelligence (DNI) issued a joint statement on election security compromises. DHS has released a Joint Analysis Report (JAR) attributing those compromises to Russian malicious cyber activity, designated as GRIZZLY STEPPE.”
This posting is meant to summarize some of the findings of the more detailed report that Dark Cubed published on New Year’s eve 2016 titled, “A Brief Analysis of the Cyber Indicators Related to GRIZZLY STEPPE.”
Involvement of Tor-Related Infrastructure
The first thing that stood out to me was the prevalence of infrastructure related to the Tor network. I stumbled on this as something significant during my reverse DNS analysis of the IPs released. Many of the DNS entries that came back included tor-exit or tor… this was intriguing to me. I then quickly jumped over to a popular list of Tor-related IP addresses maintained at https://www.dan.me.uk/tornodes.
Of the 876 IP addresses released by DHS, 211 of them appear on the list of all Tor nodes, that is right around 24% of the indicators. Looking at it from the other direction, as of the time of writing, there were 6,909 IP addresses on that list of Tor nodes. So, the DHS list of IP addresses contained approximately 3% of all of the Tor nodes… while a small percentage, this is a significant number.
Now, numbers and statistics are interesting, but visualization lets this really sink in. Much of the time I spent analyzing the data released in the JAR was in building graph visualizations using Gephi. The image below shows the result of the reverse DNS analysis and the influence of Tor:
On the left hand side, we see the reverse DNS data set visualized. The items are grouped by the associated TLD (e.g. .com, .net, .ru). The relationships in the graph flow from TLD -> Domain Name -> Fully Qualified Domain Name -> IP -> JAR. On the right hand side we have colored Tor-related IPs, domain names with Tor in them, and their parent nodes red. It is shocking to see in one glance how heavily the data is influenced by the Tor network. The bottom line from this analysis is a simple question: Even if an attack is routed through TOR, should the Government (or anyone else for that matter) be releasing the related TOR nodes as a part of a data set such as this?
Analysis of Geolocation
In the released .csv file, DHS identified the countries associated with each IP. This lead me down the road of exploring what insights could be gleaned from a geolocation-based analysis. The first step of course was to simply map the IP addresses on Google Maps using the MaxMind database. The result was not necessarily helpful, but interesting nonetheless.
A quick glance shows that this data set covers quite a bit of territory and was rather expansive. For a more detailed analysis, I jumped back into building out some visualizations in Gephi, a painstaking, methodical process.
“These cyber operations have included spearphishing campaigns targeting government organizations, critical infrastructure entities, think tanks, universities, political organizations, and corporations leading to the theft of information” — JAR-16–20296
The result was fascinating and in some ways looks like daisy. Like the map above, the graph reveals that this is a pretty diverse data set, but there are a few very intriguing nuggets:
The first thing that stood out to me immediately was China and the Republic of Korea. Both of those countries were featured, yet only were tied to infrastructure that was identified by the JAR as being Command and Control (C2). Looking at Russia, there was of course a significant number of nodes, however only one was identified as C2. We see the same effect in countries represented by a smaller number of nodes such as Puerto Rico (3), Thailand, and Hong Kong. What this means exactly is not clear, but it is definitely an interesting result and worth more consideration.
Analysis of Organizations
A different way to look at the data set involves evaluating the organizations that are related to the IP addresses themselves. Again, this took a significant amount of time to build out the visualizations in a way that made them worthwhile, but the result was fascinating:
This view provides us with insight into organizations that have more IP addresses than others featured in the data set. Groups like the Russian Broadband provider Scartel Ltd. are very prominent. We also see the influence of cloud providers such as Online S.A.S. (online.net) and Ovh Systems (ovh.com). We go into more detail in the paper, but the other interesting finding here is the three organizations associated with China that appear to be related to the C2 infrastructure discussed above.
Dark Cubed Analysis
The final, and most detailed section of our analysis was related to proprietary data that Dark Cubed has been collecting for almost a year now. A unique part of our offering is that we provide a real-time, fully anonymous information sharing network capability to our customers that enables shared analytics without revealing customer identities. This provides us with a very unique and interesting data set that lets us compare real network activity with sets of indicators like those released with GRIZZLY STEPPE.
This data provides a significant benefit when evaluating such data sets due to the fact that it lets us mash up suspected bad activity with real activity to see where indicators might be too noisy. The graph below highlights the organizations related to GRIZZLY STEPPE when influenced by that volume that Dark Cubed customers observed those indicators:
By using real-world data to analyze these data sets, we can instantly see the introduction of unwanted noise. Why are indicators associated with Yahoo, Twitter, Google, Microsoft, EdgeCast (Verizon) included as something that network administrators should be looking for? At best, this creates a rabbit hole with no value. At worst, noisy threat intelligence companies might introduce these indicators into their data sets and create a world of headaches for companies both large and small.
The second analysis has to do with sizing the nodes based on the number of customers that observed the indicator. This provides us with a different perspective on “noisy” indicators:
We see above a relatively tight grouping in the middle associated with a group of C2 and non-C2 nodes that were seen on a large number of Dark Cubed customer networks. We also see a majority of the indicators were scattered across customers in a relatively balanced fashion.
In our report we dig into more detail on the most broadly seen indicators, but (spoiler alert) they just appear to be noise.
Based on our initial look into the data, this is the result of broad, unfocused scanning activity that occurs from these IP addresses on a regular basis. In our report we dig into more detail on the most broadly seen indicators, but (spoiler alert) they just appear to be noise. In fact, four of the largest “C2” nodes in the graph above are simply Yahoo servers.
The third and final analysis worth sharing outside of our more detailed report is when we overlay scores that we calculated for our customers when they observed these indicators on their own networks. Our scores range from low risk, high confidence to high risk, high confidence. The graph below shows the results of that analysis:
As we can quickly see, most of the infrastructure associated with the JAR was already known to be suspicious for a number of reasons. The items in the middle that are scored neutral (yellow) or low risk (green) are Yahoo and Google servers, by the way.
This is important to note because it indicates how noisy this data set really is. We can not make a determination on whether the infrastructure was or was not used by the Russian Intelligence Services (RIS), because we have not seen the incident response data. However, we can say with certainty that of the infrastructure related to GRIZZLY STEPPE AND observed by Dark Cubed customers was not used exclusively by RIS threat actors.
Summary
In closing, this project was a fascinating deep dive into the power of data analytics and re-enforced to me the necessity of using real network data to help remove noise from data sets such as that released in association with GRIZZLY STEPPE. It also re-enforced our vision of delivering threat intelligence and predictive analytics at scale to companies of all sizes in a vehicle they can actually afford and use.
As we continue to build and grow our early stage startup, we hope to be able to continue to contribute more data-driven analytics to help filter out the noise and allow organizations to protect what matters most.
This article was originally published on www.hackernoon.com.