Charles Leaver – Using Edit Difference Is Vital Part 1

Written By Jesse Sampson And Presented By Charles Leaver CEO Ziften


Why are the exact same tricks being utilized by assailants over and over? The easy response is that they are still working today. For instance, Cisco’s 2017 Cybersecurity Report informs us that after years of decline, spam email with malicious attachments is again on the rise. Because conventional attack vector, malware authors normally mask their activities by using a filename just like a common system process.

There is not necessarily a connection with a file’s path name and its contents: anybody who has attempted to conceal sensitive information by giving it a boring name like “taxes”, or changed the extension on a file attachment to circumvent email rules is aware of this idea. Malware creators know this as well, and will typically name their malware to look like typical system procedures. For instance, “explore.exe” is Internet Explorer, however “explorer.exe” with an additional “r” could be anything. It’s simple even for experts to overlook this small difference.

The opposite issue, known.exe files running in uncommon locations, is simple to solve, using string functions and SQL sets.


What about the other scenario, finding close matches to the executable name? The majority of people begin their hunt for close string matches by arranging data and visually looking for inconsistencies. This usually works effectively for a small set of data, maybe even a single system. To find these patterns at scale, nevertheless, needs an algorithmic approach. One established strategy for “fuzzy matching” is to utilize Edit Distance.

Exactly what’s the best approach to computing edit distance? For Ziften, our technology stack includes HP Vertica, making this task easy. The web has plenty of data researchers and data engineers singing Vertica’s praises, so it will be adequate to mention that Vertica makes it simple to create customized functions that take full advantage of its power – from C++ power tools, to analytical modeling scalpels in R and Java.

This Git repo is maintained by Vertica lovers operating in industry. It’s not a certified offering, however the Vertica group is definitely familiar with it, and furthermore is thinking everyday about ways to make Vertica better for data researchers – a great space to see. Most importantly, it includes a function to compute edit distance! There are likewise alternative tools for the natural processing of langauge here like word stemmers and tokenizers.

Using edit distance on the top executable paths, we can quickly discover the nearest match to each of our leading hits. This is an intriguing data-set as we can arrange by distance to discover the closest matches over the whole data set, or we can arrange by frequency of the leading path to see exactly what is the closest match to our frequently utilized procedures. This data can likewise emerge on contextual “report card” pages, to show, e.g. the leading five nearest strings for a provided path. Below is a toy example to offer a sense of usage, based on genuine data ZiftenLabs observed in a client environment.


Setting an upper limit of 0.2 seems to find excellent results in our experience, however the point is that these can be adapted to fit specific use cases. Did we find any malware? We see that “teamviewer_.exe” (should be simply “teamviewer.exe”), “iexplorer.exe” (needs to be “iexplore.exe”), and “cvshost.exe” (must be svchost.exe, unless possibly you work for CVS drug store…) all look unusual. Given that we’re already in our database, it’s likewise minor to obtain the associated MD5 hashes, Ziften suspicion scores, and other attributes to do a deeper dive.


In this specific real life environment, it ended up that teamviewer_.exe and iexplorer.exe were portable applications, not known malware. We helped the customer with additional examination on the user and system where we observed the portable applications given that use of portable apps on a USB drive might be proof of suspicious activity. The more troubling find was cvshost.exe. Ziften’s intelligence feeds indicate that this is a suspicious file. Searching for the md5 hash for this file on VirusTotal validates the Ziften data, indicating that this is a potentially severe Trojan virus that could be a component of a botnet or doing something much more harmful. Once the malware was discovered, nevertheless, it was simple to resolve the problem and make sure it remains solved utilizing Ziften’s ability to eliminate and persistently block procedures by MD5 hash.

Even as we develop sophisticated predictive analytics to spot destructive patterns, it is necessary that we continue to enhance our capabilities to hunt for recognized patterns and old techniques. Even if brand-new hazards emerge does not mean the old ones disappear!

If you liked this post, watch this space for the second part of this series where we will apply this method to hostnames to detect malware droppers and other harmful websites.

Leave a Reply

Your email address will not be published. Required fields are marked *