DIY DNS firewall

LARG*feed is a high confidence IP address threat feed created inhouse with over 200,000 unique entities. DDOS attacks against LARG*net members collected or by high confidence sources like honeypots and provide a significant amount of data that we incorporate into the feed. The most important feature of this system is confidence level - we are confident that the IPs originating these attacks belong in the feed because they already participated in an attack.

Threat feeds are typically measured by threat and confidence. A low threat is typically an unsophisticated attacker that’s mostly being blocked already. A medium threat is an actual exploit attempt. A high threat is typically an APT group as defined by the Mitre Attack framework. Threat level is very rarely included in threat feeds so we do not measure threat for LARG*feed.

Confidence level helps determine and eliminate the likelihood of false positives. Known attackers are automatically assigned a high confidence level since we know they’ve attacked before. Only high confidence entries are added to LARG*feed to ensure we aren’t blocking unnecessarily. Other feeds don’t always indicate their confidence level so be wary of these lists as they end up being more of an “indicators of compromise” list that requires further investigation. This is not useful in blocking threats but could be part of your defense in depth solution provided you have the resources to devote to these investigations.

DNS Firewall

URLhaus is a fantastic free resource that provides many feeds. We’re going to focus on DNS Response Policy Zone (RPZ), also known as DNS firewall, which allows you to block the resolution of certain domain names on your DNS resolver. URLhaus extracts domain names from malware URLs and offers them as an RPZ dataset conveniently formatted to be a BIND DNS zone drop in. This is a high confidence feed because it blocks domains that are actively distributing malware.

Keep in mind that DNS Firewall does not protect against phishing or adware as perpetrators of this type of cyber malice is harder to pin down with confidence. The most important feature of DNS Firewall is it’s confidence level so0 these categories are omitted to maintain it.

Phishing

Symantec’s 2019 Internet Security Threat Report indicates 65 percent of attacker groups use phishing as their primary infection vector. While open phishing threat feeds do exist and there are some good ones, they aren’t licensed for commercial use. Most are either unclear about confidence level or are too small to be effective. I have not been able to evaluate any commercial phishing threat feeds.

I had to try something to address the phishing threat so attempted a technological approach and python came to my rescue. I pulled all the feeds I could find and looked for any intersections across them. I surmised that a domain appearing in multiple feeds must be illegitimate so planned to collect my own list from all the available data. Unfortunately there were practically no overlaps: a potential size of a million domains only had 65 intersections. This is all the more surprising when you imagine the finite number of threats on the internet as a large circle with the threat feeds, regardless of size, comprising smaller portions within the whole.

This scenario presents a remarkable problem: not only do we fundamentally not know how large the Total Threats circle is, but we also can’t estimate the percentage of false positives within the individual feeds. Such low intersection rates mean we are missing the big picture. These phishing threat feeds operate independently and do not come across the same threats? The total number of phishing threats must be immense to have no overlap at all.

I tried a few other techniques after hitting that dead end but ultimately I don’t believe a technological approach will work. Fundamentally the issue is not technological in nature but rather speaks to quality control standards like ISO 9001. The usual control measurement makes 20 quality measurements on each feed daily for at least 4 weeks so you can measure the standard deviation in your rated quality. I queried each entry of the list every half second. Each cycle of testing took about 95 hours so the whole process took more than 4 weeks but I eventually identified valuable phishing feeds. Retesting becomes less rigorous.

Unfortunately I cannot list which are good or bad; the list would become outdated and not useful. Quality changes over time, one must continuously be measuring and monitoring the feeds.

Here’s an even larger monkey wrench of quality. When I pull the phishing threat feeds and query against 8.8.8.8. Google’s public DNS does not do any filtering.

Total threat feed size: 789,436

Resolved: 179,261

The majority of the phishing domains in the feeds are not live entries. These feeds don’t do anything if the domain itself doesn’t get resolved. These are all false positives as there is no threat on a domain that doesn’t resolve.