We have been quiet about the Dark Web Map recently, mainly because we have been busy with other work. We have another dark web project that we will announce later this year, but for now I want to tell you about a major update to the Dark Web Map: version 2! In this post, I'll tell you what's new, what's changed, and why we made these decisions.
No embedded images
The biggest change in Dark Web Map v2 is that we have started crawling with embedded images disabled. If you read our How It's Made article, you may recall that we manually reviewed over 6,600 dark web sites and redacted depictions of sexual violence or children. While the dark web is notorious for violent, racist, gross, and illegal multimedia, our experience while building the first Dark Web Map was that very few sites put that kind of stuff on their home page. Of course, as with anything done by human beings on a large scale, there were oversights and mistakes made. Over time, we received notifications of additional materials that needed redacting, and it was quite time-consuming to make those small redactions, regenerate the entire map, and upload all of the tiles all over again.
If this was merely a technical problem, then we might be able to optimize the process of making redactions to be more efficient. But as we prepared this update to the map—now a full year after the original release—we began to see that more and more sites display shocking and horrifying content directly on their home page. In particular, there are a lot of new sites that display child exploitation directly on their home page.
The material is too awful for words.
As a result, we made the decision to start crawling dark web sites with embedded images disabled: i.e. not loading multimedia from
<img> tags or from style sheets. Reviewing and redacting these sites would be traumatizing, and even possessing some of these screenshots would be illegal. On the upside, the decision to globally disable all images increases our agility: we can make more frequent updates to the map without needing to schedule time for a human being to review and redact screenshots. Some of the sites are completely blank when embedded images are disabled, but most of the sites are still very recognizable.
Note: there is a bug in our crawler whereby a response body with an image mime-type like
image/jpeg served from the root URL will still be displayed. Not many sites host images at their root URL. We have reviewed the ones that do and will fix this bug in the next update of the Dark Web Map.
Censoring onion names
When we built the first version of the dark web map, we had a lot of internal debate at our company about how exactly it should look, how it should work, and how we should present the data. One decision we made late in the process—after collecting the crawl data and doing most of the pre-processing—was to mask off onion names. For example, Facebook's onion is
facebookcorewwwi.onion, but in the v1 map it is displayed as
We masked off the last 4 characters because we didn't want the map to be a directory of onion services. We didn't want to point people towards dangerous and/or illegal dark web sites. Some sites display onion addresses (their own address or others) in the text of the page, however, leading to a situation like this:
In this example from the v1 map, the onion name is masked off in the caption at the bottom, but the screenshot shows that the page displays its own onion address! This particular example is harmless, but this obviously defeats the purpose of masking. In the v2 map, we are now masking off onions that appear in the text of the page.
In this example, notice how the onion is masked off both in the caption as well as in the screenshot of the page itself. This process isn't perfect—it may not mask onions that are intentionally obscured—but it is a dramatic improvement, especially for onions that list links to illegal sites.
Hunchly's Dark Web Report
Another big change is that we are now obtaining our onion data from the Hunchly Dark Web Report! Hunchly builds amazing OSINT software. As a side project, they also publish a free list of recently crawled onions in a daily report. We are using this report as a seed list for Dark Web Map v2.
As a side effect of this new data source, the map now includes v3 onions! These are the onions with really long, 56 character names. When these names appear in the map, they are abbreviated to fit into the space available:
We even had to upgrade the Tor proxy on our crawling machine because it was too old to support v3 onions!
This data source also gives us subdomains of onions. For example, Facebook runs internationalized versions of its onion:
The Dark Web Map now displays these subdomains, like
ru-ru.facebookcorewwwi.onion, as seen in the caption underneath this Facebook page.
In Dark Web Map v1, if you found something interesting in the map and wanted to show it to somebody else, it was very cumbersome process. You might take a few screenshots and/or provide directions about how to navigate to an item of interest.
Dark Web Map v2 makes it super easy to share specific locations in the map! Just copy and paste your current URL, and anybody else who opens up that URL will be taken directly to the same view that you are looking at.
As a result of the decisions regarding images and the data source, we threw out all of crawling data from the v1 map and started from scratch. As a result, the map has shrunk from 6.6k onions in the v1 map to 3.7k onions in the v2 map.
The difference in size isn't immediately obvious when you start navigating the map, and this tradeoff makes sense to us now, because we will be able to make more frequent updates to the map going forward, and this will eventually lead us to build bigger and bigger maps as we gather more and more historical data.