色情网站 Pornhub 利用 Websocket 防止广告被屏蔽
(NSFW level: medium)
We tried to find the most PG page on MindGeek’s network to use as an example- it wasn’t easy.
When I was building the prototype for BugReplay, I was evaluating different methods of capturing and analyzing network traffic from Chrome. One of the first things I saw that looked promising was the chrome.webRequest API.
From the docs: “Use the chrome.webRequest API to observe and analyze traffic and to intercept, block, or modify requests in-flight.”
That seemed to be exactly what I needed.
After experimenting with the Chrome webRequest API, I quickly realized there was a big problem. It didn’t allow me to analyze any WebSocket traffic, something I really wanted to support.
As I was searching the web trying to see if I was misreading the documentation or was looking in the wrong spot, I found a relevant bug report from 2012: “chrome.webRequest.onBeforeRequest doesn’t intercept WebSocket requests.” In the bug report, users were complaining that without the ability to block WebSockets, websites could get around ad blockers fairly easily. If WebSocket data was not visible to Chrome extensions via the webRequest API, they could not be blocked without some heavy duty hacks.
Initially, the risks to ad blockers seemed theoretical; the examples of sites that were employing this technique were very obscure. Then in August 2016, an employee of the company that owns Pornhub.com (MindGeek) started arguing against adding the WebSocket blocking capabilities to the Chrome API. Pornhub is the 63rd most visited site on the Internet according to Alexa. I checked out a few of MindGeek’s sites and sure enough, I could see ads coming through even though I had Adblock Plus on. The ads on Pornhub are marked ‘By Traffic Junky,’ which is an ad network owned by MindGeek.
In the screenshot below, you can see a banner at the top of the page announcing that the site is aware that the user is using an Ad Blocker, with an invitation to subscribe to a premium ads free version of the site. On the right side of the page you can see an advertisement.
How They Do It
When you visit Pornhub.com, it tries to detect if you have an ad blocker. If it detects one, it opens a WebSocket connection that acts as a backup mechanism for delivering ads.
Watching the BugReplay browser recording, you can see a number of network requests triggered that are blocked by AdBlock: They are marked Failed in the network traffic, and if you click one to inspect the detail pane you can see the failed reason is
net::ERR_BLOCKED_BY_CLIENT. That is the error reported by Chrome when an asset is blocked from loading.
You can find the WebSocket frames individually in the network panel or just look at the WebSocket create request which has links to all the individual frames. The name of the domain where the WebSocket connects is “ws://ws.adspayformy.site.” A decent joke aimed at ad blockers :)
When the WebSocket loads, the browser sends a frame with a JSON encoded payload for each of the spots it has available for ads. Checking out one of the WebSocket frames, you can see in the frame data the advertisement data is sent back with:
media_typeimage, so the page knows what kind of element to create (most of the ads are videos, I picked an image for this post because it was relatively tame).
- The Image itself, transmitted base64 encoded so it can be reconstructed using the data uri scheme
- An “img_type” (“image/jpeg”) to pass to the data uri.
Ad Blockers primarily work using the webRequest API, so constructing the ad by transmitting the data over the WebSocket as base64 is a pretty clever way of dodging the blocker.
On October 25th, 2016, there was some new activity on the Chromium ticket. A contributor wrote a patch adding the ability to block WebSockets using the webRequest api. If it’s accepted, it will eventually wind up in Chrome stable.
When or if that rolls out, the ad blocker extension writers can choose to remove the hacks for users of the latest Chrome, leaving content providers like Pornhub to figure out their next move in the ad blocking war.
For AdBlock Plus, “The wrapper performs a dummy web request before WebSocket messages are sent/received. The extension recognizes these dummy web requests as representing a WebSocket message. It intercepts and blocks them if the corresponding WebSocket message should be blocked. The WebSocket wrapper then allows / blocks the WebSocket message based on whether the dummy web request was blocked or not.” via
For uBlock Origin, they shipped a workaround that has the “ability to foil WebSocket using a CSP directive.”