Tuesday 31 July 2012

Limited Run and Facebook

So it's all over the news that Limited Run and Facebook are squaring off about click fraud. No big surprise, you can't be a large advertising platform and NOT eventually make the news.  So let's do a bit of analysis on it. Of course we can only speak for the traffic we've captured through our beta testing program, we can't speak for Facebook or Limited Run.

Javascript vs. No Javascript


We are also interested in the javascript vs. no javascript end of things on our system. However, we determined that 1.4% of all traffic didn't have javascript enabled. We also determined that 1.5% of all pageviews had no javascript enabled. That's significantly different than what Limited Run was seeing. We all know that javascript and cookies are required for a large majority of websites that folks use these days, and fraudsters know that if you don't look like the vast majority....well you're not a good fraudster.

Secondly, if I were to write a good Facebook fraud engine that was bot style, it would be in pure javascript. Facebook is not a traditional linking system, with many things buried inside the DOM (Document Object Model) that give the fraudster the idea of where and how to click. The traditional scrape a web page and start clicking on links would be less effective, and no matter how shabby the Facebook click fraud detection was, it would stick out like a sore thumb.

Javascript vs. Conversions


So we dug a little deeper. We ran the numbers on the conversion rate for javascript disabled visitors. It turned out to be 2.2%. Not great, but not 0.  We did however find a number of paid clicks coming from these types of visitors, so we dug a little further, and this is where it got interesting.

The ClickOptics Test


Using our own fancy shmancy scoring algorithm, we then took all of the traffic that didn't have javascript enabled, ran it through our scoring algorithm, and here's a high level breakdown, which oddly enough seems VERY close to the numbers Limited Run was seeing.

Good Traffic (you made more than you spent)        - 21.98%
Not So Bad (you spent a little you made a little)  - 2.99%
Bad (it cost you money, you made none)             - 75.01%


Interesting! So you can see that if you breakdown the actual javascript disabled traffic itself, you start seeing some quality issues, but this can be misleading when it's only making up a fraction of your total traffic. Could this be the issue? Are we crazy? Let us know in the comments.

Closing Thoughts


In our opinion, Facebook advertising is a tool to increase your social engagement (duh). It's not there to service people looking to actually buy product and services. Sure, you can get followers, likes and the rest of it, but that could translate to 0 visitors to your sites and 0 conversions. This is simply because your target market is not on Facebook for commercial reasons. Use Facebook as a part of your strategy, but if you're going to be advertising on it you better have a killer Facebook page, with some very compelling reasons to have people go ANOTHER click deeper to your website.

Sunday 29 July 2012

Metrics - Acronyms for All

Interesting post on conversion rates and visitor engagement over on Inc.com. What I take away from this post is: know your site.

No one or two metrics (or acronyms) work for every single site in the same way. We keep beating this drum: know your site, know your site, know your site.

If you know your site, then you're going to boil down metrics, analytics, click fraud rates, conversions and the rest of it into your own formula that works, is repeatable and lends itself well to continuous experimentation.

Saturday 28 July 2012

Make Your PPC Expensive to Attack

I came across a pretty good writeup, from 7 years ago. It nails some really great points.

The main takeaway I grabbed from this, and from the single comment that was posted is that it is key to make your PPC campaign expensive to attack. What does this mean exactly?

1.  Be a Sniper


Use a tool (we're a bit biased here of course) to keep an eye on your traffic, visitor behaviour, social stickiness (we'll delve into this in a future post), and low quality paid traffic. Start excluding known bad actors, increase the bids on your socially sticky keyword sets. Your competitors will have no choice but to replicate you (fail) or outbid (triple fail) you. Don't try to do this without a tool that is paring down the data into a manageable format and amount, reams of data does not equate to actionable data.

2.  Forget the 1%


We've already written that 80% of malware make it past anti-virus companies. You're never going to have a silver bullet for traffic quality and click fraud prevention, and there are always going to be attackers that are extremely skilled. Forget about them. Like now. If they're good enough to sneak past you day after day undetected, accept it. And move on. Make it so that it's only the top 1% (see point 1 above on how) that can do this to your online presence.

3. Stop Tracking Conversions from Multiple Sources


I never understood why people track conversions from AdWords, Google Analytics, and nine other tracking scripts. And then they try to compare each of them (which never jive) to how many conversions they actually had, and fairly soon they have no friction' clue what's going on. Use one very good conversion tracking system, and stick to it. This may be trial and error, but stop trying to do it from multiple locations, you're just confusing yourself, and your traffic quality program suffers. Conversions are the ultimate end goal of your online presence, make sure your conversion tracking is bang on.


Click fraud is an arms race, one that involves many players: you the advertiser, Google the provider, the fraudster and potentially a whole whack load infected machines running bots. It's key to find a defensive program that requires minutes a day to implement, is multi-layered (computer security folks call it "defence in depth") and effective. Let us know in the comments if you've found techniques that work for you, or whether you think we're full of it.

Tuesday 24 July 2012

The Google Conspiracy

I recently had a conversation with a fellow webmaster and former AdWords advertiser. When I asked why he had stopped advertising, he gave me a rough rundown on how he wasn't getting results, and that he started to see a drop in conversions vs. money spent.

It was at this point in the conversation that the inevitable happened:

He figures it was attributed to either Google themselves generating clicks that looked to be from anywhere in the world, or it was known bad traffic that Google just ignored.

*eye roll*

I always wondered why people had these harebrained ideas about Google, and how they don't care if you get ripped off. It just never made sense.

There is the easy parallel to draw to the antivirus industry. Sure, they want to do just a good enough job to make you think they care, but not such a good job that they catch 100% of the malware because why would you buy updates in the future? ZDNet had reported the 80% of malware makes it past most AV's, but what if we saw click fraud rates that high? It would definitely make the bullshit Click Fraud Index look like a Strawberry Shortcake party on your AdWords account wouldn't it!

Then the next inevitable argument line is how Google has little transparency on how they handle this problem, so how do you REALLY know. But again, neither does the AV industry. We can only assume (or you can in theory reverse engineer their software, which you can't with Google) that they are doing everything in their power to protect your corporate and personal networks. But they fail, and they fail big.

Yet everyone keeps buying it.

Hopefully we can start to be the ones to put an end to this madness. Drop the Google conspiracy, and start paying attention to your website. Guard your PPC like you guard your network.

Friday 20 July 2012

More Mobile Madness

If you didn't read our previous post here, then check it out for some background information.

We've also subsequently added Verizon and T-Mobile to the list of telcos that are allowing private information flow past their proxies.

In these carriers cases, they are pushing out the MSISDN header which gives away the end users's phone number. Now it's possible that the user's mobiles are giving this away, however, we definitely feel like the carriers should make an effort to protect the users that are coming through their proxies.

Here's an example request from T-Mobile:

Accept-Language: en-US
x-wap-profile: http://wap.samsungmobile.com/uaprof/SGH-T959V.xml
User-Agent: Mozilla/5.0 (Linux; U; Android 2.2.1; en-us; SGH-T959V Build/FROYO) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1
Accept: */*
Accept-Charset: utf-8, iso-8859-1, utf-16, *;q=0.7
X-Nokia-MSISDN: 1XXXXXX0549
X-Nokia-sgsnipaddress: XXX.155.174.198
MSISDN: XXXXXX0549
X-Via: Harmony proxy
Connection: keep-alive

We found about 50 instances where folks numbers were being disclosed in this fashion, and they were exclusively from Verizon and T-Mobile.

Why Hasn't Anyone Said Anything?


Obviously analytics companies and click fraud companies who are doing deep analysis must have seen this before. We are a new startup, with very small (comparatively) traffic numbers. If after seeing some very small samples of our beta testing customers sites, and picking this up, we would be surprised that no one has noticed this before.

But here is the tough part of the situation. A good portion of analytics, and click fraud detection is being able to isolate a visitor on the other end of the HTTP stream. It is advantageous to have uniquely identifying information flowing in because it makes the job of the click fraud detection company a lot easier. If only we could see each user's phone numbers! Hopefully no one is heavily relying on this type of information to create visitor profiles or you're algorithm is going to be severely bitched after all the mobile companies (and their users) get wind of it.

This is precisely why we all need to do a better job of how we approach this problem, and why we feel our approach is different. Let's just hope these mobile carriers, and any others who are paying attention, fix these issues and be on the lookout for them in the future.

Monday 16 July 2012

Head(er) Hunter - Beware the Misconfigured Proxy

As part of our beta testing program, we spend vast amounts of time manually poring over the traffic that flows into our tracking server. We're always looking for interesting traffic anomalies, unique signatures and in general we just find it neat to do technical deep dives on our client sites.

In a header hunting session about a week ago, we found some interesting requests, here is one of them with the incriminating stuff left out, and the other incriminating stuff snipped:

GET /[SNIPPED] HTTP/1.1
Host: [SNIPPED]
Accept: */*
Accept-Charset: *
Accept-Language: en-US
Cache-Control: no-cache
Pragma: no-cache
Referer: [SNIPPED]
User-Agent: Mozilla/5.0 (Linux; U; Android 2.3.5; en-us; ZTE-Z990G Build/GRJ22) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1
Via: 1.1 [PROXY WE DON'T WANT TO SHOW]
x-up-calling-line-id: 1[MOBILE_NUMBER]4510


So this is interesting, a mobile customer through this particular carrier actually has their mobile number disclosed to our service (and every other site they browse to through this proxy). This should sound familiar if you follow mobile tech, as The Register wrote about this exact issue regarding O2 a UK-based carrier that was in trouble for leaking mobile customer's numbers. Read it here.

This is bad. And in this case, we disclosed this to the carrier and their security team has begun an investigation. We'll update when we hear back about the issue. This is not a new issue either, there was a presentation by Collin Mulliner (PDF) that described some of the data leaks that were occurring....four years ago! We however have a new twist on this, read on.

Hey Proxy, I Just Met You and This Is Crazy

All Carly Rae Jepsen references aside, we continued our header hunting session to see what other tidbits we could find. We soon came across some more badness, although less obvious. These were for MetroPCS and Cricket customers in the US:

Cricket Example:


GET / HTTP/1.1
Proxy-Authorization: Basic Q2xpY2tPcHRpY3MgU2F5cyBIZWxsbw==
User-Agent: Cricket-A210/1.0 UP.Browser/6.3.0.7 (GUI) MMP/2.0
Accept-Charset: utf-8, US-ASCII, ISO-8859-1
Accept-Language: en; q=1.0, es; q=0.5
x-wap-profile: "wapuaprof.wap.mycricket.com"
Referer: [SNIPPED]
Accept: application/vnd.oma.dd+xml, application/vnd.phonecom.mmc-xml, application/vnd.wap.wmlc;type=4365, application/vnd.wap.wmlscriptc, application/vnd.wap.xhtml+xml, application/xhtml+xml;profile="http://www.wapforum.org/xhtml", multipart/mixed, multipart/related, text/css, text/html, text/plain, text/vnd.wap.wml;type=4365, audio/mid, audio/midi, audio/x-mid, audio/x-midi, audio/qcelp, audio/vnd.qcelp, audio/mp3, audio/mpeg, audio/pmd, audio/x-pmd, audio/vnd.pmd, application/x-pmd, application/vnd.pmd, audio/evrc, application/vnd.oma.drm.message, image/bmp, image/gif, image/jpeg, image/jpg, image/png, image/vnd.wap.wbmp, image/x-up-wpng, text/css
Host: [SNIPPED]
Cache-Control: no-cache, max-age=43200
Connection: keep-alive

MetroPCS Example:


GET / HTTP/1.1
Proxy-Authorization: Basic Q2xpY2tPcHRpY3MgU2F5cyBIZWxsbyBBZ2FpbiE=
User-Agent: sam-r380 UP.Browser/6.2.3.8 (GUI) MMP/2.0
Accept-Charset: iso-8859-1
Accept-Language: en, es
x-wap-profile: "http://uaprof.metropcs.net/UAProf/sam-r380.xml"
Referer: [SNIPPED]
Accept: application/octet-stream, application/vnd.oma.drm.content, application/vnd.oma.drm.message, application/vnd.oma.drm.rights+wbxml, application/vnd.oma.drm.rights+xml, application/vnd.phonecom.mmc-xml, application/vnd.wap.wmlc;type=4365, application/vnd.wap.wmlscriptc, application/vnd.wap.xhtml+xml, application/xhtml+xml;profile="http://www.wapforum.org/xhtml", image/bmp, image/gif, image/jpeg, image/png, image/vnd.wap.wbmp, image/x-up-wpng, multipart/mixed, multipart/related, text/css, text/html, text/plain, text/vnd.wap.wml;type=4365, application/x-smaf, application/vnd.smaf, audio/mid, application/vnd.wap.co, application/vnd.wap.si, application/vnd.wap.sia, application/vnd.wap.sl, application/vnd.oma.dd+xml, application/vnd.oma.drm.message, application/vnd.oma.drm.rights+xml, image/bmp, image/gif, image/png, image/jpeg, image/vnd.wap.wbmp, image/x-up-wpng, text/vnd.wap.wml, text/plain, text/html, text/css
Host: [SNIPPED]
Cache-Control: max-age=43200
Connection: keep-alive


Now this doesn't look all that bad right? Except if you look a bit closer at the Proxy-Authorization header. This header is designed so that clients (i.e. your mobile phone) can sent their credentials to a proxy, and then have that proxy send the same credentials to the next proxy in the chain. So a fancy ASCII diagram would look like this:

Mobile Phone -> Credentials -> Mobile Proxy -> Credentials -> HTTP Proxy -> ClickOptics.com

Now there is nothing wrong with this, and this is how large networks can segment proxy traffic. Typically however, the next proxy in the chain has to send a Proxy-Authenticate header to request those credentials, and depending on the proxy configuration, and their ACLs (Access Control Lists) the originating proxy will either send or not send these credentials. In our case, we weren't acting as a proxy, nor do we send out a Proxy-Authenticate header, these misconfigured proxies just volunteered the information on their own. So let's break down these headers from the two requests above:

Cricket:

Proxy-Authorization: Basic Q2xpY2tPcHRpY3MgU2F5cyBIZWxsbw==


If we base64 decode the encoded part of the string above we get: "ClickOptics Says Hello". However, if we take the real value from the requests we trapped you get:

[mobilenumber]@mycricket.com:[password]


MetroPCS:

Proxy-Authorization: Basic Q2xpY2tPcHRpY3MgU2F5cyBIZWxsbyBBZ2FpbiE=

Again in this case you'll get: "ClickOptics Says Hello Again!". However, in the real request we get:
[mobilenumber]@aaah.mymetropcs.com:[password]


Implications for Click Fraud


So as always we look at how this applies to click fraud. Now because none of us wanted to break any laws, we didn't actually attempt to change any proxy settings on our phones (and we actually don't use these carriers). But under the assumption that these proxies are simply obeying a username/password style authentication scheme, it could be possible that a fraudster could harvest these credentials. Then by writing a simple Android/iPhone application, they could feed this list of credentials into the app and begin executing click fraud that would appear to be coming from multiple mobile clients.

This wouldn't prevent pure IP tracking to pick this up assuming that the proxy they exit from is the same. But we all know that you need more than just IP tracking to determine a visitor's behaviour. Also, depending on how the carrier tracks user usage, this might make it difficult for the carrier to trace back to who exactly was doing any of this nefarious business.

The other attack vector could be where a fraudster harvests these mobile numbers and stores them. Then by doing some simple math they can determine which AdSense ads would yield a higher return than the cost of sending an SMS. Then they can use a service like Twilio or any other SMS service to send out SMS messages with shortened URLs that appear to be coming from their provider ("You've Won XYZ from Carrier ABC Click Here!"). Because SMS messages have a direct cost to send (as opposed to email SPAM), folks would be pretty likely to click on them. Using a simple redirect, the unwitting mobile customer would be executing the click fraud. The redirection service could also track which mobile numbers were clicking on URLs and then re-market the SPAM to them. The unfortunate thing is that in pay-as-you-go situations, these messages will actually chew into the mobile user's budget.

Considering you can easily determine what type of phone a remote user has, sending out "Upgrade Your XYZ Phone Now" SMS messages would seem to be a pretty good attack vector as well.

Disclosure

Now we did try reaching out to MetroPCS and Cricket, in a variety of ways but to no avail. So we felt it was in the best interest of their customers to know that they need to enable Wifi as often as possible to avoid using these proxies until their carriers have fixed the problem. The third carrier did respond, and is actively investigating, so we'll leave them to it.

Hope you enjoyed our analysis, this is one of our first posts on some real technical behind the scenes stuff, and we hope to do more in the future!

Thursday 12 July 2012

Opinions are Like ..... Everybody's Got One

Just took a peruse of a PPC Hero post here and ran into an older Inc.com posting here. Both are struggling to tackle the problem of traffic quality vs. business quality. Both are great posts that you should read.

What I take away from both is that everyone's opinions of how things should be tracked, and how metrics should be applied are different. Interestingly enough however, if you start to combine a lot of measures that both articles used, you can see that you can build a fairly good picture of how to score your traffic.

That's right, you, the owner. After all, you have an opinion too.

Wednesday 11 July 2012

Measuring the Unmeasurable

In the posting on Forbes last month here, and another post referencing it here we're seeing this continued reference to the Click Fraud Index that Click Forensics (now Adometry) had put out in 2010.  We haven't been able to find a reference to the CFI in some time, which we can only assume means that they either gave up trying to educate the masses, or they realized it's exceedingly difficult to measure click fraud overall. Either way, we're not overly disappointed that the index has disappeared, we're just disappointed to see that it's still being referenced so heavily.

Apples and Oranges - Click Fraud Isn't Globally Comparable


The problem with articles like the one in Forbes, like many others, is that they are stacking up Google against the CFI, even though the CFI is calculated based on 300 different advertising networks and platforms. So if the CFI says that 19% is the number that it is seeing across it's whole advertising platform, this is including networks that could very well have tremendous click fraud rates that can obviously skew the numbers. How do you compare Google's advertising platform to Mom n Pops Big Ol Search Network? You can't. And this is why this comparison just doesn't work.

Google's platform is the largest target, with the most available market penetration for malware authors and clickbot operators. This is no different than vulnerability research that targets Windows more than other operating systems. If you're going to exploit something, you want to go as far and as wide as possible. Microsoft knows this all too well. This is also why Google is going to take the most heat because of it, which is fair when you're king of the search castle. But blanketing Google in with other search providers that can't possibly match the volume of data (and the mining capabilities) isn't fair, and ultimately it's just poor reporting.

Werewolves and Click Fraud


Click fraud doesn't have a silver bullet, and neither does general security of any kind. There never will be. But ultimately it boils down to getting the right data, in the right format, into the right hands (the webmaster/owner). Do some research on intrusion detection systems, you'll see that ultimately they are really good at gathering information, boiling it down, and letting the trained eye make the judgement calls.

I might be a bit biased, but we feel there are a few things click fraud prevention companies can do:


  1. Get REALLY good at one search provider and one advertising platform. What works for one, won't work for another.
  2. Refine the data so that the webmaster/owner can easily spot deviant behaviour, software will never be as good as the human eye.
  3. Double check your numbers. It doesn't make sense to tell a webmaster to go for 1000 refundable clicks when their provider only charged them for 500. It makes you look bad, it makes us all look bad.
  4. Stop publishing stats on a problem that is effectively unmeasurable (unless you have access to the search providers advertising logs, at which point I'll shut up).

Monday 9 July 2012

Head in the Clouds

An interesting post over here describes a problem where click fraud propagators are using the Amazon cloud to perpetrate click fraud on AdWords and AdSense.

I wonder how long it will be before there is a SaaS version of "Defraud Your Competitors", where you can just punch in your domain as a whitelist and a list of keywords you'd like your competitors defrauded in. It would make it quite easy to distribute the load across multiple AWS servers, and obviously the pricing model would be easy to compute as you can figure out how much it costs you in network resources (billed via AWS) to execute the attack and mark it up appropriately.

I'm guessing this exists already, let us know where and we'd love to see how they do it.

Sunday 8 July 2012

The Click Fraud Insider

An interesting story of click fraud insiders is told here and is analyzed here. The original emails sent internally at SheKnows.com are published here.

It seems no surprise to us that this type of thing is happening. It can definitely be viewed as a fox watching the hen house type relationship, but advertisers need qualified traffic and publishers need to pay their bills. Not much getting around that!

So how do you fight this type of attack? We're not totally sure.

You ultimately have to trust the advertising platform that you put your money into. Running millions of ad clicks per month is not an easy dataset to analyze or parse through, and in some cases even deciding what a conversion is for your site can be tricky (think large brands: is a conversion a Facebook Like? Is it a form someone filled out? Real money being spent?). These are not easy problems, and we definitely think it's not something can be totally solved by a pure software solution. The key is to be able to drill down into a subset of your traffic, and look at emerging patterns, not at overall site metrics (like click through rates, etc.).

I hopped onto SheKnows.com and took a peek at their ads (no I didn't click on any). They all appeared to be routed through Google's DoubleClick. Now the interesting bit is whether Google utilizes the same click fraud techniques on their DoubleClick platform as they do AdWords.

If so, I would be interested to see a snapshot of what they tracked for invalid clicks 10 days prior to the internal memos being sent all the way up to the point when it went public and the editors were suspended.

Friday 6 July 2012

Mobile Click Fraud - Not So Fast...

After reading this article, I support most of what he is saying. Take some proactive steps to identify some potential fraudsters and mainly know your website, and know your visitors! This is one thing when dealing with regular users (and by regular I mean one user per IP address in our perfect world), this is fine and dandy, but not so with mobile.

And really this is where lots of click fraud companies, and advertising platforms have failed. Let's take an excerpt from the article:

Massive clicks in a short amount of time from a single IP address: Clickbots or pieces of software designed to generate huge volumes of traffic can sabotage results and can be flagged as potential sources of click fraud. This activity can also be suspicious if large amounts of clicks are originating from a single physical location.


So Many People So Little IP's


We've seen in the past that Google had taken a stand on this type of analysis. It is exceedingly difficult to pin down a user based solely on IP address. Even if they are not behind a router or firewall that performs NAT you can't simply assume that their IP address identifies them uniquely. This is why click fraud analysis is very, very hard to do in a fully automated fashion and exceedingly difficult for analyzing mobile visitors.

All mobile phones pass through a proxy at some point, and you can have hundreds or thousands of phones all coming through the same proxy, alas and the same IP address. So you could easily have a large volume of clicks from one IP in a short time span, which might be an anomaly but really not malicious traffic.

Hold the Phone!


So don't go hitting the AdWord IP Exclusion Tool just yet. Dig into your analytics a bit more, and in a future post we'll show you some quick tests you can do to determine whether you're looking at a mobile proxy or a bad actor, and some strategies to deal with it.

Thursday 5 July 2012

Click Fraud Blog - Here We Go!

Well as always the first post is the toughest. How do you tell a bunch of strangers that what you're going to write in this blank blog is going to be worth sticking around and reading? 


We definitely can't make you any guarantees, but here's what we do plan on talking about:



  • click fraud;
  • tricks that the bad guys are using every day to screw you and Google out of money;
  • suggestions for helping yourself improve traffic quality, by looking at the bad actors in your site visitor sea;
  • our own products (you knew this one was coming)

Another primary reason for starting this blog is because we've noticed that click fraud has kind of fallen off the map. Through 2005 - 2007 you were seeing reports, posts, we had a Click Fraud Index, and a whole host of activity. And then, like a houseful of ghosts at the end of a B-grade movie, it all just disappeared.

But the click fraud problem still remains, so what happened?

We have some theories, and we're going to start sharing some of them. We're also a software startup looking to make a dent in this problem with a unique solution, and hopefully to help the biggest advertising platform on the planet get better at prevention.

We hope you enjoy, and we're looking forward to interacting with y'all.