M(L)yth Busters: 6 myths of machine learning and fraud prevention debunked
1. You need to first define fraud to stop fraud
The typical approach to combating fraud is to look at all the different ways fraudsters operate and find some indicator based on their objectives. But every time one fraud tactic is identified, the fraudster evolves its tactics to evade detection. Increasingly, fraud resembles valid traffic.
Every time the fraudster finds a new vulnerability, valuable time is lost in defining the tactic and finding the rule to stop it. Using a strategic combination of machine learning techniques and models, TrafficGuard can detect new tactics much faster and more reliably than human analysis can. Time isn’t wasted developing rules which means that budgets are protected.
2. Machine learning doesn’t perform well in new or unfamiliar scenarios, and machine learning can be polluted
The ability of machine learning to encode features of traffic so intricately enables it to far outperform human analysis in speed and accuracy in unfamiliar scenarios. That doesn’t mean it completely replaces human analysis. Humans play a vital role in the success of machine learning by closing the loop with constant checks and validation against the machine learning outcomes.
The intricate analysis of data through feature engineering is actually what strengthens its defence against pollution. The strongest machine learning incorporates multiplicity. That’s multiple machine learning models and techniques, applied to a variety of data sets and a variety of feature engineering techniques.
Monitoring a greater number of signals, machine learning stands much firmer against pollution than human analysis on its own.
3. Machine learning leads to a black box mentality because of the complexity of invalidation
The importance of transparency is two-fold. Firstly, it helps an advertiser communicate to their traffic source why their traffic has been invalidated. Secondly, it holds verification companies accountable for their fraud diagnoses.
The delicate balance is in providing the desired transparency, while also ensuring that fraudsters cannot reverse engineer fraud detection. This is a factor of every anti-fraud tool, not just those that leverage machine learning.
Transparency can be achieved by the use of standardised definitions – a great example of which is the MRC’s Invalid Traffic Standards. When used in fraud detection and prevention, these terms communicate the fraud diagnosis without revealing the mechanics of the detection.
Sharing intelligence upstream with traffic sources can also facilitate this process, especially if that intelligence is provided by an independent and unbiased verification solution. This helps traffic sources optimise fraud out of their supply, and strengthens the performance of the whole supply chain.
In lieu of unbiased verification, it is common for some advertisers to implement custom validation rules that are often needlessly harsh, resulting in the blocking of high volumes of legitimate traffic, as well as some invalid traffic. With 3rd party verification, these rules are not necessary, ensuring traffic mitigated is only fraud.
4. Machine learning is new
Did you know the term machine learning dates back to the 1950s? So how is it that a topic older than the compact cassette tape is one of the hottest topics in technology today?
Access to powerful and scalable infrastructure via affordable subscription models has made Machine Learning feasible for a broader array of applications. This, accompanied by the growing pool of talent and expertise in the field has been the catalyst for Machine Learning’s recent, more widespread adoption.
5. Machine Learning is a magical algorithm
Here at TrafficGuard, we are a magic-free zone! For us, Machine Learning isn’t “cracking the code” or “finding THE magic algorithm” but combinations of Machine Learning based models and techniques that are trained and verified by our dedicated team of data scientists.
Contrary to popular belief, Machine Learning isn’t self-sufficient and presenting it like that would be a huge disservice to our experienced, talented and coincidentally awesome team of data scientists and analysts. They are in charge of correctly collecting, preparing and storing the data; defining features, identifying the appropriate models and training them; and continuously verifying effectiveness.
6. The more complex the algorithm the better
Machine learning doesn’t need to be a deep network of neurons. Simple models can outperform complex ones if they are well-engineered. The most important aspect of machine learning is not the algorithms but the data. Data quality maintenance and data enrichment (methods to infer greater intelligence from the data you collect) have a much stronger influence on a successful outcome than the complexity of the algorithm.