The role of first-party data in machine learning

October 11, 2022

All machine learning (ML) needs data in order to learn — the more the better. But, for a machine learning model to perform well, more than sheer quantity of data, it needs the highest quality data possible.

It has always been key to train and continually feed ML models relevant first-party data. But, as the conditions for acquiring user data have evolved – from regulatory requirements to platform policies to public trust – and the use cases have transitioned – from driving brand awareness to delivering measurable, monetizable performance outcomes – a mobile app user acquisition strategy focused on first-party data has gone from important to absolutely essential.

Note: This is the first in a series around machine learning, data and mobile performance advertising. Stay tuned for more installments in the coming months.

The role of data in mobile advertising

Marketers have long relied on a wide variety of information to better understand, reach, and engage their audiences through different channels. One key source has been first-party data, which AdExchanger defines as “data that is collected by a single entity via its direct relationship with the end user.” For any user behavior within an app – from actions like logins and clicks to engagements like ad views to conversions like subscriptions and payments – each of that user’s interactions becomes a piece of first-party data for that app.

In the past, this first-party user information was often supplemented with third-party data, which Epsilon defines as data acquired “from an outside source that is not the original collector of said data.” In the mobile ecosystem, one of the most prevalent (and powerful) sources of third-party data has historically been the advertising ID assigned at the device level by the mobile operating system: the Android ID (ADID) and the iOS Identifier for Advertisers (IDFA).

First- and third-party are not the only sources of data. There’s also second-party data, which is data acquired from a partner, along with certain levels of explicit user consent. Increasingly, there’s also discussion of zero-party data, which is information voluntarily shared by a user, but independent of app behaviors – think surveys and quizzes, user settings, and form submissions. Through these various sources of data, advertisers could gain a granular understanding of their audiences and build robust targeting strategies to acquire new users.

The role of data in machine learning

Before we dive into the specific importance of first-party data today, let’s take a step back and highlight the importance of data overall for any ML model. No matter the goal of any ML algorithm (i.e., what it’s designed to do), quality data underpins its performance.

As the old saying goes, “garbage in, garbage out.” And this is especially true with machine learning. Without quality inputs, no ML model can reasonably expect to produce quality outcomes. Data that’s outside the scope or mislabeled – garbage in – can lead to incorrect outcomes — garbage out.

Data quality isn’t the only determining factor – ML models also require enough data that’s unbiased, varied, and contextually relevant – but data quality is still key.

For example, if we’re building a model to identify cats, datasets that include images of other common animals, such as dogs and lizards, would be necessary for the machine to learn the not-cat identifiers. But datasets that include artists’ renderings of prehistoric cats, or CAT scan data, may mislead the model because it lacks relevant context.

The growing importance of first-party data

In the past, mobile performance advertisers had a wide variety of sources of quality data to use, including both first-party and third-party options. But today, the number and quality of third-party data sources is diminishing.

Between policy changes like GDPR in Europe and CCPA in the US and platform-level changes like Apple’s AppTrackingTransparency and Google’s Privacy Sandbox, quality third-party data that can be used to inform mobile performance marketing is in short supply. This is a major change for the mobile user acquisition space, as many demand-side platforms (DSPs) were heavily reliant on these third-party signals to inform their audience segmentation and targeting methods.

There is also a major downside to relying on third-party data sources to run mobile advertising campaigns. Not only is it often erroneous or irrelevant, especially when not blended with first-party data, but its continued use comes with legal and reputational risks.

In August 2022, beauty brand Sephora reached a settlement with the California Attorney General’s Office for $1.2 million after its e-commerce app allegedly ran afoul of CCPA data privacy requirements. And one month later, Instagram was issued a €405 million fine for allegedly violating the GDPR.

Even if an advertiser manages to find quality third-party data and avoid a hefty fine, it may still find that using it for advertising purposes angers or alienates its end users. After all, according to KPMG, 86% of survey respondents say data privacy is a growing concern for them while 40% do not trust companies to ethically use their data.

Moloco’s approach to first-party data

Moloco has always prioritized using first-party data in a privacy-safe manner to inform our ML models. By using the advertiser’s own, unique first-party data, which is very specific to the particular app, we can shorten the time to train the models and apply a zero-cost approach to learning prior to even launching a campaign, enabling the marketer to achieve the first return on investment in days or weeks. All that saved budget from what would otherwise be a lengthy and expensive training period can then be applied to scaling campaigns and driving greater performance that yields monetization.

Leveraging an app's first-party data creates many advantages for performance marketers. For instance, it makes campaigns more relevant and targeted on a bid-by-bid basis.

While emphasizing first-party data has advantages, there often is not a lot of first-party data volume, especially early in a campaign. And while training on high volumes of contextual data from a stable source will also avoid the garbage-in problem, it can easily succumb to bias when new externalities, like new regulations or new data structures, enter the picture.

Like the biggest ad platforms, Moloco’s machine learning engine is powered by deep neural networks that optimize immediately, iteratively, and in real time. We use deep learning technology to translate the marketer’s raw first-party data into artificial-like languages within our system. This allows us to scale limited first-party data to a sizable training dataset (i.e., generalization) that can teach the models how to deliver strong performance to achieve truly unparalleled returns that few others can approach because of these data limitations.

Furthermore, we begin training our models using your raw first-party data even before the campaign launches. Our models learn the inputs that result in a desired output prior to purchasing any impressions on your behalf. This contrasts with traditional DSPs, which wait until they receive enough positive samples to train their models — a process that can take months or quarters to deliver return on ad spend (ROAS).

And, while privacy regulations are upending the advertising techniques marketers have relied on to grow their user base, prompt users to take actions, and re-engage dormant users, Moloco provides a better alternative to third-party data that is more palatable to end users. Our inference models assess bid requests in order to determine if the user is likely to meet an advertiser’s campaign goals and calculates a price to bid that works within the set budget. We don’t need IDFA data to do this assessment. And, in anticipation of continued changes to iOS policies, Moloco already provides specific options for SKAN traffic.

To learn more about Moloco’s ML models, why they’re unique in the industry, and how they help performance marketers see success, be sure to grab your complimentary copy of our machine learning primer. Download our guide and learn how utilizing a DSP with machine learning, combined with deep neural networks, can boost your performance marketing results.