How Moloco built a service infrastructure that delivers over 1 trillion bids monthly 

Moloco Cloud DSP enables app marketers to scale performance quickly and efficiently through Moloco’s battle-tested prediction models. We combine our machine learning with our customers’ data to build custom models to hit specified goals (CPI, CPA, ROAS). As of January 2022, our product reaches 10B+ devices via 2.6M+ mobile apps in 250+ countries every month.

Running fast-evolving ad-tech services at a global scale posed significant technical challenges to our engineering teams. Since Moloco was founded in 2013, we have been focusing on building a technology stack that is horizontally scalable and highly optimized for both performance and cost. Our system makes over 1 trillion bids monthly, serving over 5M QPS at peak with a less than 100ms processing latency for each bid response we generate. At the same time, we keep reducing the unit cost for processing the bid requests as we scale. (This video interview with Google describes this in more detail.)

In this post, we describe the challenges in building a scalable Demand Side Platform (DSP) service and introduce the designs of our service infrastructure that address them.

Technical challenges

High throughput with low latency:

We process bid requests from multiple Real-Time Bidding (RTB) exchanges for app users all around the world. The volume of the incoming requests has been increasing steadily as the overall mobile ads market grows and our business expands globally. Due to real-time requirements in mobile advertising, the requests come with a stringent response deadline. Our bidding infrastructure systems need to process millions of bid requests per second within 100ms.

Transforming big data into ML models:

We handle millions of bid requests per second and respond to a sizable fraction of those requests by generating our bid responses. Then a fraction of the bid responses lead to ad impressions on mobile apps running on users’ devices, which in return trigger further actions by users (e.g., clicks, installations, purchases, or other types of conversions that come after clicks). This process generates petabytes of new data daily, and we need to carefully ingest, accumulate, and curate all this data to train our ML models while complying with data privacy regulations. We use the prediction results from the ML models to evaluate incoming bid requests, and continuously train the models as we keep receiving more recent data. This means our data pipelines should not have any bottlenecks while producing a very high volume of data in near real-time, and the resulting models should be lightweight and accurate.

Collaborative & interactive tools for complex analysis around big data:

Our data scientists & ML engineers need fast and effective tools to analyze large, complex data. We need efficient and reliable data pipelines to feed insights into them to enable data-driven decisions.

Anatomy of our online service infrastructure

In this section, we describe the main components of our online service infrastructure (shown below) that handles incoming bid requests and how they interact with each other.

Moloco Cloud DSP Infrastructure

When a publisher’s app has an available ad inventory, it announces the inventory slot via ad exchanges, which subsequently convert the inventory information into bid requests and share the requests with multiple DSPs, including Moloco. DSPs are potential buyers for the inventory. Moloco’s bid processor evaluates the request, determines whether to bid, how much to offer, and sends the decision back to the exchange – again, all within 100 milliseconds.

For a given bid request, the bid processor first parses the information on the offered inventory (e.g., compatible sizes of an ad image/video, orientation, etc.) and retrieves proprietary data from multiple feature stores based on the request’s properties and history data. The bid processor runs a quick evaluation of the request and stops processing it early if its value does not meet the bar, which is updated dynamically based on active campaigns, past performance, and remaining budgets. 

With this contextual information, the bid processor invokes our prediction service, which uses TensorFlow as the inference engine for our ML models. The bid processor determines the bidding price of each campaign based on the prediction results, and then runs an internal auction to select a target campaign that would deliver the highest value to the advertiser. The bid processor responds to the ad exchange with the internal auction winner to participate in bidding at the ad exchange level.

If our bid wins, certain types of post-bidding events (such as impressions and clicks) are delivered directly to our event processor service. Other types of post-bidding events (such as installations, in-app purchases, and custom conversion types our advertisers care about) are first collected by Mobile Measurement Partners (MMPs) and then selectively delivered to our event processor service by the MMPs. Our event processors first sanitize and anonymize those events and then use them for generating detailed reports for advertisers about their campaign activities and performances. The events are also securely piped into the training framework for our ML models to keep improving the campaign performance, and this whole cycle repeats.

Advertisers can manage their campaigns via our DSP Cloud, our support platform for DSP customer engagement. Any changes in the campaign settings made through the DSP Cloud will be propagated to the online bidding systems in real-time and reflected in the bidding process as rapidly as possible.

Performance of bidding infrastructure

In this section, we describe our design decisions to satisfy the system requirements for scalability, latency, and reliability.

The bid processor consumes the biggest resources in our online service infrastructure. We made sure that it has a minimal state stored in local machines. This enabled us to aggressively scale in/out the service and to even adopt preemptible VMs which are not generally suitable for continuously running services.

The whole bidding process should be completed within 50-100ms. To address the challenge of a hard deadline, we pre-generate features for apps and users, and store them in a low-latency key-value storage. The fields within the feature stores have different update intervals depending on their freshness requirements. The bid processor looks up the feature stores to build the context for the request, and quickly scopes down to a small number of candidate campaigns that would benefit the most from the specific incoming bid request (i.e., an ad inventory). With the pre-computation of large feature datasets and the cap on the candidate campaigns, we were able to reliably make bidding decisions with a low latency even though the volume of data and our customer base have seen continuous and rapid growth.

We maintain two-layered caches for the pre-generated features. Each bid processor has an in-memory cache for the most active apps and users. The internal representation of the features in the cache is optimized for space to fit many feature entries in the server’s memory. For the second layer cache, we use Redis, which is shared by all bid processor instances running in the same region. The L2 cache significantly reduces the load on the underlying persistent key-value storage while improving latency and start-up time for bid processors. The multi-layered caches also help us reduce costs by reducing expensive cross-zone traffic. Our cache hit ratio during normal operation is over 99.9%.

The bid processors and event processors have different reliability requirements. Bid processors have a relaxed requirement for availability. We implemented an adaptive load-shedding mechanism to protect the servers from transient load spikes. 

We built mechanisms to dynamically throttle requests from suspicious sources or ones that Moloco is unlikely to make bids for. The throttling configuration is tuned automatically by ML models based on our history logs. This intelligent throttling mechanism reduces the load on upstream services by 45%. 

In contrast to bid processors, we have a very strict availability requirement for event processors due to the following: 1) the loss of incoming events could negatively impact the KPIs for our customers, and 2) each post-bidding event is an extremely valuable training example for our ML models. We ensure a much higher availability for our event processors by replicating the incoming events over multiple highly-available storage systems. It ensures that the event logs are recoverable and accessible in the presence of any unpredictable disruptions or transient failures in one storage system. 

Having separate reliability models for different services enables us to make balanced decisions about the service configuration based on cost/benefit analysis. For example, we don’t rely on the auto-scaler for dynamically sizing event processors in normal operation, but we still enable it to automatically scale out when there’s an unexpected surge of incoming events or significant performance regression in the service.

Prediction and Optimization Engines

The core function of DSP is web-scale stochastic optimization, which determines an optimal bid price for a given ad inventory in the presence of uncertainties and constraints. Our ML models provide probabilistic predictions that distill patterns in big data into tendencies for such optimization. They predict various user engagements such as clicks, installations, in-app purchases, as well as environmental contexts such as market price distributions and forward-looking branching predictions. Our system delivers all of the above functionality with accuracy, efficiency, and robustness.

As the image below shows, our ML system fosters continuously evolving models by feeding various data sources and monitoring outputs in real-time. At a high level, it is composed of a feature generator, feature storage systems, training pipelines, monitoring systems, and model servers. The feature generator efficiently generates hundreds of features in both online and offline settings while ensuring consistency. The feature storage systems provide low-latency access to training pipelines while providing online analytical processing to ML engineers and data scientists. The training pipelines continuously update models to adapt to surrounding changes and self-correct, and both the monitoring systems and the serving systems provide layers of safeguards to run the updated models reliably in the production environment.


Our stochastic optimization engine then takes probabilistic predictions of models to perform real-time monitoring and control of bidding decisions, with carefully designed algorithms by our scientists, mathematicians, and economists.

We will share more details of our prediction and optimization engines in upcoming blog articles.

Data processing

The primary purpose of our data pipelines is to provide accurate and consistent pre-generated features of users and apps to our online service infrastructure and the ML feature generator. As of January 2022, our bid processors generate more than 340 billion bid records per day. Each record contains selected fields of a bid request along with the information about our handling of the request – for example, whether to bid or not, which campaign was selected if we decided to bid, the bid price we offered. Our event processor processes more than 2 billion events per day and converts the events into well-structured logs for analysis and reporting purposes.

The data pipelines are carefully designed and implemented to continuously process such a large volume of data and provide up-to-date user and app features to our bid processors in a timely fashion. They should be efficient in minimizing the data processing time and cost. They should also be scalable to adapt to the rapid changes in the volume of bid requests and events. The data pipelines are fully automated to provide consistent data to other system components while fulfilling their latency requirements. Upon transient failures in various parts of its underlying systems, our data pipeline can self-recover quickly and backfill missing data from original data sources without compromising data integrity. 

Scale and cost

Reducing bidding unit cost in Moloco Cloud DSP

It’s expected that our infrastructure spending increases as our business grows and our ML models evolve. However, it’s critical for our business success to keep the unit cost of bidding sufficiently low because doing so enables us to evaluate more bid opportunities and campaigns without increasing the service fees for our customers and hence maximize advertisers’ return on ad spending.

Cost efficiency has always been a high-priority initiative since we started operating our services. We reduced the bidding unit cost (dark blue line) by 43%, while the volume of bidding (blue bars) increased by 754% over the past two years (shown in the chart above). 

In this next section, we describe our approaches to improving the cost efficiency of our systems.

Cost-optimized funneling: 

A bid processor is the system that orchestrates the overall bidding process. It breaks the process into multiple well-defined phases that are sorted by their cost impact. From our analysis, the most expensive function is pricing, which determines an optimal bid price for each bid. Hence, we place it in a later phase, and in between phases, we install filters that reject low-value requests as early as possible.

Resource efficiency:

Our bidding systems are highly optimized for resource efficiency. The servers can fully utilize allocated CPUs and memory (both at 80+% utilization on average), while processing hundreds of requests concurrently per server. Also, we evaluated different machine types available on our underlying cloud provider and selected the most cost-effective machine type for our particular workloads.


Our systems scale automatically to keep the minimum resources for varying workloads. When we started operating our systems at scale, we initially built in-house scaling tools for our feature store systems and service clusters because the tools made available by our underlying cloud provider either lacked essential features or were not performing well for our use cases. We replaced the in-house scalers with cloud-provided tools recently. We configured the scalers to make scaling decisions based on various load metrics, including CPU consumption, the number of outstanding requests, and the available memory and disk spaces.


Based on the success of building and operating industry-leading mobile advertising platforms, Moloco is rapidly expanding its business domain. We aim to provide our powerful machine learning capabilities to adjacent industries that require high-quality match-making services along with industry-specific vertical integration and optimization. Our engineering teams are growing fast to strengthen our core technical competencies in large-scale data processing, machine learning (both in model design and training & inference infra), and data analysis.

Working on difficult challenges and solving complex problems is what we do. Our engineering teams are global and consist of software engineers, machine learning engineers, data scientists, data analysts, operations engineers, mathematicians, economists, and more. If working on machine learning solutions and very-large online-service infrastructure sounds exciting to you, Moloco is a great place to be. Check out our open positions and get in touch.