diff --git a/draft/use-cases/serving-ads.txt b/draft/use-cases/serving-ads.txt index 71327a03617..48fb26a12f2 100644 --- a/draft/use-cases/serving-ads.txt +++ b/draft/use-cases/serving-ads.txt @@ -75,6 +75,8 @@ The schema for storing available ads consists of a single collection, ... ] } +.. kind of confusing to call this collection "ad.zone", esp since there is only 1 and zone is just property + For each (``site``, ``zone``) combination you'll store a list of ads, sorted by their ``ecpm`` values. @@ -102,6 +104,10 @@ maximize the ad network's profits. This query resembles the following: ecpm, ad_group = ecpm_groups.next() return choice(list(ad_group)) +.. should be noted that here the list of ads get sorted client side for every call. + also full list gets pulled every time. + In case that the list is long, would be better to normalize, then use index on {site, zone, ecpm} + Indexing ```````` @@ -225,6 +231,11 @@ This schema: timestamp. This facilitates rapid lookups of a stream of a particular type of event. +.. probably should warn that storing impressions and clicks within user document can backfire. + Could grow very large, e.g. if it's a bot, and past 1000 will start slowing down db. + A more robust solution would keep each impression with its possible click in a separate document. + Could be in a capped collection if impressions do not need to be kept for long. + Choosing an Ad to Serve ~~~~~~~~~~~~~~~~~~~~~~~ @@ -411,3 +422,7 @@ customize the value of some keywords, but this is beyond the scope of this document. Because the ad service must sort all ads at display time, you may find performance issues if you if there are a large number of ads competing for the same display slot. + +.. here the processing will be done client side for every request. + This can become very expensive if a page gets hit a lot and has many possible ads. + Again here if normalize each ad, can use index on { site, zone, keywords }