Facebook continues to share technical details on how its recently introduced graph search feature works, with Sriram Sankar, who works on search quality and ranking for the social network, posting a detailed note on the topic on the Facebook Engineering page.
Sankar described how the graph search team set out to deliver the best results from search queries:
Search ranking is the process of choosing the best set of results for a search query. Our team combines data-driven techniques with intuition to create a process that is roughly:
- Come up with an idea for a ranking change — ideas come from a mix of creativity on the part of our ranking engineers and feedback from our users.
- Implement the ranking change, test it, and launch it to a very small fraction of our user base.
- Measure the impact of the ranking change.
Our goal is to maximize searcher happiness, which we do our best to quantify through metrics like click-through rates, NDCG (normalized discounted cumulative gain), engagement, etc. We have to measure the impact of ranking changes on all of these metrics to maintain a proper balance.
Sankar then discussed how Unicorn, the engine behind graph search, deals with the massive amount of data on the social network:
A Unicorn instance (or vertical) may store an index that is too large to fit into the memory of a single machine. So we break up the index into multiple shards such that each shard can fit on a single machine. These machines are called index servers. Queries to Unicorn are sent to a vertical aggregator, which, in turn, broadcasts the queries to all index servers. Each index server retrieves entities from its shard and returns it to the vertical aggregator, which then combines all these entities and returns them to the caller.
Unicorn is fundamentally an in-memory “database” with a query language for retrieval. Before we could use Unicorn for graph search, we had to extend it with search ranking capabilities. Here are some of the things we did:
- Keep different entity types in separate Unicorn verticals.
- Extending the retrieval operations with weak “and” and strong “or.”
- Query rewriting.
- Forward index.
- Result-set scoring.
- Nested queries.
- A/B testing.
The query-suggestions phase, as described by Sankar:
In the query-suggestions phase, the Unicorn top aggregator fans out the search request to all of the verticals and then blends together the results on the way back. The sequence of steps that takes place is:
- The NLP (neuro-linguistic programming) module parses the query and identifies portions to search in Unicorn. Steps two through nine are then performed simultaneously for each of these search requests.
- The top aggregator receives the request and fans it out for each vertical. Steps three through eight are performed simultaneously for each vertical.
- The top aggregator rewrites the query for the vertical. Each vertical has different query-rewriting requirements. Rewritten queries are typically augmented with additional searcher context.
- The top aggregator sends the rewritten query to the vertical — first to the vertical aggregator, which passes it on to each of the index servers.
- Each index server retrieves a specified number of entities from the index.
- Each of these retrieved entities is scored and the top results are returned from the index server to the vertical aggregator.
- The vertical aggregator combines the results from all index servers and sends them back to the top aggregator.
- The top aggregator performs result set scoring on the returned results separately for each vertical.
- The top aggregator runs the blending algorithm to combine the results from each vertical and returns this combined result set to the NLP module.
- Once the results for all of the search requests are back at the NLP module, it constructs all possible parse trees with this information, assigns a score to each parse tree, and shows the top parse trees as suggestions to the searcher.
After offering equally detailed breakdowns of the search phase and query rewriting, Sankar discussed the role of machine learning in making results from graph search useful:
Generally, when people talk about search, they immediately mention machine learning and scoring. Scoring is an important part of the search-ranking pipeline, and it is the part that can be automated using machine learning.
Machine learning (as it applies to search) is the process of automatically coming up with a scoring function using data from lots of previous search queries and the actions taken on them. We build features that become inputs to the scoring function — features are developed using our domain knowledge and quantify important properties of the entities and/or the query. Examples of features are: the distance between a place and the searcher; the closeness of a user’s results from the searcher in terms of friend connections; the amount of overlap of the query string with the entity name; etc. Although machine learning frameworks can use many thousands of features, we have decided to limit the number of features and engineer these features well. We use machine learning frameworks that combine features in a simple manner — as a linear weighted combination of the feature values. This allows us to keep the scoring formula understandable, and we are able to make manual tweaks to the trained formulae as necessary.
The retrieval operations allow for diversity during retrieval using the weak “and” and strong “or” operators. In order to maintain this diversity, we support diversity during scoring by Unicorn offering the ability to plug in multiple scorers for the same search query. This means an entity may have multiple scores assigned to it. We then have the ability to select the top results with respect to each of these scores. For example, we may decide to have three scoring functions for “nearby” search — one that is based on overall popularity, one that is social, and one that gives lots of weight to distance. We can then come up with a policy to have a certain number of results with respect to each function and let the result set ranking do the final arbitration on which results to send up to the caller. We believe that a simple scoring framework along with additional support for diversity results in a more straightforward scoring framework that can be understood, maintained, and debugged most easily.
Readers: Did you have any idea that the process behind graph search was this complex?