The videos are:
Welcome: Ryan Mack, Facebook Boston site lead.
@Scale is a series of technical conferences for engineers who build or maintain systems that are designed for scale.
Query Evaluation Using Dynamic Code Generation: Magnus Bjornsson, senior director of engineering, Oracle.
Typical query evaluation in a database uses a static evaluator, which is built to handle all types of queries. For performance reasons, it has become more and more common in recent years to dynamically build the evaluator based on the query itself (using JIT compilation). In this talk, I’ll talk about the approach that we at Oracle/Endeca took in our own columnar, in-memory data store to dynamically generate the query evaluator at query time.
Data Movement for Distributed Execution: Derrick Rice, software engineer, HP Vertica.
Networked data-movement techniques are critical to scalability in distributed computing. As data sets have grown and analytics have increased in complexity, traditional approaches have run into some surprising problems. Exhaustion of ephemeral ports and OS buffers. Deadlock and unexpected performance degradations. Extraordinary overhead costs. Congestion control challenges and firmware corner cases.
This talk will introduce HP Vertica’s data-transmission layer and the challenges encountered in the context of its distributed execution engine. We will share our journey from a naive implementation to a topology-aware data flow. We will also look at what can be learned from other technologies and ask, “What’s next?” Looking forward, operating at scale will continue to reveal problems and require new techniques.
Scaling to Over 1,000,000 Requests per Second: Beth Logan, senior director of optimization, DataXu.
DataXu’s decisioning technology handles more than 1 million ad requests per second. To put this into context, Google Search handles 5,000 to 10,000 transactions per second, and Twitter handles 5,000 to 7,000 transactions per second. Behind this statistic is an incredible architecture that has enabled us to scale. We use a blend of open-source and homegrown tools to place ads, record their impact and learn and deploy our decisioning models automatically, all while running 24×7 in over 30 countries worldwide. In this presentation, we will dive into some of these tools and discuss the challenges we faced and the tradeoffs we made.
Cold Storage at Facebook: Ritesh Kumar, software engineer, Facebook.
Cold storage is an internally used Exabyte-scale archival storage system developed completely in-house at Facebook. We discuss some of the salient design features of the cold-storage stack and how it fits into the specific low-power hardware requirements for cold storage and its unique workload characteristics. We will discuss multiple aspects of the software stack including methods to practically keep storage very durable and highly efficient, and handling realistic operations such as handling incremental cluster growth and tolerating a myriad of hardware failures at scale.
Fractal Tree Indexing in MySQL and MongoDB: Tim Callaghan, vice president of engineering, Tokutek.
As transactional and indexed reporting data sets continue to grow, traditional B-tree indexing struggles to keep up, especially when the working set of data cannot fit in RAM (random-access memory). Fractal tree indexes were purpose built to overcome this limitation, while retaining the read properties we expect for our queries. We’ll start by covering the theoretical differences between the two indexing technologies. We’ll end the talk by discussing the benefits that fractal tree indexes bring to MySQL (TokuDB) and MongoDB (TokuMX). “Benchmarks or it doesn’t count,” so expect to see a few.
Scalable Collaborative Filtering on Top of Apache Giraph: Maja Kabiljo, software engineer, Facebook.
Apache Giraph is a highly performant distributed platform for doing graph and iterative computations. Collaborative filtering is a well-known recommendation technique that is often solved with matrix-factorization based algorithms. This talk will detail our scalable implementation of SGD and ALS methods for collaborative filtering on top of Giraph. We will describe our novel methods for distributing the problem and the related Giraph extensions that allows us to scale to more than 1 billion people and tens of millions of items. We will also review various additions required for handling Facebook’s data (for example, implicit and skewed item data). Finally, to complete our easy-to-use and holistic approach to scalable recommendations at Facebook, we detail our approach to quickly finding top-k recommendations per user.
Making Enterprise Software That’s as Easy to Install as Dropbox: Martin Martin, software architect, Infinio.
There’s a lot of great software that does cool stuff, but when it comes to software that is deeply embedded in your infrastructure, all too often, it’s too much of a project to deploy and try it out. We confronted the problem of intercepting and modifying the data stream between all the virtual machines in a data center and the back-end storage arrays that host their virtual disks. This talk describes how networking works at the TCP and link levels, and how we subvert that to make installation so easy and nondisruptive that you could try out our product over lunch.
H-Store/VoltDB Architecture vs. CEP Systems and Newer Streaming Architectures.
In 2007, researchers at Massachusetts Institute of Technology, Brown University and Yale University set out to build a new kind of relational database called H-Store. Commercially developed as VoltDB, it was suddenly possible to build applications that did millions of transactional operations per second at very low cost and with high fault tolerance. While suitable for micro-payments and other high-volume, traditional transactional work, many early customers built systems for stream processing. As the product evolved, more and more features were added to support streaming, event processing and ingestion workloads, including materialized views, Kafka ingestion and push-to-HDFS data migration. This talk will explain, through customer use-cases and some development back story, how the H-Store/VoltDB architecture compares to CEP systems and newer streaming architectures like Storm and Spark Streaming.
Large graph datasets, like online social networks or the World Wide Web, introduce new challenges to the field of systems design. Their size requires scaling resources horizontally by splitting data and queries across several computation units, but standard sharding and routing schemes that ignore the inherent graph structure of the datasets result in suboptimal performance characteristics. In this talk, we present an efficient distributed algorithm for graph partitioning and the problem of dividing a graph into equally sized components with as few edges connecting these components as possible, and show how its results can be used for optimizing distributed systems serving graph based datasets.
Scaling Cassandra and MySQL: Stefan Piesche, chief technology officer, Constant Contact.
Constant Contact used to scale data vertically in large DB2 databases attached to even larger SANs (storage-area networks). Since this is not only cost-prohibitive but poses significant scalability and availability issues, we have now two primary other data strategies.
Cassandra: We use Cassandra as a horizontally scalable data tier for key/value type data. We have around 350 Cassandra nodes spanning two data centers. That system provides 10 times the performance of the old RDBMS at one-tenth of the cost. This system is our consumer event-tracking system that scales to 100 terabytes of data and 150 billion records that arrive at a velocity of 10,000 per second.
Sharded MySQL: Our largest deploy is a 36-TB system spanning two data centers. But instead of just sharding the DB tier, we even shard the application tier using that system in order to provide complete transparency of the sharding mechanism. Our SOA allows for RESTful access of that data, without any knowledge of the underlying sharding mechanism. However, we have learned that this led to a substantial underutilization of the app tiers — a 96 node cluster of a Ruby on Rails application — so we are looking into proprietary DB-level sharding mechanisms, as well.
The mixture of RDMBS and NOSQL data tiers has caused issues in our analytics platform, a 150-TB Hadoop cluster. We use a similar mechanism to what Netflix does to read data from Cassandra nodes — reading from the SSTables to extract the data.
F4 — Photo Storage at Facebook: Joe Gasperetti, production engineer, and Satadru Pan, software engineer, Facebook.
Vertica Live Aggregate Projection: Nga Tran, software engineer, HP Vertica.
Live aggregate projections are a new type of projection, introduced in HP Vertica 7.1, that contain one or more columns of data that have been aggregated from a table. The data in a LAP are aggregated at load time, thus querying it is several times faster than applying the aggregation directly on the table.
Unlike other databases, which do not support incremental maintenance of non-distributive aggregate functions (e.g. MIN and MAX) for updates and deletes, incremental maintenance is possible in Vertica because the data in LAPs are *partially* aggregated.
This talk focuses on the implementation of Vertica LAPs, for which the current supported aggregate functions are (distributive) SUM, COUNT and (non-distributive) MIN, MAX, and TOP-K. By aggregating data per load on each partition, Vertica allows incremental maintenance on both distributive and non-distributive aggregation while allowing data to be deleted.
Scaling Crashlytics Answers — Real-Time High-Volume Analytics Processing with the lambda architecture: Ed Solovey, staff software engineer, Twitter.
In the 15 seconds that it takes you to read this, Answers will have processed 12 million events in support of its actionable, insightful and real-time mobile analytics dashboards. Learn how it leverages the lambda architecture and probabilistic algorithms to handle this influx of information. We’ll dig into how the two pillars of the lambda architecture — offline batch processing and real-time, stream-compute processing — come together to help us achieve a scalable, fault-tolerant, real-time data processing system.
Data @ Scale Building Scalable Caching Systems with mcrouter: Anton Likhtarov, software engineer, Facebook.
Modern large-scale Web infrastructures rely heavily on distributed caching (e.g., memcached) to process user requests. These caches serve as a temporary holding spot for the most commonly accessed data. However, this makes these services very sensitive to cache performance and reliability. mcrouter is the lynchpin of Facebook’s caching infrastructure. It handles the basics of routing requests to the appropriate hosts and managing the responses in a highly performant way. In addition, there are a lot of features in mcrouter that have been designed to dramatically improve the reliability of the caching infrastructure. The problems that mcrouter addresses are not specific to Facebook, but to distributed caching systems in general. As a result, Instagram and Reddit have also adopted mcrouter as the primary communication layer to their cache tiers. mcrouter is open-source software, and we hope it will be useful in many other applications that rely on caching. This talk gives a very brief overview of mcrouter and the basics of integrating it into different pieces of infrastructure.
Geo-Spatial Features in RocksDB: Igor Chanadi, database engineer, and Yin Wang, research scientist, Facebook.
RocksDB is an embeddable key-value store optimized for fast storage. It is based on log-structured merge-tree (LSM) architecture and is widely used across Facebook’s services. In this talk, we’ll present how we implemented spatial indexing on top of RocksDB’s LSM architecture, which enables us to efficiently store geo-spatial data in RocksDB. We’ll also discuss how we optimized the spatial indexing for bulk-load, read-only and read-mostly workloads. Finally, we’ll talk about how we use the geo-spatial features to build database serving OpenStreetMaps data, which can then be used to render map tiles using Mapnik.
Benefits of Big Data: Handling Operations at Scale: Don O’Neill, VP of engineering, operations and infrastructure, TripAdvisor.
Don O’Neill from TripAdvisor presents big data business lessons learnt from handling operations on a site with more than 280 million unique visitors every month, discussing Hadoop, log shipping and analytics, new operations monitoring and anomaly detection.
Scaling Redis and Memcached at Wayfair: Ben Clark, chief architect, Wayfair.
At Wayfair, we had to take the caching layer for our customer-facing websites from a simple master/slave pair of Memcached nodes in 2012 to a set of consistently hashed clusters of in-memory cache servers and persistent key-value stores, in multiple data centers, in time for the holiday rush of 2013. Building on the work of giants and innovators — particularly Akamai, Last.fm and Twitter — we used composable tools (Memcached, Redis, Ketama, Twemproxy, ZooKeeper) to create a resilient distributed system. It’s big. Well, that’s always relative. Maybe that’s too bold a claim, considering some of the other speakers at this conference. Let’s say it seems big to us, and we’ve been through some explosive growth over the last few years. It’s definitely inexpensive, strong and fast, and Ben Clark will describe our techniques and add-ons, which are available on GitHub, and explain how to do it yourself.
DBD: An Automated Design Tool for Vertica: Vivek Bharathan, software engineer, HP Vertica.
Query performance in any database system is heavily dependent on the organization and structure of the data. The task of automatically generating an optimal design becomes essential when dealing with large datasets. Vertica is a distributed, massively parallel columnar database that physically organizes data into projections. Projections are attribute subsets from one or more tables with tuples sorted by one or more attributes, which are replicated on or distributed across cluster nodes.
The key challenges involved in projection design are picking appropriate column sets, sort orders, cluster data distributions and column encodings. In this talk we shall discuss Vertica’s customizable physical design tool, called the Database Designer (DBD), which is tasked with producing projection designs that are optimized for various scenarios and applications. In particular, we will focus on the challenges that the DBD faces, and its evolution over the years.
Look Back Videos: Goranka Bjedov, capacity engineer, Facebook.
On Feb. 4, Facebook celebrated its 10th anniversary by releasing the A Look-Back videos product. Every Facebook user was given a 62-second video of their most important events over the course of their Facebook presence. If the users didn’t have a lot of activity, a cover page was generated instead. As reported in the press and on Facebook engineering blog, this project was realized in less than one month worth of time.
This talk will focus on discussing technical challenges related to the project, some of which include: compute, network, storage, distribution, projections and modeling. But more important, the talk will focus on which parts of infrastructure enabled successful undertaking on a project of this size. What are the infrastructure pieces that had to be in place to make this happen? What are the necessary parts that enable fast product development on a massive scale, while at the same time keeping the risk to the remainder of the service acceptable? How can you plan and execute a project of this magnitude and how do you mitigate risks?
Thank You: Mack.
@Scale is a series of technical conferences for engineers who build or maintain systems that are designed for scale.