A Shameless Plug for Shameless — Engineering Our Schemaless Data Store

From estimating how many small bars of soap guests will take home to figuring how much Romulan Ale to order for in-town Star Trek conventions, the hotel industry faces a host of engineering and logistical challenges. And while they might not be as expansive as the entirety of the Alpha Quadrant, HotelTonight’s engineering challenges are robust enough to warp the work we do on the backend.

One enterprise-sized challenge we recently faced was figuring out how to manage the large volume of hotel rate updates while reading them quickly. For us, the solution was Shameless: an append-only, distributed, schemaless data store we built on top of MySQL.

Here’s how we made it so.

Updating our update system

When people browse HotelTonight room listings, they’re looking at hotel rates: snippets of information on rooms including price, room type, check-in and check-out dates, special discounts, and more. Keeping an updated list is extremely important for us because one of the major advantages we have is that we offer great and easy-to-book last minute deals.

This poses a unique problem for us. To fill all their rooms, hotels are likely to frequently update discounted rates as they approach the last minute. This means there’s a constant stream of new hotel rates for us to sort — and we want to provide our app users with the most recent listings. But we also have to ensure that our rates are readable and not constantly being repopulated on the app.

Our original solution of storing rates in a typical relational SQL table was reaching its limits due to write congestion, migration anxiety, and high maintenance cost. This is what it looked like pre-Shameless:

But as the growth rate of our app started to expand into exponential territory, this approach started to prove unstable. Here’s the size of that old rates table over time:

We needed a new solution. We wanted something that could handle the constant rate updates, so we knew we needed consistent write latency. Additionally, we wanted to enable versioning to make sure any agents adjusting any of the values in the rates (say, adding a discount to a room whose listing was about to expire) wouldn’t be interrupted. We also wanted to avoid having to create migrations whenever we planned on adding more data columns to rates.

A schemaless SQL split

The basic idea for Shameless was to split a regular SQL table into index tables and content tables. Index tables map the fields you want to query by to UUIDs. Content tables map UUIDs to model contents (bodies). In addition, both index and content tables are sharded. For example:

Because the store is append-only, all changes (SQL inserts) are written to the end of the tables, making access to most recent data very handy — and very fast. Because of that, we also get versioning for free and are able to find a snapshot of any rate at any point in time.

The body of the model is schemaless; you can store arbitrary data structures in it. And under the hood, the body is serialized using MessagePack and is stored as a blob in a single database column — hence the need for index tables.

Since there’s no way to query the MessagePack payload, we need to extract the fields we want to query by to an index table. In our case, the index table contains three columns for the fields we want to query by — hotel_id, checkin_date and stay_length — and the uuid column mapping the query fields to a record. Hence, every query to Shameless becomes n+1 queries to the underlying database — one to get n UUIDs of matching records, and n to get the latest versions of each of those records (in the correct shard).

Here’s how the contents of the old table is now spread in Shameless:

Another great feature is that Shameless hides all the complexity of sharding and querying behind a straightforward API.

Shameless use, 101

Here’s how you can use it. The first step add Shameless to your Gemfile:

Next, define your store and assign it to an object (e.g., a constant):

After that, you need to define your models and indices. Your store may have multiple models, each model might have multiple indices:

The first time around, you’ll need to create all the underlying tables for your models and indices. You can do it from a console or a rake task:

Writing a record to a Shameless store is pretty easy. The put method will perform an “upsert,” inserting the first version of a new record — or inserting a new version of an existing record:

To get rates back, use the where method:

These are the basics of Shameless. For more advanced uses (i.e., multiple cells per model, hopping between versions, using Shameless as a log for stream processing similar to Kafka, etc.) check out the README on GitHub.

The pros and the cons of sharding the index tables

Thanks to the distributed architecture of Shameless, we’re able to maintain a healthy rate of around 2,000 writes/sec and around 3,000 reads/sec, with latency of around 5 ms for writes and 2 ms for reads.

But — and as is usually the case with these kinds of projects — we had to make some trade-offs. The biggest compromise was that we could no longer rely on the database to guarantee ACID constraints. We needed to move transaction management and the logic of concurrent writes to the application code. Since the store is distributed, we could no longer perform JOINs or aggregations. But Shameless was well worth it.

Hat tips from HT

We drew some inspiration for Shameless from teams that built similar solutions for a similar type of problems. Here are some most noteworthy mentions: Uber, FriendFeed, and Pinterest.