Monitoring Jank: How we found the slowest parts of our UI

Published in

Lever Engineering

6 min readMay 16, 2017

This year we’ve had to revisit some decisions made when Lever had fewer — and smaller — customers. When it comes to the performance of Lever Hire, our applicant tracking system, strategies that work for small companies with one office do not work so well for larger organizations with many offices.

For example, we have a posting-picker component that customers can use to assign a job posting to a candidate, to filter the candidates list by job posting, and to see reports for a specific job posting. For companies with fewer than 50 job postings, downloading and rendering all postings in this picker is fast and simple. Also, keeping all of the account’s posting data on the page means we do not need to perform extra requests when the posting picker opens. For these customers, using the posting-picker is fast and easy.

Jank Buster seems like a good candidate for the “Eng” team

On the flip side, for companies with 1,000+ postings, the browser’s attempt to render all 1,000+ postings in the dropdown causes the interface to stutter. This stutter, the extra time the main thread spends processing while the user waits, is called jank.

At Lever, we want to offer a jank-free application, but we didn’t know how much jank was out there, nor did we know what was causing it. While we were already collecting data from the browser performance timing API to monitor initial page load performance, we did not have any visibility into how our app performed once it was loaded in the browser.

Measure twice

The first task of any performance work is to measure current performance in order to establish benchmarks to work against. To reduce jank, we needed a way to measure it.

© 1988–2017 Brimar Industries, Inc. All rights reserved.

Frames, explained

When web browsers need to draw updates to a page, like for scrolling or animations, they typically draw up to 60 times a second. Each drawing pass is called a frame. To have completely smooth animations and scrolling behavior, the browser can spend no more than 1000ms/60 (or 16.67ms) on each frame.

Jank, quantified

So, academically, we know that browsers render 60 frames per second. The question was: how can we write javascript code to measure frames that take too long? In modern browsers, it’s best to use CSS animations where possible. However, achieving some animation effects may require custom code. That’s where requestAnimationFrame comes in; it lets one schedule work for the next time the browser is ready to draw a frame. Practically, we discovered that the requestAnimationFrame callback is indeed called 60 times per second under normal conditions.

Consequently, if one measures how long it takes for arequestAnimationFrame callback to be invoked, one can quantify how much jank occurred using the difference between the time measured and the expected time under smooth conditions (which is approximately 16.67ms). If a requestAnimationFrame callback is invoked after 116.67ms, then the jankAmount for that frame is approximately 100ms.

Event, captured

Monitoring every single rendered frame for all users is a ridiculous proposition. The monitoring itself would cause performance issues, and the amount of data it would generate is not feasible for us to store. We determined that jank only really matters when the user is actively waiting for the browser to render. So, our strategy for monitoring is to capture the jankAmount for the frame immediately following a click or a keydown event.

To ensure that we start the jank measurement before any event handlers start doing their work, we use event capturing for the click and keydown event listeners we add to the window object.

Copyright © 2016 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.

Finally, in order to make the tracking more valuable, we generate a query selector string that can be used to identify the element that received the event by traversing up the DOM (Document Object Model) tree. We set a reasonable traversal limit to ensure our performance tracking code doesn’t add too much load. With this query selector, we can identify the specific interaction that produced the jank.

Fine-tuned jank monitoring

The first version of our jank tracking code revealed some issues with our naive measurement solution. While most of the events we received made sense, there were quite a few outliers that indicated a need to tune our monitoring strategy.

Not all jank

Our initial lower bound for reporting jank was too low, so we received a lot of monitoring events that were not very valuable. We found a sweet spot at setting a threshold of 40msof jank before we would track it in order to increase the signal on parts of the application that are slow.

Accounting for page transitions

Lever Hire is a Single Page Application. We found that the biggest outliers were clicks that would trigger page transitions. In these cases, some jank is expected as the new data, markup, and styles are applied to the brand new page. Knowing this, we now track the window’s location before and after we set up the jank tracking callback. This extra data helps us understand the reason for jank events, and allows us to filter out page transitions when measuring in-page rendering performance.

Classic browsers

Shims, libraries that bring a new functionality to older environments, are common in web applications that support multiple browsers. To simplify our code, we shim window.requestAnimationFrame on browsers that do not support it natively. Since the shim is not connected to the browser’s actual rendering pipeline, we also track whether or not we are using the native version of window.requestAnimationFrame. While the jank measured using the shim still contains some signal on rendering behavior, the events are also more likely to be outliers due to the relative priority of the different types of callbacks in the event loop.

Show me the code already

Putting it all together we have jank monitoring code that looks kind of like this:

The actual code in production is a bit more complex (more try/catch action and Lever specific stuff) but this is more or less how we monitor jank.

Cut once

Once we had the data flowing, we were able to identify particularly slow parts of our application using the query selector on the tracked events. After putting the tracking in place, we released some updates we believed would improve rendering performance for some of our jankiest components. And, according to our jank monitoring in our ELK stack, we were quite successful!

Pic(k)s or it didn’t happen

The posting picker mentioned earlier was indeed one of our jankiest components. The posting picker is a dropdown connected to a text box that allows the user to quickly filter and select the desired posting. In some cases, each keystroke could lead to a frustrating experience as the browser struggled to render the hundreds of postings that have the letter “S” in them.

New hotness on the left, old and busted on the right.

To improve this experience we did something very simple: limit the number of postings rendered in the dropdown list. By limiting the number of results we render, the UI can respond more quickly to each subsequent keystroke and the user’s flow is not disrupted. There are some drawbacks to this change, especially for users who are not used to typing in the field to select a posting, but for users at companies with hundreds of job postings the experience is drastically improved.

Save time, increase delight

Our jank monitoring allows me to confidently say that this work has saved our customers’ time. Since most of our users are at work, this time translates pretty directly to saving them money. Also, beyond nickels and dimes, we know anecdotally that jank contributes to user frustration; so less jank == happier users!

While we made progress in improving performance, there is still plenty of jank left to reduce. If that sounds like an exciting challenge to you, please check out our jobs page!