Alert about unprocessed crash reports


At TPA we have been spending some time lately improving the way we handle crash reports. Specifically how we group similar reports: Whenever a crash report is uploaded to our backend, we will try to figure out if we have already seen a similar crash, so that we can group individual crash reports in to issues.

The end result should be that instead of being shown thousands of individual reports, the developer is presented with tens of issues, which hopefully comprise the actual underlying bugs in the app.

At the same time we want to point the developer as close as possible to the source code line causing the crash.

Three reasons prompted us to dive in to this topic again:

  • We wanted to be better at grouping reports from different versions of the same app.
  • We noticed that some types of crashes would generate several essentially similar issues.
  • Based on the growth of TPA we have collected in excess of one million crash reports from various iOS and Android apps, which gave us a perfect starting point for revisiting our algorithms.

Grouping is not an exact science

A single version of an app might provide us with tens of thousands of crash reports. This makes it futile for a grouping algorithm to compare a new report to every single other report we have received. Therefore we calculate a hash of the stack trace1, which will be the grouping key.

Up until now the hash weighed equality over similarity. In some situations this would result in crashes with the same underlying cause being grouped separately due to differences in OS versions, or minor code changes unrelated to the issue.

To improve this, we need to carefully weigh how similar stack traces must be to be regarded as similar enough.

For every decision we make in our algorithms, it is possible to come up with a set of crash reports that will not be grouped ideally. Therefore we took a random sample of 100k+ reports and ran them through a selection of algorithms. We picked the algorithms that provided us with the best trade-off between the number of groups and the suspected cause of the crash.

The Result

The end result is what we believe to be some nice improvements.

In our new algorithms we are putting more weight on similarity: Just because the line causing the crash has moved a bit, should not result in a whole new issue. And neither should the device’s OS version.

iOS/tvOS: We need your help

Our new iOS/tvOS algorithms require us to be able to do a complete symbolication of the parts of the crashed thread related to the app’s code. Therefore we need you to upload files to us2.

In case we miss any .dSYM’s, we will notify you with the yellow button you see above, and will provide you with a list of the UUID’s we require. We will also provide you with the raw crash reports in case you either mislaid the .dSYM’s or you need some help finding them.

When you have uploaded the required .dSYM’s, we will process the crash reports. In case you do not upload .dSYM’s within 30 days, we will delete the unprocessed crash reports.

Android: We may need your help

We will process and group any Android crash report, but you may want to upload any ProGuard mapping files to get the best possible starting point for tracking down bugs.

  1. We also use additional information from the report, e.g. the exception being thrown. 

  2. Fastlane and The Fastlane TPA Plugin are great at making sure that all needed artifacts are available to our backend.