Home Crashlytics + Big Query: Creating a More Immediate and Convenient Crash Tracking Tool
Post
Cancel

Crashlytics + Big Query: Creating a More Immediate and Convenient Crash Tracking Tool

Crashlytics + Big Query: Creating a More Immediate and Convenient Crash Tracking Tool

Integrating Crashlytics and Big Query to automatically forward crash records to a Slack Channel

Results

Pinkoi iOS Team Real Photo

Pinkoi iOS Team Real Photo

First, let’s look at the results. We query Crashlytics crash records regularly every week; filter out the top 10 issues with the most crashes; and send the information to a Slack Channel, making it convenient for all iOS teammates to quickly understand the current stability.

Problem

For app developers, the Crash-Free Rate can be said to be the most important metric; the data represents the proportion of app users who did not encounter crashes. I think every app would hope its Crash-Free Rate ~= 99.9%; but the reality is that it’s impossible. As long as there is code, there can be bugs, not to mention some crashes are caused by underlying issues (Apple) or third-party SDKs. Additionally, the DAU (Daily Active Users) volume can also impact the Crash-Free Rate. The higher the DAU, the more likely it is to encounter many sporadic crash issues.

Since a 100% crash-free app does not exist, tracking and handling crashes becomes very important. Besides the most common Google Firebase Crashlytics (formerly Fabric), there are other options like Bugsnag and Bugfender. I haven’t compared these tools personally, so interested friends can research on their own. If you use other tools, the content introduced in this article won’t be applicable.

Crashlytics

The benefits of choosing Crashlytics are:

  • Stability, backed by Google
  • Free, easy, and quick to install
  • Besides crashes, it can also log error events (e.g., Decode Error)
  • One Firebase suite can handle everything: other services include Google Analytics, Realtime Database, Remote Config, Authentication, Cloud Messaging, Cloud Storage…

Side note: It is not recommended to build a formal service entirely on Firebase, as the charges can become very expensive once the traffic increases… it’s a trap.

Crashlytics also has many drawbacks:

  • Crashlytics does not provide an API to query crash data
  • Crashlytics only stores crash records for the last 90 days
  • Crashlytics’ Integrations support and flexibility are extremely poor

The most painful part is the poor support and flexibility of Integrations, coupled with the lack of an API to write scripts to connect crash data. This means you have to manually check Crashlytics for crash records from time to time to track crash issues.

Crashlytics only supports the following Integrations:

  1. [Email Notification] — Trending stability issues (crash issues encountered by more and more people)
  2. [Slack, Email Notification] — New Fatal Issue (crash issue)
  3. [Slack, Email Notification] — New Non-Fatal Issue (non-crash issue)
  4. [Slack, Email Notification] — Velocity Alert (crash issues that suddenly increase in number)
  5. [Slack, Email Notification] — Regression Alert (issues that were solved but reappeared)
  6. Crashlytics to Jira issue

The content and rules of the above Integrations cannot be customized.

Initially, we directly used 2. New Fatal Issue to Slack or Email, and for Email, we used Google Apps Script to trigger subsequent processing scripts; however, this notification would bombard the notification channel crazily, because it would notify for any issue, big or small, or even sporadic crashes caused by user devices or iOS itself. As DAU increased, we were bombarded by these notifications every day, and only about 10% of them were truly valuable, related to our program errors, and encountered by many users.

As a result, it did not solve the problem of Crashlytics being difficult to track automatically, and we still had to spend a lot of time reviewing whether the issue was important.

Crashlytics + Big Query

After searching around, we only found this method, and the official also only provides this method; this is the trap under the free candy coating. I guess neither Crashlytics nor Analytics Event will or plan to launch an API for users to query data via API; because the only official suggestion is to import the data into Big Query for use, and Big Query charges for storage and queries beyond the free quota.

Storage: The first 10 GB per month is free.

Query: The first 1 TB per month is free. (The query quota means how much data is processed when you run a Select query)

For details, refer to Big Query pricing.

The setup details for Crashlytics to Big Query can be found in the official documentation, which requires enabling GCP services, binding a credit card, etc.

Start Using Big Query to Query Crashlytics Log

After setting up the Crashlytics Log to Big Query import cycle and completing the first import with data, we can start querying the data.

First, go to Firebase Project -> Crashlytics -> Click the “•••” in the top right corner of the list -> Click “BigQuery dataset”.

After going to GCP -> Big Query, you can select “firebase_crashlytics” in the left “Explorer” -> select your Table name -> “Detail” -> You can view the Table information on the right, including the latest modification time, used capacity, storage period, etc.

Make sure there is imported data available for querying.

You can switch to the “SCHEMA” tab at the top to view the Table’s column information or refer to the official documentation.

Click the “Query” button in the top right to open an interface with an assisted SQL Builder (if you are not familiar with SQL, it is recommended to use this):

Or directly click “COMPOSE NEW QUERY” to open a blank Query Editor:

Regardless of the method, it is the same text editor; after entering the SQL, you can automatically complete the SQL syntax check and estimate the query quota cost in the top right (This query will process XXX when run.):

After confirming the query, click “RUN” in the top left to execute the query, and the results will be displayed in the Query results section below.

⚠️ Pressing “RUN” to execute the query will accumulate the query quota and incur charges; so please be careful not to run queries recklessly.

If you are unfamiliar with SQL, you can first understand the basic usage and then refer to the Crashlytics official examples for modification:

1. Count the number of crashes per day for the past 30 days:

1
2
3
4
5
6
7
8
9
10
SELECT
  COUNT(DISTINCT event_id) AS number_of_crashes,
  FORMAT_TIMESTAMP("%F", event_timestamp) AS date_of_crashes
FROM
 `yourProjectID.firebase_crashlytics.yourTableName`
GROUP BY
  date_of_crashes
ORDER BY
  date_of_crashes DESC
LIMIT 30;

2. Query the top 10 most frequent crashes in the past 7 days:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
SELECT
  DISTINCT issue_id,
  COUNT(DISTINCT event_id) AS number_of_crashes,
  COUNT(DISTINCT installation_uuid) AS number_of_impacted_user,
  blame_frame.file,
  blame_frame.line
FROM
  `yourProjectID.firebase_crashlytics.yourTableName`
WHERE
  event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(),INTERVAL 168 HOUR)
  AND event_timestamp < CURRENT_TIMESTAMP()
GROUP BY
  issue_id,
  blame_frame.file,
  blame_frame.line
ORDER BY
  number_of_crashes DESC
LIMIT 10;

However, the data retrieved using this official example is sorted differently from what you see in Crashlytics. This is likely because it groups by blame_frame.file (nullable) and blame_frame.line (nullable).

3. Query the top 10 devices with the most crashes in the past 7 days:

1
2
3
4
5
6
7
8
9
10
11
12
13
SELECT
  device.model,
COUNT(DISTINCT event_id) AS number_of_crashes
FROM
  `yourProjectID.firebase_crashlytics.yourTableName`
WHERE
  event_timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 168 HOUR)
  AND event_timestamp < CURRENT_TIMESTAMP()
GROUP BY
  device.model
ORDER BY
  number_of_crashes DESC
LIMIT 10;

For more examples, please refer to the official documentation.

If your SQL query returns no data, first ensure that the Crashlytics data for the specified conditions has been imported into Big Query (for example, the default SQL example queries the crash records of the day, but the data might not have been synchronized yet, so no results are found); if there is data, then check whether the filter conditions are correct.

Top 10 Crashlytics Issue Big Query SQL

Here, we modify the official example from point 2. We want the results to match the crash issues and sorting data we see on the first page of Crashlytics.

Top 10 crash issues in the past 7 days:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
SELECT 
  DISTINCT issue_id, 
  issue_title, 
  issue_subtitle, 
  COUNT(DISTINCT event_id) AS number_of_crashes, 
  COUNT(DISTINCT installation_uuid) AS number_of_impacted_user 
FROM 
  `yourProjectID.firebase_crashlytics.yourTableName`
WHERE 
  is_fatal = true 
  AND event_timestamp >= TIMESTAMP_SUB(
    CURRENT_TIMESTAMP(), 
    INTERVAL 7 DAY
  ) 
GROUP BY 
  issue_id, 
  issue_title, 
  issue_subtitle 
ORDER BY 
  number_of_crashes DESC 
LIMIT 
  10;

Comparison of Crashlytics’ Top 10 crash issues results, matched ✅.

Use Google Apps Script to regularly query & forward to Slack

Go to Google Apps Script homepage -> Log in with the same account as Big Query -> Click “New Project” in the upper left corner, and you can rename the project after opening a new project.

First, let’s complete the integration with Big Query to get the query data:

Refer to the official documentation example, and bring in the above Query SQL.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
function queryiOSTop10Crashes() {
  var request = {
    query: 'SELECT DISTINCT issue_id, issue_title, issue_subtitle, COUNT(DISTINCT event_id) AS number_of_crashes, COUNT(DISTINCT installation_uuid) AS number_of_impacted_user FROM `firebase_crashlytics.YourTableName` WHERE is_fatal = true AND event_timestamp >= TIMESTAMP_SUB( CURRENT_TIMESTAMP(), INTERVAL 7 DAY ) GROUP BY issue_id, issue_title, issue_subtitle ORDER BY number_of_crashes DESC LIMIT 10;',
    useLegacySql: false
  };
  var queryResults = BigQuery.Jobs.query(request, 'YourProjectID');
  var jobId = queryResults.jobReference.jobId;

  // Check on status of the Query Job.
  var sleepTimeMs = 500;
  while (!queryResults.jobComplete) {
    Utilities.sleep(sleepTimeMs);
    sleepTimeMs *= 2;
    queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId);
  }

  // Get all the rows of results.
  var rows = queryResults.rows;
  while (queryResults.pageToken) {
    queryResults = BigQuery.Jobs.getQueryResults(projectId, jobId, {
      pageToken: queryResults.pageToken
    });
    Logger.log(queryResults.rows);
    rows = rows.concat(queryResults.rows);
  }

  var data = new Array(rows.length);
  for (var i = 0; i < rows.length; i++) {
    var cols = rows[i].f;
    data[i] = new Array(cols.length);
    for (var j = 0; j < cols.length; j++) {
      data[i][j] = cols[j].v;
    }
  }

  return data
}

query: The parameters can be arbitrarily replaced with the written Query SQL.

The structure of the returned object is as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
[
  [
    "67583e77da3b9b9d3bd8feffeb13c8d0",
    "<compiler-generated> line 2147483647",
    "specialized @nonobjc NSAttributedString.init(data:options:documentAttributes:)",
    "417",
    "355"
  ],
  [
    "a590d76bc71fd2f88132845af5455c12",
    "libnetwork.dylib",
    "nw_endpoint_flow_copy_path",
    "259",
    "207"
  ],
  [
    "d7c3b750c3e5587c91119c72f9f6514d",
    "libnetwork.dylib",
    "nw_endpoint_flow_copy_path",
    "138",
    "118"
  ],
  [
    "5bab14b8f8b88c296354cd2e",
    "CoreFoundation",
    "-[NSCache init]",
    "131",
    "117"
  ],
  [
    "c6ce52f4771294f9abaefe5c596b3433",
    "XXX.m line 975",
    "-[XXXX scrollToMessageBottom]",
    "85",
    "57"
  ],
  [
    "712765cb58d97d253ec9cc3f4b579fe1",
    "<compiler-generated> line 2147483647",
    "XXXXX.heightForRow(at:tableViewWidth:)",
    "67",
    "66"
  ],
  [
    "3ccd93daaefe80f024cc8a7d0dc20f76",
    "<compiler-generated> line 2147483647",
    "XXXX.tableView(_:cellForRowAt:)",
    "59",
    "59"
  ],
  [
    "f31a6d464301980a41367b8d14f880a3",
    "XXXX.m line 46",
    "-[XXXX XXX:XXXX:]",
    "50",
    "41"
  ],
  [
    "c149e1dfccecff848d551b501caf41cc",
    "XXXX.m line 554",
    "-[XXXX tableView:didSelectRowAtIndexPath:]",
    "48",
    "47"
  ],
  [
    "609e79f399b1e6727222a8dc75474788",
    "Pinkoi",
    "specialized JSONDecoder.decode<A>(_:from:)",
    "47",
    "38"
  ]
]

You can see it is a two-dimensional array.

Add the function to forward to Slack:

Continue adding the new function below the above code.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
function sendTop10CrashToSlack() {

  var iOSTop10Crashes = queryiOSTop10Crashes();
  var top10Tasks = new Array();
  
  for (var i = 0; i < iOSTop10Crashes.length ; i++) {
    var issue_id = iOSTop10Crashes[i][0];
    var issue_title = iOSTop10Crashes[i][1];
    var issue_subtitle = iOSTop10Crashes[i][2];
    var number_of_crashes = iOSTop10Crashes[i][3];
    var number_of_impacted_user = iOSTop10Crashes[i][4];

    var strip_title = issue_title.replace(/[\<|\>]/g, '');
    var strip_subtitle = issue_subtitle.replace(/[\<|\>]/g, '');
    
    top10Tasks.push("<https://console.firebase.google.com/u/1/project/YOUR_FIREBASE_PROJECTID/crashlytics/app/YOUR_FIREBASE_APP_PROJECT_ID/issues/"+issue_id+"|"+(i+1)+". Crash: "+number_of_crashes+" times ("+number_of_impacted_user+" users) - "+strip_title+" "+strip_subtitle+">");
  }

  var messages = top10Tasks.join("\n");
  var payload = {
    "blocks": [
      {
        "type": "header",
        "text": {
          "type": "plain_text",
          "text": ":bug::bug::bug: iOS Top 10 Crashes in the Last 7 Days :bug::bug::bug:",
          "emoji": true
        }
      },
      {
        "type": "divider"
      },
      {
        "type": "section",
        "text": {
          "type": "mrkdwn",
          "text": messages
        }
      },
      {
        "type": "divider"
      },
      {
        "type": "actions",
        "elements": [
          {
            "type": "button",
            "text": {
              "type": "plain_text",
              "text": "View Last 7 Days in Crashlytics",
              "emoji": true
            },
            "url": "https://console.firebase.google.com/u/1/project/YOUR_FIREBASE_PROJECTID/crashlytics/app/YOUR_FIREBASE_APP_PROJECT_ID/issues?time=last-seven-days&state=open&type=crash&tag=all"
          },
          {
            "type": "button",
            "text": {
              "type": "plain_text",
              "text": "View Last 30 Days in Crashlytics",
              "emoji": true
            },
            "url": "https://console.firebase.google.com/u/1/project/YOUR_FIREBASE_PROJECTID/crashlytics/app/YOUR_FIREBASE_APP_PROJECT_ID/issues?time=last-thirty-days&state=open&type=crash&tag=all"
          }
        ]
      },
      {
        "type": "context",
        "elements": [
          {
            "type": "plain_text",
            "text": "Crash counts and versions are only counted for the last 7 days, not all data.",
            "emoji": true
          }
        ]
      }
    ]
  };

  var slackWebHookURL = "https://hooks.slack.com/services/XXXXX"; //Replace with your in-coming webhook URL
  UrlFetchApp.fetch(slackWebHookURL,{
    method             : 'post',
    contentType        : 'application/json',
    payload            : JSON.stringify(payload)
  })
}

If you don’t know how to obtain the incoming WebHook URL, you can refer to the “Obtaining Incoming WebHooks App URL” section in this article.

Testing & Scheduling

At this point, your Google Apps Script project should have the above two functions.

Next, please select the “sendTop10CrashToSlack” function at the top, and then click Debug or Run to execute a test run; since the first execution requires authentication, please execute it at least once before proceeding to the next step.

After successfully executing a test run, you can start setting up the schedule for automatic execution:

Select the clock icon on the left, then choose “+ Add Trigger” at the bottom right.

For the first “Choose which function to run” (entry point of the function to be executed), change it to sendTop10CrashToSlack. The time period can be set according to personal preference.

⚠️⚠️⚠️ Please be aware that each query will accumulate and incur charges, so do not set it up carelessly; otherwise, you might end up bankrupt due to automatic scheduling.

Completion

Example Result Image

Example Result Image

From now on, you can quickly track the current app crash issues on Slack; you can even discuss them directly there.

App Crash-Free Users Rate?

If you want to track the App Crash-Free Users Rate, you can refer to the next article “Crashlytics + Google Analytics Automatic Query for App Crash-Free Users Rate

Further Reading

If you have any questions or feedback, feel free to contact me.

===

本文中文版本

===

This article was first published in Traditional Chinese on Medium ➡️ View Here



This post is licensed under CC BY 4.0 by the author.

2021 Pinkoi Tech Career Talk - Decoding the High-Efficiency Engineering Team

The Past and Present of iOS Privacy and Convenience