January 28, 2018
Core dumps have a history of being used for debugging earliest computers. In this post’s context, core dump is not a dump of all register values, memory, cache, etc of a CPU but more high level as state of my web application. It’s much high level as in the contents are different but the philosophy is the same - having a snapshot of the machine or software at the time of failure so that it could be investigated for the possible cause.
At Grofers, we have a react application and we’re using redux to maintain state of the application in one magnificent object. This object is the source of truth for most of the rendering and behavior of the application. Apart from this state, there are local storage and cookies, containing bits of state, which is mostly a copy of subsets of the main Redux state.
Now at any given point of time, if you copy the Redux state from one browser to another you can expect to pretty much replicate the application behavior. But there could be some bits in the local storage or cookies which might still conjure different results on different machines. To address that, if you replicated local storage and cookies as well, you’re pretty much done.
I’ve pondered over the idea of using core dump of an application to debug it since the day I got introduced to it. Quite recently, I had an opportunity, in the form a bug which only happened on my machine. There was a UI element which should not show up for any user, given a set of actions taken by the user. With the same set of actions the UI element was showing up on my screen. Given that I knew how the application works, it would have been very easy for me to find out exactly what was wrong, but being able to replicate the bug on others’ machines seemed like a better problem to solve first.
To quickly validate the method, I used the crude way of copying the state, and cookies from my erroneous browser session and pasted it in my server’s response, which would make sure the same state and cookies get to all of the clients. After replicating all of the state, I was actually able to replicate the bug. Once replicated, quick inspection into the difference between the corrupt state and the working state helped me tracked down a cookie that was causing this. This cookie was used to decide whether or not to show the UI element in question. A new change in the code somehow didn’t consider the case of stale cookies because of which saw this undesired behavior.
The crude way worked, but ideally I’d like this to be easier. For just engineers, having a browser extension or something, which would capture all of this state and allow me to easily export the dump for someone else to import and replicate the application state. For an expanded set of users, something as simple as Report a Bug action on the website achieving the similar result should be even better.
In a way being able to capture the snapshot of the application is about capturing the context of the application, detailed enough to predict the application behavior. Being able to replicate weird bugs by transferrable context, is an extremely powerful tool for debugging. People have used such tools for debugging in all kinds of applications, like using Dtrace or using MDB for Node.js. Everyone puts enough monitoring and reporting for their application, but capturing the right context and in a consumable manner is what is missing at times. And that renders all the monitoring and reporting useless.
So how do we get there? How do we capture the context? What should we capture? How should we share it across? There are services which integrate with your web app to capture logs on the console, network requests, and any other contextual data that you may want for debugging your app. Apps like InstaBug, etc. There is LogRocket. It can be as simple as Report a Bug button which captures all the data that an engineer might need to debug the application. Whatever you use you need to make sure that it’s easy to change what data you may need to capture, when you want to capture. etc The last thing you want to do is start using another tool with the promise of easier debuggability but end up at zero because you cannot use the tool the way you’d want.