Priority Web Sudden Crash

A worldwide live video streaming company with thousands of customers and hundreds of employees found that their Priority application suddenly was not working at the beginning of a regular working day. Imagine the distress of finding the reason quickly and getting the company’s core system up and running again as soon as possible! Thankfully, with AimBetter, they could quickly figure it out and reestablish operations in a matter of minutes.

In Priority Web environments, the database and application servers are typically separated, with one server functioning as the web application server – an IIS server – and another as the database server – an SQL Server in this case. When the application needs to retrieve or store data, the IIS server connects to the database using a database connection string, allowing the web application to query the database and view or update data within the database storage. The query is executed if the connection and authentication are successful, and the required results are retrieved and displayed in the web application.

In this episode, a single and apparently unharmful process of retrieving data from the database caused the downtime of a whole web server, disrupting the operation of this company, whose activities are completely web-based. Each minute of downtime is precious, and having a solution to recover rapidly from this critical situation was a saver.

With the AimBetter RTPS, an alert was received the moment the problem began, making it easy to recognize this issue’s root cause. First, the web server stopped due to a lack of available space in disk C, which is essential for system functionality.

AimBetter’s ability to analyze the server data when the disk got full made it possible to investigate what happened around the time this issue occurred. With all the historical data stored, the investigation can be carried out without the need to establish a connection to the server or interfere with its functioning.

Opening a side-by-side graphical display of the free disk space and I/O data transfer, it was possible to understand that this process generated a pick in the I/O and caused a more than 400GB loss!

Searching for the top accessed file around midnight during the sudden decrease in disk space, it was simple to identify the process involved in this occurrence. The following screenshot shows that process ID 7788 was heavily writing new temp files of the Priority application to disk C.

The first and immediate solution to this critical issue was deleting these temp files so the server could return to its good functioning in less than 20 minutes.

For a long-term resolution that would prevent the repetition of this event, AimBetter’s capability of displaying the specific query activated by the process involved in this issue provided this company’s DBA with the necessary information to carry out its optimization and guarantee an enduring solution.

In the above screenshot, you can see that AimBetter displays the query activated by process ID 7788. AimBetter presents the user running this query beside the “Client Process” label, “Cris****,” although the database user is the Priority’s generic web application user, “tabula.” In this view, the client process, a “Report Generation,” is displayed, pinpointing that this specific report generation was the root cause of the whole Priority application crash episode.

AimBetter also provided a recommendation for improvement, particularly suggesting changes to avoid Cartesian nested loops. Cartesian nested loops occur when rows from one table are combined with all rows from another table without a proper join condition, creating a massive number of combinations and overwhelming the system. With these insights and recommendations, the DBA was able to optimize the query, preventing future occurrences of the issue and ensuring the stability of the Priority application.

With this recommendation in hand and the possibility to download the execution plan directly from AimBetter with a single click, the company’s DBA saved significant time and effort and could start working on improving this long-running query that caused an excess of temp files to clog the C disk.

If you’re interested in learning more about how Aimbetter can enhance your IT management by preventing disruptions and speeding up recovery from unexpected events, please contact us today!

    Learn more how you can solve IT systems performance issues faster



    Share with friends:

    You may also like this:

    Menu
    Skip to content