You know how it feels. After releasing a new version, a service starts behaving in an unexpected way, and it's up to you to save the day. But where to start?
Criteo processes 150 billion requests per day, across more than 4000 front-end servers. As part of the Criteo Performance team, our job is to investigate critical issues in this kind of environment.
In this talk, you will follow our insights, mistakes and false leads during a real world case.
We will cover all the phases of the investigation, from the early detection to the actual fix, and we will detail our tricks and tools along the way. Including but not limited to:
- Using metrics to detect and assess the issue;
- What you can get... or not from a profiler to make a good assumption;
- Digging into the CLR data structures with a decompiler, WinDBG and SOS to assert your assumption;
- Automating memory dump analysis with ClrMD to build your own tools when WinDBG falls short.
In addition to developing and shipping software on Microsoft stacks for 25+ years, Christophe Nasarre has been working as a technical reviewer for MSPress, Addison-Wesley and other publishing companies since 1996 on books such as "CLR via C#" and the last editions of Windows Internals.
He is providing tools and insights on .NET and Windows development via his blog. Christophe also presented technical sessions on stage both internally at Microsoft or for ISVs and customers at public events.
Kevin Gosse has been using Microsoft .NET technologies for 15 years, across client, server, and mobile applications. He is currently employed at Criteo, where he works on scalability, debugging, and optimization issues.