What is it?

ETWController is the tool to troubleshoot Windows performance issues. It can profile

  • Your local machine
  • A remote machine
  • Both machines simultaneously

When using a profiling tool it is normally very hard to deduce from the profiling data alone which steps the user did perform and where some issue did happen which did annoy the user. To solve that issue once and for all ETWController records for every mouse click and Enter key press and every two seconds a full screenshot of all attached monitors. All user actions like

  • Mouse Click
  • Keyboard Press
  • Mouse Move

can be recorded along with the screenshots into the profiling data. That is very helpful to see what the user did and how the UI (if there is any) did react. To make things even better you only need this tool on Windows 10 machines without any additional downloads because WPR.exe is part of Windows 10 which enables easy ETW system wide profiling with ETWController. On other machines you need to install the Windows Performance Toolkit which brings WPR.exe (the recording tool) and WPA (the profiling data viewer) along. It is part of the Windows 10 SDK where you need to install only the Windows Performance Toolkit part (40MB).

image

Usage Scenarios

Record data at the machine/s where the user has a problem

  • You can xcopy deploy ETWController or start if from a remote network share if you are not allowed to install any software on client machines
  • You can customize the settings for a specific problem in ETWController.exe.config so the user can start the tool easily and you record only the data you are interested in.
    1. User starts ETWController
    2. Select Trace Collection tab
    3. Press Start
    4. Reproduce the issue
    5. Press Stop
    6. Send the collected zip/.7z file back to you.
  • You not only get a nice compressed file but also (if you enable it) many screenshots which document the actions the user did perform
    • Do not ask me if you have legal issues with that. Work something out with your clients.
    • One possible way out to not disclose confidential information is to analyze the data on the client machine so no confidential data every leaves the users machine.

Distributed Profiling

Most software comes with a backend on one or more server machines. It can be challenging to analyze issues in a distributed setup. If a problem shows up you need to check if

  • The client machine has an issue
  • The network in between is to blame
  • Or the backend/s have a problem

To support that you can start ETWController on both machines and drive profiling from one machine where you initiate the server requests. ETWController can send the keyboard and mouse events to the remote machine via a dedicated socket so you can synchronize the ETW traces from both machines in time without the necessity to have exactly synchronized clocks on both machines. Each input event gets a unique id so you can find for mouse click 145 the exact time point on the client and remote machine where e.g. a hang did start.

If neither the client nor the server machine show anything unusual you can take the next step and use WireShark to record the network data. There you can filter in Wireshark for the ETWController input socket to find e.g. mouse click 145 and check there if unusual high packet round trip times, TCP retransmissions or something equally bad on the network did happen at that time. If yes then you need to follow the switches which one was overloaded or not booted for a long time and present the customer definitive proof that their network infrastructure has an issue.

Script Data Recording

ETWController can be the driver or only another data source in your own data collection strategy. If you use the UI you can configure your own profiling start/stop scripts as you like. ETWController will be happy to execute them for you. You can record the generated mouse and keyboard events with your own tracing script along with the provided screenshots and HTML report if you wish. You can also configure ETWController from the command line to fully automate the data collection without any user visible UI. It is up to you and your use case how you want to use it.

 

Quickstart

  • Install from the latest Windows 10 SDK the Windows Performance Toolkit
  • Unzip ETWController (XCopy Deployable)
  • Start ETWController.exe

image

The default settings should be ok for now. If not you can change quite a lot of things in the Configuration dialog.

  • Click on Trace Collection Tab.

image

  • Press Start and press on the Show Output button to see the script output.

image

  • Reproduce some performance issue.
  • Press Stop

image

 

The trace state should now flash blue for the time the data is collected. This can take quite some time for the first time because for all .NET native images the pdbs must be created which can take on machines with large .NET projects up to 15 minutes for the first time. But later stop calls are much faster.

image

  • Press Open Trace

This will launch WPA with my simplified WPA profile from my blog which makes ETW analysis especially for managed code much easier.

While the trace loads you can examine the captured screenshots which are copied by the xxwpr.cmd script besides the ETL file with the extension .Screenshots.

There is a file named Report.html which you can open. In this report you can configure the size of the displayed screenshots to get an better overview where the interesting things did happen.

image

For each mouse click event a screenshot is taken which is named Screenshot_dd where dd is the number of mouse clicks since trace start. To make it easier to see where the user did click around the mouse coordinates a red square is drawn (see picture below) and a second screenshot 500ms later is taken named Screenshot_ddAfter500ms.

That should make it easy to see if the UI did react to a click event in a decent time interval.

image

In the meantime the trace file should have opened in WPA of the Windows Performance Toolkit. Where you see in the HookEvents graphs our mouse, keyboard and screenshot events nicely lined up where it is now much easier to drill down guided by the screenshots to the interesting time region where the user did experience a performance glitch.

image

From there it is the usual WPA analysis with more details than ever.

 

Distributed Profiling Background

In a distributed world performance troubleshooting has just got much harder. Now we have at least two computers and the Internet involved. Sane developers blame for all performance problems, which involve a remote server, the network which can be true or not. It is nearly impossible to correlate user input of one computer (lets say a keyboard press) with the associated network traffic and the following actions on a remote server, since most of the time the network traffic is encrypted or too hard to follow because of the huge amount of data transferred.

This is where ETWControler comes into the game. The name ETWControler stems from Event Tracing for Windows. ETW is the most detailed and fastest profiling facility on Windows. If you do not know it you have one more reason to learn how to make use of it. ETWControler as the name suggests controls simultaneously profiling on two machines. It can start/stop at the same time ETW tracing on the local and remote machine making it an ideal buddy to capture data on one or more systems.

It allows you to simultaneously capture and correlate profiling information from the client, network and server. If the user presses a key or a mouse button it can be sent to the remote server over a dedicated port. ETWControler comes with a built in keyboard and mouse logger which writes the captured keyboard and mouse events locally to ETW and sends them over a configurable port in plain text over the wire to the remote server where another instance of ETWControler receives the user events and logs them also as ETW events.

Additionally there is a "Slow" button which can be assigned to a mouse or keyboard hot key which logs a user configurable message to the local computer, network stream and the remote machine. With this hotkey you can create marker events where you did experience sluggish behavior or other interesting incidents. This makes it very easy to identify in the network stream the exact time point where a slowdown did happen and you can look with e.g. Wireshark directly at the plain text data and search for your user defined message in a multi GB network trace stream.

Sure you can wade through GB network traces if you wish but I want to get my hands out of the network trace analyzer as fast as possible. These marker events on a dedicated port make searching and marking of strategic events trivial. Every logged event gets a unique number which allows you to search for specific mouse/keyboard events as well. The data flow is shown below:

image

Currently ETWControler does not start or stop Wireshark captures. You need to start network capturing on the network devices by yourself. But now you can correlate with the help of the keyboard and the Slow/Fast Marker events both ETW traces and the network trace without any trouble. That makes it much easier to find the point of interest in any captured ETW and/or network stream. You can e.g. watch your network load during an integration test and insert at interesting spikes marker events to check if the network performance did drop due to network issues or if the server or the client was busy doing something else (e.g. the virus scanner was active or your application did perform a garbage collection).

Last edited May 28 at 6:40 PM by Alois, version 19