cpu-usage-analyzer.qdoc 14.4 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
/****************************************************************************
**
** Copyright (C) 2015 The Qt Company Ltd.
** Contact: http://www.qt.io/licensing
**
** This file is part of Qt Creator
**
**
** GNU Free Documentation License
**
** Alternatively, this file may be used under the terms of the GNU Free
** Documentation License version 1.3 as published by the Free Software
** Foundation and appearing in the file included in the packaging of this
** file.
**
**
****************************************************************************/

// **********************************************************************
// NOTE: the sections are not ordered by their logical order to avoid
// reshuffling the file each time the index order changes (i.e., often).
// Run the fixnavi.pl script to adjust the links to the index order.
// **********************************************************************

/*!
    \contentspage {Qt Creator Manual}
    \previouspage creator-clang-static-analyzer.html
    \page creator-cpu-usage-analyzer.html
29
    \nextpage creator-autotest.html
30
31
32

    \title Analyzing CPU Usage

33
34
    \commercial

35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
    \QC is integrated with the Linux Perf tool (commercial only) that can be
    used to analyze the CPU usage of an application on embedded devices and, to
    a limited extent, on Linux desktop platforms. The CPU Usage Analyzer uses
    the Perf tool bundled with the Linux kernel to take periodic snapshots of
    the call chain of an application and visualizes them in a timeline view.

    \section1 Using the CPU Usage Analyzer

    The CPU Usage Analyzer needs to be able to locate debug symbols for the
    binaries involved. For debug builds, debug symbols are always generated.
    Edit the project build settings to generate debug symbols also for release
    builds.

    To use the CPU Usage Analyzer:

    \list 1
        \li To generate debug symbols also for applications compiled in release
            mode, select \uicontrol {Projects}, and then select
            \uicontrol Details next to \uicontrol {Build Steps} to view the
            build steps.

        \li Select the \uicontrol {Generate separate debug info} check box, and
            then select \uicontrol Yes to recompile the project.

        \li Select \uicontrol {Analyze > CPU Usage Analyzer} to profile the
            current application.

        \li Select the
            \inlineimage qtcreator-analyze-start-button.png
            (\uicontrol Start) button to start the application from the
            CPU Usage Analyzer.

            \note If data collection does not start automatically, select the
            \inlineimage qtcreator-analyzer-button.png
            (\uicontrol {Collect profile data}) button.

    \endlist

    When you start analyzing an application, the application is launched, and
    the CPU Usage Analyzer immediately begins to collect data. This is indicated
    by the time running in the \uicontrol Recorded field. However, as the data
    is passed through the Perf tool and an extra helper program bundled with
    \QC, and both buffer and process it on the fly, data may arrive in \QC
    several seconds after it was generated. An estimate for this delay is given
    in the \uicontrol {Processing delay} field.

    Data is collected until you select the
    \uicontrol {Stop collecting profile data} button or terminate the
    application.

    Select the \uicontrol {Stop collecting profile data} button to disable the
    automatic start of the data collection when an application is launched.
    Profile data will still be generated, but \QC will discard it until you
    select the button again.

    \section1 Specifying CPU Usage Analyzer Settings

    To specify global settings for the CPU Usage Analyzer, select
    \uicontrol Tools > \uicontrol Options > \uicontrol Analyzer >
    \uicontrol {CPU Usage Analyzer}. For each run configuration, you can also
    use specialized settings. Select \uicontrol Projects > \uicontrol Run, and
    then select \uicontrol Details next to
    \uicontrol {CPU Usage Analyzer Settings}.

    \section2 Selecting Call Graph Mode

    Select the command to invoke Perf in the \uicontrol {Call graph mode} field.
    The \uicontrol {Frame Pointer}, or \c fp, mode relies on frame pointers
    being available in the profiled application.

    The \uicontrol {Dwarf} mode  works also without frame pointers, but
    generates significantly more data.  Qt and most system libraries are
    compiled without frame pointers by default, so the frame pointer mode is
    only useful with customized systems.

    \section2 Setting Stack Snapshot Size

    In the dwarf mode, Perf takes periodic snapshots of the application stack,
    which are then analyzed and \e unwound by the CPU Usage Analyzer. Set the
    size of the stack snapshots in the \uicontrol {Stack snapshot size} field.
    Large stack snapshots result in a larger volume of data to be transferred
    and processed. Small stack snapshots may fail to capture call chains of
    highly recursive applications or other intense stack usage.

    \section2 Setting Sampling Frequency

    Set the sampling frequency for Perf in the \uicontrol {Sampling frequency}
    field. High sampling frequencies result in more accurate data, at the
    expense of a higher overhead and a larger volume of profiling data being
    generated. The actual sampling frequency is determined by the Linux kernel
    on the target device, which takes the frequency set for Perf merely as
    advice. There may be a significant difference between the sampling frequency
    you request and the actual result.

    In general, if you configure the CPU Usage Analyzer to collect more data
    than it can transmit over the connection between the target and the host
    device, the application may get blocked while Perf is trying to send the
    data, and the processing delay may grow excessively. You should then lower
    the \uicontrol {Sampling frequency} or the \uicontrol {Stack snapshot size}.

135
136
137
138
139
140
141
142
    \section2 Adding Command Line Options For Perf

    You can specify additional command line options to be passed to Perf when
    recording data in the \uicontrol {Additional arguments} field. You may want
    to specify \c{--no-delay} or \c{--no-buffering} to reduce the processing delay.
    However, those options are not supported by all versions of Perf and Perf may
    not start if an unsupported option is given.

143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
    \section1 Analyzing Collected Data

    The \uicontrol Timeline view displays a graphical representation of CPU
    usage per thread and a condensed view of all recorded events.

    \image cpu-usage-analyzer.png "CPU Usage Analyzer"

    Each category in the timeline describes a thread in the application. Move
    the cursor on an event (6) on a row to see how long it takes and which
    function in the source it represents. To display the information only when
    an event is selected, disable the
    \uicontrol {View Event Information on Mouseover} button (5).

    The outline (10) summarizes the period for which data was collected. Drag
    the zoom range (8) or click the outline to move on the outline. You can
    also move between events by selecting the
    \uicontrol {Jump to Previous Event} (1) and \uicontrol {Jump to Next Event}
    (2) buttons.

    Select the \uicontrol {Show Zoom Slider} button (3) to open a slider that
    you can use to set the zoom level. You can also drag the zoom handles (9).
    To reset the default zoom level, right-click the timeline to open the
    context menu, and select \uicontrol {Reset Zoom}.

    \section2 Selecting Event Ranges

    You can select an event range (7) to view the time it represents or to zoom
    into a specific region of the trace. Select the \uicontrol {Select Range}
    button (4) to activate the selection tool. Then click in the timeline to
    specify the beginning of the event range. Drag the selection handle to
    define the end of the range.

    You can use event ranges also to measure delays between two subsequent
    events. Place a range between the end of the first event and the beginning
    of the second event. The \uicontrol Duration field displays the delay
    between the events in milliseconds.

    To zoom into an event range, double-click it.

    To remove an event range, close the \uicontrol Selection dialog.

    \section2 Understanding the Data

    Generally, events in the timeline view indicate how long a function call
    took. Move the mouse over them to see details. The details always include
    the address of the function, the approximate duration of the call, the ELF
    file the function resides in, the number of samples collected with this
    function call active, the total number of times this function was
    encountered in the thread, and the number of samples this function was
    encountered in at least once.

    For functions with debug information available, the details include the
    location in source code and the name of the function. You can click on such
    events to move the cursor in the code editor to the part of the code the
    event is associated with.

    As the Perf tool only provides periodic samples, the CPU Usage Analyzer
    cannot determine the exact time when a function was called or when it
201
    returned. You can, however, see exactly when a sample was taken in the
202
    second row of each thread. The CPU Usage Analyzer assumes that if the same
203
204
205
206
207
208
    function is present at the same place in the call chain in multiple
    consecutive samples, then this represents a single call to the respective
    function. This is, of course, a simplification. Also, there may be other
    functions being called between the samples taken, which do not show up in
    the profile data. However, statistically, the data is likely to show the
    functions that spend the most CPU time most prominently.
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256

    If a function without debug information is encountered, further unwinding
    of the stack may fail. Unwinding will also fail if a QML or JavaScript
    function is encountered, and for some symbols implemented in assembler. If
    unwinding fails, only part of the call chain is displayed, and the
    surrounding functions may seem to be interrupted. This does not necessarily
    mean they were actually interrupted during the execution of the
    application, but only that they could not be found in the stacks where the
    unwinding failed.

    Kernel functions included in call chains are shown on the third row of each
    thread. All kernel functions are summarized and not differentiated any
    further, because most of the time kernel symbols cannot be resolved when the
    data is analyzed.

    The coloring of the events represents the actual sample rate for the
    specific thread they belong to, across their duration. The Linux kernel
    will only take a sample of a thread if the thread is active. At the same
    time, the kernel tries to maintain a constant overall sampling frequency.
    Thus, differences in the sampling frequency between different threads
    indicate that the thread with more samples taken is more likely to be the
    overall bottleneck, and the thread with less samples taken has likely spent
    time waiting for external events such as I/O or a mutex.

    \section1 Loading Perf Data Files

    You can load any \c perf.data files generated by recent versions of the
    Linux Perf tool and view them in \QC. Select \uicontrol Analyze >
    \uicontrol {Load Trace} to load a file. The CPU Usage Analyzer needs to know
    the context in which the data was recorded to find the debug symbols.
    Therefore, you have to specify the kit that the application was built with
    and the folder where the application executable is located.

    The Perf data files are generated by calling \c {perf record}. Make sure to
    generate call graphs when recording data by starting Perf with the
    \c {--call-graph} option. Also check that the necessary debug symbols are
    available to the CPU Usage Analyzer, either at a standard location
    (\c /usr/lib/debug or next to the binaries), or as part of the Qt package
    you are using.

    The CPU Usage Analyzer can read Perf data files generated in either frame
    pointer or dwarf mode. However, to generate the files correctly, numerous
    preconditions have to be met. All system images for the
    \l{http://doc.qt.io/QtForDeviceCreation/qtee-supported-platforms.html}
    {Qt for Device Creation reference devices}, except for Freescale iMX53 Quick
    Start Board and SILICA Architect Tibidabo, are correctly set up for
    profiling in the dwarf mode. For other devices, check whether Perf can read
    back its own data in a sensible way by checking the output of
257
    \c {perf report} or \c {perf script} for the recorded Perf data files.
258

259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
    \section1 Troubleshooting

    The CPU Usage Analyzer might fail to record data for the following reasons:

    \list 1
        \li The connection between the target device and the host may not be
            fast enough to transfer the data produced by Perf. Try lowering
            the \uicontrol {Stack snapshot size} or
            \uicontrol {Sampling Frequency} settings.
        \li Perf may be buffering the data forever, never sending it. Add
            \c {--no-delay} or \c {--no-buffering} to the
            \uicontrol {Additional arguments} field.
        \li Some versions of Perf will not start recording unless given a
            certain minimum sampling frequency. Try with a
            \uicontrol {Sampling Frequency} of 1000.
274
275
276
277
278
279
280
281
282
283
284
285
286
        \li On some devices, for example Boundary Devices i.MX6 Boards, the
            Perf support is not very stable and the Linux kernel may randomly
            fail to record data after some time. Perf can use different types
            of events to trigger samples. You can get a list of available event
            types by running \c {perf list} on the device and add
            \c {-e <event type>} to the \uicontrol {Additional arguments} field
            to change the event type to be used. The choice of event type
            affects the performance and stability of the sampling.
            \c {-e cpu-clock} is a safe but relatively slow option as it
            does not use the hardware performance counters, but drives the
            sampling from software. After the sampling has failed, reboot the
            device. The kernel may have disabled important parts of the
            performance counters system.
287
288
289
290
    \endlist

    Output from the helper program that processes the data is displayed in the
    \uicontrol {General Messages} output pane.
291
*/