Commit 054eb1ab authored by Tvrtko Ursulin's avatar Tvrtko Ursulin

benchmarks/gem_wsim: Command submission workload simulator

Tool which emits batch buffers to engines with configurable
sequences, durations, contexts, dependencies and userspace waits.

Unfinished but shows promise so sending out for early feedback.

v2:
 * Load workload descriptors from files. (also -w)
 * Help text.
 * Calibration control if needed. (-t)
 * NORELOC | LUT to eb flags.
 * Added sample workload to wsim/workload1.

v3:
 * Multiple parallel different workloads (-w -w ...).
 * Multi-context workloads.
 * Variable (random) batch length.
 * Load balancing (round robin and queue depth estimation).
 * Workloads delays and explicit sync steps.
 * Workload frequency (period) control.

v4:
 * Fixed queue-depth estimation by creating separate batches
   per engine when qd load balancing is on.
 * Dropped separate -s cmd line option. It can turn itself on
   automatically when needed.
 * Keep a single status page and lie about the write hazard
   as suggested by Chris.
 * Use batch_start_offset for controlling the batch duration.
   (Chris)
 * Set status page object cache level. (Chris)
 * Moved workload description to a README.
 * Tidied example workloads.
 * Some other cleanups and refactorings.

v5:
 * Master and background workloads (-W / -w).
 * Single batch per step is enough even when balancing. (Chris)
 * Use hars_petruska_f54_1_random IGT functions and see to zero
   at start. (Chris)
 * Use WC cache domain when WC mapping. (Chris)
 * Keep seqnos 64-bytes apart in the status page. (Chris)
 * Add workload throttling and queue-depth throttling commands.
   (Chris)

v6:
 * Added two more workloads.
 * Merged RT balancer from Chris.

v7:
 * Merged NO_RELOC patch from Chris.
 * Added missing RT balancer to help text.

TODO list:

 * Fence support.
 * Batch buffer caching (re-use pool).
 * Better error handling.
 * Less 1980's workload parsing.
 * More workloads.
 * Threads?
 * ... ?
Signed-off-by: Tvrtko Ursulin's avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
parent cf6f2c9b
......@@ -14,6 +14,7 @@ benchmarks_prog_list = \
gem_prw \
gem_set_domain \
gem_syslatency \
gem_wsim \
kms_vblank \
prime_lookup \
vgem_mmap \
......
This diff is collapsed.
Workload descriptor format
==========================
ctx.engine.duration_us.dependency.wait,...
<uint>.<str>.<uint>[-<uint>].<int <= 0>.<0|1>,...
d|p|s.<uiny>,...
For duration a range can be given from which a random value will be picked
before every submit. Since this and seqno management requires CPU access to
objects, care needs to be taken in order to ensure the submit queue is deep
enough these operations do not affect the execution speed unless that is
desired.
Additional workload steps are also supported:
'd' - Adds a delay (in microseconds).
'p' - Adds a delay relative to the start of previous loop so that the each loop
starts execution with a given period.
's' - Synchronises the pipeline to a batch relative to the step.
't' - Throttle every n batches
'q' - Throttle to n max queue depth
Engine ids: RCS, BCS, VCS, VCS1, VCS2, VECS
Example (leading spaces must not be present in the actual file):
----------------------------------------------------------------
1.VCS1.3000.0.1
1.RCS.500-1000.-1.0
1.RCS.3700.0.0
1.RCS.1000.-2.0
1.VCS2.2300.-2.0
1.RCS.4700.-1.0
1.VCS2.600.-1.1
p.16000
The above workload described in human language works like this:
1. A batch is sent to the VCS1 engine which will be executing for 3ms on the
GPU and userspace will wait until it is finished before proceeding.
2-4. Now three batches are sent to RCS with durations of 0.5-1.5ms (random
duration range), 3.7ms and 1ms respectively. The first batch has a data
dependency on the preceding VCS1 batch, and the last of the group depends
on the first from the group.
5. Now a 2.3ms batch is sent to VCS2, with a data dependency on the 3.7ms
RCS batch.
6. This is followed by a 4.7ms RCS batch with a data dependency on the 2.3ms
VCS2 batch.
7. Then a 0.6ms VCS2 batch is sent depending on the previous RCS one. In the
same step the tool is told to wait for the batch completes before
proceeding.
8. Finally the tool is told to wait long enough to ensure the next iteration
starts 16ms after the previous one has started.
When workload descriptors are provided on the command line, commas must be used
instead of new lines.
1.VCS1.3000.0.1
1.RCS.1000.-1.0
1.RCS.3700.0.0
1.RCS.1000.-2.0
1.VCS2.2300.-2.0
1.RCS.4700.-1.0
1.VCS2.600.-1.1
0.VECS.1400-1500.0.0
0.RCS.1000-1500.-1.0
s.-2
2.VCS2.50-350.0.1
1.VCS1.1300-1400.0.1
0.VECS.1400-1500.0.0
0.RCS.100-300.-1.1
2.RCS.1300-1500.0.0
2.VCS2.100-300.-1.1
1.VCS1.900-1400.0.1
1.VCS.3000.0.1
1.RCS.1000.-1.0
1.RCS.3700.0.0
1.RCS.1000.-2.0
1.VCS.2300.-2.0
1.RCS.4700.-1.0
1.VCS.600.-1.1
0.VECS.1400-1500.0.0
0.RCS.1000-1500.-1.0
s.-2
1.VCS.50-350.0.1
1.VCS.1300-1400.0.1
0.VECS.1400-1500.0.0
0.RCS.100-300.-1.1
1.RCS.1300-1500.0.0
1.VCS.100-300.-1.1
1.VCS.900-1400.0.1
t.5
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
0.VCS1.500-2000.0.0
q.5
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
0.VCS.500-2000.0.0
......@@ -1557,6 +1557,32 @@ bool __igt_fork(void)
}
/**
* igt_child_done:
*
* Lets the IGT core know that one of the children has exited.
*/
void igt_child_done(pid_t pid)
{
int i = 0;
int found = -1;
igt_assert(num_test_children > 1);
for (i = 0; i < num_test_children; i++) {
if (pid == test_children[i]) {
found = i;
break;
}
}
igt_assert(found >= 0);
num_test_children--;
for (i = found; i < num_test_children; i++)
test_children[i] = test_children[i + 1];
}
/**
* igt_waitchildren:
*
......
......@@ -688,6 +688,7 @@ bool __igt_fork(void);
#define igt_fork(child, num_children) \
for (int child = 0; child < (num_children); child++) \
for (; __igt_fork(); exit(0))
void igt_child_done(pid_t pid);
void igt_waitchildren(void);
void igt_waitchildren_timeout(int seconds, const char *reason);
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment