1. 16 May, 2018 1 commit
  2. 10 May, 2017 3 commits
  3. 25 Apr, 2017 1 commit
    • Tvrtko Ursulin's avatar
      benchmarks/gem_wsim: Command submission workload simulator · 054eb1ab
      Tvrtko Ursulin authored
      Tool which emits batch buffers to engines with configurable
      sequences, durations, contexts, dependencies and userspace waits.
      Unfinished but shows promise so sending out for early feedback.
       * Load workload descriptors from files. (also -w)
       * Help text.
       * Calibration control if needed. (-t)
       * NORELOC | LUT to eb flags.
       * Added sample workload to wsim/workload1.
       * Multiple parallel different workloads (-w -w ...).
       * Multi-context workloads.
       * Variable (random) batch length.
       * Load balancing (round robin and queue depth estimation).
       * Workloads delays and explicit sync steps.
       * Workload frequency (period) control.
       * Fixed queue-depth estimation by creating separate batches
         per engine when qd load balancing is on.
       * Dropped separate -s cmd line option. It can turn itself on
         automatically when needed.
       * Keep a single status page and lie about the write hazard
         as suggested by Chris.
       * Use batch_start_offset for controlling the batch duration.
       * Set status page object cache level. (Chris)
       * Moved workload description to a README.
       * Tidied example workloads.
       * Some other cleanups and refactorings.
       * Master and background workloads (-W / -w).
       * Single batch per step is enough even when balancing. (Chris)
       * Use hars_petruska_f54_1_random IGT functions and see to zero
         at start. (Chris)
       * Use WC cache domain when WC mapping. (Chris)
       * Keep seqnos 64-bytes apart in the status page. (Chris)
       * Add workload throttling and queue-depth throttling commands.
       * Added two more workloads.
       * Merged RT balancer from Chris.
       * Merged NO_RELOC patch from Chris.
       * Added missing RT balancer to help text.
      TODO list:
       * Fence support.
       * Batch buffer caching (re-use pool).
       * Better error handling.
       * Less 1980's workload parsing.
       * More workloads.
       * Threads?
       * ... ?
      Signed-off-by: Tvrtko Ursulin's avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: "Rogozhkin, Dmitry V" <dmitry.v.rogozhkin@intel.com>
  4. 27 Sep, 2016 1 commit
  5. 28 Aug, 2016 1 commit
  6. 04 Aug, 2016 2 commits
  7. 20 Jun, 2016 1 commit
  8. 19 May, 2016 1 commit
    • Chris Wilson's avatar
      benchmarks: Add gem_exec_fault · 99c015af
      Chris Wilson authored
      If we specify an unobtainable alignment (e.g, 63bits) the kernel will
      eviction the object from the GTT and fail to rebind it. We can use this,
      to measure how long it takes to move objects around in the GTT by
      running execbuf followed by the unbind. For small objects, this will be
      dominated by the nop execution time, but for larger objects this will be
      ratelimited by how fast we can rewrite the PTE.
      Signed-off-by: Chris Wilson's avatarChris Wilson <chris@chris-wilson.co.uk>
  9. 09 Mar, 2016 1 commit
  10. 19 Dec, 2015 2 commits
    • Chris Wilson's avatar
      benchmarks: Remove gem_wait · 39bad606
      Chris Wilson authored
      Superseded by gem_latency.
      Signed-off-by: Chris Wilson's avatarChris Wilson <chris@chris-wilson.co.uk>
    • Chris Wilson's avatar
      benchmark: Measure of latency of producers -> consumers, gem_latency · c9da0b52
      Chris Wilson authored
      The goal is measure how long it takes for clients waiting on results to
      wakeup after a buffer completes, and in doing so ensure scalibilty of
      the kernel to large number of clients.
      We spawn a number of producers. Each producer submits a busyload to the
      system and records in the GPU the BCS timestamp of when the batch
      completes. Then each producer spawns a number of waiters, who wait upon
      the batch completion and measure the current BCS timestamp register and
      compare against the recorded value.
      By varying the number of producers and consumers, we can study different
      aspects of the design, in particular how many wakeups the kernel does
      for each interrupt (end of batch). The more wakeups on each batch, the
      longer it takes for any one client to finish.
      Signed-off-by: Chris Wilson's avatarChris Wilson <chris@chris-wilson.co.uk>
  11. 22 Nov, 2015 1 commit
  12. 30 Oct, 2015 1 commit
    • Chris Wilson's avatar
      benchmark/gem_wait: poc for benchmarking i915_wait_request overhead · 9024a72d
      Chris Wilson authored
      One scenario under recent discussion is that of having a thundering herd
      in i915_wait_request - where the overhead of waking up every waiter for
      every batchbuffer was significantly impacting customer throughput. This
      benchmark tries to replicate something to that effect by having a large
      number of consumers generating a busy load (a large copy followed by
      lots of small copies to generate lots of interrupts) and tries to wait
      upon all the consumers concurrenctly (to reproduce the thundering herd
      effect). To measure the overhead, we have a bunch of cpu hogs - less
      kernel overhead in waiting should allow more CPU throughput.
      Signed-off-by: Chris Wilson's avatarChris Wilson <chris@chris-wilson.co.uk>
  13. 06 Oct, 2015 1 commit
  14. 11 Aug, 2015 1 commit
  15. 10 Aug, 2015 1 commit
  16. 09 Aug, 2015 1 commit
    • Chris Wilson's avatar
      benchmarks: Record and replay calls to EXECBUFFER2 · 0393e728
      Chris Wilson authored
      This slightly idealises the behaviour of clients with the aim of
      measuring the kernel overhead of different workloads. This test focuses
      on the cost of relocating batchbuffers.
      A trace file is generated with an LD_PRELOAD intercept around
      execbuffer, which we can then replay at our leisure. The replay replaces
      the real buffers with a set of empty ones so the only thing that the
      kernel has to do is parse the relocations. but without a real workload
      we lose the impact of having to rewrite active buffers.
      Signed-off-by: Chris Wilson's avatarChris Wilson <chris@chris-wilson.co.uk>
  17. 24 Jul, 2015 2 commits
  18. 23 Jul, 2015 3 commits
  19. 22 Jul, 2015 1 commit
  20. 25 Apr, 2014 1 commit
    • Tvrtko Ursulin's avatar
      tests/gem_userptr_benchmark: Benchmarking userptr surfaces and impact · d3057d7a
      Tvrtko Ursulin authored
      This adds a small benchmark for the new userptr functionality.
      Apart from basic surface creation and destruction, also tested is the
      impact of having userptr surfaces in the process address space. Reason
      for that is the impact of MMU notifiers on common address space
      operations like munmap() which is per process.
        * Moved to benchmarks.
        * Added pointer read/write tests.
        * Changed output to say iterations per second instead of
          operations per second.
        * Multiply result by batch size for multi-create* tests
          for a more comparable number with create-destroy test.
        * Use ALIGN macro.
        * Catchup with big lib/ reorganization.
        * Removed unused code and one global variable.
        * Fixed up some warnings.
        * Fixed feature test, does not matter here but makes it
          consistent with gem_userptr_blits and clearer.
      Signed-off-by: Tvrtko Ursulin's avatarTvrtko Ursulin <tvrtko.ursulin@intel.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Brad Volkin <bradley.d.volkin@intel.com>
      Reviewed-by: 's avatarBrad Volkin <bradley.d.volkin@intel.com>
      Signed-off-by: Daniel Vetter's avatarDaniel Vetter <daniel.vetter@ffwll.ch>
  21. 24 Apr, 2014 1 commit