1. 02 Apr, 2015 2 commits
  2. 26 Aug, 2014 1 commit
    • Christoph Lameter's avatar
      x86: Replace __get_cpu_var uses · 89cbc767
      Christoph Lameter authored
      __get_cpu_var() is used for multiple purposes in the kernel source. One of
      them is address calculation via the form &__get_cpu_var(x).  This calculates
      the address for the instance of the percpu variable of the current processor
      based on an offset.
      
      Other use cases are for storing and retrieving data from the current
      processors percpu area.  __get_cpu_var() can be used as an lvalue when
      writing data or on the right side of an assignment.
      
      __get_cpu_var() is defined as :
      
      #define __get_cpu_var(var) (*this_cpu_ptr(&(var)))
      
      __get_cpu_var() always only does an address determination. However, store
      and retrieve operations could use a segment prefix (or global register on
      other platforms) to avoid the address calculation.
      
      this_cpu_write() and this_cpu_read() can directly take an offset into a
      percpu area and use optimized assembly code to read and write per cpu
      variables.
      
      This patch converts __get_cpu_var into either an explicit address
      calculation using this_cpu_ptr() or into a use of this_cpu operations that
      use the offset.  Thereby address calculations are avoided and less registers
      are used when code is generated.
      
      Transformations done to __get_cpu_var()
      
      1. Determine the address of the percpu instance of the current processor.
      
      	DEFINE_PER_CPU(int, y);
      	int *x = &__get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(&y);
      
      2. Same as #1
      
       but this time an array structure is involved.
      
      	DEFINE_PER_CPU(int, y[20]);
      	int *x = __get_cpu_var(y);
      
          Converts to
      
      	int *x = this_cpu_ptr(y);
      
      3. Retrieve the content of the current processors instance of a per cpu
      variable.
      
      	DEFINE_PER_CPU(int, y);
      	int x = __get_cpu_var(y)
      
         Converts to
      
      	int x = __this_cpu_read(y);
      
      4. Retrieve the content of a percpu struct
      
      	DEFINE_PER_CPU(struct mystruct, y);
      	struct mystruct x = __get_cpu_var(y);
      
         Converts to
      
      	memcpy(&x, this_cpu_ptr(&y), sizeof(x));
      
      5. Assignment to a per cpu variable
      
      	DEFINE_PER_CPU(int, y)
      	__get_cpu_var(y) = x;
      
         Converts to
      
      	__this_cpu_write(y, x);
      
      6. Increment/Decrement etc of a per cpu variable
      
      	DEFINE_PER_CPU(int, y);
      	__get_cpu_var(y)++
      
         Converts to
      
      	__this_cpu_inc(y)
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86@kernel.org
      Acked-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Acked-by: Ingo Molnar's avatarIngo Molnar <mingo@kernel.org>
      Signed-off-by: default avatarChristoph Lameter <cl@linux.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      89cbc767
  3. 02 Sep, 2013 1 commit
  4. 28 May, 2013 1 commit
  5. 21 Apr, 2013 1 commit
  6. 16 Feb, 2013 1 commit
  7. 06 Feb, 2013 5 commits
  8. 24 Oct, 2012 1 commit
  9. 05 Jul, 2012 1 commit
  10. 18 May, 2012 1 commit
  11. 09 May, 2012 1 commit
    • Robert Richter's avatar
      perf/x86-ibs: Precise event sampling with IBS for AMD CPUs · 450bbd49
      Robert Richter authored
      
      
      This patch adds support for precise event sampling with IBS. There are
      two counting modes to count either cycles or micro-ops. If the
      corresponding performance counter events (hw events) are setup with
      the precise flag set, the request is redirected to the ibs pmu:
      
       perf record -a -e cpu-cycles:p ...    # use ibs op counting cycle count
       perf record -a -e r076:p ...          # same as -e cpu-cycles:p
       perf record -a -e r0C1:p ...          # use ibs op counting micro-ops
      
      Each ibs sample contains a linear address that points to the
      instruction that was causing the sample to trigger. With ibs we have
      skid 0. Thus, ibs supports precise levels 1 and 2. Samples are marked
      with the PERF_EFLAGS_EXACT flag set. In rare cases the rip is invalid
      when IBS was not able to record the rip correctly. Then the
      PERF_EFLAGS_EXACT flag is cleared and the rip is taken from pt_regs.
      
      V2:
      * don't drop samples in precise level 2 if rip is invalid, instead
        support the PERF_EFLAGS_EXACT flag
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20120502103309.GP18810@erda.amd.com
      
      Signed-off-by: Ingo Molnar's avatarIngo Molnar <mingo@kernel.org>
      450bbd49
  12. 26 Apr, 2012 1 commit
  13. 16 Mar, 2012 1 commit
  14. 05 Mar, 2012 1 commit
  15. 02 Mar, 2012 1 commit
  16. 06 Dec, 2011 1 commit
    • Robert Richter's avatar
      perf, x86: Fix event scheduler for constraints with overlapping counters · bc1738f6
      Robert Richter authored
      
      
      The current x86 event scheduler fails to resolve scheduling problems
      of certain combinations of events and constraints. This happens if the
      counter mask of such an event is not a subset of any other counter
      mask of a constraint with an equal or higher weight, e.g. constraints
      of the AMD family 15h pmu:
      
                              counter mask    weight
      
       amd_f15_PMC30          0x09            2  <--- overlapping counters
       amd_f15_PMC20          0x07            3
       amd_f15_PMC53          0x38            3
      
      The scheduler does not find then an existing solution. Here is an
      example:
      
       event code     counter         failure         possible solution
      
       0x02E          PMC[3,0]        0               3
       0x043          PMC[2:0]        1               0
       0x045          PMC[2:0]        2               1
       0x046          PMC[2:0]        FAIL            2
      
      The event scheduler may not select the correct counter in the first
      cycle because it needs to know which subsequent events will be
      scheduled. It may fail to schedule the events then.
      
      To solve this, we now save the scheduler state of events with
      overlapping counter counstraints.  If we fail to schedule the events
      we rollback to those states and try to use another free counter.
      
      Constraints with overlapping counters are marked with a new introduced
      overlap flag. We set the overlap flag for such constraints to give the
      scheduler a hint which events to select for counter rescheduling. The
      EVENT_CONSTRAINT_OVERLAP() macro can be used for this.
      
      Care must be taken as the rescheduling algorithm is O(n!) which will
      increase scheduling cycles for an over-commited system dramatically.
      The number of such EVENT_CONSTRAINT_OVERLAP() macros and its counter
      masks must be kept at a minimum. Thus, the current stack is limited to
      2 states to limit the number of loops the algorithm takes in the worst
      case.
      
      On systems with no overlapping-counter constraints, this
      implementation does not increase the loop count compared to the
      previous algorithm.
      
      V2:
      * Renamed redo -> overlap.
      * Reimplementation using perf scheduling helper functions.
      
      V3:
      * Added WARN_ON_ONCE() if out of save states.
      * Changed function interface of perf_sched_restore_state() to use bool
        as return value.
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1321616122-1533-3-git-send-email-robert.richter@amd.com
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      bc1738f6
  17. 10 Oct, 2011 1 commit
  18. 06 Oct, 2011 1 commit
  19. 27 Sep, 2011 1 commit
  20. 26 Sep, 2011 1 commit
  21. 14 Aug, 2011 1 commit
  22. 01 Jul, 2011 1 commit
    • Peter Zijlstra's avatar
      perf, arch: Add generic NODE cache events · 89d6c0b5
      Peter Zijlstra authored
      
      
      Add a NODE level to the generic cache events which is used to measure
      local vs remote memory accesses. Like all other cache events, an
      ACCESS is HIT+MISS, if there is no way to distinguish between reads
      and writes do reads only etc..
      
      The below needs filling out for !x86 (which I filled out with
      unsupported events).
      
      I'm fairly sure ARM can leave it like that since it doesn't strike me as
      an architecture that even has NUMA support. SH might have something since
      it does appear to have some NUMA bits.
      
      Sparc64, PowerPC and MIPS certainly want a good look there since they
      clearly are NUMA capable.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: David Miller <davem@davemloft.net>
      Cc: Anton Blanchard <anton@samba.org>
      Cc: David Daney <ddaney@caviumnetworks.com>
      Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1303508226.4865.8.camel@laptop
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      89d6c0b5
  23. 29 Apr, 2011 1 commit
  24. 19 Apr, 2011 2 commits
  25. 16 Feb, 2011 1 commit
    • Robert Richter's avatar
      perf, x86: Add support for AMD family 15h core counters · 4979d272
      Robert Richter authored
      
      
      This patch adds support for AMD family 15h core counters. There are
      major changes compared to family 10h. First, there is a new perfctr
      msr range for up to 6 counters. Northbridge counters are separate
      now. This patch only adds support for core counters. Second, certain
      events may only be scheduled on certain counters. For this we need to
      extend the event scheduling and constraints.
      
      We use cpu feature flags to calculate family 15h msr address offsets.
      This way we later can implement a faster ALTERNATIVE() version for
      this.
      Signed-off-by: default avatarRobert Richter <robert.richter@amd.com>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110215135210.GB5874@erda.amd.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4979d272
  26. 08 Dec, 2010 1 commit
  27. 10 Nov, 2010 1 commit
  28. 18 Oct, 2010 1 commit
  29. 03 Jul, 2010 1 commit
  30. 02 Apr, 2010 5 commits