Skip to content
Snippets Groups Projects
  1. Sep 24, 2021
    • Artem Bityutskiy's avatar
      intel_idle: enable interrupts before C1 on Xeons · c227233a
      Artem Bityutskiy authored
      
      Enable local interrupts before requesting C1 on the last two generations
      of Intel Xeon platforms: Sky Lake, Cascade Lake, Cooper Lake, Ice Lake.
      This decreases average C1 interrupt latency by about 5-10%, as measured
      with the 'wult' tool.
      
      The '->enter()' function of the driver enters C-states with local
      interrupts disabled by executing the 'monitor' and 'mwait' pair of
      instructions. If an interrupt happens, the CPU exits the C-state and
      continues executing instructions after 'mwait'. It does not jump to
      the interrupt handler, because local interrupts are disabled. The
      cpuidle subsystem enables interrupts a bit later, after doing some
      housekeeping.
      
      With this patch, we enable local interrupts before requesting C1. In
      this case, if the CPU wakes up because of an interrupt, it will jump
      to the interrupt handler right away. The cpuidle housekeeping will be
      done after the pending interrupt(s) are handled.
      
      Enabling interrupts before entering a C-state has measurable impact
      for faster C-states, like C1. Deeper, but slower C-states like C6 do
      not really benefit from this sort of change, because their latency is
      a lot higher comparing to the delay added by cpuidle housekeeping.
      
      This change was also tested with cyclictest and dbench. In case of Ice
      Lake, the average cyclictest latency decreased by 5.1%, and the average
      'dbench' throughput increased by about 0.8%. Both tests were run for 4
      hours with only C1 enabled (all other idle states, including 'POLL',
      were disabled). CPU frequency was pinned to HFM, and uncore frequency
      was pinned to the maximum value. The other platforms had similar
      single-digit percentage improvements.
      
      It is worth noting that this patch affects 'cpuidle' statistics a tiny
      bit.  Before this patch, C1 residency did not include the interrupt
      handling time, but with this patch, it will include it. This is similar
      to what happens in case of the 'POLL' state, which also runs with
      interrupts enabled.
      
      Suggested-by: default avatarLen Brown <len.brown@intel.com>
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c227233a
  2. Jun 09, 2021
    • Chen Yu's avatar
      intel_idle: Adjust the SKX C6 parameters if PC6 is disabled · 64233338
      Chen Yu authored
      Because cpuidle assumes worst-case C-state parameters, PC6 parameters
      are used for describing C6, which is worst-case for requesting CC6.
      When PC6 is enabled, this is appropriate. But if PC6 is disabled
      in the BIOS, the exit latency and target residency should be adjusted
      accordingly.
      
      Exit latency:
      Previously the C6 exit latency was measured as the PC6 exit latency.
      With PC6 disabled, the C6 exit latency should be the one of CC6.
      
      Target residency:
      With PC6 disabled, the idle duration within [CC6, PC6) would make the
      idle governor choose C1E over C6. This would cause low energy-efficiency.
      We should lower the bar to request C6 when PC6 is disabled.
      
      To fill this gap, check if PC6 is disabled in the BIOS in the
      MSR_PKG_CST_CONFIG_CONTROL(0xe2) register. If so, use the CC6 exit latency
      for C6 and set target_residency to 3 times of the new exit latency. [This
      is consistent with how intel_idle driver uses _CST to calculate the
      target_residency.] As a result, the OS would be more likely to choose C6
      over C1E when PC6 is disabled, which is reasonable, because if C6 is
      enabled, it implies that the user cares about energy, so choosing C6 more
      frequently makes sense.
      
      The new CC6 exit latency of 92us was measured with wult[1] on SKX via NIC
      wakeup as the 99.99th percentile. Also CLX and CPX both have the same CPU
      model number as SkX, but their CC6 exit latencies are similar to the SKX
      one, 96us and 89us respectively, so reuse the SKX value for them.
      
      There is a concern that it might be better to use a more generic approach
      instead of optimizing every platform. However, if the required code
      complexity and different PC6 bit interpretation on different platforms
      are taken into account, tuning the code per platform seems to be an
      acceptable tradeoff.
      
      Link: https://intel.github.io/wult/
      
       # [1]
      Suggested-by: default avatarLen Brown <len.brown@intel.com>
      Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      Reviewed-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      [ rjw: Subject and changelog edits ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      64233338
  3. Apr 08, 2021
  4. Mar 18, 2021
  5. Jan 22, 2021
  6. Dec 30, 2020
    • Artem Bityutskiy's avatar
      intel_idle: add SnowRidge C-state table · 9cf93f05
      Artem Bityutskiy authored
      
      Add C-state table for the SnowRidge SoC which is found on Intel Jacobsville
      platforms.
      
      The following has been changed.
      
       1. C1E latency changed from 10us to 15us. It was measured using the
          open source "wult" tool (the "nic" method, 15us is the 99.99th
          percentile).
      
       2. C1E power break even changed from 20us to 25us, which may result
          in less C1E residency in some workloads.
      
       3. C6 latency changed from 50us to 130us. Measured the same way as C1E.
      
      The C6 C-state is supported only by some SnowRidge revisions, so add a C-state
      table commentary about this.
      
      On SnowRidge, C6 support is enumerated via the usual mechanism: "mwait" leaf of
      the "cpuid" instruction. The 'intel_idle' driver does check this leaf, so even
      though C6 is present in the table, the driver will only use it if the CPU does
      support it.
      
      Signed-off-by: default avatarArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9cf93f05
  7. Dec 03, 2020
  8. Nov 24, 2020
  9. Oct 27, 2020
    • Chen Yu's avatar
      intel_idle: Fix max_cstate for processor models without C-state tables · 4e0ba557
      Chen Yu authored
      
      Currently intel_idle driver gets the c-state information from ACPI
      _CST if the processor model is not recognized by it. However the
      c-state in _CST starts with index 1 which is different from the
      index in intel_idle driver's internal c-state table.
      
      While intel_idle_max_cstate_reached() was previously introduced to
      deal with intel_idle driver's internal c-state table, re-using
      this function directly on _CST is incorrect.
      
      Fix this by subtracting 1 from the index when checking max_cstate
      in the _CST case.
      
      For example, append intel_idle.max_cstate=1 in boot command line,
      Before the patch:
      grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
      POLL
      After the patch:
      grep . /sys/devices/system/cpu/cpu0/cpuidle/state*/name
      /sys/devices/system/cpu/cpu0/cpuidle/state0/name:POLL
      /sys/devices/system/cpu/cpu0/cpuidle/state1/name:C1_ACPI
      
      Fixes: 18734958 ("intel_idle: Use ACPI _CST for processor models without C-state tables")
      Reported-by: default avatarPengfei Xu <pengfei.xu@intel.com>
      Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
      Signed-off-by: default avatarChen Yu <yu.c.chen@intel.com>
      [ rjw: Changelog edits ]
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4e0ba557
  10. Oct 16, 2020
    • Mel Gorman's avatar
      intel_idle: Ignore _CST if control cannot be taken from the platform · 75af76d0
      Mel Gorman authored
      
      e6d4f08a ("intel_idle: Use ACPI _CST on server systems") avoids
      enabling c-states that have been disabled by the platform with the
      exception of C1E.
      
      Unfortunately, BIOS implementations are not always consistent in terms
      of how capabilities are advertised and control cannot always be handed
      over. If control cannot be handed over then intel_idle reports that "ACPI
      _CST not found or not usable" but does not clear acpi_state_table.count
      meaning the information is still partially used.
      
      This patch ignores ACPI information if CST control cannot be requested from
      the platform. This was only observed on a number of Haswell platforms that
      had identical CPUs but not identical BIOS versions.  While this problem
      may be rare overall, 24 separate test cases bisected to this specific
      commit across 4 separate test machines and is worth addressing. If the
      situation occurs, the kernel behaves as it did before commit e6d4f08a
      and uses any c-states that are discovered.
      
      The affected test cases were all ones that involved a small number of
      processes -- exec microbenchmark, pipe microbenchmark, git test suite,
      netperf, tbench with one client and system call microbenchmark. Each
      case benefits from being able to use turboboost which is prevented if the
      lower c-states are unavailable. This may mask real regressions specific
      to older hardware so it is worth addressing.
      
      C-state status before and after the patch
      
      5.9.0-vanilla            POLL     latency:0      disabled:0 default:enabled
      5.9.0-vanilla            C1       latency:2      disabled:0 default:enabled
      5.9.0-vanilla            C1E      latency:10     disabled:0 default:enabled
      5.9.0-vanilla            C3       latency:33     disabled:1 default:disabled
      5.9.0-vanilla            C6       latency:133    disabled:1 default:disabled
      5.9.0-ignore-cst-v1r1    POLL     latency:0      disabled:0 default:enabled
      5.9.0-ignore-cst-v1r1    C1       latency:2      disabled:0 default:enabled
      5.9.0-ignore-cst-v1r1    C1E      latency:10     disabled:0 default:enabled
      5.9.0-ignore-cst-v1r1    C3       latency:33     disabled:0 default:enabled
      5.9.0-ignore-cst-v1r1    C6       latency:133    disabled:0 default:enabled
      
      Patch enables C3/C6.
      
      Netperf UDP_STREAM
      
      netperf-udp
                                            5.5.0                  5.9.0
                                          vanilla        ignore-cst-v1r1
      Hmean     send-64         193.41 (   0.00%)      226.54 *  17.13%*
      Hmean     send-128        392.16 (   0.00%)      450.54 *  14.89%*
      Hmean     send-256        769.94 (   0.00%)      881.85 *  14.53%*
      Hmean     send-1024      2994.21 (   0.00%)     3468.95 *  15.85%*
      Hmean     send-2048      5725.60 (   0.00%)     6628.99 *  15.78%*
      Hmean     send-3312      8468.36 (   0.00%)    10288.02 *  21.49%*
      Hmean     send-4096     10135.46 (   0.00%)    12387.57 *  22.22%*
      Hmean     send-8192     17142.07 (   0.00%)    19748.11 *  15.20%*
      Hmean     send-16384    28539.71 (   0.00%)    30084.45 *   5.41%*
      
      Fixes: e6d4f08a ("intel_idle: Use ACPI _CST on server systems")
      Signed-off-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Cc: 5.6+ <stable@vger.kernel.org> # 5.6+
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      75af76d0
    • Alexander Monakov's avatar
      intel_idle: mention assumption that WBINVD is not needed · 8bb2e2a8
      Alexander Monakov authored
      
      Intel SDM does not explicitly say that entering a C-state via MWAIT will
      implicitly flush CPU caches as appropriate for that C-state. However,
      documentation for individual Intel CPU generations does mention this
      behavior.
      
      Since intel_idle binds to any Intel CPU with MWAIT, list this assumption
      of MWAIT behavior.
      
      In passing, reword opening comment to make it clear that the driver can
      load on any old and future Intel CPU with MWAIT.
      
      Signed-off-by: default avatarAlexander Monakov <amonakov@ispras.ru>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8bb2e2a8
  11. Aug 26, 2020
  12. Jul 30, 2020
  13. Jul 29, 2020
    • Neal Liu's avatar
      cpuidle: change enter_s2idle() prototype · efe97112
      Neal Liu authored
      
      Control Flow Integrity(CFI) is a security mechanism that disallows
      changes to the original control flow graph of a compiled binary,
      making it significantly harder to perform such attacks.
      
      init_state_node() assign same function callback to different
      function pointer declarations.
      
      static int init_state_node(struct cpuidle_state *idle_state,
                                 const struct of_device_id *matches,
                                 struct device_node *state_node) { ...
              idle_state->enter = match_id->data; ...
              idle_state->enter_s2idle = match_id->data; }
      
      Function declarations:
      
      struct cpuidle_state { ...
              int (*enter) (struct cpuidle_device *dev,
                            struct cpuidle_driver *drv,
                            int index);
      
              void (*enter_s2idle) (struct cpuidle_device *dev,
                                    struct cpuidle_driver *drv,
                                    int index); };
      
      In this case, either enter() or enter_s2idle() would cause CFI check
      failed since they use same callee.
      
      Align function prototype of enter() since it needs return value for
      some use cases. The return value of enter_s2idle() is no
      need currently.
      
      Signed-off-by: default avatarNeal Liu <neal.liu@mediatek.com>
      Reviewed-by: default avatarSami Tolvanen <samitolvanen@google.com>
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      efe97112
  14. Jul 16, 2020
  15. Jun 29, 2020
    • Rafael J. Wysocki's avatar
      intel_idle: Eliminate redundant static variable · dab20177
      Rafael J. Wysocki authored
      
      The value of the lapic_timer_always_reliable static variable in
      the intel_idle driver reflects the boot_cpu_has(X86_FEATURE_ARAT)
      value and so it also reflects the static_cpu_has(X86_FEATURE_ARAT)
      value.
      
      Hence, the lapic_timer_always_reliable check in intel_idle() is
      redundant and apart from this lapic_timer_always_reliable is only
      used in two places in which boot_cpu_has(X86_FEATURE_ARAT) can be
      used directly.
      
      Eliminate the lapic_timer_always_reliable variable in accordance
      with the above observations.
      
      No intentional functional impact.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      dab20177
  16. Mar 24, 2020
  17. Feb 11, 2020
  18. Feb 03, 2020
    • Rafael J. Wysocki's avatar
      intel_idle: Introduce 'states_off' module parameter · 4dcb78ee
      Rafael J. Wysocki authored
      
      In certain system configurations it may not be desirable to use some
      C-states assumed to be available by intel_idle and the driver needs
      to be prevented from using them even before the cpuidle sysfs
      interface becomes accessible to user space.  Currently, the only way
      to achieve that is by setting the 'max_cstate' module parameter to a
      value lower than the index of the shallowest of the C-states in
      question, but that may be overly intrusive, because it effectively
      makes all of the idle states deeper than the 'max_cstate' one go
      away (and the C-state to avoid may be in the middle of the range
      normally regarded as available).
      
      To allow that limitation to be overcome, introduce a new module
      parameter called 'states_off' to represent a list of idle states to
      be disabled by default in the form of a bitmask and update the
      documentation to cover it.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4dcb78ee
    • Rafael J. Wysocki's avatar
      intel_idle: Introduce 'use_acpi' module parameter · 3a5be9b8
      Rafael J. Wysocki authored
      
      For diagnostics, it is generally useful to be able to make intel_idle
      take the system's ACPI tables into consideration even if that is not
      required for the processor model in there, so introduce a new module
      parameter, 'use_acpi', to make that happen and update the documentation
      to cover it.
      
      While at it, fix the 'no_acpi' module parameter name in the
      documentation.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3a5be9b8
  19. Jan 22, 2020
  20. Jan 13, 2020
    • Sean Christopherson's avatar
      x86/msr-index: Clean up bit defines for IA32_FEATURE_CONTROL MSR · 32ad73db
      Sean Christopherson authored
      
      As pointed out by Boris, the defines for bits in IA32_FEATURE_CONTROL
      are quite a mouthful, especially the VMX bits which must differentiate
      between enabling VMX inside and outside SMX (TXT) operation.  Rename the
      MSR and its bit defines to abbreviate FEATURE_CONTROL as FEAT_CTL to
      make them a little friendlier on the eyes.
      
      Arguably, the MSR itself should keep the full IA32_FEATURE_CONTROL name
      to match Intel's SDM, but a future patch will add a dedicated Kconfig,
      file and functions for the MSR. Using the full name for those assets is
      rather unwieldy, so bite the bullet and use IA32_FEAT_CTL so that its
      nomenclature is consistent throughout the kernel.
      
      Opportunistically, fix a few other annoyances with the defines:
      
        - Relocate the bit defines so that they immediately follow the MSR
          define, e.g. aren't mistaken as belonging to MISC_FEATURE_CONTROL.
        - Add whitespace around the block of feature control defines to make
          it clear they're all related.
        - Use BIT() instead of manually encoding the bit shift.
        - Use "VMX" instead of "VMXON" to match the SDM.
        - Append "_ENABLED" to the LMCE (Local Machine Check Exception) bit to
          be consistent with the kernel's verbiage used for all other feature
          control bits.  Note, the SDM refers to the LMCE bit as LMCE_ON,
          likely to differentiate it from IA32_MCG_EXT_CTL.LMCE_EN.  Ignore
          the (literal) one-off usage of _ON, the SDM is simply "wrong".
      
      Signed-off-by: default avatarSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: https://lkml.kernel.org/r/20191221044513.21680-2-sean.j.christopherson@intel.com
      32ad73db
  21. Dec 27, 2019
    • Rafael J. Wysocki's avatar
      intel_idle: Use ACPI _CST on server systems · e6d4f08a
      Rafael J. Wysocki authored
      
      In many cases, especially on server systems, it is desirable to avoid
      enabling C-states that have been disabled in the platform firmware
      (BIOS) setup, except for C1E.
      
      As a rule, the C-states disabled this way are not listed by ACPI
      _CST, so if that is used by intel_idle along with the specific
      table of C-states that it has for the given processor, the C-states
      disabled through the platform firmware will not be enabled by default
      by intel_idle.
      
      Accordingly, set the use_acpi flag (introduced previously) in all
      server processor profiles defined in intel_idle (so as to make it use
      ACPI _CST to decide which C-states to enable by default) and set
      the CPUIDLE_FLAG_ALWAYS_ENABLE flag (also introduced previously)
      for C1E in all C-states tables in intel_idle that contain C1 too
      (so that C1E is enabled regardless of whether or not it is listed
      by ACPI _CST).
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e6d4f08a
    • Rafael J. Wysocki's avatar
      intel_idle: Add module parameter to prevent ACPI _CST from being used · 4ec32d9e
      Rafael J. Wysocki authored
      
      Add a new module parameter called "no_acpi" to the intel_idle driver
      to allow the driver to be prevented from using ACPI _CST via kernel
      command line.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      4ec32d9e
    • Rafael J. Wysocki's avatar
      intel_idle: Allow ACPI _CST to be used for selected known processors · bff8e60a
      Rafael J. Wysocki authored
      
      Update the intel_idle driver to get the C-states information from ACPI
      _CST in some cases in which the processor is known to the driver, as long as
      that information is available and the new use_acpi flag is set in the
      profile of the processor in question.
      
      In the cases when there is a specific table of C-states for the given
      processor in the driver, that table is used as the primary source of
      information on the available C-states, but if ACPI _CST is present,
      the C-states that are not listed by it will not be enabled by default
      (they still can be enabled later by user space via sysfs, though).
      
      The new CPUIDLE_FLAG_ALWAYS_ENABLE flag can be used for marking
      C-states that should be enabled by default even if they are not
      listed by ACPI _CST.
      
      Signed-off-by: default avatarRafael J. Wysocki <rafael.j.wysocki@intel.com>
      bff8e60a
Loading