Skip to content
Snippets Groups Projects
  1. Jan 31, 2020
  2. Dec 05, 2019
    • William Breathitt Gray's avatar
      bitops: introduce the for_each_set_clump8 macro · 169c474f
      William Breathitt Gray authored
      Pach series "Introduce the for_each_set_clump8 macro", v18.
      
      While adding GPIO get_multiple/set_multiple callback support for various
      drivers, I noticed a pattern of looping manifesting that would be useful
      standardized as a macro.
      
      This patchset introduces the for_each_set_clump8 macro and utilizes it
      in several GPIO drivers.  The for_each_set_clump macro8 facilitates a
      for-loop syntax that iterates over a memory region entire groups of set
      bits at a time.
      
      For example, suppose you would like to iterate over a 32-bit integer 8
      bits at a time, skipping over 8-bit groups with no set bit, where
      XXXXXXXX represents the current 8-bit group:
      
          Example:        10111110 00000000 11111111 00110011
          First loop:     10111110 00000000 11111111 XXXXXXXX
          Second loop:    10111110 00000000 XXXXXXXX 00110011
          Third loop:     XXXXXXXX 00000000 11111111 00110011
      
      Each iteration of the loop returns the next 8-bit group that has at
      least one set bit.
      
      The for_each_set_clump8 macro has four parameters:
      
          * start: set to the bit offset of the current clump
          * clump: set to the current clump value
          * bits: bitmap to search within
          * size: bitmap size in number of bits
      
      In this version of the patchset, the for_each_set_clump macro has been
      reimplemented and simplified based on the suggestions provided by Rasmus
      Villemoes and Andy Shevchenko in the version 4 submission.
      
      In particular, the function of the for_each_set_clump macro has been
      restricted to handle only 8-bit clumps; the drivers that use the
      for_each_set_clump macro only handle 8-bit ports so a generic
      for_each_set_clump implementation is not necessary.  Thus, a solution
      for large clumps (i.e.  those larger than the width of a bitmap word)
      can be postponed until a driver appears that actually requires such a
      generic for_each_set_clump implementation.
      
      For what it's worth, a semi-generic for_each_set_clump (i.e.  for clumps
      smaller than the width of a bitmap word) can be implemented by simply
      replacing the hardcoded '8' and '0xFF' instances with respective
      variables.  I have not yet had a need for such an implementation, and
      since it falls short of a true generic for_each_set_clump function, I
      have decided to forgo such an implementation for now.
      
      In addition, the bitmap_get_value8 and bitmap_set_value8 functions are
      introduced to get and set 8-bit values respectively.  Their use is based
      on the behavior suggested in the patchset version 4 review.
      
      This patch (of 14):
      
      This macro iterates for each 8-bit group of bits (clump) with set bits,
      within a bitmap memory region.  For each iteration, "start" is set to
      the bit offset of the found clump, while the respective clump value is
      stored to the location pointed by "clump".  Additionally, the
      bitmap_get_value8 and bitmap_set_value8 functions are introduced to
      respectively get and set an 8-bit value in a bitmap memory region.
      
      [gustavo@embeddedor.com: fix potential sign-extension overflow]
        Link: http://lkml.kernel.org/r/20191015184657.GA26541@embeddedor
      [akpm@linux-foundation.org: s/ULL/UL/, per Joe]
      [vilhelm.gray@gmail.com: add for_each_set_clump8 documentation]
        Link: http://lkml.kernel.org/r/20191016161825.301082-1-vilhelm.gray@gmail.com
      Link: http://lkml.kernel.org/r/893c3b4f03266c9496137cc98ac2b1bd27f92c73.1570641097.git.vilhelm.gray@gmail.com
      
      
      Signed-off-by: default avatarWilliam Breathitt Gray <vilhelm.gray@gmail.com>
      Signed-off-by: default avatarGustavo A. R. Silva <gustavo@embeddedor.com>
      Suggested-by: default avatarAndy Shevchenko <andy.shevchenko@gmail.com>
      Suggested-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Suggested-by: default avatarLukas Wunner <lukas@wunner.de>
      Tested-by: default avatarAndy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Bartosz Golaszewski <bgolaszewski@baylibre.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Phil Reid <preid@electromag.com.au>
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: Mathias Duckeck <m.duckeck@kunbus.de>
      Cc: Morten Hein Tiljeset <morten.tiljeset@prevas.dk>
      Cc: Sean Nyekjaer <sean.nyekjaer@prevas.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      169c474f
  3. May 30, 2019
  4. Feb 07, 2018
    • Clement Courbet's avatar
      lib: optimize cpumask_next_and() · 0ade34c3
      Clement Courbet authored
      We've measured that we spend ~0.6% of sys cpu time in cpumask_next_and().
      It's essentially a joined iteration in search for a non-zero bit, which is
      currently implemented as a lookup join (find a nonzero bit on the lhs,
      lookup the rhs to see if it's set there).
      
      Implement a direct join (find a nonzero bit on the incrementally built
      join).  Also add generic bitmap benchmarks in the new `test_find_bit`
      module for new function (see `find_next_and_bit` in [2] and [3] below).
      
      For cpumask_next_and, direct benchmarking shows that it's 1.17x to 14x
      faster with a geometric mean of 2.1 on 32 CPUs [1].  No impact on memory
      usage.  Note that on Arm, the new pure-C implementation still outperforms
      the old one that uses a mix of C and asm (`find_next_bit`) [3].
      
      [1] Approximate benchmark code:
      
      ```
        unsigned long src1p[nr_cpumask_longs] = {pattern1};
        unsigned long src2p[nr_cpumask_longs] = {pattern2};
        for (/*a bunch of repetitions*/) {
          for (int n = -1; n <= nr_cpu_ids; ++n) {
            asm volatile("" : "+rm"(src1p)); // prevent any optimization
            asm volatile("" : "+rm"(src2p));
            unsigned long result = cpumask_next_and(n, src1p, src2p);
            asm volatile("" : "+rm"(result));
          }
        }
      ```
      
      Results:
      pattern1    pattern2     time_before/time_after
      0x0000ffff  0x0000ffff   1.65
      0x0000ffff  0x00005555   2.24
      0x0000ffff  0x00001111   2.94
      0x0000ffff  0x00000000   14.0
      0x00005555  0x0000ffff   1.67
      0x00005555  0x00005555   1.71
      0x00005555  0x00001111   1.90
      0x00005555  0x00000000   6.58
      0x00001111  0x0000ffff   1.46
      0x00001111  0x00005555   1.49
      0x00001111  0x00001111   1.45
      0x00001111  0x00000000   3.10
      0x00000000  0x0000ffff   1.18
      0x00000000  0x00005555   1.18
      0x00000000  0x00001111   1.17
      0x00000000  0x00000000   1.25
      -----------------------------
                     geo.mean  2.06
      
      [2] test_find_next_bit, X86 (skylake)
      
       [ 3913.477422] Start testing find_bit() with random-filled bitmap
       [ 3913.477847] find_next_bit: 160868 cycles, 16484 iterations
       [ 3913.477933] find_next_zero_bit: 169542 cycles, 16285 iterations
       [ 3913.478036] find_last_bit: 201638 cycles, 16483 iterations
       [ 3913.480214] find_first_bit: 4353244 cycles, 16484 iterations
       [ 3913.480216] Start testing find_next_and_bit() with random-filled
       bitmap
       [ 3913.481074] find_next_and_bit: 89604 cycles, 8216 iterations
       [ 3913.481075] Start testing find_bit() with sparse bitmap
       [ 3913.481078] find_next_bit: 2536 cycles, 66 iterations
       [ 3913.481252] find_next_zero_bit: 344404 cycles, 32703 iterations
       [ 3913.481255] find_last_bit: 2006 cycles, 66 iterations
       [ 3913.481265] find_first_bit: 17488 cycles, 66 iterations
       [ 3913.481266] Start testing find_next_and_bit() with sparse bitmap
       [ 3913.481272] find_next_and_bit: 764 cycles, 1 iterations
      
      [3] test_find_next_bit, arm (v7 odroid XU3).
      
      [  267.206928] Start testing find_bit() with random-filled bitmap
      [  267.214752] find_next_bit: 4474 cycles, 16419 iterations
      [  267.221850] find_next_zero_bit: 5976 cycles, 16350 iterations
      [  267.229294] find_last_bit: 4209 cycles, 16419 iterations
      [  267.279131] find_first_bit: 1032991 cycles, 16420 iterations
      [  267.286265] Start testing find_next_and_bit() with random-filled
      bitmap
      [  267.302386] find_next_and_bit: 2290 cycles, 8140 iterations
      [  267.309422] Start testing find_bit() with sparse bitmap
      [  267.316054] find_next_bit: 191 cycles, 66 iterations
      [  267.322726] find_next_zero_bit: 8758 cycles, 32703 iterations
      [  267.329803] find_last_bit: 84 cycles, 66 iterations
      [  267.336169] find_first_bit: 4118 cycles, 66 iterations
      [  267.342627] Start testing find_next_and_bit() with sparse bitmap
      [  267.356919] find_next_and_bit: 91 cycles, 1 iterations
      
      [courbet@google.com: v6]
        Link: http://lkml.kernel.org/r/20171129095715.23430-1-courbet@google.com
      [geert@linux-m68k.org: m68k/bitops: always include <asm-generic/bitops/find.h>]
        Link: http://lkml.kernel.org/r/1512556816-28627-1-git-send-email-geert@linux-m68k.org
      Link: http://lkml.kernel.org/r/20171128131334.23491-1-courbet@google.com
      
      
      Signed-off-by: default avatarClement Courbet <courbet@google.com>
      Signed-off-by: default avatarGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Yury Norov <ynorov@caviumnetworks.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0ade34c3
  5. Feb 25, 2017
  6. Apr 17, 2015
    • Yury Norov's avatar
      lib: rename lib/find_next_bit.c to lib/find_bit.c · 840620a1
      Yury Norov authored
      
      This file contains implementation for all find_*_bit{,_le}
      So giving it more generic name looks reasonable.
      
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarGeorge Spelvin <linux@horizon.com>
      Cc: Alexey Klimov <klimov.linux@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Valentin Rothberg <valentinrothberg@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      840620a1
    • Yury Norov's avatar
      lib: move find_last_bit to lib/find_next_bit.c · 8f6f19dd
      Yury Norov authored
      
      Currently all 'find_*_bit' family is located in lib/find_next_bit.c,
      except 'find_last_bit', which is in lib/find_last_bit.c. It seems,
      there's no major benefit to have it separated.
      
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarGeorge Spelvin <linux@horizon.com>
      Cc: Alexey Klimov <klimov.linux@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Valentin Rothberg <valentinrothberg@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f6f19dd
    • Yury Norov's avatar
      lib: find_*_bit reimplementation · 2c57a0e2
      Yury Norov authored
      
      This patchset does rework to find_bit function family to achieve better
      performance, and decrease size of text.  All rework is done in patch 1.
      Patches 2 and 3 are about code moving and renaming.
      
      It was boot-tested on x86_64 and MIPS (big-endian) machines.
      Performance tests were ran on userspace with code like this:
      
      	/* addr[] is filled from /dev/urandom */
      	start = clock();
      	while (ret < nbits)
      		ret = find_next_bit(addr, nbits, ret + 1);
      
      	end = clock();
      	printf("%ld\t", (unsigned long) end - start);
      
      On Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz measurements are: (for
      find_next_bit, nbits is 8M, for find_first_bit - 80K)
      
      	find_next_bit:		find_first_bit:
      	new	current		new	current
      	26932	43151		14777	14925
      	26947	43182		14521	15423
      	26507	43824		15053	14705
      	27329	43759		14473	14777
      	26895	43367		14847	15023
      	26990	43693		15103	15163
      	26775	43299		15067	15232
      	27282	42752		14544	15121
      	27504	43088		14644	14858
      	26761	43856		14699	15193
      	26692	43075		14781	14681
      	27137	42969		14451	15061
      	...			...
      
      find_next_bit performance gain is 35-40%;
      find_first_bit - no measurable difference.
      
      On ARM machine, there is arch-specific implementation for find_bit.
      
      Thanks a lot to George Spelvin and Rasmus Villemoes for hints and
      helpful discussions.
      
      This patch (of 3):
      
      New implementations takes less space in source file (see diffstat) and in
      object.  For me it's 710 vs 453 bytes of text.  It also shows better
      performance.
      
      find_last_bit description fixed due to obvious typo.
      
      [akpm@linux-foundation.org: include linux/bitmap.h, per Rasmus]
      Signed-off-by: default avatarYury Norov <yury.norov@gmail.com>
      Reviewed-by: default avatarRasmus Villemoes <linux@rasmusvillemoes.dk>
      Reviewed-by: default avatarGeorge Spelvin <linux@horizon.com>
      Cc: Alexey Klimov <klimov.linux@gmail.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Daniel Borkmann <dborkman@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: AKASHI Takahiro <takahiro.akashi@linaro.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Valentin Rothberg <valentinrothberg@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2c57a0e2
  7. Mar 07, 2012
  8. May 27, 2011
  9. Mar 24, 2011
    • Akinobu Mita's avatar
      bitops: introduce CONFIG_GENERIC_FIND_BIT_LE · 0664996b
      Akinobu Mita authored
      
      This introduces CONFIG_GENERIC_FIND_BIT_LE to tell whether to use generic
      implementation of find_*_bit_le() in lib/find_next_bit.c or not.
      
      For now we select CONFIG_GENERIC_FIND_BIT_LE for all architectures which
      enable CONFIG_GENERIC_FIND_NEXT_BIT.
      
      But m68knommu wants to define own faster find_next_zero_bit_le() and
      continues using generic find_next_{,zero_}bit().
      (CONFIG_GENERIC_FIND_NEXT_BIT and !CONFIG_GENERIC_FIND_BIT_LE)
      
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0664996b
    • Akinobu Mita's avatar
      asm-generic: change little-endian bitops to take any pointer types · a56560b3
      Akinobu Mita authored
      
      This makes the little-endian bitops take any pointer types by changing the
      prototypes and adding casts in the preprocessor macros.
      
      That would seem to at least make all the filesystem code happier, and they
      can continue to do just something like
      
        #define ext2_set_bit __test_and_set_bit_le
      
      (or whatever the exact sequence ends up being).
      
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Kyle McMartin <kyle@mcmartin.ca>
      Cc: Matthew Wilcox <willy@debian.org>
      Cc: Grant Grundler <grundler@parisc-linux.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Kazumoto Kojima <kkojima@rr.iij4u.or.jp>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a56560b3
    • Akinobu Mita's avatar
      asm-generic: rename generic little-endian bitops functions · c4945b9e
      Akinobu Mita authored
      
      As a preparation for providing little-endian bitops for all architectures,
      This renames generic implementation of little-endian bitops.  (remove
      "generic_" prefix and postfix "_le")
      
      s/generic_find_next_le_bit/find_next_bit_le/
      s/generic_find_next_zero_le_bit/find_next_zero_bit_le/
      s/generic_find_first_zero_le_bit/find_first_zero_bit_le/
      s/generic___test_and_set_le_bit/__test_and_set_bit_le/
      s/generic___test_and_clear_le_bit/__test_and_clear_bit_le/
      s/generic_test_le_bit/test_bit_le/
      s/generic___set_le_bit/__set_bit_le/
      s/generic___clear_le_bit/__clear_bit_le/
      s/generic_test_and_set_le_bit/test_and_set_bit_le/
      s/generic_test_and_clear_le_bit/test_and_clear_bit_le/
      
      Signed-off-by: default avatarAkinobu Mita <akinobu.mita@gmail.com>
      Acked-by: default avatarArnd Bergmann <arnd@arndb.de>
      Acked-by: default avatarHans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Greg Ungerer <gerg@uclinux.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c4945b9e
  10. Apr 29, 2008
    • Thomas Gleixner's avatar
      bitops: remove "optimizations" · fee4b19f
      Thomas Gleixner authored
      
      The mapsize optimizations which were moved from x86 to the generic
      code in commit 64970b68 increased the
      binary size on non x86 architectures.
      
      Looking into the real effects of the "optimizations" it turned out
      that they are not used in find_next_bit() and find_next_zero_bit().
      
      The ones in find_first_bit() and find_first_zero_bit() are used in a
      couple of places but none of them is a real hot path.
      
      Remove the "optimizations" all together and call the library functions
      unconditionally.
      
      Boot-tested on x86 and compile tested on every cross compiler I have.
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fee4b19f
  11. Apr 26, 2008
    • Alexander van Heukelum's avatar
      x86: generic versions of find_first_(zero_)bit, convert i386 · 77b9bd9c
      Alexander van Heukelum authored
      
      Generic versions of __find_first_bit and __find_first_zero_bit
      are introduced as simplified versions of __find_next_bit and
      __find_next_zero_bit. Their compilation and use are guarded by
      a new config variable GENERIC_FIND_FIRST_BIT.
      
      The generic versions of find_first_bit and find_first_zero_bit
      are implemented in terms of the newly introduced __find_first_bit
      and __find_first_zero_bit.
      
      This patch does not remove the i386-specific implementation,
      but it does switch i386 to use the generic functions by setting
      GENERIC_FIND_FIRST_BIT=y for X86_32.
      
      Signed-off-by: default avatarAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      77b9bd9c
    • Alexander van Heukelum's avatar
      x86, generic: optimize find_next_(zero_)bit for small constant-size bitmaps · 64970b68
      Alexander van Heukelum authored
      
      This moves an optimization for searching constant-sized small
      bitmaps form x86_64-specific to generic code.
      
      On an i386 defconfig (the x86#testing one), the size of vmlinux hardly
      changes with this applied. I have observed only four places where this
      optimization avoids a call into find_next_bit:
      
      In the functions return_unused_surplus_pages, alloc_fresh_huge_page,
      and adjust_pool_surplus, this patch avoids a call for a 1-bit bitmap.
      In __next_cpu a call is avoided for a 32-bit bitmap. That's it.
      
      On x86_64, 52 locations are optimized with a minimal increase in
      code size:
      
      Current #testing defconfig:
      	146 x bsf, 27 x find_next_*bit
         text    data     bss     dec     hex filename
         5392637  846592  724424 6963653  6a41c5 vmlinux
      
      After removing the x86_64 specific optimization for find_next_*bit:
      	94 x bsf, 79 x find_next_*bit
         text    data     bss     dec     hex filename
         5392358  846592  724424 6963374  6a40ae vmlinux
      
      After this patch (making the optimization generic):
      	146 x bsf, 27 x find_next_*bit
         text    data     bss     dec     hex filename
         5392396  846592  724424 6963412  6a40d4 vmlinux
      
      [ tglx@linutronix.de: build fixes ]
      
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      64970b68
    • Alexander van Heukelum's avatar
      x86: change x86 to use generic find_next_bit · 6fd92b63
      Alexander van Heukelum authored
      
      The versions with inline assembly are in fact slower on the machines I
      tested them on (in userspace) (Athlon XP 2800+, p4-like Xeon 2.8GHz, AMD
      Opteron 270). The i386-version needed a fix similar to 06024f21 to avoid
      crashing the benchmark.
      
      Benchmark using: gcc -fomit-frame-pointer -Os. For each bitmap size
      1...512, for each possible bitmap with one bit set, for each possible
      offset: find the position of the first bit starting at offset. If you
      follow ;). Times include setup of the bitmap and checking of the
      results.
      
      		Athlon		Xeon		Opteron 32/64bit
      x86-specific:	0m3.692s	0m2.820s	0m3.196s / 0m2.480s
      generic:	0m2.622s	0m1.662s	0m2.100s / 0m1.572s
      
      If the bitmap size is not a multiple of BITS_PER_LONG, and no set
      (cleared) bit is found, find_next_bit (find_next_zero_bit) returns a
      value outside of the range [0, size]. The generic version always returns
      exactly size. The generic version also uses unsigned long everywhere,
      while the x86 versions use a mishmash of int, unsigned (int), long and
      unsigned long.
      
      Using the generic version does give a slightly bigger kernel, though.
      
      defconfig:	   text    data     bss     dec     hex filename
      x86-specific:	4738555  481232  626688 5846475  5935cb vmlinux (32 bit)
      generic:	4738621  481232  626688 5846541  59360d vmlinux (32 bit)
      x86-specific:	5392395  846568  724424 6963387  6a40bb vmlinux (64 bit)
      generic:	5392458  846568  724424 6963450  6a40fa vmlinux (64 bit)
      
      Signed-off-by: default avatarAlexander van Heukelum <heukelum@fastmail.fm>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      6fd92b63
  12. Jan 29, 2008
  13. Mar 26, 2006
  14. Jan 09, 2006
  15. Apr 16, 2005
    • Linus Torvalds's avatar
      Linux-2.6.12-rc2 · 1da177e4
      Linus Torvalds authored
      Initial git repository build. I'm not bothering with the full history,
      even though we have it. We can create a separate "historical" git
      archive of that later if we want to, and in the meantime it's about
      3.2GB when imported into git - space that would just make the early
      git days unnecessarily complicated, when we don't have a lot of good
      infrastructure for it.
      
      Let it rip!
Loading