• Andy Lutomirski's avatar
    x86/mm: Rework lazy TLB to track the actual loaded mm · 3d28ebce
    Andy Lutomirski authored
    Lazy TLB state is currently managed in a rather baroque manner.
    AFAICT, there are three possible states:
    
     - Non-lazy.  This means that we're running a user thread or a
       kernel thread that has called use_mm().  current->mm ==
       current->active_mm == cpu_tlbstate.active_mm and
       cpu_tlbstate.state == TLBSTATE_OK.
    
     - Lazy with user mm.  We're running a kernel thread without an mm
       and we're borrowing an mm_struct.  We have current->mm == NULL,
       current->active_mm == cpu_tlbstate.active_mm, cpu_tlbstate.state
       != TLBSTATE_OK (i.e. TLBSTATE_LAZY or 0).  The current cpu is set
       in mm_cpumask(current->active_mm).  CR3 points to
       current->active_mm->pgd.  The TLB is up to date.
    
     - Lazy with init_mm.  This happens when we call leave_mm().  We
       have current->mm == NULL, current->active_mm ==
       cpu_tlbstate.active_mm, but that mm is only relelvant insofar as
       the scheduler is tracking it for refcounting.  cpu_tlbstate.state
       != TLBSTATE_OK.  The current cpu is clear in
       mm_cpumask(current->active_mm).  CR3 points to swapper_pg_dir,
       i.e. init_mm->pgd.
    
    This patch simplifies the situation.  Other than perf, x86 stops
    caring about current->active_mm at all.  We have
    cpu_tlbstate.loaded_mm pointing to the mm that CR3 references.  The
    TLB is always up to date for that mm.  leave_mm() just switches us
    to init_mm.  There are no longer any special cases for mm_cpumask,
    and switch_mm() switches mms without worrying about laziness.
    
    After this patch, cpu_tlbstate.state serves only to tell the TLB
    flush code whether it may switch to init_mm instead of doing a
    normal flush.
    
    This makes fairly extensive changes to xen_exit_mmap(), which used
    to look a bit like black magic.
    
    Perf is unchanged.  With or without this change, perf may behave a bit
    erratically if it tries to read user memory in kernel thread context.
    We should build on this patch to teach perf to never look at user
    memory when cpu_tlbstate.loaded_mm != current->mm.
    Signed-off-by: default avatarAndy Lutomirski <luto@kernel.org>
    Cc: Andrew Morton <akpm@linux-foundation.org>
    Cc: Arjan van de Ven <arjan@linux.intel.com>
    Cc: Borislav Petkov <bpetkov@suse.de>
    Cc: Dave Hansen <dave.hansen@intel.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Nadav Amit <nadav.amit@gmail.com>
    Cc: Nadav Amit <namit@vmware.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar's avatarIngo Molnar <mingo@kernel.org>
    3d28ebce
core.c 58.9 KB