Blame jemalloc/TUNING.md

Packit Service 724aca
This document summarizes the common approaches for performance fine tuning with
Packit Service 724aca
jemalloc (as of 5.1.0).  The default configuration of jemalloc tends to work
Packit Service 724aca
reasonably well in practice, and most applications should not have to tune any
Packit Service 724aca
options. However, in order to cover a wide range of applications and avoid
Packit Service 724aca
pathological cases, the default setting is sometimes kept conservative and
Packit Service 724aca
suboptimal, even for many common workloads.  When jemalloc is properly tuned for
Packit Service 724aca
a specific application / workload, it is common to improve system level metrics
Packit Service 724aca
by a few percent, or make favorable trade-offs.
Packit Service 724aca
Packit Service 724aca
Packit Service 724aca
## Notable runtime options for performance tuning
Packit Service 724aca
Packit Service 724aca
Runtime options can be set via
Packit Service 724aca
[malloc_conf](http://jemalloc.net/jemalloc.3.html#tuning).
Packit Service 724aca
Packit Service 724aca
* [background_thread](http://jemalloc.net/jemalloc.3.html#background_thread)
Packit Service 724aca
Packit Service 724aca
    Enabling jemalloc background threads generally improves the tail latency for
Packit Service 724aca
    application threads, since unused memory purging is shifted to the dedicated
Packit Service 724aca
    background threads.  In addition, unintended purging delay caused by
Packit Service 724aca
    application inactivity is avoided with background threads.
Packit Service 724aca
Packit Service 724aca
    Suggested: `background_thread:true` when jemalloc managed threads can be
Packit Service 724aca
    allowed.
Packit Service 724aca
Packit Service 724aca
* [metadata_thp](http://jemalloc.net/jemalloc.3.html#opt.metadata_thp)
Packit Service 724aca
Packit Service 724aca
    Allowing jemalloc to utilize transparent huge pages for its internal
Packit Service 724aca
    metadata usually reduces TLB misses significantly, especially for programs
Packit Service 724aca
    with large memory footprint and frequent allocation / deallocation
Packit Service 724aca
    activities.  Metadata memory usage may increase due to the use of huge
Packit Service 724aca
    pages.
Packit Service 724aca
Packit Service 724aca
    Suggested for allocation intensive programs: `metadata_thp:auto` or
Packit Service 724aca
    `metadata_thp:always`, which is expected to improve CPU utilization at a
Packit Service 724aca
    small memory cost.
Packit Service 724aca
Packit Service 724aca
* [dirty_decay_ms](http://jemalloc.net/jemalloc.3.html#opt.dirty_decay_ms) and
Packit Service 724aca
  [muzzy_decay_ms](http://jemalloc.net/jemalloc.3.html#opt.muzzy_decay_ms)
Packit Service 724aca
Packit Service 724aca
    Decay time determines how fast jemalloc returns unused pages back to the
Packit Service 724aca
    operating system, and therefore provides a fairly straightforward trade-off
Packit Service 724aca
    between CPU and memory usage.  Shorter decay time purges unused pages faster
Packit Service 724aca
    to reduces memory usage (usually at the cost of more CPU cycles spent on
Packit Service 724aca
    purging), and vice versa.
Packit Service 724aca
Packit Service 724aca
    Suggested: tune the values based on the desired trade-offs.
Packit Service 724aca
Packit Service 724aca
* [narenas](http://jemalloc.net/jemalloc.3.html#opt.narenas)
Packit Service 724aca
Packit Service 724aca
    By default jemalloc uses multiple arenas to reduce internal lock contention.
Packit Service 724aca
    However high arena count may also increase overall memory fragmentation,
Packit Service 724aca
    since arenas manage memory independently.  When high degree of parallelism
Packit Service 724aca
    is not expected at the allocator level, lower number of arenas often
Packit Service 724aca
    improves memory usage.
Packit Service 724aca
Packit Service 724aca
    Suggested: if low parallelism is expected, try lower arena count while
Packit Service 724aca
    monitoring CPU and memory usage.
Packit Service 724aca
Packit Service 724aca
* [percpu_arena](http://jemalloc.net/jemalloc.3.html#opt.percpu_arena)
Packit Service 724aca
Packit Service 724aca
    Enable dynamic thread to arena association based on running CPU.  This has
Packit Service 724aca
    the potential to improve locality, e.g. when thread to CPU affinity is
Packit Service 724aca
    present.
Packit Service 724aca
Packit Service 724aca
    Suggested: try `percpu_arena:percpu` or `percpu_arena:phycpu` if
Packit Service 724aca
    thread migration between processors is expected to be infrequent.
Packit Service 724aca
Packit Service 724aca
Examples:
Packit Service 724aca
Packit Service 724aca
* High resource consumption application, prioritizing CPU utilization:
Packit Service 724aca
Packit Service 724aca
    `background_thread:true,metadata_thp:auto` combined with relaxed decay time
Packit Service 724aca
    (increased `dirty_decay_ms` and / or `muzzy_decay_ms`,
Packit Service 724aca
    e.g. `dirty_decay_ms:30000,muzzy_decay_ms:30000`).
Packit Service 724aca
Packit Service 724aca
* High resource consumption application, prioritizing memory usage:
Packit Service 724aca
Packit Service 724aca
    `background_thread:true` combined with shorter decay time (decreased
Packit Service 724aca
    `dirty_decay_ms` and / or `muzzy_decay_ms`,
Packit Service 724aca
    e.g. `dirty_decay_ms:5000,muzzy_decay_ms:5000`), and lower arena count
Packit Service 724aca
    (e.g. number of CPUs).
Packit Service 724aca
Packit Service 724aca
* Low resource consumption application:
Packit Service 724aca
Packit Service 724aca
    `narenas:1,lg_tcache_max:13` combined with shorter decay time (decreased
Packit Service 724aca
    `dirty_decay_ms` and / or `muzzy_decay_ms`,e.g.
Packit Service 724aca
    `dirty_decay_ms:1000,muzzy_decay_ms:0`).
Packit Service 724aca
Packit Service 724aca
* Extremely conservative -- minimize memory usage at all costs, only suitable when
Packit Service 724aca
allocation activity is very rare:
Packit Service 724aca
Packit Service 724aca
    `narenas:1,tcache:false,dirty_decay_ms:0,muzzy_decay_ms:0`
Packit Service 724aca
Packit Service 724aca
Note that it is recommended to combine the options with `abort_conf:true` which
Packit Service 724aca
aborts immediately on illegal options.
Packit Service 724aca
Packit Service 724aca
## Beyond runtime options
Packit Service 724aca
Packit Service 724aca
In addition to the runtime options, there are a number of programmatic ways to
Packit Service 724aca
improve application performance with jemalloc.
Packit Service 724aca
Packit Service 724aca
* [Explicit arenas](http://jemalloc.net/jemalloc.3.html#arenas.create)
Packit Service 724aca
Packit Service 724aca
    Manually created arenas can help performance in various ways, e.g. by
Packit Service 724aca
    managing locality and contention for specific usages.  For example,
Packit Service 724aca
    applications can explicitly allocate frequently accessed objects from a
Packit Service 724aca
    dedicated arena with
Packit Service 724aca
    [mallocx()](http://jemalloc.net/jemalloc.3.html#MALLOCX_ARENA) to improve
Packit Service 724aca
    locality.  In addition, explicit arenas often benefit from individually
Packit Service 724aca
    tuned options, e.g. relaxed [decay
Packit Service 724aca
    time](http://jemalloc.net/jemalloc.3.html#arena.i.dirty_decay_ms) if
Packit Service 724aca
    frequent reuse is expected.
Packit Service 724aca
Packit Service 724aca
* [Extent hooks](http://jemalloc.net/jemalloc.3.html#arena.i.extent_hooks)
Packit Service 724aca
Packit Service 724aca
    Extent hooks allow customization for managing underlying memory.  One use
Packit Service 724aca
    case for performance purpose is to utilize huge pages -- for example,
Packit Service 724aca
    [HHVM](https://github.com/facebook/hhvm/blob/master/hphp/util/alloc.cpp)
Packit Service 724aca
    uses explicit arenas with customized extent hooks to manage 1GB huge pages
Packit Service 724aca
    for frequently accessed data, which reduces TLB misses significantly.
Packit Service 724aca
Packit Service 724aca
* [Explicit thread-to-arena
Packit Service 724aca
  binding](http://jemalloc.net/jemalloc.3.html#thread.arena)
Packit Service 724aca
Packit Service 724aca
    It is common for some threads in an application to have different memory
Packit Service 724aca
    access / allocation patterns.  Threads with heavy workloads often benefit
Packit Service 724aca
    from explicit binding, e.g. binding very active threads to dedicated arenas
Packit Service 724aca
    may reduce contention at the allocator level.