README.md

MEMKIND

Build Status MEMKIND version Coverage Status Packaging status

The memkind library is a user extensible heap manager built on top of jemalloc which enables control of memory characteristics and a partitioning of the heap between kinds of memory. The kinds of memory are defined by operating system memory policies that have been applied to virtual address ranges. Memory characteristics supported by memkind without user extension include control of NUMA and page size features. The jemalloc non-standard interface has been extended to enable specialized arenas to make requests for virtual memory from the operating system through the memkind partition interface. Through the other memkind interfaces the user can control and extend memory partition features and allocate memory while selecting enabled features. Memkind interface allows to create and control file-backed memory (PMEM kind) on specified device.

Contents

  1. Interfaces
  2. Dependencies
  3. Building and Installing
  4. Run Requirements
  5. Kind Requirements
  6. Kernel
  7. NVDIMM volatile usage
  8. The Detection Mechanism of the Kind
  9. Setting Logging Mechanism
  10. Setting Heap Manager
  11. Testing
  12. Simulating High Bandwidth Memory
  13. Notes
  14. Status
  15. Disclaimer

Interfaces

The memkind library delivers four interfaces: * hbwmalloc.h - recommended for high-bandwidth memory use cases (stable) * memkind.h - generic interface for more complex use cases (stable) * pmem_allocator.h - the C++ allocator that satisfies the C++ Allocator requirements used for PMEM memory use cases (stable) * memkind_allocator.h - the C++ allocator that satisfies the C++ Allocator requirements used for static kinds (stable)

For more detailed information about those interfaces see corresponding manpages (located in man/ subdir):

man memkind

man hbwmalloc

man pmemallocator

man memkindallocator

Dependencies

You will need to install required packages on the build system:

  • autoconf
  • automake
  • gcc-c++
  • libnuma-devel
  • libtool
  • numactl-devel
  • unzip

For using automatic recognition of PMEM NUMA in MEMKIND_DAX_KMEM:

  • libdaxctl-devel (v66 or later)

Building and Installing

The memkind library has a dependency on a related fork of jemalloc. The configure scripts and gtest source code are distributed with the source tarball included in the source RPM, and this tarball is created with the memkind "make dist" target. In contrast to the distributed source tarball, the git repository does not include any generated files. For this reason some additional steps are required when building from a checkout of the git repo. Those steps include running the bash script called "autogen.sh" prior to configure. This script will populate a VERSION file based on "git describe", and use autoreconf to generate a configure script.

Building and installing memkind in standard system location can be as simple as typing the following while in the root directory of the source tree:

./autogen.sh
./configure
make
make install

To install this library into other locations, you can use the prefix variable, e.g.:

./autogen.sh
./configure --prefix=/usr
make
make install

This will install files to /usr/lib, /usr/include, /usr/share/doc/, usr/share/man.

See the output of:

./configure --help

for more information about either the memkind or the jemalloc configuration options.

jemalloc

The jemalloc source was forked from jemalloc version 5.2.1. This source tree is located within the jemalloc subdirectory of the memkind source. The jemalloc source code has been kept close to the original form, and in particular the build system has been lightly modified.

Run Requirements

You will need to install required packages for applications, which are using the memkind library for dynamic linking at run time:

  • libnuma
  • numactl
  • pthread

Kind Requirements

Memory kind NUMA HBW Memory Hugepages Device DAX Filesystem supporting hole punching
MEMKIND_DEFAULT
MEMKIND_HUGETLB X X X
MEMKIND_HBW X X
MEMKIND_HBW_ALL X X
MEMKIND_HBW_HUGETLB X X X
MEMKIND_HBW_ALL_HUGETLB X X X
MEMKIND_HBW_PREFERRED X X
MEMKIND_HBW_PREFERRED_HUGETLB X X X
MEMKIND_HBW_INTERLEAVE X X
MEMKIND_REGULAR X
MEMKIND_DAX_KMEM X X
MEMKIND_DAX_KMEM_ALL X X
MEMKIND_DAX_KMEM_PREFERRED X X
PMEM kind X

Kernel

To correctly control of NUMA, huge pages and file-backed memory following requirements regarding Linux kernel must be satisfied:

  • NUMA

Requires kernel patch introduced in Linux v3.11 that impacts functionality of the NUMA system calls. Red Hat has back-ported this patch to the v3.10 kernel in the RHEL 7.0 GA release, so RHEL 7.0 onward supports memkind even though this kernel version predates v3.11.

  • Hugepages

Functionality related to hugepages allocation require patches patch1 and patch2 Without them physical memory may end up being located on incorrect NUMA node.

  • 2MB Pages

To use the interfaces for obtaining 2MB pages please be sure to follow the instructions here and pay particular attention to the use of the procfs files:

/proc/sys/vm/nr_hugepages
/proc/sys/vm/nr_overcommit_hugepages

for enabling the kernel's huge page pool.

  • Filesystem supporting hole punching

To use the PMEM kind, please be sure that filesystem which is used for PMEM creation supports FALLOC_FL_PUNCH_HOLE flag.

  • Device DAX

To use MEMKIND_DAX_KMEM_* kinds you need at least Linux Kernel 5.1 (with enabled DEV_DAX_KMEM Kernel option) and created DAX device. If you have loaded dax_pmem_compat module instead of dax_pmem, please read this article and migrate device model to use alternative device dax drivers, e.g. kmem.

If you want to migrate device model type:

daxctl migrate-device-model

To create a new DAX device you can type:

ndctl create-namespace --mode=devdax --map=mem

To display a list of created devices:

ndctl list

If you have already created device in another mode, you can reconfigure the device using:

ndctl create-namespace --mode=devdax --map=mem --force -e namespaceX.Y

Where namespaceX.Y means namespace you want to reconfigure.

For more details about creating new namespace read this.

Conversion from device dax to NUMA node can be performed using following commands:

daxctl reconfigure-device daxX.Y --mode=system-ram

Where daxX.Y is DAX device you want to reconfigure.

This will migrate device from device_dax to kmem driver. After this step daxctl should be able to see devices in system-ram mode:

daxctl list

Example output after reconfigure dax2.0 as system-ram:

[
  {
    "chardev":"dax1.0",
    "size":263182090240,
    "target_node":3,
    "mode":"devdax"
  },
  {
    "chardev":"dax2.0",
    "size":263182090240,
    "target_node":4,
    "mode":"system-ram"
  }
]

Also you should be able to check new NUMA node configuration by:

numactl -H

Example output, where NUMA node 4 is NUMA node created from persistent memory:

available: 3 nodes (0-1,4)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
node 0 size: 192112 MB
node 0 free: 185575 MB
node 1 cpus: 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111
node 1 size: 193522 MB
node 1 free: 193107 MB
node 4 cpus:
node 4 size: 250880 MB
node 4 free: 250879 MB
node distances:
node   0   1   4
  0:  10  21  17
  1:  21  10  28
  4:  17  28  10

NVDIMM volatile usage

Memkind supports using persistent memory as an extension of DRAM. This volatile memory solution is provided by the library with two separate ways described below.

DAX device

NVDIMM memory as DAX device is supported by MEMKIND_DAX_KMEM_* kinds. With this solution persistent memory will be seen in OS as separate NUMA nodes.

Memkind allows two ways to use this kind:

  • first implicitly, by allowing memkind library for automatic recognition of NUMA nodes created from persistent memory using libdaxctl-devel
  • secondary explicitly, by using MEMKIND_DAX_KMEM_NODES environment variable set to comma separated list of NUMA nodes which will be treated as NUMA nodes created from persistent memory, this solution overrides the first one

DAX filesystem

PMEM kind supports the traditional malloc/free interfaces on a memory mapped file. This allows the use of persistent memory as volatile memory, for cases where the region of persistent memory is useful to an application, but when the application doesn't need it to be persistent. PMEM kind is most useful when used with Direct Access storage DAX, which is memory-addressable persistent storage that supports load/store access without being paged via the system page cache.

Application using memkind library supports managing a data placement:

Data placement Memory kind
PMEM (fsdax) PMEM kind
PMEM (devdax) MEMKIND_DAX_KMEM kind
DRAM e.g. MEMKIND_DEFAULT kind

Currently, the PMEM kind is supported only by the jemalloc heap manager.

The Detection Mechanism of the Kind

One of the notable features of the memkind is to detect the correct kind of previously allocated memory.

Operations Memkind API function
Freeing memory memkind_free(kind, ptr)
Reallocating memory memkind_realloc(kind, ptr, size)
Obtaining the size of allocated memory memkind_malloc_usable_size(kind, ptr)
Reallocating memory to reduce fragmentation memkind_defrag_reallocate(kind, ptr)

Operations above could be unified for all used memory kinds by passing a NULL value as a kind to the functions mentioned above.

For more details, please see the following example.

Important Notes: The look up for correct kind could result in serious performance penalty, which can be avoided by specifying a correct kind explicitly.

Setting Logging Mechanism

In memkind library logging mechanism could be enabled by setting MEMKIND_DEBUG environment variable. Setting MEMKIND_DEBUG to "1" enables printing messages like errors and general information about environment to stderr.

Setting Heap Manager

In memkind library heap management can be adjusted with MEMKIND_HEAP_MANAGER environment variable, which allows for switching to one of the available heap managers. Values: - JEMALLOC - sets the jemalloc heap manager - TBB - sets Intel Threading Building Blocks heap manager. This option requires installed Intel Threading Building Blocks library.

If the MEMKIND_HEAP_MANAGER is not set then the jemalloc heap manager will be used by default.

Testing

All existing tests pass. For more information on how to execute tests see the CONTRIBUTING file.

When tests are run on a NUMA platform without high bandwidth memory the MEMKIND_HBW_NODES environment variable is used in conjunction with "numactl --membind" to force standard allocations to one NUMA node and high bandwidth allocations through a different NUMA node. See next section for more details.

Simulating High Bandwidth Memory

A method for testing for the benefit of high bandwidth memory on a dual socket Intel(R) Xeon(TM) system is to use the QPI bus to simulate slow memory. This is not an accurate model of the bandwidth and latency characteristics of the Intel's 2nd generation Intel(R) Xeon Phi(TM) Product Family on package memory, but is a reasonable way to determine which data structures rely critically on bandwidth.

If the application a.out has been modified to use high bandwidth memory with the memkind library then this can be done with numactl as follows with the bash shell:

export MEMKIND_HBW_NODES=0
numactl --membind=1 --cpunodebind=0 a.out

or with csh:

setenv MEMKIND_HBW_NODES 0
numactl --membind=1 --cpunodebind=0 a.out

The MEMKIND_HBW_NODES environment variable set to zero will bind high bandwidth allocations to NUMA node 0. The --membind=1 flag to numactl will bind standard allocations, static and stack variables to NUMA node 1. The --cpunodebind=0 option to numactl will bind the process threads to CPUs associated with NUMA node 0. With this configuration standard allocations will be fetched across the QPI bus, and high bandwidth allocations will be local to the process CPU.

Notes

  • Using memkind with Transparent Huge Pages enabled may result in undesirably high memory footprint. To avoid that disable THP using following instruction

Status

Different interfaces can represent different maturity level (as described in corresponding man pages). Feedback on design and implementation is greatly appreciated.

Disclaimer

SEE COPYING FILE FOR LICENSE INFORMATION.

THIS SOFTWARE IS PROVIDED AS A DEVELOPMENT SNAPSHOT TO AID COLLABORATION AND WAS NOT ISSUED AS A RELEASED PRODUCT BY INTEL.