Blame HOWTO

Packit 2d622a
libhugetlbfs HOWTO
Packit 2d622a
==================
Packit 2d622a
Packit 2d622a
Author: David Gibson <dwg@au1.ibm.com>, Adam Litke <agl@us.ibm.com>, and others
Packit 2d622a
Last updated: December 07, 2011
Packit 2d622a
Packit 2d622a
Introduction
Packit 2d622a
============
Packit 2d622a
Packit 2d622a
In Linux(TM), access to hugepages is provided through a virtual file
Packit 2d622a
system, "hugetlbfs".  The libhugetlbfs library interface works with
Packit 2d622a
hugetlbfs to provide more convenient specific application-level
Packit 2d622a
services.  In particular libhugetlbfs has three main functions:
Packit 2d622a
Packit 2d622a
	* library functions
Packit 2d622a
libhugetlbfs provides functions that allow an applications to
Packit 2d622a
explicitly allocate and use hugepages more easily they could by
Packit 2d622a
directly accessing the hugetblfs filesystem
Packit 2d622a
Packit 2d622a
	* hugepage malloc()
Packit 2d622a
libhugetlbfs can be used to make an existing application use hugepages
Packit 2d622a
for all its malloc() calls.  This works on an existing (dynamically
Packit 2d622a
linked) application binary without modification.
Packit 2d622a
Packit 2d622a
	* hugepage text/data/BSS
Packit 2d622a
libhugetlbfs, in conjunction with included special linker scripts can
Packit 2d622a
be used to make an application which will store its executable text,
Packit 2d622a
its initialized data or BSS, or all of the above in hugepages.  This
Packit 2d622a
requires relinking an application, but does not require source-level
Packit 2d622a
modifications.
Packit 2d622a
Packit 2d622a
This HOWTO explains how to use the libhugetlbfs library.  It is for
Packit 2d622a
application developers or system administrators who wish to use any of
Packit 2d622a
the above functions.
Packit 2d622a
Packit 2d622a
The libhugetlbfs library is a focal point to simplify and standardise
Packit 2d622a
the use of the kernel API.
Packit 2d622a
Packit 2d622a
Prerequisites
Packit 2d622a
=============
Packit 2d622a
Packit 2d622a
Hardware prerequisites
Packit 2d622a
----------------------
Packit 2d622a
Packit 2d622a
You will need a CPU with some sort of hugepage support, which is
Packit 2d622a
handled by your kernel.  This covers recent x86, AMD64, 64-bit
Packit 2d622a
PowerPC(R) (POWER4, PPC970 and later), and IBM System z CPUs.
Packit 2d622a
Packit 2d622a
Currently, only x86, AMD64 and PowerPC are fully supported by
Packit 2d622a
libhugetlbfs. IA64 and Sparc64 have a working malloc, and SH64
Packit 2d622a
should also but it has not been tested. IA64, Sparc64, and SH64
Packit 2d622a
do not support segment remapping at this time. IBM System z supports
Packit 2d622a
malloc and also segment remapping with --hugetlbfs-align.
Packit 2d622a
Packit 2d622a
Kernel prerequisites
Packit 2d622a
--------------------
Packit 2d622a
Packit 2d622a
To use all the features of libhugetlbfs you will need a 2.6.16 or
Packit 2d622a
later kernel.  Many things will work with earlier kernels, but they
Packit 2d622a
have important bugs and missing features.  The later sections of the
Packit 2d622a
HOWTO assume a 2.6.16 or later kernel.  The kernel must also have
Packit 2d622a
hugepages enabled, that is to say the CONFIG_HUGETLB_PAGE and
Packit 2d622a
CONFIG_HUGETLBFS options must be switched on.
Packit 2d622a
Packit 2d622a
To check if hugetlbfs is enabled, use one of the following methods:
Packit 2d622a
Packit 2d622a
  * (Preferred) Use "grep hugetlbfs /proc/filesystems" to see if
Packit 2d622a
    hugetlbfs is a supported file system.
Packit 2d622a
  * On kernels which support /proc/config.gz (for example SLES10
Packit 2d622a
    kernels), you can search for the CONFIG_HUGETLB_PAGE and
Packit 2d622a
    CONFIG_HUGETLBFS options in /proc/config.gz
Packit 2d622a
  * Finally, attempt to mount hugetlbfs. If it works, the required
Packit 2d622a
    hugepage support is enabled.
Packit 2d622a
Packit 2d622a
Any kernel which meets the above test (even old ones) should support
Packit 2d622a
at least basic libhugetlbfs functions, although old kernels may have
Packit 2d622a
serious bugs.
Packit 2d622a
Packit 2d622a
The MAP_PRIVATE flag instructs the kernel to return a memory area that
Packit 2d622a
is private to the requesting process.  To use MAP_PRIVATE mappings,
Packit 2d622a
libhugetlbfs's automatic malloc() (morecore) feature, or the hugepage
Packit 2d622a
text, data, or BSS features, you will need a kernel with hugepage
Packit 2d622a
Copy-on-Write (CoW) support.  The 2.6.16 kernel has this.
Packit 2d622a
Packit 2d622a
PowerPC note: The malloc()/morecore features will generate warnings if
Packit 2d622a
used on PowerPC chips with a kernel where hugepage mappings don't
Packit 2d622a
respect the mmap() hint address (the "hint address" is the first
Packit 2d622a
parameter to mmap(), when MAP_FIXED is not specified; the kernel is
Packit 2d622a
not required to mmap() at this address, but should do so when
Packit 2d622a
possible).  2.6.16 and later kernels do honor the hint address.
Packit 2d622a
Hugepage malloc()/morecore should still work without this patch, but
Packit 2d622a
the size of the hugepage heap will be limited (to around 256M for
Packit 2d622a
32-bit and 1TB for 64-bit).
Packit 2d622a
Packit 2d622a
The 2.6.27 kernel introduced support for multiple huge page sizes for
Packit 2d622a
systems with the appropriate hardware support.  Unless specifically
Packit 2d622a
requested, libhugetlbfs will continue to use the default huge page size.
Packit 2d622a
Packit 2d622a
Toolchain prerequisites
Packit 2d622a
-----------------------
Packit 2d622a
Packit 2d622a
The library uses a number of GNU specific features, so you will need to use
Packit 2d622a
both gcc and GNU binutils.  For PowerPC and AMD64 systems you will need a
Packit 2d622a
"biarch" compiler, which can build both 32-bit and 64-bit binaries.  To use
Packit 2d622a
hugepage text and data segments, GNU binutils version 2.17 (or later) is
Packit 2d622a
recommended.  Older versions will work with restricted functionality.
Packit 2d622a
Packit 2d622a
Configuration prerequisites
Packit 2d622a
---------------------------
Packit 2d622a
Packit 2d622a
Direct access to hugepage pool has been deprecated in favor of the
Packit 2d622a
hugeadm utility.  This utility can be used for finding the available
Packit 2d622a
hugepage pools and adjusting their minimum and maximum sizes depending
Packit 2d622a
on kernel support.
Packit 2d622a
Packit 2d622a
To list all availabe hugepage pools and their current min and max values:
Packit 2d622a
	hugeadm --pool-list
Packit 2d622a
Packit 2d622a
To set the 2MB pool minimum to 10 pages:
Packit 2d622a
	hugeadm --pool-pages-min 2MB:10
Packit 2d622a
Packit 2d622a
Note: that the max pool size will be adjusted to keep the same number of
Packit 2d622a
overcommit pages available if the kernel support is available when min
Packit 2d622a
pages are adjusted
Packit 2d622a
Packit 2d622a
To add 15 pages to the maximum for 2MB pages:
Packit 2d622a
	hugeadm --pool-pages-min 2MB:-5
Packit 2d622a
Packit 2d622a
For more information see man 8 hugeadm
Packit 2d622a
Packit 2d622a
The raw kernel interfaces (as described below) are still available.
Packit 2d622a
Packit 2d622a
In kernels before 2.6.24, hugepages must be allocated at boot-time via
Packit 2d622a
the hugepages= command-line parameter or at run-time via the
Packit 2d622a
/proc/sys/vm/nr_hugepages sysctl. If memory is restricted on the system,
Packit 2d622a
boot-time allocation is recommended. Hugepages so allocated will be in
Packit 2d622a
the static hugepage pool.
Packit 2d622a
Packit 2d622a
In kernels starting with 2.6.24, the hugepage pool can grown on-demand.
Packit 2d622a
If this feature should be used, /proc/sys/vm/nr_overcommit_hugepages
Packit 2d622a
should be set to the maximum size of the hugepage pool. No hugepages
Packit 2d622a
need to be allocated via /proc/sys/vm/nr_hugepages or hugepages= in this
Packit 2d622a
case. Hugepages so allocated will be in the dynamic hugepage pool.
Packit 2d622a
Packit 2d622a
For the running of the libhugetlbfs testsuite (see below), allocating 25
Packit 2d622a
static hugepages is recommended. Due to memory restrictions, the number
Packit 2d622a
of hugepages requested may not be allocated if the allocation is
Packit 2d622a
attempted at run-time. Users should verify the actual number of
Packit 2d622a
hugepages allocated by:
Packit 2d622a
Packit 2d622a
       hugeadm --pool-list
Packit 2d622a
Packit 2d622a
or
Packit 2d622a
Packit 2d622a
       grep HugePages_Total /proc/meminfo
Packit 2d622a
Packit 2d622a
With 25 hugepages allocated, most tests should succeed. However, with
Packit 2d622a
smaller hugepages sizes, many more hugepages may be necessary.
Packit 2d622a
Packit 2d622a
To use libhugetlbfs features, as well as to run the testsuite, hugetlbfs
Packit 2d622a
must be mounted.  Each hugetlbfs mount point is associated with a page
Packit 2d622a
size.  To choose the size, use the pagesize mount option.  If this option
Packit 2d622a
is omitted, the default huge page size will be used.
Packit 2d622a
Packit 2d622a
To mount the default huge page size:
Packit 2d622a
Packit 2d622a
       mkdir -p /mnt/hugetlbfs
Packit 2d622a
       mount -t hugetlbfs none /mnt/hugetlbfs
Packit 2d622a
Packit 2d622a
To mount 64KB pages (assuming hardware support):
Packit 2d622a
Packit 2d622a
       mkdir -p /mnt/hugetlbfs-64K
Packit 2d622a
       mount -t hugetlbfs none -opagesize=64k /mnt/hugetlbfs-64K
Packit 2d622a
Packit 2d622a
If hugepages should be available to non-root users, the permissions on
Packit 2d622a
the mountpoint need to be set appropriately.
Packit 2d622a
Packit 2d622a
Installation
Packit 2d622a
============
Packit 2d622a
Packit 2d622a
1. Type "make" to build the library
Packit 2d622a
Packit 2d622a
This will create "obj32" and/or "obj64" under the top level
Packit 2d622a
libhugetlbfs directory, and build, respectively, 32-bit and 64-bit
Packit 2d622a
shared and static versions (as applicable) of the library into each
Packit 2d622a
directory.  This will also build (but not run) the testsuite.
Packit 2d622a
Packit 2d622a
On i386 systems, only the 32-bit library will be built.  On PowerPC
Packit 2d622a
and AMD64 systems, both 32-bit and 64-bit versions will be built (the
Packit 2d622a
32-bit AMD64 version is identical to the i386 version).
Packit 2d622a
Packit 2d622a
2. Run the testsuite with "make check"
Packit 2d622a
Packit 2d622a
Running the testsuite is a good idea to ensure that the library is
Packit 2d622a
working properly, and is quite quick (under 3 minutes on a 2GHz Apple
Packit 2d622a
G5).  "make func" will run the just the functionality tests, rather
Packit 2d622a
than stress tests (a subset of "make check") which is much quicker.
Packit 2d622a
The testsuite contains tests both for the library's features and for
Packit 2d622a
the underlying kernel hugepage functionality.
Packit 2d622a
Packit 2d622a
NOTE: The testsuite must be run as the root user.
Packit 2d622a
Packit 2d622a
WARNING: The testsuite contains testcases explicitly designed to test
Packit 2d622a
for a number of hugepage related kernel bugs uncovered during the
Packit 2d622a
library's development.  Some of these testcases WILL CRASH HARD a
Packit 2d622a
kernel without the relevant fixes.  2.6.16 contains all such fixes for
Packit 2d622a
all testcases included as of this writing.
Packit 2d622a
Packit 2d622a
3. (Optional) Install to system paths with "make install"
Packit 2d622a
Packit 2d622a
This will install the library images to the system lib/lib32/lib64 as
Packit 2d622a
appropriate, the helper utilities and the manual pages.  By default
Packit 2d622a
it will install under /usr/local.  To put it somewhere else use
Packit 2d622a
PREFIX=/path/to/install on the make command line.  For example:
Packit 2d622a
Packit 2d622a
	make install PREFIX=/opt/hugetlbfs
Packit 2d622a
Will install under /opt/hugetlbfs.
Packit 2d622a
Packit 2d622a
"make install" will also install the linker scripts and wrapper for ld
Packit 2d622a
used for hugepage test/data/BSS (see below for details).
Packit 2d622a
Packit 2d622a
Alternatively, you can use the library from the directory in which it
Packit 2d622a
was built, using the LD_LIBRARY_PATH environment variable.
Packit 2d622a
Packit 2d622a
To only install library with linker scripts, the manual pages or the helper
Packit 2d622a
utilities separetly, use the install-libs, install-man and install-bin targets
Packit 2d622a
respectively. This can be useful when you with to install the utilities but
Packit 2d622a
not override the distribution-supported version of libhugetlbfs for example.
Packit 2d622a
Packit 2d622a
Usage
Packit 2d622a
=====
Packit 2d622a
Packit 2d622a
Using hugepages for malloc() (morecore)
Packit 2d622a
---------------------------------------
Packit 2d622a
Packit 2d622a
This feature allows an existing (dynamically linked) binary executable
Packit 2d622a
to use hugepages for all its malloc() calls.  To run a program using
Packit 2d622a
the automatic hugepage malloc() feature, you must set several
Packit 2d622a
environment variables:
Packit 2d622a
Packit 2d622a
1. Set LD_PRELOAD=libhugetlbfs.so
Packit 2d622a
  This tells the dynamic linker to load the libhugetlbfs shared
Packit 2d622a
  library, even though the program wasn't originally linked against it.
Packit 2d622a
Packit 2d622a
  Note: If the program is linked against libhugetlbfs, preloading the
Packit 2d622a
        library may lead to application crashes. You should skip this
Packit 2d622a
        step in that case.
Packit 2d622a
Packit 2d622a
2. Set LD_LIBRARY_PATH to the directory containing libhugetlbfs.so
Packit 2d622a
  This is only necessary if you haven't installed libhugetlbfs.so to a
Packit 2d622a
  system default path.  If you set LD_LIBRARY_PATH, make sure the
Packit 2d622a
  directory referenced contains the right version of the library
Packit 2d622a
  (32-bit or 64-bit) as appropriate to the binary you want to run.
Packit 2d622a
Packit 2d622a
3. Set HUGETLB_MORECORE
Packit 2d622a
  This enables the hugepage malloc() feature, instructing libhugetlbfs
Packit 2d622a
  to override libc's normal morecore() function with a hugepage
Packit 2d622a
  version and use it for malloc().  From this point all malloc()s
Packit 2d622a
  should come from hugepage memory until it runs out.  This option can
Packit 2d622a
  be specified in two ways:
Packit 2d622a
Packit 2d622a
  To use the default huge page size:
Packit 2d622a
       HUGETLB_MORECORE=yes
Packit 2d622a
Packit 2d622a
  To use a specific huge page size:
Packit 2d622a
       HUGETLB_MORECORE=<pagesize>
Packit 2d622a
Packit 2d622a
  To use Transparent Huge Pages (THP):
Packit 2d622a
       HUGETLB_MORECORE=thp
Packit 2d622a
Packit 2d622a
Note: This option requires a kernel that supports Transparent Huge Pages
Packit 2d622a
Packit 2d622a
Usually it's preferable to set these environment variables on the
Packit 2d622a
command line of the program you wish to run, rather than using
Packit 2d622a
"export", because you'll only want to enable the hugepage malloc() for
Packit 2d622a
particular programs, not everything.
Packit 2d622a
Packit 2d622a
Examples:
Packit 2d622a
Packit 2d622a
If you've installed libhugetlbfs in the default place (under
Packit 2d622a
/usr/local) which is in the system library search path use:
Packit 2d622a
  $ LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes <your app command line>
Packit 2d622a
Packit 2d622a
If you have built libhugetlbfs in ~/libhugetlbfs and haven't installed
Packit 2d622a
it yet, the following would work for a 64-bit program:
Packit 2d622a
Packit 2d622a
  $ LD_PRELOAD=libhugetlbfs.so LD_LIBRARY_PATH=~/libhugetlbfs/obj64 \
Packit 2d622a
	HUGETLB_MORECORE=yes <your app command line>
Packit 2d622a
Packit 2d622a
Under some circumstances, you might want to specify the address where
Packit 2d622a
the hugepage heap is located.  You can do this by setting the
Packit 2d622a
HUGETLB_MORECORE_HEAPBASE environment variable to the heap address in
Packit 2d622a
hexadecimal.  NOTE: this will not work on PowerPC systems with old kernels
Packit 2d622a
which don't respect the hugepage hint address; see Kernel Prerequisites
Packit 2d622a
above.  Also note that this option is ignored for THP morecore.
Packit 2d622a
Packit 2d622a
By default, the hugepage heap begins at roughly the same place a
Packit 2d622a
normal page heap would, rounded up by an amount determined by your
Packit 2d622a
platform.  For 32-bit PowerPC binaries the normal page heap address is
Packit 2d622a
rounded-up to a multiple of 256MB (that is, putting it in the next MMU
Packit 2d622a
segment); for 64-bit PowerPC binaries the address is rounded-up to a
Packit 2d622a
multiple of 1TB.  On all other platforms the address is rounded-up to
Packit 2d622a
the size of a hugepage.
Packit 2d622a
Packit 2d622a
By default, the hugepage heap will be prefaulted by libhugetlbfs to
Packit 2d622a
guarantee enough hugepages exist and are reserved for the application
Packit 2d622a
(if this was not done, applications could receive a SIGKILL signal if
Packit 2d622a
hugepages needed for the heap are used by another application before
Packit 2d622a
they are faulted in). This leads to local-node allocations when no
Packit 2d622a
memory policy is in place for hugepages. Therefore, it is recommended to
Packit 2d622a
use
Packit 2d622a
Packit 2d622a
  $ numactl --interleave=all <your app command line>
Packit 2d622a
Packit 2d622a
to regain some of the performance impact of local-node allocations on
Packit 2d622a
large NUMA systems. This can still result in poor performance for those
Packit 2d622a
applications which carefully place their threads on particular nodes
Packit 2d622a
(such as by using OpenMP). In that case, thread-local allocation is
Packit 2d622a
preferred so users should select a memory policy that corresponds to
Packit 2d622a
the run-time behavior of the process' CPU usage. Users can specify
Packit 2d622a
HUGETLB_NO_PREFAULT to prevent the prefaulting of hugepages and instead
Packit 2d622a
rely on run-time faulting of hugepages.  NOTE: specifying
Packit 2d622a
HUGETLB_NO_PREFAULT on a system where hugepages are available to and
Packit 2d622a
used by many process can result in some applications receving SIGKILL,
Packit 2d622a
so its use is not recommended in high-availability or production
Packit 2d622a
environments.
Packit 2d622a
Packit 2d622a
By default, the hugepage heap does not shrink.  To enable hugepage heap
Packit 2d622a
shrinking, set HUGETLB_MORECORE_SHRINK=yes.  NB: We have been seeing some
Packit 2d622a
unexpected behavior from glibc's malloc when this is enabled.
Packit 2d622a
Packit 2d622a
Using hugepage shared memory
Packit 2d622a
----------------------------
Packit 2d622a
Packit 2d622a
Hugepages are used for shared memory segments if the SHM_HUGETLB flag is
Packit 2d622a
set when calling shmget() and the pool is large enough. For hugepage-unaware
Packit 2d622a
applications, libhugetlbfs overrides shmget and adds the SHM_HUGETLB if the
Packit 2d622a
environment variable HUGETLB_SHM is set to "yes". The steps to use hugepages
Packit 2d622a
with applications not linked to libhugetlbfs are similar to morecore except
Packit 2d622a
for step 3.
Packit 2d622a
Packit 2d622a
1. Set LD_PRELOAD=libhugetlbfs.so
Packit 2d622a
  This tells the dynamic linker to load the libhugetlbfs shared
Packit 2d622a
  library, even though the program wasn't originally linked against it.
Packit 2d622a
Packit 2d622a
  Note: If the program is linked against libhugetlbfs, preloading the
Packit 2d622a
        library may lead to application crashes. You should skip this
Packit 2d622a
        step in that case.
Packit 2d622a
Packit 2d622a
2. Set LD_LIBRARY_PATH to the directory containing libhugetlbfs.so
Packit 2d622a
  This is only necessary if you haven't installed libhugetlbfs.so to a
Packit 2d622a
  system default path.  If you set LD_LIBRARY_PATH, make sure the
Packit 2d622a
  directory referenced contains the right version of the library
Packit 2d622a
  (32-bit or 64-bit) as appropriate to the binary you want to run.
Packit 2d622a
Packit 2d622a
3. Set HUGETLB_SHM=yes
Packit 2d622a
   The shmget() call is overridden whether the application is linked or the
Packit 2d622a
   libhugetlbfs library is preloaded. When this environment variable is set,
Packit 2d622a
   the SHM_HUGETLB flag is added to the call and the size parameter is aligned
Packit 2d622a
   to back the shared memory segment with huge pages. In the event hugepages
Packit 2d622a
   cannot be used, small pages will be used instead and a warning will be
Packit 2d622a
   printed to explain the failure.
Packit 2d622a
Packit 2d622a
   Note: It is not possible to select any huge page size other than the
Packit 2d622a
         system default for this option.  If the kernel supports multiple
Packit 2d622a
         huge page sizes, the size used for shared memory can be changed by
Packit 2d622a
         altering the default huge page size via the default_hugepagesz
Packit 2d622a
         kernel boot parameter.
Packit 2d622a
Packit 2d622a
Using hugepage text, data, or BSS
Packit 2d622a
---------------------------------
Packit 2d622a
Packit 2d622a
To use the hugepage text, data, or BSS segments feature, you need to specially
Packit 2d622a
link your application.  How this is done depends on the version of GNU ld.  To
Packit 2d622a
support ld versions older than 2.17, libhugetlbfs provides custom linker
Packit 2d622a
scripts that must be used to achieve the required binary layout.  With version
Packit 2d622a
2.17 or later, the system default linker scripts should be used.
Packit 2d622a
Packit 2d622a
To link an application for hugepages, you should use the the ld.hugetlbfs
Packit 2d622a
script included with libhugetlbfs in place of your normal linker.  Without any
Packit 2d622a
special options this will simply invoke GNU ld with the same parameters.  When
Packit 2d622a
it is invoked with options detailed in the following sections, ld.hugetlbfs
Packit 2d622a
will call the system linker with all of the options necessary to link for
Packit 2d622a
hugepages.  If a custom linker script is required, it will also be selected.
Packit 2d622a
Packit 2d622a
If you installed ld.hugetlbfs using "make install", or if you run it
Packit 2d622a
from the place where you built libhugetlbfs, it should automatically
Packit 2d622a
be able to find the libhugetlbfs linker scripts.  Otherwise you may
Packit 2d622a
need to explicitly instruct it where to find the scripts with the
Packit 2d622a
option:
Packit 2d622a
	--hugetlbfs-script-path=/path/to/scripts
Packit 2d622a
(The linker scripts are in the ldscripts/ subdirectory of the
Packit 2d622a
libhugetlbfs source tree).
Packit 2d622a
Packit 2d622a
	Linking the application with binutils-2.17 or later:
Packit 2d622a
	----------------------------------------------------
Packit 2d622a
Packit 2d622a
This method will use the system default linker scripts.  Only one linker option
Packit 2d622a
is required to prepare the application for hugepages:
Packit 2d622a
Packit 2d622a
	--hugetlbfs-align
Packit 2d622a
Packit 2d622a
will instruct ld.hugetlbfs to call GNU ld with two options that increase the
Packit 2d622a
alignment of the resulting binary.  For reference, the options passed to ld are:
Packit 2d622a
Packit 2d622a
	-z common-page-size=<value>	and
Packit 2d622a
	-z max-page-size=<value>
Packit 2d622a
Packit 2d622a
	Linking the application with binutils-2.16 or older:
Packit 2d622a
	----------------------------------------------------
Packit 2d622a
Packit 2d622a
To link a program with a custom linker script, one of the following linker
Packit 2d622a
options should be specified:
Packit 2d622a
Packit 2d622a
	--hugetlbfs-link=B
Packit 2d622a
Packit 2d622a
will link the application to store BSS data (only) into hugepages
Packit 2d622a
Packit 2d622a
	--hugetlbfs-link=BDT
Packit 2d622a
Packit 2d622a
will link the application to store text, initialized data and BSS data
Packit 2d622a
into hugepages.
Packit 2d622a
Packit 2d622a
These are the only two available options when using custom linker scripts.
Packit 2d622a
Packit 2d622a
	A note about the custom libhugetlbfs linker scripts:
Packit 2d622a
	----------------------------------------------------
Packit 2d622a
Packit 2d622a
Linker scripts are usually distributed with GNU binutils and they may contain a
Packit 2d622a
partial implementation of new linker features.  As binutils evolves, the linker
Packit 2d622a
scripts supplied with previous versions become obsolete and are upgraded.
Packit 2d622a
Packit 2d622a
Libhugetlbfs distributes one set of linker scripts that must work across
Packit 2d622a
several Linux distributions and binutils versions.  This has worked well for
Packit 2d622a
some time but binutils-2.17 (including some late 2.16 builds) have made changes
Packit 2d622a
that are impossible to accomodate without breaking the libhugetlbfs linker
Packit 2d622a
scripts for older versions of binutils.  This is why the linker scripts (and
Packit 2d622a
the --hugetlbfs-link ld.hugetlbfs option) have been deprecated for binutils >=
Packit 2d622a
2.17 configurations.
Packit 2d622a
Packit 2d622a
If you are using a late 2.16 binutils version (such as 2.16.91) and are
Packit 2d622a
experiencing problems with huge page text, data, and bss, you can check
Packit 2d622a
binutils for the incompatibility with the following command:
Packit 2d622a
Packit 2d622a
	ld --verbose | grep SPECIAL
Packit 2d622a
Packit 2d622a
If any matches are returned, then the libhugetlbfs linker scripts may not work
Packit 2d622a
correctly.  In this case you should upgrade to binutils >= 2.17 and use the
Packit 2d622a
--hugetlbfs-align linking method.
Packit 2d622a
Packit 2d622a
	Linking via gcc:
Packit 2d622a
	----------------
Packit 2d622a
Packit 2d622a
In many cases it's normal to link an application by invoking gcc,
Packit 2d622a
which will then invoke the linker with appropriate options, rather
Packit 2d622a
than invoking ld directly.  In such cases it's usually best to
Packit 2d622a
convince gcc to invoke the ld.hugetlbfs script instead of the system
Packit 2d622a
linker, rather than modifying your build procedure to invoke the
Packit 2d622a
ld.hugetlbfs directly; the compilers may often add special libraries
Packit 2d622a
or other linker options which can be fiddly to reproduce by hand.
Packit 2d622a
To make this easier, 'make install' will install ld.hugetlbfs into
Packit 2d622a
$PREFIX/share/libhugetlbfs and create an 'ld' symlink to it.
Packit 2d622a
Packit 2d622a
Then with gcc, you invoke it as a linker with two options:
Packit 2d622a
Packit 2d622a
	-B $PREFIX/share/libhugetlbfs
Packit 2d622a
Packit 2d622a
This option tells gcc to look in a non-standard location for the
Packit 2d622a
linker, thus finding our script rather than the normal linker. This
Packit 2d622a
can optionally be set in the CFLAGS environment variable.
Packit 2d622a
Packit 2d622a
	-Wl,--hugetlbfs-align
Packit 2d622a
OR	-Wl,--hugetlbfs-link=B
Packit 2d622a
OR	-Wl,--hugetlbfs-link=BDT
Packit 2d622a
Packit 2d622a
This option instructs gcc to pass the option after the comma down to the
Packit 2d622a
linker, thus invoking the special behaviour of the ld.hugetblfs script. This
Packit 2d622a
can optionally be set in the LDFLAGS environment variable.
Packit 2d622a
Packit 2d622a
If you use a compiler other than gcc, you will need to consult its
Packit 2d622a
documentation to see how to convince it to invoke ld.hugetlbfs in
Packit 2d622a
place of the system linker.
Packit 2d622a
Packit 2d622a
	Running the application:
Packit 2d622a
	------------------------
Packit 2d622a
Packit 2d622a
The specially-linked application needs the libhugetlbfs library, so
Packit 2d622a
you might need to set the LD_LIBRARY_PATH environment variable so the
Packit 2d622a
application can locate libhugetlbfs.so.  Depending on the method used to link
Packit 2d622a
the application, the HUGETLB_ELFMAP environment variable can be used to control
Packit 2d622a
how hugepages will be used.
Packit 2d622a
Packit 2d622a
	When using --hugetlbfs-link:
Packit 2d622a
	----------------------------
Packit 2d622a
Packit 2d622a
The custom linker script determines which segments may be remapped into
Packit 2d622a
hugepages and this remapping will occur by default.  The following setting will
Packit 2d622a
disable remapping entirely:
Packit 2d622a
Packit 2d622a
	HUGETLB_ELFMAP=no
Packit 2d622a
Packit 2d622a
	When using --hugetlbfs-align:
Packit 2d622a
	-----------------------------
Packit 2d622a
Packit 2d622a
This method of linking an application permits greater flexibility at runtime.
Packit 2d622a
Using HUGETLB_ELFMAP, it is possible to control which program segments are
Packit 2d622a
placed in hugepages.  The following four settings will cause the indicated
Packit 2d622a
segments to be placed in hugepages:
Packit 2d622a
Packit 2d622a
	HUGETLB_ELFMAP=R	Read-only segments (text)
Packit 2d622a
	HUGETLB_ELFMAP=W	Writable segments (data/BSS)
Packit 2d622a
	HUGETLB_ELFMAP=RW	All segments (text/data/BSS)
Packit 2d622a
	HUGETLB_ELFMAP=no	No segments
Packit 2d622a
Packit 2d622a
It is possible to select specific huge page sizes for read-only and writable
Packit 2d622a
segments by using the following advanced syntax:
Packit 2d622a
Packit 2d622a
	HUGETLB_ELFMAP=[R[=<pagesize>]:[W[=<pagesize>]]
Packit 2d622a
Packit 2d622a
For example:
Packit 2d622a
Packit 2d622a
	Place read-only segments into 64k pages and writable into 16M pages
Packit 2d622a
	HUGETLB_ELFMAP=R=64k:W=16M
Packit 2d622a
Packit 2d622a
	Use the default for read-only segments, 1G pages for writable segments
Packit 2d622a
	HUGETLB_ELFMAP=R:W=1G
Packit 2d622a
Packit 2d622a
	Use 16M pages for writable segments only
Packit 2d622a
	HUGETLB_ELFMAP=W=16M
Packit 2d622a
Packit 2d622a
	Default remapping behavior:
Packit 2d622a
	---------------------------
Packit 2d622a
Packit 2d622a
If --hugetlbfs-link was used to link an application, the chosen remapping mode
Packit 2d622a
is saved in the binary and becomes the default behavior.  Setting
Packit 2d622a
HUGETLB_ELFMAP=no will disable all remapping and is the only way to modify the
Packit 2d622a
default behavior.
Packit 2d622a
Packit 2d622a
For applications linked with --hugetlbfs-align, the default behavior is to not
Packit 2d622a
remap any segments into huge pages.  To set or display the default remapping
Packit 2d622a
mode for a binary, the included hugeedit command can be used:
Packit 2d622a
Packit 2d622a
hugeedit [options] target-executable
Packit 2d622a
   options:
Packit 2d622a
   --text,--data	Remap the specified segment into huge pages by default
Packit 2d622a
   --disable		Do not remap any segments by default
Packit 2d622a
Packit 2d622a
When target-executable is the only argument, hugeedit will display the default
Packit 2d622a
remapping mode without making any modifications.
Packit 2d622a
Packit 2d622a
When a binary is remapped according to its default remapping policy, the
Packit 2d622a
system default huge page size will be used.
Packit 2d622a
Packit 2d622a
	Environment variables:
Packit 2d622a
	----------------------
Packit 2d622a
Packit 2d622a
There are a number of private environment variables which can affect
Packit 2d622a
libhugetlbfs:
Packit 2d622a
	HUGETLB_DEFAULT_PAGE_SIZE
Packit 2d622a
		Override the system default huge page size for all uses
Packit 2d622a
		except hugetlb-backed shared memory
Packit 2d622a
Packit 2d622a
	HUGETLB_RESTRICT_EXE
Packit 2d622a
		By default, libhugetlbfs will act on any program that it
Packit 2d622a
		is loaded with, either via LD_PRELOAD or by explicitly
Packit 2d622a
		linking with -lhugetlbfs.
Packit 2d622a
Packit 2d622a
		There are situations in which it is desirable to restrict
Packit 2d622a
		libhugetlbfs' actions to specific programs.  For example,
Packit 2d622a
		some ISV applications are wrapped in a series of scripts
Packit 2d622a
		that invoke bash, python, and/or perl.	It is more
Packit 2d622a
		convenient to set the environment variables related
Packit 2d622a
		to libhugetlbfs before invoking the wrapper scripts,
Packit 2d622a
		yet this has the unintended and undesirable consequence
Packit 2d622a
		of causing the script interpreters to use and consume
Packit 2d622a
		hugepages.  There is no obvious benefit to causing the
Packit 2d622a
		script interpreters to use hugepages, and there is a
Packit 2d622a
		clear disadvantage: fewer hugepages are available to
Packit 2d622a
		the actual application.
Packit 2d622a
Packit 2d622a
		To address this scenario, set HUGETLB_RESTRICT_EXE to a
Packit 2d622a
		colon-separated list of programs to which the other
Packit 2d622a
		libhugetlbfs environment variables should apply.  (If
Packit 2d622a
		not set, libhugetlbfs will attempt to apply the requested
Packit 2d622a
		actions to all programs.)  For example,
Packit 2d622a
Packit 2d622a
		    HUGETLB_RESTRICT_EXE="hpcc:long_hpcc"
Packit 2d622a
Packit 2d622a
		will restrict libhugetlbfs' actions to programs named
Packit 2d622a
		/home/fred/hpcc and /bench/long_hpcc but not /usr/hpcc_no.
Packit 2d622a
Packit 2d622a
	HUGETLB_ELFMAP
Packit 2d622a
		Control or disable segment remapping (see above)
Packit 2d622a
Packit 2d622a
	HUGETLB_MINIMAL_COPY
Packit 2d622a
		If equal to "no", the entire segment will be copied;
Packit 2d622a
		otherwise, only the necessary parts will be, which can
Packit 2d622a
		be much more efficient (default)
Packit 2d622a
Packit 2d622a
	HUGETLB_FORCE_ELFMAP
Packit 2d622a
		Explained in "Partial segment remapping"
Packit 2d622a
Packit 2d622a
	HUGETLB_MORECORE
Packit 2d622a
	HUGETLB_MORECORE_HEAPBASE
Packit 2d622a
	HUGETLB_NO_PREFAULT
Packit 2d622a
		Explained in "Using hugepages for malloc()
Packit 2d622a
		(morecore)"
Packit 2d622a
Packit 2d622a
	HUGETLB_VERBOSE
Packit 2d622a
		Specify the verbosity level of debugging output from 1
Packit 2d622a
		to 99 (default is 1)
Packit 2d622a
	HUGETLB_PATH
Packit 2d622a
		Specify the path to the hugetlbfs mount point
Packit 2d622a
	HUGETLB_SHARE
Packit 2d622a
		Explained in "Sharing remapped segments"
Packit 2d622a
	HUGETLB_DEBUG
Packit 2d622a
		Set to 1 if an application segfaults. Gives very detailed output
Packit 2d622a
		and runs extra diagnostics.
Packit 2d622a
Packit 2d622a
	Sharing remapped segments:
Packit 2d622a
	--------------------------
Packit 2d622a
Packit 2d622a
By default, when libhugetlbfs uses anonymous, unlinked hugetlbfs files
Packit 2d622a
to store remapped program segment data.  This means that if the same
Packit 2d622a
program is started multiple times using hugepage segments, multiple
Packit 2d622a
huge pages will be used to store the same program data.
Packit 2d622a
Packit 2d622a
The reduce this wastage, libugetlbfs can be instructed to allow
Packit 2d622a
sharing segments between multiple invocations of a program.  To do
Packit 2d622a
this, you must set the HUGETLB_SHARE variable must be set for all the
Packit 2d622a
processes in question.  This variable has two possible values:
Packit 2d622a
	anything but 1: the default, indicates no segments should be shared
Packit 2d622a
	1: indicates that read-only segments (i.e. the program text,
Packit 2d622a
in most cases) should be shared, read-write segments (data and bss)
Packit 2d622a
will not be shared.
Packit 2d622a
Packit 2d622a
If the HUGETLB_MINIMAL_COPY variable is set for any program using
Packit 2d622a
shared segments, it must be set to the same value for all invocations
Packit 2d622a
of that program.
Packit 2d622a
Packit 2d622a
Segment sharing is implemented by creating persistent files in a
Packit 2d622a
hugetlbfs containing the necessary segment data.  By default, these
Packit 2d622a
files are stored in a subdirectory of the first located hugetlbfs
Packit 2d622a
filesystem, named 'elflink-uid-XXX' where XXX is the uid of the
Packit 2d622a
process using sharing.  This directory must be owned by the uid in
Packit 2d622a
question, and have mode 0700.  If it doesn't exist, libhugetlbfs will
Packit 2d622a
create it automatically.  This means that (by default) separate
Packit 2d622a
invocations of the same program by different users will not share huge
Packit 2d622a
pages.
Packit 2d622a
Packit 2d622a
The location for storing the hugetlbfs page files can be changed by
Packit 2d622a
setting the HUGETLB_SHARE_PATH environment variable.  If set, this
Packit 2d622a
variable must contain the path of an accessible, already created
Packit 2d622a
directory located in a hugetlbfs filesystem.  The owner and mode of
Packit 2d622a
this directory are not checked, so this method can be used to allow
Packit 2d622a
processes of multiple uids to share huge pages.  IMPORTANT SECURITY
Packit 2d622a
NOTE: any process sharing hugepages can insert arbitrary executable
Packit 2d622a
code into any other process sharing hugepages in the same directory.
Packit 2d622a
Therefore, when using HUGETLB_SHARE_PATH, the directory created *must*
Packit 2d622a
allow access only to a set of uids who are mutually trusted.
Packit 2d622a
Packit 2d622a
The files created in hugetlbfs for sharing are persistent, and must be
Packit 2d622a
manually deleted to free the hugepages in question.  Future versions
Packit 2d622a
of libhugetlbfs should include tools and scripts to automate this
Packit 2d622a
cleanup.
Packit 2d622a
Packit 2d622a
	Partial segment remapping
Packit 2d622a
	-------------------------
Packit 2d622a
Packit 2d622a
libhugetlbfs has limited support for remapping a normal, non-relinked
Packit 2d622a
binary's data, text and BSS into hugepages. To enable this feature,
Packit 2d622a
HUGETLB_FORCE_ELFMAP must be set to "yes".
Packit 2d622a
Packit 2d622a
Partial segment remapping is not guaranteed to work. Most importantly, a
Packit 2d622a
binary's segments must be large enough even when not relinked by
Packit 2d622a
libhugetlbfs:
Packit 2d622a
Packit 2d622a
	architecture	address		minimum segment size
Packit 2d622a
	------------	-------		--------------------
Packit 2d622a
	i386, x86_64	all		hugepage size
Packit 2d622a
	ppc32		all		256M
Packit 2d622a
	ppc64		0-4G		256M
Packit 2d622a
	ppc64		4G-1T		1020G
Packit 2d622a
	ppc64		1T+		1T
Packit 2d622a
Packit 2d622a
The raw size, though, is not sufficient to indicate if the code will
Packit 2d622a
succeed, due to alignment. Since the binary is not relinked, however,
Packit 2d622a
this is relatively straightforward to 'test and see'.
Packit 2d622a
Packit 2d622a
NOTE: You must use LD_PRELOAD to load libhugetlbfs.so when using
Packit 2d622a
partial remapping.
Packit 2d622a
Packit 2d622a
Packit 2d622a
Examples
Packit 2d622a
========
Packit 2d622a
Packit 2d622a
Example 1:  Application Developer
Packit 2d622a
---------------------------------
Packit 2d622a
Packit 2d622a
To have a program use hugepages, complete the following steps:
Packit 2d622a
Packit 2d622a
1. Make sure you are working with kernel 2.6.16 or greater.
Packit 2d622a
Packit 2d622a
2. Modify the build procedure so your application is linked against
Packit 2d622a
libhugetlbfs.
Packit 2d622a
Packit 2d622a
For the remapping, you link against the library with the appropriate
Packit 2d622a
linker script (if necessary or desired).  Linking against the library
Packit 2d622a
should result in transparent usage of hugepages.
Packit 2d622a
Packit 2d622a
Example 2:  End Users and System Administrators
Packit 2d622a
-----------------------------------------------
Packit 2d622a
Packit 2d622a
To have an application use libhugetlbfs, complete the following steps:
Packit 2d622a
Packit 2d622a
1. Make sure you are using kernel 2.6.16.
Packit 2d622a
Packit 2d622a
2. Make sure the library is in the path, which you can set with the
Packit 2d622a
LD_LIBRARY_PATH environment variable. You might need to set other
Packit 2d622a
environment variables, including LD_PRELOAD as described above.
Packit 2d622a
Packit 2d622a
Packit 2d622a
Troubleshooting
Packit 2d622a
===============
Packit 2d622a
Packit 2d622a
The library has a certain amount of debugging code built in, which can
Packit 2d622a
be controlled with the environment variable HUGETLB_VERBOSE.  By
Packit 2d622a
default the debug level is "1" which means the library will only print
Packit 2d622a
relatively serious error messages.  Setting HUGETLB_VERBOSE=2 or
Packit 2d622a
higher will enable more debug messages (at present 2 is the highest
Packit 2d622a
debug level, but that may change).  Setting HUGETLB_VERBOSE=0 will
Packit 2d622a
silence the library completely, even in the case of errors - the only
Packit 2d622a
exception is in cases where the library has to abort(), which can
Packit 2d622a
happen if something goes wrong in the middle of unmapping and
Packit 2d622a
remapping segments for the text/data/bss feature.
Packit 2d622a
Packit 2d622a
If an application fails to run, set the environment variable HUGETLB_DEBUG
Packit 2d622a
to 1. This causes additional diagnostics to be run. This information should
Packit 2d622a
be included when sending bug reports to the libhugetlbfs team.
Packit 2d622a
Packit 2d622a
Specific Scenarios:
Packit 2d622a
-------------------
Packit 2d622a
Packit 2d622a
ISSUE:	When using the --hugetlbfs-align or -zmax-page-size link options, the
Packit 2d622a
	linker complains about truncated relocations and the build fails.
Packit 2d622a
Packit 2d622a
TRY:	Compile the program with the --relax linker option.  Either add
Packit 2d622a
	-Wl,--relax to CFLAGS or --relax to LDFLAGS.
Packit 2d622a
Packit 2d622a
ISSUE:  When using the xB linker script with a 32 bit binary on an x86 host with
Packit 2d622a
        NX support enabled, the binary segfaults.
Packit 2d622a
Packit 2d622a
TRY:    Recompiling with the --hugetlbfs-align options and use the new relinking
Packit 2d622a
        method or booting your kernel with noexec32=off.
Packit 2d622a
Packit 2d622a
Packit 2d622a
Trademarks
Packit 2d622a
==========
Packit 2d622a
Packit 2d622a
This work represents the view of the author and does not necessarily
Packit 2d622a
represent the view of IBM.
Packit 2d622a
Packit 2d622a
PowerPC is a registered trademark of International Business Machines
Packit 2d622a
Corporation in the United States, other countries, or both.  Linux is
Packit 2d622a
a trademark of Linus Torvalds in the United States, other countries,
Packit 2d622a
or both.