How to force any Linux application to use Hugepages without modifying the source code


In this article we’ll have a look at how to force any software application to use Huge Pages (HugeTLB) without the need to modify the source and/or recompile them.

Installing required tools

Firt of all, to achieve our goal we’ll have to install libhugetlbfs. This powerful library allow us to back text, data areas, malloc() calls and shared memory with Hugepages.

On RHEL like Linux distros (CentOS, Scientific Linux, PacketLinux etc…) we can easly install libhugetlbfs by executing:

# sudo yum -y install libhugetlbfs-utils libhugetlbfs

libhugetlbfs is the library which provides easy access to hugepages of memory to any application

libhugetlbfs-utils is a set of user-space tools for configure and manage hugepage environment

Configuring Huge Page

Once installed our requires RPMs we can proceed with configuring hugepage on our Linux.

Using the old raw kernel interfaces

On RHEL 4 or 5 based distros:

Before to start let’s check the actual system Hugepage configuration, if the system has not yet been configured to use Hugepage then the output of the following command should be 0 (zero).

# grep HugePages_Total /proc/meminfo
HugePages_Total:   0

Now, let’s check the size of our hugepage, we need this in order to calcoulate the number of hugepages we’ll need for our application. To manage Hugepage I reccommend to use root account.

# grep Hugepagesize /proc/meminfo
Hugepagesize:     2048 kB

The output (2048 kB) shows that the size of a single Huge Page on this system is 2MB, this is pretty much the default setup for RHEL based Linux. Now, if we need 4GB of Huge Pages pool then 2048 Huge Pages need to be allocated.

To allocate our 2048 Huge Pages we can use:

# echo 2048 > /proc/sys/vm/nr_hugepages

Please Note: Before allocating a big number of Hugepage on a system that is running Virtual Machines or other memory hungry applications, make sure to shutdown your Virtual Machines and any memory hungry application before executing the previous command otherwise the execution may take a long time to complete.

To quickly and temporarly allocate them, or we can use:

# sysctl -w vm.nr_hugepages=2048

To make sure our system will always allocate 4GB of Hugepage at every reboot.

Now let’s check if the system has been configured correctly:

# grep HugePages_Total /proc/meminfo

We should be able to see this output now:

HugePages_Total:   2048

To check how many pages are free:

# grep HugePages_Free /proc/meminfo

And the output will depend on how many pages are left free at the moment we run the command.

Using hugeadm modern tool

Direct access to hugepage pool (the raw kernel interfaces) has been deprecated in favor of the hugeadm utility, so on the next steps we are going to use hugeadm to configure huge page pool (use this method if you have RHEL 6 or higher based distro, so CentOS 6.x/7.x, PacketLinux 1.x/2.x and so on).

To list all huge page pools available on a system (and display their min and max values) we can use:

# hugeadm --pool-list
      Size  Minimum  Current  Maximum  Default
   2097152        0        0        0        *

To set the 2MB pool minimum to 512 pages:

# hugeadm --pool-pages-min 2MB:512

And to set our 2048 max pages:

# hugeadm --pool-pages-max 2MB:2048

At this point pool-list should display something like:

# hugeadm --pool-list
      Size  Minimum  Current  Maximum  Default
   2097152      512      512     2048        *

To use libhugetlbfs features hugetlbfs must be mounted.  Each hugetlbfs mount point is associated with a page size.  To choose the size, use the pagesize mount option.  If this option is omitted, the default huge page size will be used.

Completing huge page configuration

To mount the default huge page size:

# mkdir -p /mnt/hugetlbfs
# mount -t hugetlbfs none /mnt/hugetlbfs

To mount 64KB pages (if the system hardware supports it):

# mkdir -p /mnt/hugetlbfs-64K
# mount -t hugetlbfs none -opagesize=64k /mnt/hugetlbfs-64K

In the case the application uses a non root account (so probably most of the times) then permissions on the mountpoint need to be set appropriately. For example, if the user postfix needs to be used for the application that we’ll force to use huge page then:

Either we can use:

# mount -t hugetlbfs none /mnt/hugetlbfs -o uid=postfix -o gid=postfix

Or we can mount as root and then:

# chown postfix:postfix /mnt/hugetlbfs

At this point we can also set the optimal values for Shared Memory Max (shmmax) in /proc/sys/kernel/shmmax, because it should be set to the size of the largest shared memory segment size you want to be able to use. To do this we can use again hugeadm:

# hugeadm --set-recommended-shmmax

And now we can ask for a report of the complete configuration by executing:

# hugeadm --explain
Total System Memory: 31792 MB

Mount Point          Options
/mnt/hugetlbfs       rw,seclabel,relatime

Huge page pools:
      Size  Minimum  Current  Maximum  Default
   2097152      512      512     2048        *

Huge page sizes with configured pools:
2097152

The recommended shmmax for your currently allocated huge pages is 4294967296 bytes.
To make shmmax settings persistent, add the following line to /etc/sysctl.conf:
  kernel.shmmax = 4294967296

To make your hugetlb_shm_group settings persistent, add the following line to /etc/sysctl.conf:
  vm.hugetlb_shm_group = 0

Note: Permanent swap space should be preferred when dynamic huge page pools are used.

Ok, we are ready to proceed with the next steps. Follow the output instructiosn above if you want to make your changes permanently on your system. (I reccommend to do this only AFTER you’ve done some good testing with your application).

Using libhugetlbfs

At this point is time to start running your application using libhugetlbfs so that your app will use Hugepages.

The genral syntax to do this is:

LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes <command>

Where <command> is the name of your application (including file path if different from local directory).

So, for example, to run VIM (vi) using hugetlb we can type:

LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes vi ./example.txt

That’s it, thanks for reading and, if you enjoyed this post, please support my blog by visiting my on-line hacking and engineering merchandise shop on redbubble.com by clicking here, thank you!:)

Advertisements

7 thoughts on “How to force any Linux application to use Hugepages without modifying the source code

  1. Thanks for the post. I followed the steps and by inspecting /proc//maps I could verify that huge pages memory get loaded into the target application’s memory space, however it’s not clear how this memory is being put to use. Heap, executable and shared library text memory is still backed by 4 KB pages

    Like

    • Hi, np, glad you liked it.
      About your problem, do you have more info to share?

      By default the hugepage heap should begins approximately where your standard heap begins, the appropriate starting point will depend on your system architecture.

      To use hugepages for TEXT, DATA and BSS segments, I am afraid, you’ll have to specially link your application. This requires ld release higher than 2.17 and, eventually, special scripts that generally are provided with libhugetlbfs. The script to use is “ld.hugetlbfs”, look at its help for more details.

      Another trick, in case your libhugetlbfs is installed in non-standard path, can be:

      LD_PRELOAD=libhugetlbfs.so LD_LIBRARY_PATH=/path-to-your-libhugetlbfs/obj64 \ HUGETLB_MORECORE=yes your-64bit-application-run-command

      Mind the spaces between each variable assignment.

      Also make sure your kernel and system architecture support in full hugepage, because, otherwise, you may have a limited set of functionalities… Intel/AMD 64bit supports it fully, recent Linux release too.

      Hope this helps,
      – Paolo

      Like

  2. Great post!
    I am curious how hugepages is used when an application calls malloc() something smaller than the hugepage size. For example, if my hugepagesz is set to 1G, and my application (without using the special script for linking) calls malloc() something like 1MB, will I waste 1023MB? Or it won’t but it will for mmap-ed area as mmap() is operated by pages? Just trying to understand the side effect instead of blindly going for hugepages. Also, is there a way to check how many hugepages are used by one process? Thanks in advance!

    Liked by 1 person

    • Thanks,
      and sorry for the delay of my answer, but I am quite busy at the moment.

      You asked quite few questions, so I have tried to condense my answers to avoid writing a book down here:

      To check how many hugepages are being used by a process you can use:

      # sudo grep huge /proc/911/numa_maps

      If the process is using hugepages then the output would look like:

      02200000 default file=/anon_hugepage\040(deleted) huge anon=1 dirty=1 N0=1 kernelpagesize_kB=2048
      7f5e0cbbd000 default file=/usr/lib64/libhugetlbfs.so mapped=13 active=2 N0=13 kernelpagesize_kB=4
      7f5e0cbcd000 default file=/usr/lib64/libhugetlbfs.so
      7f5e0cdcc000 default file=/usr/lib64/libhugetlbfs.so anon=1 dirty=1 N0=1 kernelpagesize_kB=4
      7f5e0cdcd000 default file=/usr/lib64/libhugetlbfs.so anon=1 dirty=1 N0=1 kernelpagesize_kB=4

      The 911 is the process pid. The example above is done on the vi example i provided in the article and on an atom 64bit server running centos 7, so, although if you read numa_maps, it works on every server class Intel x86_64 CPU.

      ————————————————————————-

      To check how alloc works you can create a simple C program that alloc memory for whatever data you want and then test it via strace, like i did in the following example and this is the result:

      read(0, “15\n”, 1024) = 3
      mmap(0x1e00000, 2097152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_HUGETLB, -1, 0) = 0x1e00000
      write(1, “Generating elements of array: 173626067 + 2107398681 + 115180039 + 10822782 + 738081557 + 165324258 + 2133575751 + 1095570634 + “…, 231) = 231
      exit_group(0) = ?
      +++ exited with 0 +++

      Here is the same simple code without forcing hugetlb:

      read(0, “15\n”, 1024) = 3
      brk(NULL) = 0x15f1000
      brk(0x1612000) = 0x1612000
      brk(NULL) = 0x1612000
      write(1, “Generating elements of array: 1942370943 + 2082658003 + 949010807 + 895494400 + 1631054737 + 587052173 + 1079432021 + 1985267562″…, 236) = 236
      exit_group(0) = ?
      +++ exited with 0 +++

      As you can see the exact same code works in two complete different ways.

      When you execute it via LD_PRELOAD=libhugetlbfs.so HUGETLB_MORECORE=yes ./paolo-test then it’ll use mmap to alloc the pointer memory for the array, while when you don’t force hugetlb then it’ll use brk as it should.

      ———————————————————

      Hugepages are not a panacea and should be used only when you have a problem that they can effectively solve, hence don’t use them blindly 😉

      ———————————————————

      Hope this answers all your questions, I also published articles on how to explore Linux Processes that shows how to analyse memory used by a process in different ways, it may help you in your hugepages discovery/analysis.

      Thanks,
      – Paolo

      Like

    • The “deleted” is referred to the fact that libhugepage is using a pseudo File System (libhugetlbfs.so) which normally remove temporary pseudo files created.

      Hope this helps, more info on the libhugepagefs source code.

      Thanks,
      – Paolo

      Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s