Optimizing Linux Swappiness: Lessons from Upgrading to RHEL 8

As part of our continual improvement process we have been migrating our RedHat Enterprise Linux 7 and alternatives over the last few years. With version 7 being end of life by 30th of June 2024, the last machine running this version are being swapped for the more recent and actively maintained version 8. We will describe the roll-over of the various systems in some detail and focus on one particular topic that had us scratching our heads for a while.

Migrating always starts with a good plan

First step is to notify the customers that an upgrade to their server is needed. This always ends up in the question “and how long will the service be unavailable”. To be able to scope this the only way is to define a detailed migration plan, test the various steps and time them. Obviously leave some room for error and unexpected behavior is a good thing but in general we were able to predict all our migrations accurately and delivered all of them in the specified window of downtime.

So, how did we go about using IBM Cloud? IBM offers to set up a new system with your preferred OS on a new disk while allowing you to mount the old disk maintaining all other aspects of the machine. Obviously you will need to install the necessary packages and configure the new services so they deliver the same service as the old instance did. We used the following scenario:

Create a script on the old system that will prepare the new system to be in line with the requirements.
Request IBM to upgrade the server while keeping the old disk available for mounting once the system is back up.
Log in into the new server and mount the old disk at a fixed position.
Run the upgrade script to make sure the new service delivers the same service as the old server did before.
Do a rigorous check to see if everything is working as before.

Minimizing down time and human error

Our update scripts are simple bash scripts. Typically they have the following main ingredients:

Install a set of packages that you need (keep a list when you are doing the test migration).
Copy the necessary configuration files (relevant /etc files, ssh and https keys and certificates).
Copy relevant data files.
(Re)start relevant services.

This strategy allows for an easy transition to a new OS. You can time the relevant parts and it greatly automates the process minimizing downtime and human error. Typically the reboot of the machine into the new OS takes up to 10 minutes and running the upgrade script takes 10 to 20 minutes order of magnitude depending on how many additional packages need to be installed.

Obviously it takes some effort to test these migration scripts which makes having several development environments, a test, an acceptation and an operational test-and-evaluation environment a great playground to make sure all inconsistencies are ironed out before upgrading the production system. That said, once you have a tried and true scenario moving other servers becomes easier since most of the machines have similar requirements.

New system, new requirements?

For many of our systems the above scenario work out perfectly and many of them are running for month, years without a hick-up. But some smaller machines ran into memory problems. That is, there was enough memory but some processes would force the main memory to be paged out to swap and that would on occasion make the machine’s response time really slow.

Our first thought was to just provide the machines with some additional memory and be done with it. The typical story being that new OSes have more bloat and just need a little bit more memory. It did seem to do the trick. Except for one machine, and it is here that the story of swappiness starts.

After some lengthy investigations we started to notice that the particular web service would occasionally pull very high loads spawning many processes. The processes would force the sizable database’s memory into the swap. After a few seconds the web service would scale down in line with the configuration and the lower demand. The result would be lots of free main memory, lots of memory in use for (file system) caching and a database with all its memory swapped to disk making the database very slow.

Our first instinct was to lower what is called the “swappiness” of the kernel in the hopes it would swap out less pages and start kicking out a little more of the sizable files system cache to give the database a little more breading room. Playing around with the swappiness parameter in /proc/sys/vm/swappiness did not change the behavior. It felt like it didn’t do anything at all. But why was that?

Looking for more swappiness…

It turns out that for RHEL 8 on variants systemd creates cgroups per service type with its own swappiness. A simple find shows a pretty long list.

# find /sys/fs/cgroup -name *swappiness

/sys/fs/cgroup/memory/system.slice/irqbalance.service/memory.swappiness

/sys/fs/cgroup/memory/system.slice/systemd-update-utmp.service/memory.swappiness

/sys/fs/cgroup/memory/system.slice/dev-xvdb1.swap/memory.swappiness

/sys/fs/cgroup/memory/system.slice/systemd-udevd-control.socket/memory.swappiness

/sys/fs/cgroup/memory/system.slice/systemd-journal-flush.service/memory.swappiness

/sys/fs/cgroup/memory/system.slice/systemd-sysctl.service/memory.swappiness

/sys/fs/cgroup/memory/system.slice/systemd-udevd.service/memory.swappiness

/sys/fs/cgroup/memory/system.slice/systemd-udevd-kernel.socket/memory.swappiness

...

A simple one line bash script would adapt “all” swappiness to the basic swappiness we originally intended

# for cgfile in $(find /sys/fs/cgroup -name *swappiness); do cat /proc/sys/vm/swappiness > $cgfile; done

This did indeed improve the situation greatly, forcing less pages to be swapped out giving the database breading room and deliver the service at the required rate. Although there were no longer any issues with the system per se, the system would still end up swapping enough pages to disk for our monitoring to go into alarm. We eventually lower the swappiness on a number of particular slices further down which did lower the superfluous swapping down to a completely acceptable level.

Looks like the community has noticed this odd behavior. In the next release, RHEL 9, the swappiness per cgroup has been entirely eliminated and most likely replaced by something better.

Conclusions

Migration with a good plan and a detailed script minimize downtime. But one should always keep in mind that despite all efforts taken to maintain stability specific workflows can suffer under migration. Fortunately the Linux kernel has many parameters and with the necessary insight tuning the right parameters can leverage many workloads to optimal performance.

If you need migration services or a workload is not working as expected, reach out to Nexperteam’s experienced Linux specialists.

Migrating always starts with a good plan

Minimizing down time and human error

New system, new requirements?

Looking for more swappiness…

Conclusions

Leave A Comment