Troubeshooting Linux Server Performance issues

System Admins used to get the complaints that the servers are responding slow and low performance. One of the important reason for this is your server might be heavily loaded.Means overloaded.

How do you troubleshoot these server performance issues in Linux?

There can be a number of reasons for high load on the server such as,

Inadequate RAM/CPU
Slower Hard disk drives
Unoptimized software applications / Modules

In this article, I am going to explain you to identify what's the bottleneck and where do you need to focus on.

1) First, Lets check the server Load:

First let us look at the server load. You can probably execute the "uptime" command to find out what's the current load, but "top" command is better one. Top command helps you identify how many CPUs are being reported and You should be able to see something like cpu00, cpu01, etc.

A load of ~1 for each CPU is reasonable. More 1 for each CPU indicates that processes are waiting on resources like CPU, Memory or IO. The higher value shows that more process are in queue for resources and indicates that your server is heavily loaded. For example, you're fine if the load's 7.80 if you have 8 CPUs.

Another thing to consider while looking at the load via uptime or top, is to understand what it shows.

15:33:35 up 180 days, 5:17, 6 users, load average: 8.76, 6.77, 5.42

The first part (8.76) shows the load average in the last 5 mins, while the second (6.77) and third (5.42) shows averages of 10 and 15 mins respectively. It's probably a spike here, lets look further.

Are you OK about your server is load? sometimes servers are able to handle much more load than the load shown. The load averages aren't so accurate after all and cannot always be the ultimate deciding factor. Move ahead if your loads are something to worry over.

Note: If have a P4 CPU having HT technology will be reported as 2 CPUs in Top, even if you know your server has one Physical CPU. For example: on a 4HT Physical CPU server, the Top reports it as 8 CPU

2) Check for RAM Memory:

Note: Perform the checks multiple times, to reach a fine conclusion.

# free -m

The output should look similar to this:

# free -m

total used free shared buffers cached

Mem: 1963 1912 50 0 28 906

-/+ buffers/cache: 978 985

Swap: 1027 157 869

Look at the output. Don't panic that almost all the RAM is used up. Have a look at the buffers/cache that says "985" MB of RAM is still free in buffers. As long as you have enough memory in the buffers, and your server is not using much swap, you're pretty fine on RAM.

Whenever Server does not have enough Memory to keep all the Application processes and data, the server starts to use SWAP, which is part of your disk mapped as memory. But it is comparatively very slow and can further slower down your system. Keep in mind, higher SWAP usage, slow down your system.

At least 200MB available in buffers and not more than 200MB swap usage is Good.

If you find, RAM is the issue, look at Top output for which application process is using more memory. If your application is taking more memory then you should probably look into optimizations on your Application and its related scripts.

Alternatively, you can increase the RAM as well.

3) Check if I/O (input/output) usage is excessive

If there are too many read/write requests on a single hard disk drive, it will become slow and you'll have to upgrade it to a faster drive (with more RPM and cache). The alternate option is splitting the load onto multiple drives by spreading the data by using RAID. To identify, if your I/O issues:

# top

Read the output under "iowait" section (In some cases %wa), for each CPU. In ideal situations, it should be near to 0%. If you see higher value here, sometimes at time of a load spike, consider rechecking these values multiple times to reach a fine conclusion. Anything above 15% is bad. Next, you can check the speed of your hard disk drive to see if it's really lagging:

Try "df -h" command to check which is the drive that your data/Filesystem resides on.

# hdparm -Tt /dev/sda

The output:

/dev/sda:

Timing cached reads: 1484 MB in 2.01 seconds = 739.00 MB/sec

Timing buffered disk reads: 62 MB in 3.00 seconds = 20.66 MB/sec

It was awesome at the buffer-cache reads, most probably because of the disk's onboard cache, however, buffered disk reads is just at 20.66 MB / sec. Anything below 25MB is something you should worry about.

4) Check the CPU consumption:

# top

Check the top output to find out if you're using too much CPU power. You should be looking the value under idle besides each CPU entry. Anything below 45% is something you should really worry about. Look for %sy, %us, %id, %wa values as well as a next step. In the Top output you can determine which process is using higher CPU.

In the example, the problem was with the I/O usage and hard disk slow. we need to upgrade the disk to a faster drive or implement RAID kind of solution.

Troubleshooting process can never be complete in one article and No article can feed you everything which need to reach up to expert level. You need to keep learning.

Hope this helps..! Happy Troubleshooting..!