System Admins used to get the complaints that the servers are responding slow and low performance. One of the important reason for this is your server might be heavily loaded.Means overloaded.
How do you troubleshoot these server performance issues in Linux?
There
can be a number of reasons for high load on the server such as,
- Inadequate RAM/CPU
- Slower Hard disk drives
- Unoptimized software applications / Modules
In this article, I am going to explain you to identify what's the bottleneck and where do you need to focus on.
1) First, Lets check the server Load:
First let us look at the server load. You can probably execute the "uptime" command to find out what's the current load, but "top"
command is better one. Top command helps you identify how many CPUs are
being reported and You should be able to see something like cpu00, cpu01, etc.
A load of ~1 for each CPU is reasonable. More 1 for each CPU indicates that processes are waiting on resources like CPU, Memory or IO. The higher value shows that more process are in queue for resources and indicates that your server is heavily loaded. For example, you're fine if the load's 7.80 if you have 8
CPUs.
Another thing to
consider while looking at the load via uptime or top, is to understand what it
shows.
15:33:35 up 180 days,
5:17, 6 users, load average: 8.76, 6.77, 5.42
The first part (8.76)
shows the load average in the last 5 mins, while the second (6.77) and third
(5.42) shows averages of 10 and 15 mins respectively. It's probably a spike
here, lets look further.
Are you OK about
your server is load? sometimes servers are able to handle much
more load than the load shown. The load averages aren't so accurate after all
and cannot always be the ultimate deciding factor. Move ahead if
your loads are something to worry over.
Note: If have a P4 CPU having HT
technology will be reported as 2 CPUs in Top, even if you know your server has one Physical CPU. For example: on a 4HT Physical CPU server, the Top reports it as 8 CPU
2) Check for RAM Memory:
Note: Perform the
checks multiple times, to reach a fine conclusion.
# free -m
The output should
look similar to this:
# free -m
total used free
shared buffers cached
Mem: 1963 1912 50 0
28 906
-/+ buffers/cache:
978 985
Swap: 1027 157 869
Look at the output. Don't panic that almost all the RAM is used up. Have a look
at the buffers/cache that says "985" MB of RAM is still free in
buffers. As long as you have enough memory in the buffers, and your server is not using much swap, you're pretty fine on RAM.
Whenever Server does not have enough Memory to keep all the Application processes and data, the server starts to use
SWAP, which is part of your disk mapped as memory. But it
is comparatively very slow and can further slower down your system. Keep in mind, higher SWAP usage, slow down your system.
At least 200MB available in buffers and not more than 200MB swap usage is Good.
If you find, RAM is the issue, look at Top output for which application process is using more memory. If your application is taking more memory then you should probably look into optimizations on your Application and its related scripts.
Alternatively, you can increase the RAM as well.
3) Check if I/O
(input/output) usage is excessive
If there are too many
read/write requests on a single hard disk drive, it will become slow and you'll
have to upgrade it to a faster drive (with more RPM and cache). The alternate
option is splitting the load onto multiple drives by
spreading the data by using RAID. To
identify, if your I/O issues:
# top
Read the output under
"iowait" section (In some cases %wa), for each CPU. In ideal situations, it should be
near to 0%. If you see higher value here, sometimes at time of a load spike,
consider rechecking these values multiple times to reach a fine conclusion.
Anything above 15% is bad. Next, you can check the speed of your hard
disk drive to see if it's really lagging:
Try "df -h" command to check which is the drive that your data/Filesystem
resides on.
# hdparm -Tt /dev/sda
The output:
/dev/sda:
Timing cached reads:
1484 MB in 2.01 seconds = 739.00 MB/sec
Timing buffered disk
reads: 62 MB in 3.00 seconds = 20.66 MB/sec
It was awesome at the
buffer-cache reads, most probably because of the disk's onboard cache, however,
buffered disk reads is just at 20.66 MB / sec. Anything below 25MB is something
you should worry about.
4) Check the CPU consumption:
# top
Check the top output
to find out if you're using too much CPU power. You should be looking the value
under idle besides each CPU entry. Anything below 45% is something you should
really worry about. Look for %sy, %us, %id, %wa values as well as a next step. In the Top output you can determine which process is using higher CPU.
In the example,
the problem was with the I/O usage and hard disk slow. we need to upgrade the disk to a faster drive or implement RAID kind of solution.
Troubleshooting process can never be
complete in one article and No article can feed you everything which need to reach up to expert
level. You need to keep learning.
Hope this helps..! Happy Troubleshooting..!
3 comments
Write commentssuppose load average: 8.76, 6.77, 5.42 then first flag representing the value of last 1 min or 5 min.
ReplySuppose load average: 8.76, 6.77, 5.42 then first flag of the field is represent to last 1 min or 5 min.........please confirm.
ReplyThis is the best article on troubleshooting which i have come across.
ReplyPlease add few more scenarios as such...
What do you think about this Article? Add your Opinion..! EmoticonEmoticon