A common question is "why is my 100% utilization at 100%". There is a great deal of concern about the measurement of CPU at the Oracle server level.
If you suspect a CPU utilization problem, see these important notes on 100% CPU and Oracle. Also see Oracle and CPU utilization metrics.
Also see my notes on OS Busy scripts.
Once we understand the CPU resources are scarce (just like RAM resources), and not to be wasted), we need to understand how to tell if our Oracle server is making optimal usage of his computing hardware.
There are many OS utilities that allow us to see CPU utilization statistics, including these, but also with uptime and procinfo.
Each of these tools display CPU processor metrics at a finer level of detail than Oracle. This is because the OS does not reveal all processor details to applications (To UNIX, Oracle is just another application), and the best place to see what's going on inside your server is to use the operating systems CPU monitors. These will report different metrics on CPU utilization:
Using top to monitor CPU
The "top" command can be used to display CPU utilization. The metrcis are:
The watch command
Sar ?w (memory switching and swapping activity)
If you suspect a CPU utilization problem, see these important notes on 100% CPU and Oracle. Also see Oracle and CPU utilization metrics.
Also see my notes on OS Busy scripts.
Once we understand the CPU resources are scarce (just like RAM resources), and not to be wasted), we need to understand how to tell if our Oracle server is making optimal usage of his computing hardware.
There are many OS utilities that allow us to see CPU utilization statistics, including these, but also with uptime and procinfo.
Each of these tools display CPU processor metrics at a finer level of detail than Oracle. This is because the OS does not reveal all processor details to applications (To UNIX, Oracle is just another application), and the best place to see what's going on inside your server is to use the operating systems CPU monitors. These will report different metrics on CPU utilization:
- The runqueue - This is the far left-hand column of the vmstat command display (labeled with an "r"). It reports the total length of the CPU dispatcher queue. When the runqueue exceeds the number of CPU's on the server, you have have an overloaded server with a CPU bottleneck.
- The load average - This is defined as the sum of the run queue length and the number of jobs currently running on the CPUs. In each display of the load average consists of three numbers. Most often, the load average numbers show a descending order from left to right, with the load average for 1, 5, and 15 minutes in the past. Occasionally, however, an ascending order appears (e.g. like that shown in the top output).
There are a host of UNIX commands that display CPU and memory consumption. While there are dialect-specific utilities such as glance, we will look at the common vmstat and top utilities.
The "top" command can be used to display CPU utilization. The metrcis are:
- load average - The load average is computed as
- CPU states - This show percentage metrics for current processor usage.
System: corp-hp1 Thu Jul 6 09:14:23 2000
Load averages: 0.04, 0.03, 0.03
340 processes: 336 sleeping, 4 running
Cpu states:
CPU LOAD USER NICE SYS IDLE BLOCK SWAIT INTR SSYS
0 0.06 5.0% 0.0% 0.6% 94.4% 0.0% 0.0% 0.0% 0.0%
1 0.06 0.0% 0.0% 0.8% 99.2% 0.0% 0.0% 0.0% 0.0%
2 0.06 0.8% 0.0% 0.0% 99.2% 0.0% 0.0% 0.0% 0.0%
3 0.06 0.0% 0.0% 0.2% 99.8% 0.0% 0.0% 0.0% 0.0%
4 0.00 0.0% 0.0% 0.0% 100.0% 0.0% 0.0% 0.0% 0.0%
5 0.00 0.2% 0.0% 0.0% 99.8% 0.0% 0.0% 0.0% 0.0%
--- ---- ----- ----- ----- ----- ----- ----- ----- -----
avg 0.04 1.0% 0.0% 0.2% 98.8% 0.0% 0.0% 0.0% 0.0%
Memory: 493412K (229956K) real, 504048K (253952K) virtual, 767868K free Page# 1
/49
CPU TTY PID USERNAME PRI NI SIZE RES STATE TIME %WCPU %CPU COMMAND
0 ? 26835 applmgr 154 20 30948K 11936K sleep 0:49 3.91 3.90 f45runw
2 ? 27210 applmgr 154 20 31316K 12836K sleep 0:49 1.91 1.91 f45runw
5 ? 36 root 152 20 0K 0K run 56:28 1.16 1.16 vxfsd
1 ? 347 root 154 20 32K 96K sleep 567:15 1.11 1.11 syncer
5 ? 27429 oracle 154 20 20736K 2608K sleep 0:23 0.39 0.38 oraclePROD
4 ? 27067 oracle 154 20 21984K 3792K sleep 1:31 0.36 0.36 oraclePROD
Using svmon on AIX
root@AIX1 [/]#svmon
size inuse free pin virtual
memory 1048566 1023178 4976 55113 251293
pg space 524288 10871
work pers clnt
pin 55116 0 0
in use 250952 772224 2
Where:
size = the number of real memory frames (size of real memory) inuse = is the number of frames containing pages pin = Number of frames containing pinned pages in use
The svmon command can also be used with the ?p option to display characteristics for a specific process ID (PID):
Root> svmon -P 26060
-------------------------------------------------------------------------------
Pid Command Inuse Pin Pgsp Virtual 64-bit Mthrd
26060 pr 6871 1607 1022 6001 N N
Vsid Esid Type Description Inuse Pin Pgsp Virtual Addr Range
24029 d work shared library text 3992 0 22 2779 0..65535
0 0 work kernel seg 2509 1606 926 2897 0..32767 :
65475..65535
105e4 2 work process private 188 1 48 230 0..273 :
65298..65535
285ea f work shared library data 92 0 26 95 0..919
185e6 1 pers code,/dev/lvs001:301 81 0 - - 0..149
6c59b - pers /dev/lvs001:92402 6 0 - - 0..9
744fd - pers /dev/lvs001:763909 3 0 - - 0..9
7c5ff - pers /dev/lvs001:1327130 0 0 - - 0..29
The w command shows the load average" which is computed from the current runqueue values. Watch also shows the same information uptime did.
$ w
22:42:14 up 2:34, 2 users, load average: 0.00, 0.00, 0.00
USER TTY LOGIN@ IDLE JCPU PCPU WHAT
terry :0 20:10 ?xdm? 5:24 1.49s gnome-session
terry pts/1 22:22 0.00s 0.24s 0.04s /usr/sbin/sshd
22:42:14 up 2:34, 2 users, load average: 0.00, 0.00, 0.00
USER TTY LOGIN@ IDLE JCPU PCPU WHAT
terry :0 20:10 ?xdm? 5:24 1.49s gnome-session
terry pts/1 22:22 0.00s 0.24s 0.04s /usr/sbin/sshd
Using SAR
The sar utility (System Activity Reporter) is the system activity reporter that is quite popular in HP/UX, and is widely becoming available for AIX and Solaris systems. SAR has much of the same functionality as vmstat, but provides additional details.
There are four major flags in sar:
sar -u = to see CPUsar -w = for swappingsar -b = for buffer activitysar -d = for disk usage
swpin/s Number of process swapins per second;
swpot/s Number of process swapouts per second;
bswin/s Number of 512-byte swap in?s per second.
bswot/s Number of 512-byte swap out?s per second
pswch/s Number of process context switches per second
ROOT-/
>sar -w 5 5
HP-UX corp-hp1 B.11.00 U 9000/800 08/09/00
19:37:57 swpin/s bswin/s swpot/s bswot/s pswch/s
19:38:02 0.00 0.0 0.00 0.0 222
19:38:07 0.00 0.0 0.00 0.0 314
19:38:12 0.00 0.0 0.00 0.0 280
19:38:17 0.00 0.0 0.00 0.0 295
19:38:22 0.00 0.0 0.00 0.0 359
Average 0.00 0.0 0.00 0.0 294
Sar ?u (CPU Report)
cpu cpu number (only on a multi-processor
system with the -M option);
%usr user mode;
%sys system mode
%wio idle with some process waiting for I/O
%idle otherwise idle.
ROOT-/
>sar -u 2 5
HP-UX burleson B.11.00 U 9000/800 08/09/00
08:37:06 %usr %sys %wio %idle
08:37:07 43 57 0 0
08:37:08 45 55 0 0
08:37:09 44 56 0 0
08:37:10 44 56 0 0
08:37:11 43 57 0 0
08:37:12 52 48 0 0
08:37:13 49 51 0 0
08:37:14 49 51 0 0
08:37:15 57 43 0 0
08:37:16 65 35 0 0
08:37:17 40 29 12 19
08:37:18 23 20 12 44
08:37:19 0 1 0 99
Sar ?b (buffer activity report)
bread/s Number of physical reads per second from disk
bwrit/s Number of physical writes per second
lread/s Number of reads per second from buffer cache
lwrit/s Number of writes per second to buffer cache
cache;
%rcache Buffer cache hit ratio for read requests
%wcache Buffer cache hit ratio for write requests
pread/s Number of reads per second from
pwrit/s Number of writes per second to character
root>sar -b 1 6
HP-UX corp-hp1 B.11.00 U 9000/800 08/09/00
19:44:53 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
19:44:54 0 91 100 9 19 53 0 0
19:44:55 0 0 0 0 5 100 0 0
19:44:56 0 6 100 9 8 0 0 0
19:44:57 0 30 100 9 20 55 0 0
19:44:58 0 1 100 0 3 100 0 0
19:44:59 0 1 100 9 4 0 0 0
Average 0 22 100 6 10 39 0 0
Using sadc
The sadc (System Activity Report Package) is a popular package that can be used inside cron to schedule collections of server statistics.
All of the sadc reports are located in the /usr/lbin/sa directory. These reports must be run as root and provide detailed server information. One of the most popular sadc reports is sa1:
#! /usr/bin/sh
# @(#) $Revision: 72.3 $
# sa1.sh
DATE=`date +%d`
ENDIR=/usr/lbin/sa
DFILE=/var/adm/sa/sa$DATE
cd $ENDIR
if [ $# = 0 ]
then
exec $ENDIR/sadc 1 1 $DFILE
else
exec $ENDIR/sadc $* $DFILE
fi
Using glance to monitor Oracle CPU
For complete details, see my notes on monitoring Oracle with glance.
The glance utility is provided on HP/UX systems to provide a graphical display of server performance. It displays current CPU, memory, disk and swap consumption, and also reports on the top processes.
Using the vmstat utility to monitor Oracle
The UNIX vmstat utility is especially useful for monitoring the performance of Oracle databases. You?ll find vmstat on almost all implementations of UNIX, including Linux. Click here for details on monitoring Oracle CPU with vmstat, and building a CPU monitor for Oracle.
The vmstat utility is the most common Unix monitor utility. It is found on virtually all dialects of UNIX (vmstat is called osview on IRIX), and vmstat quickly display?s server values. These values include:
r = runqueue ? When this value exceeds the number of CPUs (lsdev ?C|grep Proc|wc ?l). then the sever is experiencing an CPU bottleneckpi = Page in ? Any non-zero values indicates that the server is short on memory and RAM memory is being send to the swap disk. However, this can also occur when numerous programs are accessing their memory for the first time, so always remember to check the scan rate ?sr? column. If both are non-zero. Then you are short on RAM.sr = scan rate - If we see ?sr? rising steadily we know that the paging daemon is busy allocating memory pages.
For AIX and HP/UX, vmstat provides the following CPU values. These values are expressed as percentages and will sum to 100
us = user CPU percentage sy = system CPU percentage Id = Idle CPU percentage wa = wait CPU percentage
When us+sy approaches 100, then the CPUs are busy, but not necessarily overloaded. Only the run queue values determines CPU overload and only when ?r? exceeds the number of CPUs on the server.
When ?wa? values exceed 20, then 20% of the processing time is waiting for a resource, usually I/O. It is common to see high wa values during backup and exports, but high wa values can also indicate an I/O bottleneck.
>vmstat 3
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 0 84283 207 0 1 1 59 174 0 178 40 142 18 4 75 4
0 0 84283 187 0 4 0 0 0 0 144 294 70 2 1 91 6
0 0 84283 184 0 0 0 0 0 0 171 740 99 5 2 89 4
0 0 84283 165 0 0 0 0 0 0 173 193 98 1 8 52 40
0 0 84283 150 0 3 0 0 0 0 205 615 136 4 2 87 6
0 0 84283 141 0 1 0 0 0 0 281 935 192 5 0 91 4
vmstat for Solaris
The display format for vmstat in Solaris is quite different than AIX and HP/UX. In Solaris the ?vmstat ?n? command is used to display server stats. The relevant columns are:
Pi = page-ins Us = CPU user time Sys = CPU system time Id = CPU idle time R = runqueue ? If this exceeds the number of CPU?s then you are CPU-bound
In the example below, we sample an overstresses Oracle server. Note that us + sy = 100, and that the r value far exceeds the 32 CPU?s on this server:
root> vmstat ?n 1
memory page faults
avm free re at pi po fr de sr in sy cs
41128 118400 4424 92 0 11 90 0 0 1124 77234 4113
CPU
cpu procs
us sy id r b w
49 51 0 100 2 0
46 54 0
49 51 0
42 58 0
54719 115379 4508 105 0 10 102 0 0 1107 78021 3912
44 56 0 67 2 0
56 44 0
58 42 0
45 55 0
54719 118479 4305 113 0 10 116 0 0 1070 75044 4085
41 59 0 67 2 0
56 44 0
50 50 0
50 50 0
54719 125113 4088 124 0 10 124 0 0 1055 75103 4520
52 48 0 67 2 0
50 50 0
65 35 0
53 47 0
54719 141189 3659 116 0 9 127 0 0 1065 71355 4882
60 40 0 67 2 0
60 40 0
61 39 0
61 39 0
54719 178306 3113 104 0 9 309 0 0 1075 64446 4741
4 15 81 67 2 0
9 13 78
16 9 75
10 9 81
0 comments:
Post a Comment