Unix & Linux ,AIX, HP-Unix, Solaries Memory and CPU Utilization

A common question is "why is my 100% utilization at 100%".  There is a great deal of concern about the measurement of CPU at the Oracle server level.
If you suspect a CPU utilization problem, see these important notes on 100% CPU and Oracle.  Also see Oracle and CPU utilization metrics.
Also see my notes on OS Busy scripts.
Once we understand the CPU resources are scarce (just like RAM resources), and not to be wasted), we need to understand how to tell if our Oracle server is making optimal usage of his computing hardware.
There are many OS utilities that allow us to see CPU utilization statistics, including these, but also with uptime and procinfo.
Each of these tools display CPU processor metrics at a finer level of detail than Oracle.  This is because the OS does not reveal all processor details to applications (To UNIX, Oracle is just another application), and the best place to see what's going on inside your server is to use the operating systems CPU monitors.  These will report different metrics on CPU utilization:
  • The runqueue - This is the far left-hand column of the vmstat command display (labeled with an "r").  It reports the total length of the CPU dispatcher queue.  When the runqueue exceeds the number of CPU's on the server, you have have an overloaded server with a CPU bottleneck.
     
  • The load average - This is defined as the sum of the run queue length and the number of jobs currently running on the CPUs. In each display of the load average consists of three numbers.  Most often, the load average numbers show a descending order from left to right, with the load average for 1, 5, and 15 minutes in the past. Occasionally, however, an ascending order appears (e.g. like that shown in the top output).
There are a host of UNIX commands that display CPU and memory consumption.  While there are dialect-specific utilities such as glance, we will look at the common vmstat and top utilities.

Using top to monitor CPU
The "top" command can be used to display CPU utilization.  The metrcis are:
  • load average - The load average is computed as
  • CPU states - This show percentage metrics for current processor usage.



System: corp-hp1                                      Thu Jul  6 09:14:23 2000
Load averages: 0.04, 0.03, 0.03
340 processes: 336 sleeping, 4 running
Cpu states:
CPU   LOAD   USER   NICE    SYS   IDLE  BLOCK  SWAIT   INTR   SSYS
 0    0.06   5.0%   0.0%   0.6%  94.4%   0.0%   0.0%   0.0%   0.0%
 1    0.06   0.0%   0.0%   0.8%  99.2%   0.0%   0.0%   0.0%   0.0%
 2    0.06   0.8%   0.0%   0.0%  99.2%   0.0%   0.0%   0.0%   0.0%
 3    0.06   0.0%   0.0%   0.2%  99.8%   0.0%   0.0%   0.0%   0.0%
 4    0.00   0.0%   0.0%   0.0% 100.0%   0.0%   0.0%   0.0%   0.0%
 5    0.00   0.2%   0.0%   0.0%  99.8%   0.0%   0.0%   0.0%   0.0%
---   ----  -----  -----  -----  -----  -----  -----  -----  -----
avg   0.04   1.0%   0.0%   0.2%  98.8%   0.0%   0.0%   0.0%   0.0%

Memory: 493412K (229956K) real, 504048K (253952K) virtual, 767868K free  Page# 1
/49

CPU TTY  PID USERNAME PRI NI   SIZE    RES STATE    TIME %WCPU  %CPU COMMAND
 0   ? 26835 applmgr  154 20 30948K 11936K sleep    0:49  3.91  3.90 f45runw
 2   ? 27210 applmgr  154 20 31316K 12836K sleep    0:49  1.91  1.91 f45runw
 5   ?    36 root     152 20     0K     0K run     56:28  1.16  1.16 vxfsd
 1   ?   347 root     154 20    32K    96K sleep  567:15  1.11  1.11 syncer
 5   ? 27429 oracle   154 20 20736K  2608K sleep    0:23  0.39  0.38 oraclePROD
 4   ? 27067 oracle   154 20 21984K  3792K sleep    1:31  0.36  0.36 oraclePROD

Using svmon on AIX


root@AIX1 [/]#svmon

               size      inuse       free        pin    virtual          
memory      1048566    1023178       4976      55113     251293
pg space     524288      10871

               work       pers       clnt
pin           55116          0          0
in use       250952     772224          2
 Where:
 size = the number of real memory frames (size of real memory)
inuse = is the number of frames containing pages
pin = Number of frames containing pinned pages in use
The svmon command can also be used with the ?p option to display characteristics for a specific process ID (PID):

Root> svmon -P 26060

-------------------------------------------------------------------------------
     Pid Command        Inuse      Pin     Pgsp  Virtual   64-bit    Mthrd
   26060 pr              6871     1607     1022     6001        N        N

  Vsid     Esid Type Description           Inuse   Pin Pgsp Virtual Addr Range
 24029        d work shared library text    3992     0   22  2779   0..65535
     0        0 work kernel seg             2509  1606  926  2897   0..32767 :
                                                                    65475..65535
 105e4        2 work process private         188     1   48   230   0..273 :
                                                                    65298..65535
 285ea        f work shared library data      92     0   26    95   0..919
 185e6        1 pers code,/dev/lvs001:301     81     0    -     -   0..149
 6c59b        - pers /dev/lvs001:92402         6     0    -     -   0..9
 744fd        - pers /dev/lvs001:763909        3     0    -     -   0..9
 7c5ff        - pers /dev/lvs001:1327130       0     0    -     -   0..29

The watch command
The w command shows the load average" which is computed from the current runqueue values.  Watch also shows the same information uptime did.
$ w
22:42:14 up  2:34,  2 users,  load average: 0.00, 0.00, 0.00
USER     TTY        LOGIN@   IDLE   JCPU   PCPU WHAT
terry    :0        20:10   ?xdm?   5:24   1.49s gnome-session
terry    pts/1     22:22    0.00s  0.24s  0.04s /usr/sbin/sshd



Using SAR

The sar utility (System Activity Reporter) is the system activity reporter that is quite popular in HP/UX, and is widely becoming available for AIX and Solaris systems.  SAR has much of the same functionality as vmstat, but provides additional details.
There are four major flags in sar:
sar -u = to see CPU
sar -w = for swapping
sar -b = for buffer activity
sar -d = for disk usage

Sar ?w (memory switching and swapping activity)
 
swpin/s        Number of process swapins per second;
 
swpot/s        Number of process swapouts per second;
 
bswin/s        Number of 512-byte swap in?s per second.
 
bswot/s        Number of 512-byte swap out?s per second
 
pswch/s        Number of process context switches per second
 
 
 
ROOT-/
>sar -w 5 5
 
HP-UX corp-hp1 B.11.00 U 9000/800    08/09/00
 
19:37:57 swpin/s bswin/s swpot/s bswot/s pswch/s
19:38:02    0.00     0.0    0.00     0.0     222
19:38:07    0.00     0.0    0.00     0.0     314
19:38:12    0.00     0.0    0.00     0.0     280
19:38:17    0.00     0.0    0.00     0.0     295
19:38:22    0.00     0.0    0.00     0.0     359
 
Average     0.00     0.0    0.00     0.0     294
 
 
 
Sar ?u (CPU Report)
 
cpu            cpu number (only on a multi-processor
                                    system with the -M option);
 
%usr           user mode;
 
%sys           system mode
 
%wio           idle with some process waiting for I/O                                    
 
%idle          otherwise idle.
 
 
ROOT-/
>sar -u 2 5
 
HP-UX burleson B.11.00 U 9000/800    08/09/00
 
08:37:06    %usr    %sys    %wio   %idle
08:37:07      43      57       0       0
08:37:08      45      55       0       0
08:37:09      44      56       0       0
08:37:10      44      56       0       0
08:37:11      43      57       0       0
08:37:12      52      48       0       0
08:37:13      49      51       0       0
08:37:14      49      51       0       0
08:37:15      57      43       0       0
08:37:16      65      35       0       0
08:37:17      40      29      12      19
08:37:18      23      20      12      44
08:37:19       0       1       0      99
 
Sar ?b (buffer activity report)
 
 
bread/s        Number of physical reads per second from disk
 
bwrit/s        Number of physical writes per second
 
lread/s        Number of reads per second from buffer cache
 
lwrit/s        Number of writes per second to buffer cache
                                    cache;
%rcache        Buffer cache hit ratio for read requests
 
%wcache        Buffer cache hit ratio for write requests
 
pread/s        Number of reads per second from
 
pwrit/s        Number of writes per second to character
 
 
root>sar -b 1 6
 
HP-UX corp-hp1 B.11.00 U 9000/800    08/09/00
 
19:44:53 bread/s lread/s %rcache bwrit/s lwrit/s %wcache pread/s pwrit/s
19:44:54       0      91     100       9      19      53       0       0
19:44:55       0       0       0       0       5     100       0       0
19:44:56       0       6     100       9       8       0       0       0
19:44:57       0      30     100       9      20      55       0       0
19:44:58       0       1     100       0       3     100       0       0
19:44:59       0       1     100       9       4       0       0       0
 
Average        0      22     100       6      10      39       0       0

Using sadc

The sadc (System Activity Report Package) is a popular package that can be used inside cron to schedule collections of server statistics.
All of the sadc reports are located in the /usr/lbin/sa directory.  These reports must be run as root and provide detailed server information.  One of the most popular sadc reports is sa1:
#! /usr/bin/sh
# @(#) $Revision: 72.3 $
#       sa1.sh
 
DATE=`date +%d`
ENDIR=/usr/lbin/sa
DFILE=/var/adm/sa/sa$DATE
cd $ENDIR
if [ $# = 0 ]
then
        exec $ENDIR/sadc 1 1 $DFILE
else
        exec $ENDIR/sadc $* $DFILE
fi

Using glance to monitor Oracle CPU

For complete details, see my notes on monitoring Oracle with glance.
The glance utility is provided on HP/UX systems to provide a graphical display of server performance.  It displays current CPU, memory, disk and swap consumption, and also reports on the top processes.

Using the vmstat utility to monitor Oracle

The UNIX vmstat utility is especially useful for monitoring the performance of Oracle databases. You?ll find vmstat on almost all implementations of UNIX, including Linux. Click here for details on monitoring Oracle CPU with vmstat, and building a CPU monitor for Oracle.
The vmstat utility is the most common Unix monitor utility.  It is found on virtually all dialects of UNIX (vmstat is called osview on IRIX), and vmstat quickly display?s server values.  These values include:
r = runqueue ? When this value exceeds the number of CPUs (lsdev ?C|grep Proc|wc ?l). then the sever is experiencing an CPU bottleneck
pi = Page in ? Any non-zero values indicates that the server is short on memory and RAM memory is being send to the swap disk.  However, this can also occur when numerous programs are accessing their memory for the first time, so always remember to check the scan rate ?sr? column.  If both are non-zero. Then you are short on RAM.
sr = scan rate - If we see ?sr? rising steadily we know that the paging daemon is busy allocating memory pages.

For AIX and HP/UX, vmstat provides the following CPU values.  These values are expressed as percentages and will sum to 100 
us = user CPU percentage
sy = system CPU percentage
Id = Idle CPU percentage
wa = wait CPU percentage
When us+sy approaches 100, then the CPUs are busy, but not necessarily overloaded.  Only the run queue values determines CPU overload and only when ?r? exceeds the number of CPUs on the server.
When ?wa? values exceed 20, then 20% of the processing time is waiting for a resource, usually I/O.  It is common to see high wa values during backup and exports, but high wa values can also indicate an I/O bottleneck.

>vmstat 3
 
kthr     memory             page              faults        cpu    
----- ----------- ------------------------ ------------ -----------
 r  b   avm   fre  re  pi  po  fr   sr  cy  in   sy  cs us sy id wa
 0  0 84283   207   0   1   1  59  174   0 178   40 142 18  4 75  4
 0  0 84283   187   0   4   0   0    0   0 144  294  70  2  1 91  6
 0  0 84283   184   0   0   0   0    0   0 171  740  99  5  2 89  4
 0  0 84283   165   0   0   0   0    0   0 173  193  98  1  8 52 40
 0  0 84283   150   0   3   0   0    0   0 205  615 136  4  2 87  6
 0  0 84283   141   0   1   0   0    0   0 281  935 192  5  0 91  4

vmstat for Solaris

The display format for vmstat in Solaris is quite different than AIX and HP/UX.  In Solaris the ?vmstat ?n? command is used to display server stats.  The relevant columns are:

Pi = page-ins
Us = CPU user time
Sys = CPU system time
Id = CPU idle time
R = runqueue ? If this exceeds the number of CPU?s then you are CPU-bound

In the example below, we sample an overstresses Oracle server.  Note that us + sy = 100, and that the r value far exceeds the 32 CPU?s on this server:



root> vmstat ?n 1


memory                     page                          faults
     avm    free   re   at    pi   po    fr   de    sr     in     sy    cs 
   41128  118400 4424   92     0   11    90    0     0   1124  77234  4113
CPU
    cpu          procs
 us sy id    r     b     w
 49 51  0  100     2     0
 46 54  0
 49 51  0
 42 58  0
   54719  115379 4508  105     0   10   102    0     0   1107  78021  3912
 44 56  0   67     2     0
 56 44  0
 58 42  0
 45 55  0
   54719  118479 4305  113     0   10   116    0     0   1070  75044  4085
 41 59  0   67     2     0
 56 44  0
 50 50  0
 50 50  0
   54719  125113 4088  124     0   10   124    0     0   1055  75103  4520
 52 48  0   67     2     0
 50 50  0
 65 35  0
 53 47  0
   54719  141189 3659  116     0    9   127    0     0   1065  71355  4882
 60 40  0   67     2     0
 60 40  0
 61 39  0
 61 39  0
   54719  178306 3113  104     0    9   309    0     0   1075  64446  4741
  4 15 81   67     2     0
  9 13 78
 16  9 75
 10  9 81




SHARE

sangeethakumar

  • Image
  • Image
  • Image
  • Image
  • Image
    Blogger Comment
    Facebook Comment

0 comments:

Post a Comment